Факторизационные модели в рекомендательных системах

  • Published on
    15-Jul-2015

  • View
    366

  • Download
    5

Embed Size (px)

Transcript

  • 15 2013 .

    1

  • SVD Factorization Machines

    2

  • Outline

    SVD Factorization Machines

    3

  • : {(u, i, rui)}(u,i)R,I u ,I i ,I rui , (feedback) u

    i,I R /,

    ,I R(u) = {i : (u, i) R}

    :I rui rui,I i? = arg maxi rui,I : sim(i1, i2),I /

    4

  • SVD

    rui = + bu + bi biases

    + pu, qi personal

    I , bu, bi ;

    I pu Rd () u;I qi Rd () i;I d , .

    : = {pu, qi, bu, bi}

    5

  • SVD

    J() =

    (u,i)Rl(rui(ui)) + (ui) min

    I :J() =

    (u,i)R

    (rui rui)2 + ()

    I :

    J() =

    (u,i)R(rui (rui))2 + ()

    I , ordinal regression, pairwise-...

    :

    () = bias

    (u

    b2u +i

    b2i

    )+user

    u

    pu2 +itemi

    qi2

    6

  • SVD 2D / SVD Yahoo! Music dataset

    Linkin Park

    Green Day

    Nelly

    Red Hot Chili Peppers

    Missy Elliott Beyonc

    50 Cent

    Mariah Carey

    Aerosmith

    Snoop Dogg

    Jay-Z

    U2

    Metallica

    Mary J. Blige

    Coldplay

    Janet Jackson

    AC/DC

    Madonna

    Nirvana

    Led Zeppelin

    The Doors

    Avril Lavigne

    Bob Marley

    Nine Inch Nails

    Busta Rhymes

    Pop

    R&B

    Rock

    Soul

    Adult Alternative

    Classic Rock

    Soft Pop

    Rock Moderno

    Latin

    Electronic/Dance Mainstream

    Rap

    Mainstream Pop

    R&B Moderno

    Hip-Hop

    Jazz

    Rap

    Disco

    -1.5

    -0.5

    0.5

    1.5

    2.5

    3.5

    4.5

    5.5

    -4.2 -3.2 -2.2 -1.2 -0.2 0.8

    Artists

    Genres

    Figure 1: The most popular musical tracks and gen-res in the Yahoo! Music dataset are embedded intoa 2-dimensional. The open cone suggests the regionof highly preferred items for the user (her logo isat the center of the cone). Note how the learnedembedding separates Rock and similar items fromHip-Hop and similar items. The low dimensional-ity (which is required for a visualization), causes asmall number of items to be wrongly folded near lessrelated items (e.g., Bob Marley).

    loss of accuracy. If the item vectors were normalized, theRoR problem would have been reduced to the well studiednearest neighbor search (as explained in section 3). How-ever, such a normalization will introduce a distortion on thebalance between the original item trait vector and the itembias which constitute the concatenated vector qi. This wouldevidently result in an incorrect solution.Denoting the concatenated user and item vectors as pu

    and qi respectively, and the effective rating for the task ofretrieval as rui, RoR reduces to the following task: Given auser query pu, we want to find an item qi S such that:

    p>u qi = maxqS

    p>u q (6)

    Hence the RoR task is equivalent to the problem of findingthe best-match for a query in a set of points with respect tothe dot-product (described in equation 1). A very simplisticvisualization of this task is depicted in Figure 1. For thegiven user, the best recommendations (in this case songs) liewithin the open cone around the user vector (maximizing thecos (pu,qi) term) and are as far as possible from the origin(maximizing the qi term).

    3. ALGORITHMS FORFINDING BEST-MATCHES

    Efficiently finding the best match using the dot-product(equation 6) appears to be very similar to much existingwork in the literature. Finding the best match with re-spect to the Euclidean (or more generally Lp) distance isthe widely studied problem of nearest-neighbor search inmetric spaces [4]. The nearest-neighbor search problem (inmetric space) can be solved approximately with the popular

    Locality-sensitive hashing (LSH) method [9]. LSH has beenextended to other forms of similarity functions (as opposedto the distance as a dissimilarity function) like the cosinesimilarity [3]. In this section, we show that the problemstated in equation 6 is different from these existing prob-lems.

    Nearest-neighbor Search in Metric Space.The problem of finding the nearest-neighbor in this setting

    is to find a point qi S for a query pu such that:qi = argmin

    qSpu q2 = argmax

    qS

    (p>u q q2 /2

    )6= argmax

    qSp>u q (unless q2 = const q S).

    If all the points in S are normalized to the same length, thenthe problem of finding the best match with respect to thedot-product is equivalent to the problem of nearest-neighborsearch in any metric space. However, without this restric-tion, the two problems can yield very different answers.

    Cosine similarity.Finding the best match with respect to the cosine simi-

    larity is to find a point qi S for a query pu such thatqi = argmax

    qSp>u q/(pu q) = argmax

    qSp>u q/ q

    6= argmaxqS

    p>u q (unless q = const q S).

    As in the previous case, the best match with cosine similar-ity is the best match with dot-products if all the points inthe set S are normalized to the same length. Under gen-eral conditions, the best matches with these two similarityfunctions can be very different.

    Locality-sensitive Hashing.LSH involves constructing hashing functions h which sat-

    isfy the following for any pair of points q, p RD:Pr[h(q) = h(p)] = sim(q, p), (7)

    where sim(q, p) [0, 1] is the similarity function of inter-est. For our situation, we can scale our dataset such that q S, q 1 and assume that the data is in the firstquadrant (such as in non-negative matrix factorization mod-els [19]). In that case, sim(q, p) = q>p [0, 1] is our simi-larity function of interest.For any similarity function to admit a locality sensitive

    hash function family (as defined in equation 7), the distancefunction d(q, p) = 1 sim(q, p) must satisfy the triangleinequality (Lemma 1 in [3]). However, the distance functiond(q, p) = 1 q>p does not satisfy the triangle inequality.Hence LSH cannot be applied to the dot-product similarityfunction even in restricted domains (the first quadrant).

    3.1 Why is finding themaximum dot-productsharder?

    Unlike the distance functions in metric space, dot-productsdo not induce any form of triangle inequality (even undersome assumptions as mentioned in the previous section).Moreover, this lack of any induced triangle inequality causesthe similarity function induced by the dot-products to haveno admissible family of locality sensitive hashing functions.Any modification to the similarity function to conform towidely used similarity functions (like Euclidean distance orCosine-similarity) will create inaccurate results.

    7

  • R Rnm,I rank(R) = k, X Rnk,Y Rmk,

    R = XY T

    I rank(R) > k, :

    R XY T

    I RXY T F minX,Y SVD

    I R :

    i,j

    wij(rij xTi yj)2 = W (RXY T ) minX,Y

    8

  • R Rnm,I rank(R) = k, X Rnk,Y Rmk,

    R = XY T

    I rank(R) > k, :

    R XY T

    I RXY T F minX,Y SVD I R :

    i,j

    wij(rij xTi yj)2 = W (RXY T ) minX,Y

    8

  • I A Rkk:

    R = XY T = XAA1Y T = (XA)(Y AT )T

    I SVD1I rank(W ) = 1

    I rank(W ) > 1

    1srebro2003weighted.9

  • I (matrix completion)

    I M = (mij), {mij}(i,j), rankM = r.

    Theorem ( ) ,

    m Cn5/4r log n m = ||, 1 cn3 log n M .2

    2candes2009exact.10

  • : SGD

    J(u,i)(ui) = l(rui(ui)) + ui(ui)

    :I ()

    I (u, i) RI rui uiI :

    pnewu = pu (l(rui) ruipu (ui) +

    uipu

    )qnewi = qi

    (l(rui) ruiqi (ui) +

    uiqi

    )

    I , , ....

    11

  • : SGD

    J(u,i)(bu, bi,pu,pi) = (ruirui)+bias(b2u+b2i )+userpu2+itemqi2

    rui(bu, bi,pu,pi) = + bu + bi + pu, qi

    :I ()

    I (u, i) RI rui uiI :eui = rui ruipnewu = pu (euiqi + userpu + biasbu)qnewi = qi (euipu + itemqi + biasbi)

    I , , ....

    11

  • : SGD

    I I I :

    rui = f(+ bu + bi + pu, qi)I (

    )I 3

    I I

    I Vowpal Wabbit4 SGD SVD

    3bottou-91c.4https://github.com/JohnLangford/vowpal_wabbit

    12

  • : ALS

    (biases) .

    (u,i)R

    (rui pTuqi)2 + (

    u

    pu2 +i

    qi2) min

    {qi}, :

    u

    iR(u)

    (rui pTuqi)2 + pu2 min{pu}

    I {pu} {qi}

    I , MapReduce

    13

  • pu

    {qi}. pu :

    p?u = (QTuQu + I)

    1 W u

    QTuru du

    = W udu =

    jR(u)W uqjruj

    Qu qi i R(u), ru .

    I / .

    I .

    14

  • SVD

    rui = qTi p

    ?u =

    jR(u)

    qTi W uqjruj =

    jR(u)rujqi, qjW u

    W u , :W u = V

    TuV u. :

    rui =

    jR(u)rujV uqi,V uqj

    I simu(i, j) = qi, qjW u ;

    I V u .

    15

  • rui = 1 . , i: ri = 1. :

    p? = (qTi qi + I)

    1qi

    j:

    rj = qTj (qiq

    Ti + I)

    1qi = {ShermanMorrison formula} =

    =1

    (1 qi

    2

    + qi2)qj , qi

    I sim(i, j) qj , qiI sim(i, j):

    , .

    16

  • Implicit SVD

    u

    i

    cui(rui rui)2 + () min

    :I cui = 1 rui = 0, (u, i) 6 R

    ALS:I p?u = (Q

    TCuQ+ I)1QTCuruI QTCuQ = QTQ+QT (Cu I)Q

    I QTQ I (Cu I) |R(u)| I Curu |R(u)|

    :I , :

    I rui = 0 dislike, rui = 1 like;I (u, i) 6 R, dislike

    17

  • Max Inner-product

    fixed u : i? arg maxiqi,pu

    I ;

    I I arg minrS p r22 = arg maxrS

    (p, r 12r22)I arg maxrS

    q,rqr = arg maxrS

    q,rr

    I LSH ;

    18

Recommended

View more >