Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PageRank all the things! - Dutch PHP Conference...

PageRank all the things! - Dutch PHP Conference 2019

Joind.in: https://joind.in/talk/fb7e9
Video recording: https://www.youtube.com/watch?v=AeZJnG9lfRs

Most people know PageRank as Google’s algorithm for ranking search results, but it’s uses extend far beyond only that: PageRank has already been utilised for analysing social networks, finding the most important functions in source code, predicting traffic, and deriving a more accurate ranking table of teams in an ongoing sports competition. In this session we will cover the basics of linear algebra, developing an intuitive notion of how matrices and vectors interact, and use it to understand the principles of PageRank. Then we’ll jump straight into real-life applications of PageRank beyond web search and how these can be implemented in PHP using the math-php library.

Arnout Boks

June 08, 2019
Tweet

More Decks by Arnout Boks

Other Decks in Programming

Transcript

  1. @arnoutboks #dpc19 Story time # Team Pnt 1 DEO 2

    30 2 Netwerk 1 24 3 Punch 6 24 4 Delta 5 23 5 Red Stars 1 23 6 Delta 4 17 7 Kalinko 6 12 8 Kratos 7 7 9 Punch 9 6 10 Sovicos 4 4 next match: Netwerk 1 – Delta 4
  2. @arnoutboks #dpc19 Story time # Team Pnt 1 DEO 2

    30 2 Netwerk 1 24 3 Punch 6 24 4 Delta 5 23 5 Red Stars 1 23 6 Delta 4 17 7 Kalinko 6 12 8 Kratos 7 7 9 Punch 9 6 10 Sovicos 4 4 Can we account for the fact that some teams have yet played against weaker opposition than others?
  3. @arnoutboks #dpc19 To be continued… $ git stash Saved working

    directory and index state WIP on master: 5002d47 PageRank all the things! HEAD is now at 5002d47 PageRank all the things!
  4. @arnoutboks #dpc19 Vectors v = 8 3 -4 1 4

    dimension 4 (“4-vector”)
  5. @arnoutboks #dpc19 Matrices 8 0 42 3 3 7 4

    -7 1 1 2 6 M = 4 dimension 4, 3 (“4x3-matrix”) 3
  6. @arnoutboks #dpc19 Matrix-vector multiplication Mv = 8 x 8 0

    x 3 42 x -4 3 x 8 3 x 3 7 x -4 4 x 8 -7 x 3 1 x -4 1 x 8 2 x 3 6 x -4 8 3 -4
  7. @arnoutboks #dpc19 Matrix-vector multiplication Mv = 64 + 0 +

    -168 24 + 9 + -28 32 + -21 + -4 8 + 6 + -24
  8. @arnoutboks #dpc19 Matrix-vector multiplication Mv = 64 + 0 +

    -168 24 + 9 + -28 32 + -21 + -4 8 + 6 + -24 = -104 5 7 -10
  9. @arnoutboks #dpc19 Matrix-vector multiplication 8 0 42 3 3 7

    4 -7 1 1 2 6 8 3 -4 Mv = = -104 5 7 -10
  10. @arnoutboks #dpc19 Matrix-vector multiplication 8 0 42 3 3 7

    4 -7 1 1 2 6 8 3 -4 Mv = = -104 5 7 -10 “4x3-matrix multiplied by a 3-vector yields a 4-vector”
  11. @arnoutboks #dpc19 Chaining transformations M 4x3-matrix 4D 3D N 2x4-matrix

    2D Is there a 2x3-matrix describing this transformation?
  12. @arnoutboks #dpc19 Matrix multiplication 8 0 42 3 3 7

    4 -7 1 1 2 6 NM = 1 0 -2 1 3 1 0 -1
  13. @arnoutboks #dpc19 Matrix multiplication 8 0 42 3 3 7

    4 -7 1 1 2 6 NM = 1 0 -2 1 3 1 0 -1 8 3 4 1 = 1 27
  14. @arnoutboks #dpc19 Matrix multiplication 8 0 42 3 3 7

    4 -7 1 1 2 6 NM = 1 0 -2 1 3 1 0 -1 0 3 -7 2 = 1 16 27 1
  15. @arnoutboks #dpc19 Matrix multiplication 8 0 42 3 3 7

    4 -7 1 1 2 6 NM = 1 0 -2 1 3 1 0 -1 42 7 1 6 = 1 16 46 27 1 127
  16. @arnoutboks #dpc19 Matrix multiplication 8 0 42 3 3 7

    4 -7 1 1 2 6 NM = 1 0 -2 1 3 1 0 -1 = 1 16 46 27 1 127
  17. @arnoutboks #dpc19 Matrix multiplication 8 0 42 3 3 7

    4 -7 1 1 2 6 NM = 1 0 -2 1 3 1 0 -1 = 1 16 46 27 1 127 “2x4-matrix multiplied by 4x3-matrix yields a 2x3-matrix”
  18. @arnoutboks #dpc19 Matrix multiplication as chained transformation M 4x3-matrix 4D

    3D N 2x4-matrix 2D 3D NM 2x3-matrix 2D N(Mv) = (NM)v
  19. @arnoutboks #dpc19 The PageRank of a page depends on the

    PageRank of the pages linking to it
  20. @arnoutboks #dpc19 Approach Let n be the number of web

    pages Let s be an n-vector of scores for these pages Let M be an n×n-matrix describing the dependencies of scores
  21. @arnoutboks #dpc19 Approach Let n be the number of web

    pages Let s be an n-vector of scores for these pages Let M be an n×n-matrix describing the dependencies of scores s = Ms
  22. @arnoutboks #dpc19 Creating the matrix M A B C D

    A 0 0 0 0 B 0 0 0 0 C 0 0 0 0 D 1 0 0 0 Outbound 1 0 0 0 A D
  23. @arnoutboks #dpc19 Creating the matrix M A D A B

    C D A 0 1 0 0 B 1 0 1 0 C 0 1 0 1 D 1 1 0 0 Outbound 2 3 1 1 B C
  24. @arnoutboks #dpc19 Creating the matrix M A D A B

    C D A 0 1/3 0 0 B 1/2 0 1 0 C 0 1/3 0 1 D 1/2 1/3 0 0 Outbound 2 3 1 1 B C
  25. @arnoutboks #dpc19 Equation s = Ms A B C D

    A 0 1/3 0 0 B 1/2 0 1 0 C 0 1/3 0 1 D 1/2 1/3 0 0 sA sB sC sD M s
  26. @arnoutboks #dpc19 Equation s = Ms A B C D

    A 0 1/3 0 0 B 1/2 0 1 0 C 0 1/3 0 1 D 1/2 1/3 0 0 sA sB sC sD M s sA sB sC sD
  27. @arnoutboks #dpc19 Equation s = Ms A B C D

    A 0 1/3 0 0 B 1/2 0 1 0 C 0 1/3 0 1 D 1/2 1/3 0 0 sA sB sC sD M s sA sB sC sD =
  28. @arnoutboks #dpc19 Equation s = Ms A B C D

    A 0 1/3 0 0 B 1/2 0 1 0 C 0 1/3 0 1 D 1/2 1/3 0 0 sA sB sC sD M s sA sB sC sD = “The higher A scores, the more ‘points’ D gets for being one of the two pages A links to”
  29. @arnoutboks #dpc19 Eigenvalue problem Q: Does there exist a vector

    s such that for the given matrix M? s = Ms
  30. @arnoutboks #dpc19 Eigenvalue problem Q: Does there exist a vector

    s and a number λ such that for the given matrix M? λs = Ms
  31. @arnoutboks #dpc19 Eigenvalue problem Q: Does there exist a vector

    s and a number λ such that for the given matrix M? λs = Ms A: Yes, with λ = 1 Fine print: under certain technical conditions. Lookup "Perron–Frobenius theorem” if you’re interested.
  32. @arnoutboks #dpc19 PageRank as simulated surfing • Someone starts surfing

    somewhere on the internet • On every page, they follow a random link • What page is shown when the buzzer goes? • Score of a page is the probability of ending up on it
  33. @arnoutboks #dpc19 PageRank as simulated surfing A B C D

    A 0 1/3 0 0 B 1/2 0 1 0 C 0 1/3 0 1 D 1/2 1/3 0 0 sA sB sC sD M s sA sB sC sD s = “When on page A, there’s a ½ chance of going to D”
  34. @arnoutboks #dpc19 Power Method M s(n) s(n+1) transition probabilities probability

    per page after n clicks probability per page after n+1 clicks
  35. @arnoutboks #dpc19 Power Method M s(n) s(n+1) transition probabilities probability

    per page after n clicks probability per page after n+1 clicks s(0) reasonable initial guess
  36. @arnoutboks #dpc19 Power Method is proven to converge to the

    eigenvector (for a matrix M like we have)
  37. @arnoutboks #dpc19 Implementation in PHP <?php use Aboks\PowerIteration\PowerIteration; use MathPHP\LinearAlgebra\Matrix;

    $m = new Matrix([/* ... */]); $pi = new PowerIteration(); $pair = $pi->getDominantEigenpair($m); var_dump($pair->getEigenvector());
  38. @arnoutboks #dpc19 Behind the scenes <?php function getEigenvector(Matrix $m): Vector

    { $ones = array_fill(0, $m->getM(), 1); $v = new Vector($ones); for ($i = 0; $i < 1000; $i++) { $v = $m->vectorMultiply($v); } return $v; }
  39. @arnoutboks #dpc19 Implementation concerns Stopping criterion • Number of iterations

    • Eigenvector tolerance Scaling intermediate results Preventing reducible matrices
  40. @arnoutboks #dpc19 Preventing reducible matrices When moving on to a

    new page: • α probability of following a link • (1- α) probability of ‘teleporting’ to a random page
  41. @arnoutboks #dpc19 Preventing reducible matrices A B C D A

    (1/n)(1 – α) (1/n)(1 – α) + 1/3α (1/n)(1 – α) (1/n)(1 – α) B (1/n)(1 – α) + 1/2α (1/n)(1 – α) (1/n)(1 – α) + 1α (1/n)(1 – α) C (1/n)(1 – α) (1/n)(1 – α) + 1/3α (1/n)(1 – α) (1/n)(1 – α) + 1α D (1/n)(1 – α) + 1/2α (1/n)(1 – α) + 1/3α (1/n)(1 – α) (1/n)(1 – α) M’i,j = (1/n)(1 – α) + αMi,j
  42. @arnoutboks #dpc19 PageRank in general Analysis of networks/directed graphs: •

    Edge A → B increases score of B • The higher the score of A, the higher the score of B
  43. @arnoutboks #dpc19 Back to our story… $ git stash pop

    On branch master Changes to be committed: (use "git reset HEAD <file>..." to unstage) new file: volleyball-story.txt Dropped refs/stash@{0}(b180f4)
  44. @arnoutboks #dpc19 The results # Team Pnt 1 DEO 2

    30 2 Netwerk 1 24 3 Punch 6 24 4 Delta 5 23 5 Red Stars 1 23 6 Delta 4 17 7 Kalinko 6 12 8 Kratos 7 7 9 Punch 9 6 10 Sovicos 4 4
  45. @arnoutboks #dpc19 The results # Team Pnt 1 DEO 2

    30 2 Netwerk 1 24 3 Punch 6 24 4 Delta 5 23 5 Red Stars 1 23 6 Delta 4 17 7 Kalinko 6 12 8 Kratos 7 7 9 Punch 9 6 10 Sovicos 4 4 # Team % 1 Punch 6 25.79 2 DEO 2 16.86 3 Red Stars 1 15.31 4 Delta 5 14.13 5 Delta 4 11.03 6 Netwerk 1 8.79 7 Kalinko 6 3.13 8 Kratos '08 7 2.49 9 Punch 9 1.24 10 Sovicos 4 1.23
  46. @arnoutboks #dpc19 Feedback & Questions @arnoutboks @arnoutboks @aboks Arnout Boks

    Please leave your feedback on joind.in: https://joind.in/talk/fb7e9 We’re hiring!
  47. @arnoutboks #dpc19 Image Credits • https://www.flickr.com/photos/42283496@N08/4011391702/ • https://pixabay.com/illustrations/google-search-engine-browser- search-76517/ •

    https://unsplash.com/photos/68ZlATaVYIo • https://imgur.com/gallery/SLHBV • https://www.flickr.com/photos/125329869@N03/14418114781 • https://unsplash.com/photos/ZiQkhI7417A