Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Maven central repository の artifact をランキングする #渋...

KOMIYA Atsushi
September 30, 2017

Maven central repository の artifact をランキングする #渋谷java

第二十回 #渋谷java の発表資料です。Maven central repository 上の artifact を PageRank を使ってランキングしてみる、というお話です。
https://shibuya-java.connpass.com/event/65433/

KOMIYA Atsushi

September 30, 2017
Tweet

More Decks by KOMIYA Atsushi

Other Decks in Programming

Transcript

  1. Artifact ΛϥϯΩϯά͢Δ • Maven central repository ্ͷ artifact ͸
 (όʔδϣϯҧ͍Λআ͍ͯ)

    20 ສҎ্ଘࡏ͢Δ • ΞϓϦέʔγϣϯʹ૊ΈࠐΉϥΠϒϥϦΛબఆ
 ͢Δࡍ͸ɺར༻࣮੷ͷ͋ΔϥΠϒϥϦΛબͼ͍ͨ • ௕͍΋ͷʹ͸ר͔Εͯੜ͖͍͖͍ͯͨੑ෼ • Artifact ʹର͢ΔϥϯΩϯά͕ཉ͍͠ʂ
  2. Artifact ͷґଘؔ܎ʹண໨͢Δ • ʮ͞·͟·ͳ artifact ʹґଘ͞Ε͍ͯΔ artifact ΄ Ͳɺॏཁͳ artifact

    Ͱ͋Δʯͱ͍͏ԾઆΛߟ͑Δ • Artifact ͝ͱͷʮඃࢀর਺ʯ Λࢦඪ஋ͱ͢Δํ๏͕ ߟ͑ΒΕΔ • ࿦จͷʮඃҾ༻਺ʯతͳߟ͑ํ • ୯७ͳʮඃࢀর਺ʯΑΓΑ͍ࢦඪ஋͸ͳ͍͔ʁ
  3. PageRank • Google ͷΞϨ • Web ϖʔδͷϦϯΫؔ܎͔Βϖʔδͷॏཁ౓Λଌఆ • ϦϯΫؔ܎͸༗޲άϥϑͱͯ͠දݱͰ͖Δ •

    Artifact ಉ࢜ͷґଘؔ܎Λ༗޲άϥϑͰදݱ͢Δ • ϊʔυ: artifactɺΤοδ: ґଘؔ܎ • Τοδ͸ґଘ͍ͯ͠Δ → ґଘ͞Ε͍ͯΔɺͷ޲͖ • είΞ͕ߴ͍ artifact ΄ͲɺॏཁͰ͋ΔͱղऍͰ͖Δ
  4. Ͳ͏΍ͬͯऩू͢Δʁ • https://repo1.maven.org/maven2/ ͔Βɺ͢΂ͯͷ POM ϑΝΠϧΛͻͨ͢ΒΫϩʔϧ͢Δʁ • όʔδϣϯҧ͍ࠐΈͰ artifact ૯਺͸

    200 ສҎ্… • ࠷৽όʔδϣϯͷ POM ͚ͩμ΢ϯϩʔυ͍ͨ͠ • ͔͠͠ɺͲͷ artifact ͕࠷৽όʔδϣϯͳͷ͔Λ
 (จࣈྻͷ) όʔδϣϯ৘ใ͔Β൑ఆ͢Δͷ͸໘౗
  5. Index ϑΝΠϧΛར༻͢Δ • ࣮͸ central repository ্ͷ͢΂ͯͷόʔδϣϯͷ artifact ΛؚΜͩ index

    ϑΝΠϧ͕ఏڙ͞Ε͍ͯΔ • https://maven.apache.org/repository/central- index.html • .properties ϑΝΠϧͱ gzip ѹॖ͞ΕͨϑΝΠϧ
 (300 MB ௒) ͷೋͭͰߏ੒͞Ε͍ͯΔ • Weekly Ͱߋ৽͞Ε͍ͯΔ
  6. Index ϑΝΠϧͰಘΒΕΔ / ಘΒΕͳ͍৘ใ • Index ϑΝΠϧ͔ΒಘΒΕΔ৘ใ (Ұ෦) • Group

    ID • Artifact ID • όʔδϣϯ • Classifier (sources / javadoc / linux-x86_64 ͱ͔ͷΞϨ) • Artifact ͷϑΝΠϧͷ࠷ऴߋ৽೔࣌ • ͜ΕͰ࠷৽όʔδϣϯͷ artifact Λ೺ѲͰ͖Δ͸ͣ • Index ϑΝΠϧ͔Β͸ಘΒΕͳ͍৘ใ • Artifact ಉ࢜ͷґଘؔ܎
  7. Index ϑΝΠϧͷ૸ࠪ • indexer-reader Λར༻͢Δ • group: 'org.apache.maven.indexer' • name:

    'indexer-reader' • ۩ମతͳར༻ํ๏͸ҎԼ URL ͷ࣮૷Λࢀর • http://bit.ly/maven-indexer-demo
  8. Artifact ಉ࢜ͷґଘؔ܎ • Maven central repository ্ͷ POM ϑΝΠϧΛ
 ࢀর͢ΔҎ֎ʹख͕ͳ͍ͬΆ͍

    • ࢓ํͳ͍ͷͰɺͻͨ͢ΒྗٕͰΫϩʔϧ • ֤ Artifact ͷ࠷৽όʔδϣϯʹݶఆ͢Ε͹ɺ
 ଟগ͸ϚγʹͳΔ • ͦΕͰ΋ 20 ສҎ্͚ͩͲ…
  9. POM ϑΝΠϧͷಡΈࠐΈ • maven-model Λར༻͢Δ • group: 'org.apache.maven' • name:

    'maven-model' public static void demo() throws Exception { try (InputStream in = new FileInputStream("path/to/pom.xml")) { Model model = new MavenXpp3Reader().read(in); // ґଘؔ܎͕औಘͰ͖Δ List<Dependency> dependencies = model.getDependencies(); } }
  10. Apache Spark / GraphX Λ࢖͏ • GraphX • Spark ্ͰάϥϑΛѻ͍ɺܭࢉ͢ΔͨΊͷ

    API Λఏڙ͢Δ • PageRank ͕͠Εͬͱ࣮૷͞Ε͍ͯΔ ❤ • άϥϑͷن໛తʹɺLocal mode ͰܭࢉՄೳ
  11. Apache Spark / GraphX Λ࢖͏ def run(sc: SparkContext): Unit =

    { // ਺஋දݱ͞Εͨ 2 ͭͷ artifact Λεϖʔε۠੾ΓͰฒ΂ͯґଘؔ܎Λදͨ͠ϑΝΠϧ val graph = GraphLoader.edgeListFile(sc, "path/to/dependency-graph.txt") // PageRank Λܭࢉ͢Δ val ranking = graph.pageRank(0.0001).vertices // Artifact ͷ਺஋දݱ͔Β GAV (groupId|artifactId|version) ΁ͷϚοϐϯά val artifacts = sc.textFile("path/to/artifacts.txt").map { line => val values = line.split(",") (values(0).toLong, values(1)) } // Artifact ͷ਺஋දݱΛ GAV ʹஔ͖׵͑ͯϑΝΠϧʹॻ͖ग़͢ artifacts.join(ranking).map { case (id, (gav, rank)) => (gav, rank) } .sortBy(_._2, ascending = false) .map(t => t._1 + "," + t._2) .saveAsTextFile("path/to/result") }
  12. ґଘؔ܎ͷάϥϑ • Maven ͷґଘؔ܎ʹ͸ʮείʔϓʯ͕͋Δ • compile, provided, runtime, test, system,

    import • ҎԼͷείʔϓ (ͷ૊Έ߹Θͤ) ͝ͱʹ PageRank Λܭࢉ͢Δ • ͢΂ͯ • compile • test • ͢΂ͯ (ґଘ͞Ε͍ͯΔ → ґଘ͍ͯ͠Δɺͷٯ޲͖)
  13. ϥϯΩϯά݁Ռʹ͍ͭͯ • Top 10 ΋͘͠͸ Top 20 ʹߜͬͯ͝঺հ • Top

    100 ·Ͱͷ݁Ռ͸ҎԼʹܝࡌ
 (Google εϓϨουγʔτ) • http://bit.ly/PackageRank
  14. ϥϯΩϯά: ͢΂ͯ (#1~10) 1BHF3BOL HSPVQ BSUJGBDU WFSTJPO  KVOJU KVOJU

      PSHTDBMBMBOH TDBMBDPNQJMFS   PSHTMGK TMGKBQJ BMQIB  PSHNPDLJUP NPDLJUPDPSF   PSHUFTUOH UFTUOH   PSHTDBMBUFTU TDBMBUFTU@   PSHNPDLJUP NPDLJUPBMM CFUB  KBWBYTFSWMFU TFSWMFUBQJ BMQIB  DIRPTMPHCBDL MPHCBDLDMBTTJD   PSHPCKFOFTJT PCKFOFTJT  http://bit.ly/PackageRank
  15. ϥϯΩϯά: ͢΂ͯ (#11~20) 1BHF3BOL HSPVQ BSUJGBDU WFSTJPO  KBWBYTFSWMFU KBWBYTFSWMFUBQJ

      PSHBTTFSUK BTTFSUKDPSF   MPHK MPHK   PSHPTHJ PSHPTHJDPSF   PSHTMGK TMGKMPHK BMQIB  PSHTDBMBMBOH TDBMBMJCSBSZ   OFUCZUFCVEEZ CZUFCVEEZ   PSHTDBMBUFTU TDBMBUFTU@   OFUCZUFCVEEZ CZUFCVEEZBHFOU   PSHTMGK TMGKTJNQMF BMQIB http://bit.ly/PackageRank
  16. ϥϯΩϯάτοϓͷ܏޲ • ςετؔ࿈ • junit, testng, scalatest, assertj, mockito •

    ݴޠ • Scala (scala-compiler, scala-library) • ϩά • slf4j, logback, log4j (log4j2 ͡Όͳ͍) • ͦͷଞ • objenesis, byte-buddy, servlet-api, org.osgi.core…
  17. ϥϯΩϯά: compile 1BHF3BOL HSPVQ BSUJGBDU WFSTJPO  PSHTDBMBMBOH TDBMBMJCSBSZ 

     PSHTMGK TMGKBQJ BMQIB  KVOJU KVOJU   DPNHPPHMFHVBWB HVBWB   PSHBOUMS BOUMSSVOUJNF   PSHBOUMS TUSJOHUFNQMBUF   DPNHPPHMFDPEFHTPO HTPO   PSHKFUCSBJOT BOOPUBUJPOT   DPNHPPHMFDPEFpOECVHT KTS   PSHKFUCSBJOTLPUMJO LPUMJOTUEMJC  http://bit.ly/PackageRank-compile
  18. ϥϯΩϯά: compile 1BHF3BOL HSPVQ BSUJGBDU WFSTJPO  PSHTDBMBMBOH TDBMBMJCSBSZ 

     PSHTMGK TMGKBQJ BMQIB  KVOJU KVOJU   DPNHPPHMFHVBWB HVBWB   PSHBOUMS BOUMSSVOUJNF   PSHBOUMS TUSJOHUFNQMBUF   DPNHPPHMFDPEFHTPO HTPO   PSHKFUCSBJOT BOOPUBUJPOT   DPNHPPHMFDPEFpOECVHT KTS   PSHKFUCSBJOTLPUMJO LPUMJOTUEMJC  ❗ http://bit.ly/PackageRank-compile
  19. ϥϯΩϯά: test 1BHF3BOL HSPVQ BSUJGBDU WFSTJPO  KVOJU KVOJU 

     PSHNPDLJUP NPDLJUPDPSF   PSHTMGK TMGKBQJ BMQIB  PSHUFTUOH UFTUOH   PSHTDBMBUFTU TDBMBUFTU@   PSHNPDLJUP NPDLJUPBMM CFUB  DIRPTMPHCBDL MPHCBDLDMBTTJD   PSHBTTFSUK BTTFSUKDPSF   PSHTMGK TMGKMPHK BMQIB  PSHTQPDLGSBNFXPSL TQPDLDPSF HSPPWZ http://bit.ly/PackageRank-test
  20. ϥϯΩϯά: ͢΂ͯ (ٯ޲͖) 1BHF3BOL HSPVQ BSUJGBDU WFSTJPO  PSHBQBDIFDMFSF[[B QMBUGPSNMBVODIFSTUPSBHFMFT

    TQBSFOU JODVCBUJOH  PSHRJKMJCSBSZ PSHRJKMJCSBSZTIJSPXFC   DPNHJUIVCMJWFTFOTF PSHMJWF4FOTFBTTFNCMJFT   PSHBQBDIFQPMZHFOFMJCSBSJFT PSHBQBDIFQPMZHFOFMJCSBSZ TIJSPXFC   DPNHJUIVCTOPXESFBNBOESPJE XJEHFU   PSHCMVFTUFNTPGUXBSFPQFOFPBFYBN QMFBQQMJDBUJPOTQSJOH PSEFSNBOBHFSBQQMJDBUJPO   PSHCMVFTUFNTPGUXBSFPQFOFPBFYBN QMFBQQMJDBUJPOTQSJOH XBSFIPVTFNBOBHFS BQQMJDBUJPO   LSQFLXPOOBNTQZNFNDBDIFEFYUSB USBOTDPEFST TQZNFNDBDIFEFYUSB USBOTDPEFSTDPSF   PSHBQBDIFTFSWJDFNJYCVOEMFT PSHBQBDIFTFSWJDFNJYCVOEM FTBXTKBWBTEL @  NFUBUBSLBHTPOWBMVF HTPOWBMVF  http://bit.ly/PackageRank-inverted
  21. ·ͱΊ • Artifact ͷґଘؔ܎Λ΋ͱʹ PageRank Λܭࢉ ͠ɺartifact ΛϥϯΩϯάͯ͠Έͨ • ·͋·͋ଥ౰ͳ݁Ռ…͔ͳʁ

    • ࠷ۙެ։͞Εͨɺྺ࢙ͷઙ͍ artifact ͚ͩʹߜͬ ͯ PageRank Λܭࢉͯ͠Έ͍ͨ • ࠷ۙͷτϨϯυతͳ artifact Λݟ͚ͭΔ͜ͱ͕ Ͱ͖Δ͔΋