Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Who added DOI Links to Wikipedia? by Jiro Kikka...

Jiro Kikkawa
November 22, 2018

Who added DOI Links to Wikipedia? by Jiro Kikkawa (@jir_o) #WikiCite / WikiCite2018

Hi, Again! WikiCite.
This is Lightning Talk slides to introduce about my ongoing project.
I note that perhaps another version on Dropbox is better (* Contents are absolutely the same, but hyperlinks in slides are also available! ) : https://www.dropbox.com/s/5yi1qczeqojqwcc/wikicite2018.pdf?dl=0
WikiCite 2018: https://meta.wikimedia.org/wiki/WikiCite_2018

Jiro Kikkawa

November 22, 2018

More Decks by Jiro Kikkawa

Other Decks in Research


  1. Who added DOI Links to Wikipedia? https://twitter.com/jir_o Mail: [email protected] Pokémon

    GO Friend Code: 4103 1517 3137 Jiro Kikkawa Graduate School of Library, Information and Media Studies University of Tsukuba, JAPAN 2018 1
  2. Hi, Again! WikiCite • Last year, I talked about “DOI

    Links on English, Japanese, and Chinese Wikipedias” 2 Slide: https://speakerdeck.com/corgies/wikicite2017 Paper: http://hdl.handle.net/2241/00144576 • Research interest: scholarly references on Wikipedia(s)
  3. Who added DOI Links to Wikipedia? • My on-going project

    (as a Ph.D. study) • Building the Dataset that contains – identified “Who” added DOI Links to the pages on English Wikipedia (as of 5, March 2017), and “When” – also identfied “research fields” for each of them by using Crossref metadata, ISSN, and Essential Science Indicators (Journal Categories) • Q. Who are the Top Editors for each Research Field? 3
  4. Dataset Overview 4 type No. of unique Editors No. of

    total DOI Links No. of unique DOI Links No. of unique Pages User 34,072 628,288 67.8% 449,328 153,820 Bot 21 264,569 28.5% 181,236 59,942 IP (Not Login) 16,349 33,946 3.7% 28,787 14,499 ALL - 926,803 100.0% 611,524 181,162 * Main Namespace Only
  5. Top 3 Editors • All Fields (User) 5 Rank User

    # DOI Links % 1 Rjwilmsi 171,414 27.3 2 Boghog 30,268 4.8 3 Chris_Capoccia 16,022 2.6 The editor “Rjwilmsi” added many DOI Links that account for about one-quarter of ALL DOI Links added by Users on English Wikipedia (excluding IP users and Bots)
  6. Top Editors of 22 Research Fields (1/2) 6 No. Research

    Field # total DOI Links Top Editor # DOI Links % 1 SOCIAL SCIENCES, GENERAL 44,638 Rjwilmsi 16,852 37.8 2 ECONOMICS & BUSINESS 9,344 Rjwilmsi 3,348 35.8 3 PSYCHIATRY/PSYCHOLOGY 31,887 Rjwilmsi 12,471 39.1 4 Multidisciplinary 61,670 Rjwilmsi 13,606 22.1 5 COMPUTER SCIENCE 8,430 Rjwilmsi 2,249 26.7 6 ENVIRONMENT/ECOLOGY 17,554 Rjwilmsi 5,556 31.7 7 CLINICAL MEDICINE 87,107 Rjwilmsi 27,541 31.6 8 PHARMACOLOGY & TOXICOLOGY 17,783 Rjwilmsi 5,079 28.6 9 IMMUNOLOGY 11,570 Rjwilmsi 4,162 36.0 10 MOLECULAR BIOLOGY & GENETICS 47,875 Rjwilmsi 13,115 27.4 11 MICROBIOLOGY 14,684 Rjwilmsi 4,280 29.1
  7. Top Editors of 22 Research Fields (2/2) 7 No. Research

    Field # total DOI Links Top Editor # DOI Links % 12 AGRICULTURAL SCIENCES 6,602 Rjwilmsi 1,452 22.0 13 PLANT & ANIMAL SCIENCE 56,609 Rjwilmsi 15,172 26.8 14 BIOLOGY & BIOCHEMISTRY 48,969 Rjwilmsi 14,365 29.3 15 NEUROSCIENCE & BEHAVIOR 29,602 Rjwilmsi 9,488 32.1 16 PHYSICS 18,919 Rjwilmsi 2,723 14.4 17 SPACE SCIENCE 28,927 RJHall 4,356 15.1 18 GEOSCIENCES 23,377 Rjwilmsi 5,407 23.1 19 MATHEMATICS 15,677 Rjwilmsi 5,078 32.4 20 MATERIALS SCIENCE 4,478 Material scientist 742 16.6 21 ENGINEERING 8,566 Rjwilmsi 2,042 23.8 22 CHEMISTRY 34,020 Rjwilmsi 4,443 13.1
  8. Q. He or She is Bot? 8 User:Rjwilmsi - Wikipedia

    https://en.wikipedia.org/wiki/User:Rjwilmsi A. No (but really?)
  9. Findings As for User, • 34,072 Editors added 628,288 DOI

    Links* to English Wikipedia as of 5, March 2017 • Top 3 Editors (All Fields) are Rjwilmsi, Boghog, and Chris Cappocia • Rjwilmsi is the Top Editor in 20 out of 22 Research Fields ! 9 * ( Strictly, it is not ALL DOI Links referenced on English Wikipedia but DOI Links whose Research Fields are identifiable by using ISSN and ESI journal category data )
  10. Thank you for your attention J • These results are

    just one part of my findings Now I’m writing an original paper (in Japanese) by using this dataset. I’d like to share full results at WikiCite Next time! • I will share the dataset, codes, and papers with all of you ASAP (after accepted/published the paper) I hope it will lead to future collaborations among the WikiCite community devoting for similar/related projects, and scientometrics researchers. • If you are the editor I mentioned in this talk, Please raise your hand or talk to me ... J 10
  11. Who added DOI Links to Wikipedia? https://twitter.com/jir_o Mail: [email protected] Pokémon

    GO Friend Code: 4103 1517 3137 Jiro Kikkawa Graduate School of Library, Information and Media Studies University of Tsukuba, JAPAN 2018 11
  12. Dataset Sample (JSON) 12 { "doiname": "10.1126/science.1097859", "ns": "0", "page_id":

    "716919", "userid": "2715142", "user": "DadaNeem", "anonymous": false, "revid": "530812128", "timestamp_utc": "2013-01-01 21:13:01 UTC", "title": "Rico (dog)", "ra": "Crossref", "type": "journal-article", "doi_prefix": "10.1126", "publisher": "American Association for the Advancement of Science (AAAS)",
  13. Dataset Sample (JSON) Cont. 13 "issn": ["10959203", "00368075"], "doi_created_date": "2004-06-10

    19:53:06 UTC", "comment": "made more precise citations; 4 wikilinks", "parsedcomment": "made more precise citations; 4 wikilinks", "yearlag_issued": "9", "yearlag_doi_created": "9", "matched_type": "issn", "full_title": "SCIENCE", "research_field": ["Multidisciplinary"], "bot_flag": false }
  14. Note: How to extract DOI Links 1. Download Dump data

    files from https://dumps.wikimedia.org/backup-index.html 1. externallinks.sql 2. iwlinks.sql 3. pages.sql 2. Import to MySQL 14
  15. Note: How to extract DOI Links and identify the page

    ids 3. Extraction conditions - external links contained “doi.org” in the el_to column of externallinks.sql (it is also needed to remove non-DOI links) - the prefix of interwiki links equaled to “doi” in the iwl_prefix column of iwlinks.sql - INNER JOIN (to filter main namespace: pages.sql) - These process also allow to identify the page ids 15
  16. Note: How to identify when the DOI Links were added

    to the pages 4. Get all revisions for each page contains DOI Links - API:Revisions https://www.mediawiki.org/wiki/API:Revisions 5. Get all hyperlinks referenced on the each revisions - API:parsing_wikitext https://www.mediawiki.org/wiki/API:Parsing_wikitext - I used DOI names (regexp like 10.d+¥/.*?$) as the key and identified the oldest version of the each page that someone added first DOI links to the page - That’s all (I needed a few months to process all of them) 16
  17. Note: How to identify “research field” for each DOI Link

    1. Get metadata for each DOI names by using Crossref REST API http://api.crossref.org 2. Get ESI journal category data from http://ipscience- help.thomsonreuters.com/incitesLiveESI/8289-TRS.html 1. I converted journal category data like this {Key: ISSN, Value: [Research Fields]} 3. Identify the research field(s) for each DOI link by using ISSN 17