Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Time Lag Analysis of Adding Scholarly Reference...

Time Lag Analysis of Adding Scholarly References to English Wikipedia: How Rapidly Are They Added to and How Fresh Are They? / iConference2023

Presentation slide at iConferene 2023 https://www.ischools.org/iconference

Kikkawa, Jiro; Takaku, Masao; Yoshikane, Fuyuki: "Time Lag Analysis of Adding Scholarly References to English Wikipedia: How Rapidly Are They Added to and How Fresh Are They?", Proceedings of the 18th International Conference, iConference 2023, Barcelona, Spain, Lecture Notes in Computer Science (LNCS), Vol. 13972, pp. 425-438, 2023.03. https://doi.org/10.1007/978-3-031-28032-0_33.

Jiro Kikkawa

March 14, 2023
Tweet

More Decks by Jiro Kikkawa

Other Decks in Research

Transcript

  1. Time Lag Analysis of Adding Scholarly References to English Wikipedia

    1 iConference 2023: Normality, Virtuality, Physicality, Inclusivity Jiro Kikkawa Masao Takaku Fuyuki Yoshikane { jiro, masao, fuyuki } @ slis.tsukuba.ac.jp How rapidly are they added to and how fresh are they? University of Tsukuba, Japan Paper: https://doi.org/10.1007/978-3-031-28032-0_33 Slide: https://speakerdeck.com/corgies/iconference2023
  2. 2 Background • Mass digitization of scholarly communication – Various

    kinds of communities and people, including non-traditional readers, such as researchers and specialists can utilize scholarly documents. – Wikipedia offers numerous references and access to scholarly documents, and Wikipedia is one of the largest referrers of Crossref DOIs as of 2015. • Scholarly references on Wikipedia – complement and improve the quality of Wikipedia content. Difficulties defining LIS "The question, 'What is library and information science?' does not elicit responses [...] Chua & Yang (2008) [10] studied papers published in Journal of the American Society for Information Science and Technology in the period 1988–1997 and found, among other things: "Top authors have grown in diversity from those being affiliated predominantly with library/information-related departments to include those from information systems management, information technology, business, and the humanities. […] " References 1. Bates, M.J. and Maack, M.N. (eds.). (2010). Encyclopedia of Library and Information Sciences. Vol. 1–7. CRC Press, Boca Raton, USA. Also available as an electronic source. […] 10. Chua, Alton Y.K.; Yang, Christopher C. (November 2008). "The shift towards multi- disciplinarity in information science". Journal of the American Society for Information Science and Technology. 59 (13): 2156– 2170. doi:10.1002/asi.20929. Figure 1. Example of the scholarly reference on English Wikipedia. Library and information science - Wikipedia https://en.wikipedia.org/wiki/Library_and_information_science
  3. 3 • Scholarly references complement and improve the quality of

    Wikipedia content. – Scholarly references on Wikipedia articles should be added as soon as possible. Moreover, the quantity and freshness of scholarly references are crucial to cover the latest academic knowledge. – However, little is known about them, such as how rapidly they are added and how fresh they are. – In this study, we conduct a time lag analysis regarding the editors and edits for adding scholarly references to Wikipedia to answer the following RQs. 1. How does the number of Wikipedia articles with scholarly references grow over time? 2. How long is the time lag between the publishing date of each scholarly article and the addition of the corresponding scholarly reference to Wikipedia articles? 3. How long is the time lag between the creation date of each Wikipedia article and the date of the first scholarly reference added to that article? RQ RQ RQ Purpose
  4. 4 • In this study, we conduct a time lag

    analysis regarding the editors and edits for adding scholarly references to Wikipedia to answer the following RQs. 1. How does the number of Wikipedia articles with scholarly references grow over time? 2. How long is the time lag between the publishing date of each scholarly article and the addition of the corresponding scholarly reference to Wikipedia articles? 3. How long is the time lag between the creation date of each Wikipedia article and the date of the first scholarly reference added to that article? RQ RQ RQ The contributions of this study 1. We clarified the long-term changes in the use of scholarly articles in the online encyclopedia community. 2. We attempted to identify the factors behind these changes in the online encyclopedia community. • If the time lag mentioned in RQ2 and RQ3 decreased over time, the factors causing this were investigated. Purpose
  5. Related Works 5 n Analysis of scholarly references on Wikipedia

    n The shift from quantity to quality in the Wikipedia community n Analysis of the freshness of the references in scholarly articles
  6. • Most previous studies have focused on the scholarly document

    itself, and little is known about the editors and their contributions to adding scholarly references to Wikipedia. 1. whether the scholarly articles published in high-impact factor journals tend to be more referenced on Wikipedia [Nielsen, 2007; Teplitskiy, 2016] 2. whether the scholarly articles published in open access journals tend to be more referenced on Wikipedia [Teplitskiy, 2016; Lin and Fenner, 2014; Pooladian and Borrego, 2017] 3. whether the references on Wikipedia are usable as a data source for research evaluations [Kousha and Thelwall, 2017] 4. investigations regarding the characteristics of Wikipedia articles with scholarly references [Pooladian and Borrego, 2017] 5. investigations regarding the references focused on specific identifiers (e.g., DOI, arXiv, ISSN, and ISBN) [Kikkawa, 2016; Kikkawa, 2020b; Halfaker and Taraborelli, 2019] or research fields [Thelwall, 2016; Pooladian and Borrego, 2017] 6. investigations regarding the editors and their edits for adding scholarly references to Wikipedia [Kikkawa, 2020a; Kikkawa, 2021b] Previous studies focused on the scholarly document itself Analysis of scholarly references on Wikipedia 6
  7. • We proposed methods to identify the first appearances of

    scholarly references on Wikipedia using paper ttitles and their identifiers. – We built a dataset of the first appearances of scholarly references on English Wikipedia articles as of 1st March 2017. Next, we evaluated the precision for detecting the first appearance, which was overall 93.3% and exceeded 90% in 20 out of 22 research fields [Kikkawa, 2020a; Kikkawa, 2022] – In addition, we published an updated version of the dataset of the first appearances of scholarly references on English Wikipedia articles as of 1st October 2021 [Kikkawa, 2021a; Kikkawa, 2022] – Using this dataset above, we conduct a time lag analysis regarding added scholarly references to Wikipedia. Analysis of scholarly references on Wikipedia 7
  8. 1. Scholarly reference • The reference added to Wikipedia articles

    by which a certain paper and its research field are uniquely identifiable. • We did not consider roles, such as references, being used as evidence for a certain part of the content of the Wikipedia article, those just mentioning a paper, or those listed in further readings. 2. First appearance of the scholarly reference • The oldest scholarly reference added to each Wikipedia article. • If multiple references corresponding to the same paper in the same article were found, the oldest one was treated as the first appearance. Definition of the terms 9
  9. • Dataset of first appearances of the scholarly references on

    English Wikipedia articles as of 1st October 2021 [Kikkawa et al., 2022] - The first appearances of scholarly references and their research fields were identified using Crossref DOIs and Essential Science Indicator categories - 1,474,347 scholarly references appearing in 313,240 English Wikipedia articles in the main namespace • Each editor is classified as the follows: - User editor: human editors among the registered editors - Bot editor: non-human editors among the registered editors - IP editor: non-registered editors Dataset 10
  10. • We investigated the number of created Wikipedia articles containing

    scholarly references by editor types and their time-series transitions. Analysis Methods 11 Basic statistics of Wikipedia articles with scholarly references RQ1. How does the number of Wikipedia articles with scholarly references grow over time? Time lag between publishing each scholarly article and adding the corresponding reference to the Wikipedia article RQ2. How long is the time lag between the publishing date of each scholarly article and the addition of the corresponding scholarly reference to Wikipedia articles? • We calculated the time lag between publishing of each scholarly article and adding the corresponding reference to the Wikipedia articles. - e.g., In the case when the timestamp of the first appearance of a scholarly reference is “2016-08-06 16:05:57 UTC” and the published year of the paper is 2015, the time lag is one year (= 2016 - 2015). - We removed cases when the published year was empty or the time lag was less than zero as an error.
  11. Step 1 • We set the target as the first

    scholarly reference on each Wikipedia article. • The reason for filtering only the oldest references was to clarify the time period without references for each article and its transitions over time. Step 2 • We calculated the time lags between the creation date of each Wikipedia article and the date of adding the first reference to the article. • e.g., If the creation date of the Wikipedia article is “2001-11-22 16:37:56 UTC” and the date of adding the first reference to the article is “2016-08-06 16:05:57 UTC,” the time lag is 5370.98 days (converted from 464,052,481 seconds) Step 3 • We analyzed the characteristics and transitions of the time lag by comparing the groups for the creation years of Wikipedia articles. Analysis Methods 12 Time lag between the creation date of each Wikipedia article and the date of adding the first scholarly reference to the corresponding article RQ3. How long is the time lag between the creation date of each Wikipedia article and the date of the first scholarly reference added to that article?
  12. Basic statistics of Wikipedia articles with scholarly references RQ1. How

    does the number of Wikipedia articles with scholarly references grow over time? 14 Years Total User editors Bot editors IP editors 2001-2002 14,951 12,280 82.13 % 0 0.00 % 2,671 17.87 % 2003-2004 34,633 25,913 74.82 % 111 0.32 % 8,609 24.86 % 2005-2006 53,211 46,054 86.55 % 174 0.33 % 6,983 13.12 % 2007-2008 52,395 39,592 75.56 % 12,782 24.40 % 21 0.04 % 2009-2010 30,439 28,241 92.78 % 2,103 6.91 % 95 0.31 % 2011-2012 23,954 23,635 98.67 % 167 0.70 % 152 0.63 % 2013-2014 22,920 22,491 98.13 % 261 1.14 % 168 0.73 % 2015-2016 21,677 21,298 98.25 % 214 0.99 % 165 0.76 % 2017-2018 28,222 23,283 82.50 % 4,810 17.04 % 129 0.46 % 2019-2020 22,151 21,926 98.98 % 6 0.03 % 219 0.99 % 2021 8,687 8,632 99.37 % 0 0.00 % 55 0.63 % Overall 313,240 273,345 87.26 % 20,628 6.59 % 19,267 6.15 % Table 2. Number of created Wikipedia articles containing scholarly references by editor types for every 2 years (n=313,240) • The total number of articles created peaked at 53,211 in 2005-2006, and approximately 20,000-30,000 articles were consistently created every 2 years.
  13. Years Total User editors Bot editors IP editors 2001-2002 14,951

    12,280 82.13 % 0 0.00 % 2,671 17.87 % 2003-2004 34,633 25,913 74.82 % 111 0.32 % 8,609 24.86 % 2005-2006 53,211 46,054 86.55 % 174 0.33 % 6,983 13.12 % 2007-2008 52,395 39,592 75.56 % 12,782 24.40 % 21 0.04 % 2009-2010 30,439 28,241 92.78 % 2,103 6.91 % 95 0.31 % 2011-2012 23,954 23,635 98.67 % 167 0.70 % 152 0.63 % 2013-2014 22,920 22,491 98.13 % 261 1.14 % 168 0.73 % 2015-2016 21,677 21,298 98.25 % 214 0.99 % 165 0.76 % 2017-2018 28,222 23,283 82.50 % 4,810 17.04 % 129 0.46 % 2019-2020 22,151 21,926 98.98 % 6 0.03 % 219 0.99 % 2021 8,687 8,632 99.37 % 0 0.00 % 55 0.63 % Overall 313,240 273,345 87.26 % 20,628 6.59 % 19,267 6.15 % Table 2. Number of created Wikipedia articles containing scholarly references by editor types for every 2 years (n=313,240) • Most articles were created by User editors, accounting for 87.26 % of the total. • The percentage for the Bot editors was low, at 6.59 %. Basic statistics of Wikipedia articles with scholarly references RQ1. How does the number of Wikipedia arAcles with scholarly references grow over Ame? 15
  14. Time lag between publishing each scholarly article and adding the

    corresponding reference to the Wikipedia article RQ2. How long is the time lag between the publishing date of each scholarly article and the addition of the corresponding scholarly reference to Wikipedia articles? Table 3. Results regarding the time lag between publishing scholarly articles and adding the corresponding references to Wikipedia articles every 2 years (n=1,458,546) Years # of the references added to Wikipedia articles The time lag in years Max Median Mode Mean SD 2001-2002 607 131 18.0 0 25.97 27.09 2003-2004 3,818 164 11.0 0 21.52 26.32 2005-2006 35,416 174 6.0 0 13.79 19.06 2007-2008 211,750 206 6.0 5 10.31 14.13 2009-2010 135,900 207 7.0 1 12.14 17.25 2011-2012 147,498 209 7.0 0 12.51 17.50 2013-2014 157,427 196 7.0 0 12.72 17.38 2015-2016 185,958 207 6.0 0 11.97 16.72 2017-2018 221,565 201 7.0 0 12.33 16.90 2019-2020 258,928 205 7.0 0 13.07 17.51 2021 99,679 204 7.0 0 12.92 17.53 • The “years” referred to when scholarly references were added to Wikipedia articles. e.g., 211,750 references were added to Wikipedia articles during 2007-2008. • The maximum values were consistently near 200 since 2007-2008. 16
  15. Table 3. Results regarding the time lag between publishing scholarly

    articles and adding the corresponding references to Wikipedia articles every 2 years (n=1,458,546) Years The time lag in years Max Median Mode Mean SD 2001-2002 131 18.0 0 25.97 27.09 2003-2004 164 11.0 0 21.52 26.32 2005-2006 174 6.0 0 13.79 19.06 2007-2008 206 6.0 5 10.31 14.13 2009-2010 207 7.0 1 12.14 17.25 2011-2012 209 7.0 0 12.51 17.50 2013-2014 196 7.0 0 12.72 17.38 2015-2016 207 6.0 0 11.97 16.72 2017-2018 201 7.0 0 12.33 16.90 2019-2020 205 7.0 0 13.07 17.51 2021 204 7.0 0 12.92 17.53 • The median, mean, and standard deviation values were stable near 6.0-7.0, 10-13, and 14-17, respectively, after 2005-2006. Time lag between publishing each scholarly article and adding the corresponding reference to the Wikipedia article RQ2. How long is the Ame lag between the publishing date of each scholarly arAcle and the addiAon of the corresponding scholarly reference to Wikipedia arAcles? 17
  16. • The mode values were either 0 or 1, except

    for 2007-2008. • The reason why the mode value was 5 in 2007-2008 is that the two papers [1, 2] published in 2002-2003 were added to 1,722 and 1,212 Wikipedia articles, respectively, by the Bot editor ProteinBoxBot in this period. • ProteinBoxBot creates articles related to human genes. Table 3. Results regarding the time lag between publishing scholarly articles and adding the corresponding references to Wikipedia articles every 2 years (n=1,458,546) Years The time lag in years Max Median Mode Mean SD 2001-2002 131 18.0 0 25.97 27.09 2003-2004 164 11.0 0 21.52 26.32 2005-2006 174 6.0 0 13.79 19.06 2007-2008 206 6.0 5 10.31 14.13 2009-2010 207 7.0 1 12.14 17.25 2011-2012 209 7.0 0 12.51 17.50 2013-2014 196 7.0 0 12.72 17.38 2015-2016 207 6.0 0 11.97 16.72 2017-2018 201 7.0 0 12.33 16.90 2019-2020 205 7.0 0 13.07 17.51 2021 204 7.0 0 12.92 17.53 1. Mammalian Gene Collection (MGC) Program Team: Generation and initial analysis of more than 15,000 full-length human and mouse cDNA sequences. PNAS 99(26), 16899–16903 (2002). 2. Ota, T., Suzuki, Y., Nishikawa, T., et al.: Complete sequencing and characterization of 21,243 full-length human cDNAs. Nature Genetics 36(1), 40–45 (2003). Time lag between publishing each scholarly article and adding the corresponding reference to the Wikipedia article RQ2. How long is the time lag between the publishing date of each scholarly article and the addition of the corresponding scholarly reference to Wikipedia articles? 18
  17. Time lag between the creation date of each Wikipedia article

    and the date of adding the first scholarly reference to the corresponding article RQ3. How long is the time lag between the creation date of each Wikipedia article and the date of the first scholarly reference added to that article? 19 A. 0 days and at the same time B. 0 days but not at the same time C. less than 1 month D. equal to or more than 1 month but less than 6 months E. equal to or more than 6 months but less than 1 year F. equal to or more than 1 year but less than 3 years G. equal to or more than 3 years but less than 5 years H. equal to or more than 5 years 2022/06/25 18:27 timelag_add_between_page_created_and_first_ref_added.html 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Figure 2. Distribution of the time lag between creating the Wikipedia articles and adding the first scholarly references for every 2 years. • Figure 2 presents the distribution of the time lag between the creation Wikipedia articles and adding the first scholarly references for every 2 years. • Regarding the group of “0 days and at the same time,” the percentage increased significantly from 2005–2006 to 2007–2008 (from 9.05% to 36.00%).
  18. A. 0 days and at the same time B. 0

    days but not at the same time C. less than 1 month D. equal to or more than 1 month but less than 6 months E. equal to or more than 6 months but less than 1 year F. equal to or more than 1 year but less than 3 years G. equal to or more than 3 years but less than 5 years H. equal to or more than 5 years 2022/06/25 18:27 timelag_add_between_page_created_and_first_ref_added.html 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Figure 2. Distribution of the time lag between creating the Wikipedia articles and adding the first scholarly references for every 2 years. • Regarding the group of “0 days and at the same time,” the percentage increased significantly from 2005–2006 to 2007–2008 (from 9.05% to 36.00%). A. 0 days and at the same time B. 0 days but not at the same time C. less than 1 month D. equal to or more than 1 month but less than 6 months E. equal to or more than 6 months but less than 1 year F. equal to or more than 1 year but less than 3 years G. equal to or more than 3 years but less than 5 years H. equal to or more than 5 years 2022/06/25 18:27 timelag_add_between_page_created_and_first_ref_added.html file:///Users/mona26/Dropbox/working/wikipedia_timelag2022/pageid_and_oldest_ref/highchart/timelag_add_between_page_created_and_first_ref_added.html 1/1 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% • In 2005, a hoax stating that a certain journalist had been a suspect in the assassinations of the president of the USA was added to the Wikipedia article, which became a social problem. • In 2006, Jimmy Wales declared that the Wikipedia community has traded in quantity for the quality of its contents. • The increase observed here could be seen as a response to this movement. Time lag between the creation date of each Wikipedia article and the date of adding the first scholarly reference to the corresponding article RQ3. How long is the time lag between the creation date of each Wikipedia article and the date of the first scholarly reference added to that article? 20
  19. A. 0 days and at the same time B. 0

    days but not at the same time C. less than 1 month D. equal to or more than 1 month but less than 6 months E. equal to or more than 6 months but less than 1 year F. equal to or more than 1 year but less than 3 years G. equal to or more than 3 years but less than 5 years H. equal to or more than 5 years 2022/06/25 18:27 timelag_add_between_page_created_and_first_ref_added.html 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Figure 2. Distribution of the time lag between creating the Wikipedia articles and adding the first scholarly references for every 2 years. • The percentage of “0 days and at the same time” gradually increased over the years, except for 2009-2010, at 30.50%. In particular, it exceeded 50% and 60% in 2013–2014 and 2017–2018, respectively. 55.04% 61.60% Time lag between the creation date of each Wikipedia article and the date of adding the first scholarly reference to the corresponding article RQ3. How long is the time lag between the creation date of each Wikipedia article and the date of the first scholarly reference added to that article? 21
  20. • We conducted a time lag analysis of adding scholarly

    references to the English Wikipedia as of October 2021. • We detected no tendencies for Wikipedia articles created recently referring to more fresh references because the time lag between publishing scholarly articles and adding references for the corresponding paper to Wikipedia articles was generally constant over the years. Conclusion 22
  21. • We conducted a time lag analysis of adding scholarly

    references to the English Wikipedia as of October 2021. • Next, tendencies to decrease over time in the time lag between creating Wikipedia articles and adding the first scholarly references were observed. - The percentage of cases where scholarly references were added at the same time as Wikipedia articles were created increased over the years, particularly since the period 2007-2008. - This trend was regarded as a response to the policy changes in the Wikipedia community and adopted by various editors, rather than depending on massive activities conducted by a small number of editors. Conclusion 23
  22. • Halfaker and Taraborelli, 2019. Halfaker, A. and Taraborelli, D.

    (2019). Research:Scholarly article citations in Wikipedia - Meta. https://meta.wikimedia.org/wiki/Research:Scholarly_article_citations_in_Wikipedia • Kikkawa, 2016. Kikkawa, J., Takaku, M., and Yoshikane, F. (2016). DOI Links on Wikipedia: Analyses of English, Japanese, and Chinese Wikipedias. In Proceedings of the 18th International Conference on Asia- Pacific Digital Libraries (ICADL 2016), pages 369–380. https://doi.org/10.1007/978-3-319-49304-6_40 • Kikkawa, 2020a. Kikkawa, J., Takaku, M., and Yoshikane, F. (2020a). A Method to Identify the Edits Adding Bibliographic References to Wikipedia. Journal of Japan Society of Information and Knowledge, 30(3):370– 389. (in Japanese, English abstract available). https://doi.org/10.2964/jsik_2020_033 • Kikkawa, 2020b. Kikkawa, J., Takaku, M., and Yoshikane, F. (2020b). Analyses of Wikipedia Editors Adding Bibliographic References based on DOI Links. Journal of Japan Society of Information and Knowledge, 30(1):21–41. (in Japanese, English abstract available). https://doi.org/10.2964/jsik_2020_004 • Kikkawa, 2021a. Kikkawa, J., Takaku, M., and Yoshikane, F. (2021a). Dataset of first appearances of the scholarly bibliographic references on English Wikipedia articles as of 1 March 2017 and as of 1 October 2021. Zenodo. https://doi.org/10.5281/zenodo.5595573 • Kikkawa, 2021b. Kikkawa, J., Takaku, M., and Yoshikane, F. (2021b). Time-series Analyses of the Editors and Their Edits for Adding Bibliographic References on Wikipedia. Journal of Japan Society of Information and Knowledge, 31(1):3–19. (in Japanese, English abstract available). https://doi.org/10.2964/jsik_2020_037 • Kikkawa, 2022. Kikkawa, J., Takaku, M., and Yoshikane, F. (2022). Dataset of first appearances of the scholarly bibliographic references on Wikipedia articles. Scientific Data, 9:article no. 85, pp. 1–11. https://doi.org/10.1038/s41597-022-01190-z References 24
  23. • Kousha and Thelwall, 2017. Kousha, K. and Thelwall, M.

    (2017). Are wikipedia cita- tions important evidence of the impact of scholarly articles and books? Journal of the Association for Information Science and Technology, 68(3):762–779. https://doi.org/10.1002/asi.23694 • Lin and Fenner, 2014. Lin, J. and Fenner, M. (2014). An analysis of Wikipedia references across PLOS publications. figshare. https://doi.org/10.6084/m9.figshare.1048991.v3 • Nielsen, 2007. Nielsen, F. Å. (2007). Scientific citations in Wikipedia. First Monday, 12(8). https://doi.org/10.5210/fm.v12i8.1997 • Pooladian and Borrego, 2017. Pooladian, A. and Borrego, Á. (2017). Methodological issues in measuring citations in Wikipedia: a case study in Library and Information Science. Scientometrics, 113(1):455–464. https://doi.org/10.1007/s11192-017-2474-z • Teplitskiy, 2016. Teplitskiy, M., Lu, G., and Duede, E. (2016). Amplifying the impact of open access: Wikipedia and the diffusion of science. Journal of the Asso- ciation for Information Science and Technology, 68(9):2116–2127. https://doi.org/10.1002/asi.23687 • Thelwall, 2016. Thelwall, M. (2016). Does Astronomy research become too dated for the public? Wikipedia citations to Astronomy and Astrophysics journal articles 1996- 2014. El Profesional de la Información, 25(6):893–900. https://doi.org/10.3145/epi.2016.nov.06 References 25
  24. Time Lag Analysis of Adding Scholarly References to English Wikipedia

    26 iConference 2023: Normality, Virtuality, Physicality, Inclusivity Jiro Kikkawa Masao Takaku Fuyuki Yoshikane { jiro, masao, fuyuki } @ slis.tsukuba.ac.jp How rapidly are they added to and how fresh are they? University of Tsukuba, Japan Paper: https://doi.org/10.1007/978-3-031-28032-0_33 Slide: https://speakerdeck.com/corgies/iconference2023