Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Stats of bioRxiv (2021)

Stats of bioRxiv (2021)

Investigated the posting status of bioRxiv.
In addition to the official statistics, the number of journal submissions by field and the percentage of fields by region are added.

Related Work: Stats of arXiv (2020)
https://speakerdeck.com/2hz9qeedd/stats-of-arxiv-2020

FSCjJh3NeB

May 14, 2021
Tweet

More Decks by FSCjJh3NeB

Other Decks in Research

Transcript

  1. Summary nSurvey of papers published in bioRxiv uIn terms of

    disciplines, neuroscience is growing remarkably uMore than 40% of submitted manuscripts may eventually be published in journals. pFor those that were published, it took about half a year from the first submission to bioRxiv for publication, and the difference between the fields was not that great. 2
  2. Specification nbioRxiv u Collect all items that can be collected

    as of April 17, 2021 through the API. nTotal Data: 117,293.* u Loading item: Title, Abstract, Author, Field, DOI, etc. u Period: November 07, 2013 - April 17, 2021 u Using Semantic Scholar, the cited references were also obtained. u If the article DOI has been assigned... pCollect journal name, publication date, etc. separately using CrossRef's API. 3 * Collected independently
  3. Research award in DOI information 8 Number of papers with

    Journal DOI with Award information with Award information containing "Japan".
  4. Percentage of papers with Journal DOI by field n Calculations

    are based on the 5-year period from October 2016 to the end of September 2020. 9
  5. Time from submission to publication with Journal DOI n Calculations

    are based on the 8-year period from 2013 to the end of September 2020. 10
  6. Number of citations per field n Calculations are based on

    five years of submissions from Oct 2016 to Sep 2020. 11
  7. Highly Cited Paper n There is a bias in the

    top fields in terms of the num of citations. u Neuroscience, genetics, and ecology seem to be the most frequently cited fields. u The maximum number of citations is less than 1,000 within the scope of this survey, which is an order of magnitude higher than the 10,000 citations in the field of information science. 12 DOI date category title cite 1 10.1101/080333 2016-10-12 Neuroscience Genetic, transcriptome, proteomic and epidemiologi... 741 2 10.1101/099192 2017-01-09 Genetics Watching the clock for 25 years in FlyClockbase: V... 587 3 10.1101/203943 2017-10-16 Neuroscience Degeneracy in hippocampal physiology and plasticit... 573 4 10.1101/535005 2019-02-01 Ecology GIFT – A Global Inventory of Floras and Traits for... 562 5 10.1101/310763 2018-04-30 Epidemiology MicroCOSM: a model of social and structural driver... 558 6 10.1101/2020.03.23.003384 2020-03-23 Genetics Rat models of human diseases and related phenotype... 487 7 10.1101/2020.03.23.003392 2020-03-23 Genetics Rat models of human diseases and related phenotype... 487 8 10.1101/833988 2019-11-07 Neuroscience Arc Regulates Transcription of Genes for Plasticit... 485 9 10.1101/425488 2018-09-24 Ecology Complex responses of global insect pests to climat... 453 10 10.1101/503334 2018-12-26 Ecology Data paper: FoRAGE (Functional Responses from Arou... 445 11 10.1101/142760 2017-05-28 Bioinformatics Opportunities and obstacles for deep learning in b... 432 12 10.1101/307652 2018-04-28 Neuroscience Mapping molecular datasets back to the brain regio... 412 13 10.1101/405688 2018-08-31 Animal Behavior and Cognition The evolution of infanticide by females in mammals 406 14 10.1101/152264 2017-06-22 Bioinformatics Informatics for Cancer Immunotherapy 404 15 10.1101/2020.07.14.202085 2020-07-14 Neuroscience 10 years of EPOC: A scoping review of Emotiv’s por... 398 October 2016 ~ End of September 2020 Submission Score
  8. Frequency by number of citations n The shape appears to

    be similar to a power distribution. 13 2016-2020
  9. Difference from bioRxiv official data 17 -100 -50 0 50

    100 150 0 500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 4,500 2013-11 2014-04 2014-09 2015-02 2015-07 2015-12 2016-05 2016-10 2017-03 2017-08 2018-01 2018-06 2018-11 2019-04 2019-09 2020-02 2020-07 2020-12 Number of posts per month (own collection) Deviation from official values Independently collected data tends to be approximately 0.30% less than the official values. From 2020, there will be many months with almost no error.
  10. Degree to which COVID-19-related 18 Those listed as related to

    COVID-19 account for a fairly small percentage of the total. The percentage of the total is quite small. 0 20,000 40,000 60,000 80,000 100,000 120,000 140,000 2014-01 2014-05 2014-09 2015-01 2015-05 2015-09 2016-01 2016-05 2016-09 2017-01 2017-05 2017-09 2018-01 2018-05 2018-09 2019-01 2019-05 2019-09 2020-01 2020-05 2020-09 2021-01 New Papers Cumulative COVID-19
  11. Countries/Regions and Number of Submissions 19 Use only the first

    author's email address for each manuscript. gmail and hotmail are classified as Unknown .com, .edu, and .org are classified by the country code of the administrator. 3FHJPO $PVOU  64    6OLOPXO    6,    (FSNBOZ    $IJOB    'SBODF    $BOBEB    "VTUSBMJB    +BQBO    4XJU[FSMBOE    /FUIFSMBOET    *OEJB    4QBJO    4XFEFO    *UBMZ   3FHJPO $PVOU  *TSBFM   #SB[JM   %FONBSL   #FMHJVN   /PSXBZ   ,PSFB   'JOMBOE   "VTUSJB   4JOHBQPSF   1PSUVHBM   /FX;FBMBOE   1PMBOE   .FYJDP   5BJXBO   "SHFOUJOB 
  12. Countries/Regions and Number of Posts 20 : . 5PUBM 64

    6OLOPXO 6, (FSNBOZ $IJOB 'SBODF $BOBEB "VTUSBMJB +BQBO 4XJU[FSMBOE /FUIFSMBOET *OEJB 4QBJO 4XFEFO *UBMZ *TSBFM #SB[JM %FONBSL #FMHJVN /PSXBZ ,PSFB 'JOMBOE "VTUSJB 4JOHBQPSF 1PSUVHBM                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   : . 5PUBM 64 6OLOPXO 6, (FSNBOZ $IJOB 'SBODF $BOBEB "VTUSBMJB +BQBO 4XJU[FSMBOE /FUIFS                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       5PUBM Use only the first author's email address for each manuscript. gmail and hotmail are classified as Unknown. .com, .edu, and .org are classified by the country code of the administrator.
  13. Field distribution 36 Field composition ratio compressed into two dimensions

    by multidimensional scaling method The composition of China and India is similar. Unknown, China and India, Italy, and Japan are different from the composition of other countries and regions.