Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scholarly Document Processing Research in the A...

wing.nus
October 17, 2022

Scholarly Document Processing Research in the Age of AI

Presented at the 3rd #sdp2022

Artificial Intelligence is poised to impact many fields, but how will the rise of AI impact the way that we do science and scholarly work? Thomas Kuhn, in his philosophical analyses of sciences coined the term "paradigm shift" to describe the resultant progress in science theory when the normal science of an existing paradigm collides with theory-unaccountable, replicable observations. With scientists in AI still expecting key discoveries to be made, will we expect a new paradigm to overturn current normal science in AI and other fields? Will the age of accelerations, as defined by Thomas Friedman, hold sway over how real-world contexts are either accounted for or discarded by research practitioners and scholars alike?

I relate my perspective on how normal science and paradigm shifting science relate to the notion of research, fast and slow, and how scholarly document processing can facilitate the mean and variance in science discovery. I give an opinionated view of the importance of scholarly document processing, as a meta-research agenda that can either aid thoughtful slow research, or be leveraged to further exacerbate acceleration of normal science.

wing.nus

October 17, 2022
Tweet

More Decks by wing.nus

Other Decks in Education

Transcript

  1. Scholarly Document Processing Research in the Age of AI Min-Yen

    Kan National University of Singapore Slides @ http://bit.ly/kan-sdp22 17 Oct 2022 3rd SDProc @ COLING 2022 1
  2. Warning: This is a participatory keynote! …that is, there is

    a pop quiz. You have been warned! 🤣🤣 Please access the poll at http://pollev.com/knmnyn Do skip the name registration 17 Oct 2022 3rd SDProc @ COLING 2022 2
  3. Fast and Slow Kahneman and Tversky, Thinking Fast and Slow

    Daniel Kahneman System 1 System 2 Fast Slow Automatic Controlled Intuitive Analytical Parallel Serial Associative Logical Slides @ http://bit.ly/kan-sdp22 17 Oct 2022 3rd SDProc @ COLING 2022 3
  4. Neural Nets – System 1 Andrew Ng System 1 System

    2 Fast Slow Automatic Controlled Intuitive Analytical Parallel Serial Associative Logical Slides @ http://bit.ly/kan-sdp22 17 Oct 2022 3rd SDProc @ COLING 2022 4 ✏Your Turn: What do you think the loss function of research should be?
  5. The Age of Accelerations Friedman, Thank You for Being Late

    His three accelerations • Moore’s law • Globalization • Mother Nature Kurzweil’s “Second half of the chessboard” Thomas Friedman 17 Oct 2022 3rd SDProc @ COLING 2022 5
  6. The Age of Accelerations Friedman, Thank You for Being Late

    Our three accelerants (take your pick) • arXiv • PapersWithCode • (Semantic) Scholar Kurzweil’s “Second half of the chessboard” Thomas Friedman 17 Oct 2022 3rd SDProc @ COLING 2022 6
  7. What about System 2? Are there scholarly problems that require

    more analytical, logical, and sustained thinking? Absolutely! 17 Oct 2022 3rd SDProc @ COLING 2022 7 System 1 System 2 Fast Slow Automatic Controlled Intuitive Analytical Parallel Serial Associative Logical
  8. A Brief History of Science A fast primer Photo Credit:

    오힘찬 @ WikimediaCommons (CC SA-BY 4.0) 17 Oct 2022 3rd SDProc @ COLING 2022 8
  9. Kuhn – challenging accumulative growth Kuhn, The Structure of Scientific

    Revolutions Paradigm Shift Normal Science To think about: What age is SDP in now? 17 Oct 2022 3rd SDProc @ COLING 2022 11 Thomas Kuhn
  10. Science is a verb … in the sense that it

    is a method (activity) involving the making of hypotheses, the design of experiments and the analysis of data. But a critical part of the scientific process is the conversation phase after the experimentation is done. Scientists share their findings with the broader community through publications or presentations at meetings. What happens next is a back-and- forth discussion including a critique of methods or interpretation, and a comparison with previous findings. If there are flaws in the experimental design or interpretation, … scientists need to be willing to hear and respond to feedback. If there are conflicting results, it may require additional hypothesis making and experimentation. Only when the conversation runs its course do the conclusions become a part of accepted scientific understanding. 17 Oct 2022 3rd SDProc @ COLING 2022 12 Steve Savage’s post on Science 2.0
  11. Science in the Age of AI 17 Oct 2022 3rd

    SDProc @ COLING 2022 13 Video Source: Video by RedEye450 from Pexels
  12. Loss function of research Beam search analogy Accelerations make the

    gradient steeper Overload favors System 1 Publish or Perish Suboptimal local minima 17 Oct 2022 3rd SDProc @ COLING 2022 14
  13. What affordances does AI yield? Better System 1! e.g., Neural

    Architecture Search (NAS) 17 Oct 2022 3rd SDProc @ COLING 2022 15 Figures from Ren et al. 2021 ACM Comput. Surv. 37(4)
  14. System 1 and 2 work together One way: System 1

    brings data for System 2 to deliberate with System 2 gives feedback (end-to-end) to System 1 Neither system is perfect but the whole is better than the parts (multi-view learning) Let’s connect it back to our societal research loss function 17 Oct 2022 3rd SDProc @ COLING 2022 16
  15. Scholarly Document Processing for System 2 Slowing down Photo Credits:

    sizumaru @ Flickr 17 Oct 2022 3rd SDProc @ COLING 2022 17
  16. Challenges for System 2 SDP 1. Discovering Adjacent Possibles (Branch

    Out) 2. Uncovering Discrepancies (Dive Deep) 3. Finding Provenance (Travel Back) 17 Oct 2022 3rd SDProc @ COLING 2022 18
  17. 1. Discovering Adjacent Possibles Liquid Networks The Slow Hunch Serendipity

    Exaptation Steven Johnson 17 Oct 2022 3rd SDProc @ COLING 2022 19 Johnson, Where Good Ideas Come From
  18. Confirmation Bias in Recommender Systems We train search and recommender

    systems, but on historical data This results in confirmation bias (more like this) But if we want to afford System 2 thinking, we want serendipitous recommendation (to learn what we don’t know) Need to capture multimodal evidence and laborious human assessment 17 Oct 2022 3rd SDProc @ COLING 2022 20
  19. Next Gen Platforms For discoverability: • Setting exploration criteria •

    Reproducible search • Suggesting alternative paths and terminologies For discussion, collaboration and crediting: • “Calm” for Scientists (arXiv off) • MIT Deliberatorium • Big Science initiatives 17 Oct 2022 3rd SDProc @ COLING 2022 21 & Toolkits (not everyone wants to do it globally and publicly)
  20. 2. Uncovering Discrepancies 17 Oct 2022 3rd SDProc @ COLING

    2022 22 ✏Your Turn: Is coffee bad for you?
  21. 2. Uncovering Discrepancies – Countering the Streetlight Effect What happens

    next is a back-and-forth discussion including a critique of methods or interpretation, and a comparison with previous findings. If there are flaws in the experimental design or interpretation, … scientists need to be willing to hear and respond to feedback. Communities do not sufficiently report negative results Difficult to organize discrepancies for systematic exploration, thus we cannot question the establishment 17 Oct 2022 3rd SDProc @ COLING 2022 24 Related: Davies et al. Promoting inclusive metrics of success and impact to dismantle a discriminatory reward system in science. Photo by Guilherme Rossi @ Pexels
  22. Aids for Paradigm Shifts Systematic reviews for what doesn’t work

    “Our techniques improve on Dataset X but less well on Y. Uncover choices left (un)stated by authors “We compare against current relevant baselines [1, 2, 3]” Machine reading of Limitations and Ethical Consideration sections 17 Oct 2022 3rd SDProc @ COLING 2022 25 , 3
  23. 17 Oct 2022 3rd SDProc @ COLING 2022 26 https://symplectic.co.uk/guest-blog/research-data-mechanics/

    ✏Your Turn: What about citation half-life? How is it changing?
  24. 3. Finding Provenance Perhaps surprisingly, citations half-life has lengthened in

    most fields. Does this mean that we are finding the right works? 17 Oct 2022 3rd SDProc @ COLING 2022 27 Martín-Martín et al. Back to the past: on the shoulders of an academic search engine giant Davis and Cochran Cited Half-Life of the Journal Literature
  25. Aids for Finding Provenance Paraphrase, terminology and simplification services in

    situ (stay tuned for Head’s keynote) Lower the barrier for communication. Platforms for easier means for discussing problems and knowing of furthering research Who cares about my research? (Multi-hop) Trace terms and ideas back to their source 17 Oct 2022 3rd SDProc @ COLING 2022 28
  26. We need to participate in Science! This is the last

    activity, I promise! http://pollev.com/knmnyn 17 Oct 2022 3rd SDProc @ COLING 2022 29 ✏Your Turn: Please use your own judgement to rank the three challenges presented
  27. Conclusion: SDP needs to get involved in Science Let’s be

    deliberate about our tools for science. Care to discuss? Diversity and inclusion are also important for holistic progress in science. Thanks to: WING members: George Huang Po-Wei Yajing Yang Abhinav Ramesh Kashyap Muthu Kumar Chandrasekaran Collaborators: Min Song Namhee Kim and many more previous WING members, and my family, and all of you who’ve attended physically and virtually to listen! Thank you! Yanxia Qin Aminesh Prasad Kazunari Sugiyama Juyoung An Slides @ http://bit.ly/kan-sdp22 17 Oct 2022 3rd SDProc @ COLING 2022 30