Upgrade to Pro — share decks privately, control downloads, hide ads and more …

JQ and YQ for data science - Quick introduction

JQ and YQ for data science - Quick introduction

Quick introduction to jq and yq for data science first-year students. In the presentation, we review the key capabilities of the tools and do a quick demo for using them with real data from the ETH Zurich library registry.

Avatar for Oleg Nenashev

Oleg Nenashev PRO

January 23, 2023
Tweet

More Decks by Oleg Nenashev

Other Decks in Technology

Transcript

  1. > whoami --academia “I have a PhD in Design… Hardware

    Design” Oleg @ Open Source Design Meetup Automation and Industrial Control Automation and Computer Systems Components of computing and control systems Thesis - Reengineering of digital hardware, and embedding test modules and interfaces into devices described by multilevel models BSc - 2009 MSc - 2011 PhD - 2015 Applied Computer Science in Economics Secondary MSc - 2011
  2. • One of the most popular data formats • Service

    REST APIs • Metadata • [Un]structured data
  3. What is jq? “jq is a lightweight and flexible command-line

    JSON processor” “jq is like sed for JSON data” stedolan.github.io/jq
  4. “jq is like sed for JSON data” Input JSON Resulting

    JSON ❖ Filtering ❖ Search ❖ Sorting
  5. Advanced jq features Syntax • Math • Comparisons • Variable

    binding • Iterators • Micro-scripting • Regular expressions Tool • Stream processing • Modules / Plugins • IO operations
  6. Working with YAML • YAML is [almost] like JSON •

    Many tools • yq - a wrapper for YAML • github.com/mikefarah/yq • Other formats are supported
  7. jq - Where to get it? • Linux - [Any?]

    package manager ◦ sudo apt install jq • Windows - Chocolatey ◦ choco install jq • Docker - official image ◦ docker pull stedolan/jq • Python - jq binding ◦ pip install jq • VSCode plugins
  8. Our data • Let’s look up for Renku publications •

    https://daas.library.ethz.ch/rib/v3/search?q=any,contains,%22Renku%22 Source: en.wikipedia.org/wiki/Japanese_poetry#/media/File:' Kokinshu'-_Anthology_of_Classic_Japanese_Poetry _LACMA_M.91.250.332.jpg
  9. Query examples cat eth-library-renku.json | jq … • Selector -

    '.docs[1]' • Selector fields - '.docs[].pnx.display.title' • Entries about data: '.docs[].pnx.display | select(.title[0] | contains("data"))'
  10. Learn More • jq documentation ◦ https://stedolan.github.io/jq/ • jq tutorials

    ◦ Szymon Stepniak: youtube.com/playlist?list=PLKaiHc24qCTSOGkkEpeIMupEmnInqHbbV ◦ https://www.baeldung.com/linux/jq-command-json • yq documentation ◦ https://mikefarah.gitbook.io/yq/
  11. Credits • All developers of JSON APIs • Stephen Dolan,

    jq author ◦ https://github.com/stedolan • Mike Farah, yq author ◦ https://github.com/mikefarah • OpenMoji project ◦ https://openmoji.org