Upgrade to Pro — share decks privately, control downloads, hide ads and more …

gitbase: exploring git repos with SQL

gitbase: exploring git repos with SQL

Francesc Campoy from source{d} will talk about gitbase, a new Open Source project fully written in Go that stands on the shoulders of giants, as one says. By integrating the codebases of go-git – the most successful git implementation in Go – and vitess – a replication layer for all the MySQL databases at YouTube, gitbase is able to provide an easy way to extract information from hundreds of git repositories with a simple SQL request.

The talk will provide an in-depth description of the project as well as the way source{d} implemented it and what they learned on the way.

Avatar for Francesc Campoy Flores

Francesc Campoy Flores

September 19, 2018
Tweet

More Decks by Francesc Campoy Flores

Other Decks in Programming

Transcript

  1. LANGUAGE(path, content): Returns the language of a file given its

    path and contents. Powered by github.com/src-d/enry. Some custom functions
  2. Lines of code per language # total lines of code

    per language in the Go repo SELECT lang, SUM(lines) as total_lines FROM ( SELECT t.tree_entry_name as name, LANGUAGE(t.tree_entry_name, b.blob_content) AS lang, ARRAY_LENGTH(SPLIT(b.blob_content, '\n')) as lines FROM refs r NATURAL JOIN commits c NATURAL JOIN commit_trees ct NATURAL JOIN tree_entries t NATURAL JOIN blobs b WHERE r.ref_name = 'HEAD' ) AS lines WHERE lang is not null GROUP BY lang ORDER BY total_lines DESC;
  3. Lines of code per language # total lines of code

    per language in the Go repo SELECT lang, SUM(lines) as total_lines FROM ( SELECT t.tree_entry_name as name, LANGUAGE(t.tree_entry_name, b.blob_content) AS lang, ARRAY_LENGTH(SPLIT(b.blob_content, '\n')) as lines FROM refs r NATURAL JOIN commits c NATURAL JOIN commit_trees ct NATURAL JOIN tree_entries t NATURAL JOIN blobs b WHERE r.ref_name = 'HEAD' ) AS lines WHERE lang is not null GROUP BY lang ORDER BY total_lines DESC;
  4. Some custom functions UAST(content, language, [filter]): Returns the Universal Abstract

    Syntax Tree resulting of parsing the given content in the given language. Powered by github.com/bblfsh/bblfshd.
  5. SELECT files.repository_id, files.file_path, ARRAY_LENGTH(UAST( files.blob_content, LANGUAGE(files.file_path, files.blob_content), '//*[@roleFunction and @roleDeclaration]')

    ) as functions FROM files NATURAL JOIN refs WHERE LANGUAGE(files.file_path,files.blob_content) = 'Go' AND refs.ref_name = 'HEAD' Number of functions per Go file
  6. SELECT files.repository_id, files.file_path, ARRAY_LENGTH(UAST( files.blob_content, LANGUAGE(files.file_path, files.blob_content), '//*[@roleFunction and @roleDeclaration]')

    ) as functions FROM files NATURAL JOIN refs WHERE LANGUAGE(files.file_path,files.blob_content) = 'Go' AND refs.ref_name = 'HEAD' Number of functions per Go file
  7. source{d} Engine • Too many moving pieces • Too many

    steps to get started • Solving it all with the power of Docker!
  8. go-mysql-server github.com/src-d/go-mysql-server • Ready to run MySQL server • Extensible

    via interfaces Database and Table • Example: github.com/campoy/csvql
  9. Indexes • SQL Indexes can speed up queries substantially •

    Vitess doesn’t provide this • Pilosa does!
  10. Caches • Caching is the obvious option to make queries

    faster • We didn’t want to reinvent the wheel • We didn’t have to, thanks to Hashicorp • Based on github.com/golang/groupcache
  11. • The regexp package in Go is linear, but not

    always faster https://swtch.com/~rsc/regexp/regexp1.html • Alternative: github.com/moovweb/rubex (onigurama) Regular Expressions