Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Ecosyste.ms Conference talk at EasyBuild User M...

Ecosyste.ms Conference talk at EasyBuild User Meeting

A talk I gave at the 9th EasyBuild User Meeting. The talk covers the Ecosyste.ms project, which I've been working on for almost two years. It delves into the challenges and solutions involved in understanding and exploring open-source landscapes and communities.

Andrew Nesbitt

May 19, 2024
Tweet

More Decks by Andrew Nesbitt

Other Decks in Technology

Transcript

  1. About me Open Source Software Developer Package Management Enthusiast Based

    in Somerset, United Kingdom - Website: https://nesbitt.io - GitHub: https://github.com/andrew - Email: [email protected] - Mastodon: https://mastodon.social/@andrewnez
  2. Exploring Open Source Software Ecosystems There are all kinds of

    reasons to analyse open source - Studying Open Source Software communities - Comparing, sorting and categorizing OSS projects - Comparing trends across software ecosystems - Discovering interesting, critical or unusual projects - Investigating security issues and trends - Recognizing important maintenance work - Finding and supporting overworked maintainers - Enabling Data Based Decision Making - Help make OSS Software Better
  3. Challenges in collecting OSS metadata - Disparate sources of metadata

    spread across many services - Many different data formats - Different ecosystem registries expose different kinds of APIs - Variety of rate limits and restrictions on keeping up to date - Huge amounts of data - PII compliance issues - Spam, malicious code and other unwanted noise - Diminishing returns for smaller software ecosystems
  4. Tools and open datasets to support, sustain, and secure critical

    digital infrastructure. - Package manager metadata for 34 different software ecosystems - Source Repository metadata from 785 different forges - Issues, pull requests, commits and security advisory datasets - Tools and APIs for analysing, parsing, diffing and scanning OSS - Normalized data across many ecosystems and platforms - Mining dependency graphs from packages, repos and containers - All open source (AGPL) and open data (CC-BY-SA) - Website: https://ecosyste.ms - Code: https://github.com/ecosyste-ms Introducing Ecosyste.ms
  5. Ecosyste.ms - The Numbers - 9 million software packages -

    100 million package versions - 200 million public software repositories - 16 billion dependencies - 27 million issues and pull requests - 345 million commits - 8 billion activity events - 17 thousand security advisories - 450 thousand docker image SBOMs - 12TB of data in Postgres (~1TB indexes) - 300 million API requests per month
  6. Ecosyste.ms - What can you do with it? - Find

    Critical packages within an ecosystem - Explore unseen infrastructure - Discover key maintainers - Look at cross-ecosystem dependency graphs - Large scale analysis of software communities - Connect with other kinds of data - Scientific papers - Funding data - Software Foundations - And more!
  7. - Packages - Timeline - Parser - Archives - Digest

    - Diff - Licenses - Repos - Open Collective - SBOM Ecosyste.ms Services - Resolve - Advisories - Commits - Docker - Summary - Issues - OST - Papers - Awesome Individual services for parsing, normalizing and aggregating OSS metadata
  8. Normalized package manager metadata from many software ecosystems - 34

    software ecosystems - 59 package manager registries - 9.5 million packages - 101 million versions - 1.2 billion dependencies - Website: https://packages.ecosyste.ms - Code: https://github.com/ecosyste-ms/packages Ecosyste.ms Services: Packages
  9. Web service to parse dependency metadata from manifest files 98

    file types supported from 30 different software ecosystems: *.cabal, *.csproj, *.gemspec, *.nuspec, *.podspec, *.podspec.json, .github/workflows/*.yaml, .github/workflows/*.yml, Brewfile, Brewfile.lock.json, Cargo.lock, Cargo.toml, Cartfile, Cartfile.private, Cartfile.resolved, DESCRIPTION, Dockerfile, Gemfile, Gemfile.lock, Godeps, Godeps/Godeps.json, Gopkg.lock, Gopkg.toml, META.json, META.yml, Package.resolved, Package.swift, Pipfile, Pipfile.lock, Podfile, Podfile.lock, Project.json, Project.lock.json, REQUIRE, action.yaml, action.yml, bower.json, build.gradle, build.gradle.kts, cabal.config, composer.json, composer.lock, cyclonedx.json, cyclonedx.xml, docker-compose.yml, dub.json, dub.sdl, elm-package.json, elm-stuff/exact-dependencies.json, elm_dependencies.json, environment.yaml, environment.yaml.lock, environment.yml, environment.yml.lock, gems.locked, gems.rb, glide.lock, glide.yaml, go-resolved-dependencies.json, go.mod, go.sum, gradle-dependencies-q.txt, haxelib.json, ivy.xml, maven-dependency-tree.txt, maven-resolved-dependencies.txt, mix.exs, mix.lock, npm-ls.json, npm-shrinkwrap.json, package-lock.json, package.json, packages.config, packages.lock.json, paket.lock, pip-resolved-dependencies.txt, pnpm-lock.yaml, poetry.lock, pom.xml, project.assets.json, project.clj, pubspec.lock, pubspec.yaml, pyproject.toml, req*.pip, req*.txt, requirements.frozen, requirements/*.pip, requirements/*.txt, sbt-update-full.txt, setup.py, shard.lock, shard.yml, vcpkg.json, vendor/manifest, vendor/vendor.json, versions.json, yarn.lock - Website: https://parser.ecosyste.ms - Code: https://github.com/ecosyste-ms/parser Ecosyste.ms Services: Parser
  10. Repository metadata from a variety of software forges such as

    GitHub, GitLab, BitBucket, Codeberg, Gitea and Forgejo instances. - 785 forges - 205 million repositories - 195 million tags - 236 million manifest files - 17 billion dependencies - Website: https://repos.ecosyste.ms - Code: https://github.com/ecosyste-ms/repos Ecosyste.ms Services: Repos
  11. Security Advisory metadata connecting packages and repositories - 17,500 advisories

    - 12 ecosystems - 8,150 affected packages - 500,000+ affected versions - 1,000,000+ affected open source repositories - Website: https://advisories.ecosyste.ms - Code: https://github.com/ecosyste-ms/advisories Ecosyste.ms Services: Advisories
  12. Issue and Pull Request metadata aggregated - 789 forges -

    2.8 million repositories indexed - 12 million issues - 26 million pull requests - 71 million comments - 3.2 million authors - 26% of all issues and pull requests created by bots - Website: https://issues.ecosyste.ms - Code: https://github.com/ecosyste-ms/issues Ecosyste.ms Services: Issues
  13. Commit metadata aggregated and summarized - 789 forges - 1.4

    million repositories indexed - 345 million commits counted - 6.2% commits authored by a bot - Average 223 commits per repository - Average 9.6 committers per repository - Website: https://commits.ecosyste.ms - Code: https://github.com/ecosyste-ms/commits Ecosyste.ms Services: Commits
  14. Index of dependencies inside public docker images using syft to

    create SBOMs of each image. - 450,000 docker images indexed - 324 Billion downloads - 983,000 unique dependencies from 27 ecosystems - 130 Million dependencies - Includes system dependency usage metrics - Website: https://docker.ecosyste.ms - Code: https://github.com/ecosyste-ms/docker Ecosyste.ms Services: Docker
  15. Mapping dependency graphs from software mentons in Biomedical Papers in

    the CZI Software Mentions dataset. - Resolve full dependency tree for software mentioned in papers - Highlight credit to hidden contributors and credit - Connect all biomedical papers by their shared dependencies - Paper: https://arxiv.org/abs/2404.06672 - Website: https://papers.ecosyste.ms - Code: https://github.com/ecosyste.ms/papers Ecosyste.ms Case Study - Mapping Software Mentions
  16. Discovering both the visible and invisible core pieces of open

    source software across every ecosystem. - Slides: https://tinyurl.com/joshbressers - Data: https://packages.ecosyste.ms/open-data - Related website: https://packages.ecosyste.ms/critical Ecosyste.ms Case Study - Critical OSS
  17. Ecosyste.ms Case Study - Funding.yml 286,425 packages (3.03%) have declared

    a way to fund their development via a funding platform in their metadata. 22% of “Critical” packages and 14% of the “Top 1%” of packages have funding metadata. Funded packages are detected via a funding url on their registry, via a funding.yml file in their source repository or the owner of the repository is part of GitHub Sponsors. Soon to be expanded with metadata of if they belong to a Foundation. - Website: https://packages.ecosyste.ms/funding - Code: https://github.com/ecosyste-ms/packages
  18. Ecosyste.ms Case Study - Open Source Collective Joining together the

    transaction data of donations, expenses and funders on Open Source Collective with the activity data from the open source projects being funded. - Looking for correlations between funding and contributions - Allow funders to see the state of the projects they’ve supported - How are dependencies of OC projects also funded? - Fund your whole SBOM (coming soon) - Website: https://opencollective.ecosyste.ms - Code: https://github.com/ecosyste-ms/opencollective
  19. Highlight “good first issues” and “help wanted” issues from open

    source software projects in the areas of climate change, sustainable energy, biodiversity and natural resources from opensustain.tech - Website: https://climatetriage.com - Code: https://github.com/protontypes/climate-triage Ecosyste.ms Case Study - climatetriage.com
  20. What’s next? - Version and file level copyright and license

    data - Changelogs and release notes per version - OpenSSF Scorecards - Project classification - Search - More system package manager support - Software Foundations via https://fossfoundation.info - Reverse Dependency Tooling Propose ideas on https://github.com/ecosyste-ms/roadmap Ecosyste.ms Roadmap
  21. Who depends on my open source project? What versions of

    my software are people depending upon? Are they direct dependents or transitive? Which packages are pulling in my library as a transitive dependency? Who is affected by a security advisory I’m about to publish? Are there packages holding back a version upgrade of transitive dependencies? Are people actually merging automated updates from Dependabot? Can I check I’m not making breaking changes against downstream users? Reverse Dependency Tooling
  22. Thanks Let’s collaborate! Code and data is all free to

    use and share. - Website: https://ecosyste.ms - Code: https://github.com/ecosyste-ms - Mastodon: https://mastodon.social/@ecosystems - Email: [email protected]
  23. Questions In person or via #eum in slack - Website:

    https://ecosyste.ms - Code: https://github.com/ecosyste-ms - Mastodon: https://mastodon.social/@ecosystems - Email: [email protected]