update to an Apache Struts dependency that was not considered critical • 143 M user account details stolen • > $4B in damages https://www.wired.com/story/equifax-breach-no-excuse/
popular project to major contributor • The new maintainer installs Bitcoin stealing code in the library • The library is being downloaded 2M times a week • Vulnerability discovered 2 months later https://blog.npmjs.org/post/180565383195/details-about-the-event-stream-incident
54 (Kikas et al. 2017), or 80 (Zimermann et al. 2019) transitive dependencies • 50% of transitive dependency closures different in a period of 6 months on Cargo/Rust (Hejderup et al. 2019) ...and they deteriorate • Packages exist in RubyGems whose removal can bring down 500k (40%) other package versions (Kikas et al. 2017) • 391 highly influential maintainers affect more than 10k packages (Zimermann et al. 2019). What research tells us
54 (Kikas et al. 2017), or 80 (Zimermann et al. 2019) transitive dependencies • 50% of transitive dependency closures different in a period of 6 months on Cargo/Rust (Hejderup et al. 2019) ...and they deteriorate • Packages exist in RubyGems whose removal can bring down 500k (40%) other package versions (Kikas et al. 2017) • 391 highly influential maintainers affect more than 10k packages (Zimermann et al. 2019). What research tells us
the dependencies are outdated in 50% of important Maven packages • No updates even in the case of security disclosures (70% were unaware) • "Too difficult!", "No tools!" Vulnerabilities proliferate • 1/4 of library downloads have a vulnerability (Comcast TR) • 1/3 of top 133k sites have a vulnerable dependency (Lauinger et al. 2017) What research tells us
the dependencies are outdated in 50% of important Maven packages • No updates even in the case of security disclosures (70% were unaware) • "Too difficult!", "No tools!" Vulnerabilities proliferate • 1/4 of library downloads have a vulnerability (Comcast TR) • 1/3 of top 133k sites have a vulnerable dependency (Lauinger et al. 2017) What research tells us
I know that one of my dependencies is outdated? • The update problem: How can I check if an updated dependency breaks my code? • The compliance problem: How do I know that I am not violating anyone’s copyrights? • The trust problem: How can I trust code I download from the Internet with my valuable data?
I update my library without breaking clients? How can I notify important clients that I am about to break them? • The deprecation problem: How can I remove features from my library? • The unlawful use problem: How can I spot instances of my code being distributed without permission? • The lack of incentive problem: Why should I use my (free!) time to maintain a library that large corporations depend upon? + the problems that developers have!
resolution in repo • Protects against breakage due to updates on dependencies • Also “protects” against fast distribution of security updates https://www.publicdomainpictures.net/en/view-image.php?image=80963 Dependency version pinning
Not much beyond simple package version matches (and a bit of compliance) • No support for assessing updates • No support for making decisions on which libraries to use • No support for maintainers We can do better than that!
services N/A (always on latest version) Updates Move responsibility to consumer. Version pinning. Client is expected to have tests/CI Move responsibility to consumer (faster). Depend on builds + tests to catch semantic updates. Compliance No generic solution No generic solution Impact Semantic versioning Land and forget. Move responsibility to client. Unlawful / Improper use Special tools (FOSSology) or services (e.g. BlackDuck) but with tons of FPs. N/A, usually deployed within a company Monorepos have the same problems, faster!
• Does this vulnerability affect my code? • Am I linking to GPL code? • Fully precise impact analysis • How many clients will I break if I change this? • Can I safely update? • Effectively, augmenting soundness with more precision
ecosystem 2.Generate call graphs for each package 3.Build unique ids for nodes (functions) 4.Link the call graphs https://cdn.pixabay.com/photo/2014/12/21/23/28/recipe-575434_960_720.png
what they accept • Need to account for missing packages • Need to identify and fix dependency descriptors • Need to deal with compilation errors • Dependency version ranges make dependency graphs time-dependent • Resolution at t must only consider versions released before t • Global call-graphs need to be dynamic B v1.1 A v3.0 1.* A v3.0 B v1.1@t1 B v1.2@t2 1.* 1.* t1 t2 time
chains etc are not compatible/available • 2 types of nodes: • Normal function calls (statically resolved) • Linkage points, when function calls cross dependencies (dynamically resolved) • Nodes can have arbitrary metadata: containing file, vulnerabilities, license, etc S S S S S S T S S T S 1 S 2
resolve the latest version for each dependency in the transitive closure released in t1 < t. 2.Retrieve the call graph for each resolved package 3.Identify linkage points and link them 4.Analyze the client application and link to dependency call graph
sound, but not precise • RustPräzi is precise by construction, but may be not sound • Rust PDN vs RustPräzi-extracted PDN’: 18k different edges • Sampled and manually analyzed 381 edges (95% conf interval)
problems • Unused dependencies • Dependencies only used in test code • Dynamic dispatch • Generic functions • Conditional compilation • Macros Präzi can be as sound as the call graph generator used
Python, incl integration to pkg managers • Analyses on top of it: • Can I safely update? • Security vulnerability propagation • Dependency risk profiling • Compliance monitoring • A centralised service to host the graphs and serve the analyses • Getting the tools to the hands of developers
z() PyPi Package Repositories Project information R E S T A P I Vulnerability Information Storage layer W e b U I Continuous Integration Server Developer Call-graph construction Security Compliance Change impact Quality and Risk Analysis layer software analytics as streams
Horizon 2020 research and innovation programme under grant agreement No 825328. The opinions expressed in this document reflects only the author`s view and in no way reflect the European Commission’s opinions. The European Commission is not responsible for any use that may be made of the information it contains.