What's in Your Containers? Tracing the Origin of Binaries
We are all building containers from base images with possibly questionable pre-built binaries every day. Why? We do not know what is in our own containers. Modern software is routinely assembled from …
Talk Title | What's in Your Containers? Tracing the Origin of Binaries |
Speakers | Philippe Ombredanne (ScanCode toolkit maintainer, Scancode toolkit and nexB Inc.) |
Conference | Open Source Summit North America |
Conf Tag | |
Location | Los Angeles, CA, United States |
Date | Sep 10-14, 2017 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
We are all building containers from base images with possibly questionable pre-built binaries every day. Why? We do not know what is in our own containers. Modern software is routinely assembled from a combo of 1000’s open source and vendor-provided packages that we reuse as pre-built binaries (and sometimes build from sources). A unknown, buggy or vulnerable package will sneak in easily in such a large quantity of third-party code packages where most of them are FOSS/open source. Join me to dive in advanced techniques to identify which known packages are built into Elfs binaries either libraries or static exes. We will first review some basic approaches to identify distro and application packages using static analyzers (without running a container!) and existing techniques for binary analysis using symbols and content-defined fingerprints with locality sensitive hashing. We will then review a new approach to determine the origin the code in binaries based on shared or unique binary information sets to build efficient indexes of the minimal signatures needed to identify packages and versions of packages (such as OpenSSL) that may be statically linked in arbitrary binaries. Finally we will show how this approach can be used for automated detection by subverting anti-virus scanners for known binary identification. And relate the collected origin information to actual known vulnerabilities.