November 28, 2019

340 words 2 mins read

Code Property Graph: A modern, queryable data storage for source code

Code Property Graph: A modern, queryable data storage for source code

Vlad Ionescu and Fabian Yamaguchi outline Code Property Graph (CPG), a unique approach that allows the functional elements of code to be represented in an interconnected graph of data and control flows, which enables semantic information about code to be stored scalably on distributed graph databases over the web while allowing them to be rapidly accessed.

Talk Title Code Property Graph: A modern, queryable data storage for source code
Speakers Vlad A Ionescu (ShiftLeft), Fabian Yamaguchi (ShiftLeft)
Conference Strata Data Conference
Conf Tag Big Data Expo
Location San Jose, California
Date March 6-8, 2018
URL Talk Page
Slides Talk Slides
Video

Modern software and its development process has become tremendously complex. Systems are now built on polyglot environments with multiple dependencies and large code bases. As the code size increases and new code contributors are added, it is imperative for developers to have an in-depth understanding of the code itself. Vlad Ionescu and Fabian Yamaguchi outline Code Property Graph (CPG), a unique approach that presents code as a queryable collection of data with which a developer can interact and ask relevant questions—much like a search engine. CPG allows the functional elements of code such as variables and methods to be represented in an interconnected graph of data and control flows—think of it like Facebook’s graph search, but the functions and variables are now your friends—which enables semantic information about code to be stored scalably on distributed graph databases over the web while allowing them to be rapidly accessed. The CPG-based data structure for storing code allows us to identify associations between function and data and query them for finding known bugs and issues. This queryable representation of a software’s entire codebase also allows developers to identify severe security and performance regression issues before they hit production environments and gives them insight to explore and quickly find solutions that would have taken a large amount of their time. The data stored in large CPGs can also be mined for automated analysis, which brings out associations with different code segments. Topics include:

comments powered by Disqus