February 3, 2020

268 words 2 mins read

How machines learn to code: Machine learning on source code

How machines learn to code: Machine learning on source code

Thomas Endres and Samuel Hopstock demonstrate how to apply machine learning techniques on a program's source code, covering problems you may encounter, how to get enough relevant training data, how to encode the source code as a feature vector so that it can be processed mathematically, what machine learning algorithms to use, and more.

Talk Title How machines learn to code: Machine learning on source code
Speakers Thomas Endres (TNG), Samuel Hopstock (TNG Technology Consulting)
Conference Artificial Intelligence Conference
Conf Tag Put AI to Work
Location London, United Kingdom
Date October 9-11, 2018
URL Talk Page
Slides Talk Slides

Machine learning on source code is a new area of research in the field of artificial intelligence, which, unlike classical problems such as image segmentation, does not yet have established standard techniques. For instance there are standard methods for processing images that make machine learning algorithms pay attention to their two-dimensionality. However, there are currently no common techniques for encoding the semantic structure of source code. Therefore, you need new ways to mathematically represent the code of projects. This technology offers a variety of possible applications, for example, in the area of static code analysis or in the automatic selection of relevant test cases. Thomas Endres and Samuel Hopstock share methods for transferring classic machine learning approaches to this new field of expertise. Along the way, Thomas and Samuel detail approaches for both automatic and manual training data generation and offer an overview of suitable models and machine learning frameworks for this challenge. They conclude by exploring the possibilities of using such models for the analysis of code.

comments powered by Disqus