End-to-end learning for autonomous driving
Urs Muller presents the architecture and training methods used to build an autonomous road-following system. A key aspect of the approach is eliminating the need for hand-programmed rules and procedures such as finding lane markings, guardrails, or other cars, thereby avoiding the creation of a large number of if, then, else statements.
Talk Title | End-to-end learning for autonomous driving |
Speakers | Urs Muller (NVIDIA) |
Conference | O’Reilly Artificial Intelligence Conference |
Conf Tag | |
Location | New York, New York |
Date | September 26-27, 2016 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Urs Muller presents the architecture and training methods used to build an autonomous road-following system. A key aspect of the approach is eliminating the need for hand-programmed rules and procedures such as finding lane markings, guardrails, or other cars, thereby avoiding the creation of a large number of “if, then, else” statements. Urs and his team trained a convolutional neural network (CNN) to map raw pixels from a single front-facing camera directly to steering commands, using an NVIDIA DevBox and Torch 7 for training and an NVIDIA DRIVE PX self-driving car computer also running Torch 7 for determining where to drive. The system operates at 30 frames per second (FPS). This end-to-end approach proved surprisingly powerful. With minimal training data from humans, the system learns to drive in traffic on local roads with or without lane markings and on highways. It also operates in areas with unclear visual guidance, such as in parking lots and on unpaved roads. The system automatically learns internal representations of the necessary processing steps such as detecting useful road features with only the human steering angle as the training signal. It was never explicitly trained to detect, for example, the outline of roads. Compared to explicit decomposition of the problem, such as lane marking detection, path planning, and control, this end-to-end system optimizes all processing steps simultaneously. This will eventually lead to better performance—because the internal components self-optimize to maximize overall system performance instead of optimizing human-selected intermediate criteria (e.g., lane detection)—and smaller systems made possible because the system learns to solve the problem with the minimal number of processing steps.