A DGA Odyssey: Passive DNS Driven DGA Analysis
Domain Generation Algorithm (DGA) techniques have been commonly used by bot-masters to evade detection, which dynamically produce quantities of seemingly random do …
Talk Title | A DGA Odyssey: Passive DNS Driven DGA Analysis |
Speakers | Yiming Gong, Qitian Su, Zaifeng Zhang |
Conference | NANOG71 |
Conf Tag | |
Location | San Jose, CA |
Date | Oct 2 2017 - Oct 4 2017 |
URL | Talk Page |
Slides | Talk Slides |
Video | Talk Video |
Domain Generation Algorithm (DGA) techniques have been commonly used by bot-masters to evade detection, which dynamically produce quantities of seemingly random domain names but only a few of them are selected as command and control (C&C) domains. Such technique makes the detection more difficult. To block these DGA domains, we need to find them early and identify them effectively. In this talk, we will share our DGA tracking experience based on passiveDNS and malware sandbox database. Starting from billions of PDNS records, we first extract highly suspicious DGA domains with a clustering algorithm. Then to identify the generation algorithms and seeds behind these domains, we use malware sandbox data to locate the malware samples. In the end, we have identify 36 families. Their corresponding DGA domain feeds can be freely accessed from http://data.netlab.360.com/dga. Our talk are divided into four parts:
- The DGA families we detected 36 DGA botnet families under track will be show in this part, as well as their activity stat in China.
- From the PDNS records to suspicious DGA domains It is a great challenge to identify millions of suspicious active DGA domains from billions of passiveDNS data. To address this problem, we utilize our Long Tail Cluster Algorithms (LTCA) to help to extract highly suspicious DGA domains. We will introduce the details of data cleaning and aggregation in this part.
- From suspicious DGA domains to malware samples Once we discover highly suspicious DGA domains, we need to locate the corresponding malware samples to further identify the corresponding generation algorithms and the seed. We use malware sandbox to bridge this gap. In this part, we will introduce our techniques, like time window difference, and others.
- Practical blocking experience Due to the complexity of DNS, this DGA blacklist may still suffer from both false positive and false negative in practice. Here we will introduce some practical cases we encountered and how we solve them.