Real-time fraud detection using process mining with Spark Streaming
If you consider user click paths a process, you can apply process mining. Process mining models users based on their actual behavior, which allows us to compare new clicks with modeled behavior and report any inconsistencies. Bolke de Bruin and Hylke Hendriksen explain how ING implemented process mining on Spark Streaming, enabling real-time fraud detection.
Talk Title | Real-time fraud detection using process mining with Spark Streaming |
Speakers | Bolke de Bruin (ING), Hylke Hendriksen (ING) |
Conference | Strata + Hadoop World |
Conf Tag | Big Data Expo |
Location | San Jose, California |
Date | March 29-31, 2016 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
How can we detect fraudulent behavior on our websites? Of any size? In real time? Bolke de Bruin and Hylke Hendriksen explain how, by considering a user’s click path a followed process, ING applied a technique called process mining and adapted it to Spark Streaming, resulting in near real-time fraud detection and analysis. Process mining is a field of research that focuses on automatically extracting and modeling or using the knowledge from event logs produced by information systems. (In ING’s case, these event logs are represented by click logs.) Specifically, modeling the process on the basis of generated event logs means that the model contains the process as it occurred in real life, which is not necessarily equivalent to the design of the process. As a result, the modeled process can be used to find the differences between the actual and the designed process, detecting problems, anomalies, and potential fraud. Process mining had never before been used on a distributed stream computing platform. Adapting to Spark Streaming meant taking the most applicable process-mining algorithm, making sure it was fully incremental, understanding the exact operation, and using the functionality of Spark Streaming while adhering to the fundamentals of process mining. The implementation is able to handle all process mining use cases and is usable in other fields within the company. The end result of this endeavor is a real-time distributed fraud detection technique applicable to any other process represented by an event log, making it usable for, for example, early warning in automated processes or finding errors or misuse in processes such as loan application handling.