Performance anomaly detection at scale (sponsored by Salesforce)
Automated anomaly detection in production using simple data science techniques enables you to more quickly identify an issue and reduce the time it takes to get customers out of an outage. Tuli Nivas shows how to apply simple statistics to change how performance data is viewed and how to easily and effectively identify issues in production.
|Talk Title||Performance anomaly detection at scale (sponsored by Salesforce)|
|Speakers||Tuli Nivas (Salesforce)|
|Conf Tag||Build resilient systems at scale|
|Location||New York, New York|
|Date||September 20-22, 2016|
As performance engineers, we understand the importance of software testing during and after development in order to identify any and all performance bottlenecks. Due to various constraints—whether a scaled-down test environment, data volume, or code integration limitations—it’s not always possible to catch all bugs in test. If performance bottlenecks are not identified and resolved in a timely manner, there’s a chance customers may be impacted. As a result, anomaly detection in production takes on an even bigger significance. The scale at which this kind of anomaly detection needs to be done is noteworthy—few servers in test versus thousands of servers in production, with time being of the utmost essence. That’s why anomaly detection at scale is one the biggest challenges for a performance engineer. One of the most widely used techniques to identify performance bugs is to look at time series data for the various metrics, which can then be used to find potential problems. However, this approach doesn’t scale well in production, even if time series data can be consolidated into a few charts. Tuli Nivas shares techniques that address how time consuming this kind of analysis can be and demonstrates how applying simple statistics and basic linear regression principles can improve productivity of a performance engineer tenfold or more. This session is sponsored by Salesforce.