January 29, 2020

209 words 1 min read

The Observatorium: Combining Machine Learning and Observability to Improve Incident Response

The Observatorium: Combining Machine Learning and Observability to Improve Incident Response

At DigitalOcean, a global hosting company predicated on providing building blocks for developers, the proliferation of microservices necessary to support a worldwide cloud creates a unique-yet-univers …

Talk Title The Observatorium: Combining Machine Learning and Observability to Improve Incident Response
Speakers Alex Kass (Engineering Manager, DigitalOcean)
Conference Open Source Summit + ELC Europe
Conf Tag
Location Lyon, France
Date Oct 27-Nov 1, 2019
URL Talk Page
Slides Talk Slides
Video

At DigitalOcean, a global hosting company predicated on providing building blocks for developers, the proliferation of microservices necessary to support a worldwide cloud creates a unique-yet-universal conundrum - while the internal code is decidedly custom to DO, the incidents that arise are common to many companies.In the Observability group, open source tools like Prometheus, Kafka, and Spark play critical roles feeding data into a central application called The Observatorium, whose primary goal is to reduce MTTD/R by curating information intelligently. Combining distributed platform data engineering and predictive machine learning, all through open source tools, the team surfaces signals essential to first responders to help improve detection times and reduce service downtime.In this talk, the speaker will describe in detail the architecture of The Observatorium, and how its creative amalgamation of OSS tools has measurably improved the company’s overall reliability.

comments powered by Disqus