Apache Spark and machine learning on microservices
Hadoop-based data platforms that power ETL jobs and machine learning pipelines are great examples of monolithic architectures that could be redesigned with microservices. Stepan Pushkarev walks you through building and deploying data processing, reporting services, training, and prediction pipelines as decoupled microservices connected with the rest of the enterprise architecture.
Talk Title | Apache Spark and machine learning on microservices |
Speakers | Stepan Pushkarev (hydrosphere.io) |
Conference | O’Reilly Software Architecture Conference |
Conf Tag | Engineering the Future of Software |
Location | London, United Kingdom |
Date | October 16-18, 2017 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Usually data scientists find it challenging to create a clean REST API; likewise, web developers find it almost impossible to understand machine learning internals. And big data engineers tend to use clunky Hadoop distributions with dozens of tightly coupled tools and then continue to follow this design, developing data processing scripts that communicate through unmanageable state and shared flags. Hydrosphere.io helps data scientists and big data engineers plug into modern reactive and microservices architectures that have already been adopted by traditional web and enterprise teams. Hadoop-based data platforms that power ETL jobs and machine learning pipelines are great examples of monolithic architectures that could be redesigned with microservices. Stepan Pushkarev walks you through building and deploying data processing, reporting services, training, and prediction pipelines as decoupled microservices connected with the rest of the enterprise architecture. Topics include: