December 8, 2019

219 words 2 mins read

Improving computer vision models at scale

Improving computer vision models at scale

Rigorous improvement of an image recognition model often requires multiple iterations of eyeballing outliers, inspecting statistics of the output labels, then modifying and retraining the model. Marton Balassi, Mirko Kmpf, and Jan Kunigk share a solution that automates the process of running the model on the testing data and populating an index of the labels so they become searchable.


Talk Title	Improving computer vision models at scale
Speakers	Marton Balassi (Cloudera), Mirko Kämpf (Cloudera), Jan Kunigk (Cloudera)
Conference	Strata Data Conference
Conf Tag	Making Data Work
Location	London, United Kingdom
Date	May 22-24, 2018
URL	Talk Page
Slides	Talk Slides
Video

Rigorous improvement of an image recognition model often requires multiple iterations of eyeballing outliers, inspecting statistics of the output labels, then modifying and retraining the model. When testing data is present at the petabyte scale, the ability to seamlessly access all the images that have been assigned specific labels poses a technical challenge by itself. Marton Balassi, Mirko Kämpf, and Jan Kunigk share a solution that automates the process of running the model on the testing data and populating an index of the labels so they become searchable. Images and labels are stored in HBase. The model is encapsulated in a PySpark program, while the images are indexed with Solr and can be accessed from a Hue dashboard.

spark computer vision dashboard

comments powered by Disqus

Building deep reinforcement learning applications on BigDL and Spark

Building deep reinforcement learning applications on BigDL and Spark

December 3, 2019

Deep reinforcement learning is a thriving area and has wide applications in industry. Arsenii Mustafin shares his experience developing deep reinforcement learning applications on BigDL and Spark.

Smart diagnosis in healthcare with deep learning

Smart diagnosis in healthcare with deep learning

December 3, 2019

Deep learning with ConvNet in particular has emerged as a promising tool in medical research labs and diagnostic centers to help analyze images and scans, and systems are now surpassing human capability for manual inspection. Nishant Sahay explains how to apply deep learning to analyze high-end microscope images and X-ray scans to provide accurate diagnosis.

Scalable Monitoring Using Prometheus with Apache Spark

Scalable Monitoring Using Prometheus with Apache Spark

December 1, 2019

As spark applications move to a containerized environment, there are many questions about how to best configure server systems in the container world. In this talk we will demonstrate a set of tools t …

Machine-learned model quality monitoring in fast data and streaming applications

Machine-learned model quality monitoring in fast data and streaming applications

December 7, 2019

Most machine learning algorithms are designed to work on stationary data, but real-life streaming data is rarely stationary. Models lose prediction accuracy over time if they are not retrained. Without model quality monitoring, retraining decisions are suboptimal and costly. Emre Velipasaoglu reviews monitoring methods, focusing on their applicability in fast data and streaming applications.

Making Big Data Processing Portable. The Story of Apache Beam and gRPC

Making Big Data Processing Portable. The Story of Apache Beam and gRPC

December 7, 2019

Big data applications have been an almost exclusive domain of Java and Scala developers. This not only frustrates engineers who prefer other languages and their ecosystems, but also impedes companies …

Running data analytic workloads in the cloud

Running data analytic workloads in the cloud

December 6, 2019

Vinithra Varadharajan, Jason Wang, Eugene Fratkin, and Mael Ropars detail new paradigms to effectively run production-level pipelines with minimal operational overhead. Join in to learn how to remove barriers to data discovery, metadata sharing, and access control.