[TALK]@Telematika

dataset

Evolution of a modern cloud-based data lake

March 3, 2020

Building a data lake is a hard task. You have to centralize all the data of the company in one place, it must be easily accessible, and governance has to be done right. And, last but not least, the price has to stay reasonable. All those aspects come up as quite a challenge. But never fear. Viacheslav Inozemtsev outlines the experience of building Zalando's data lake.

Getting Started with Modern Time Series Database and Grafana

March 3, 2020

== Intro == Telemetry / Monitoring, also known as Observability, as been a hot topic in the software industry for few years. With so many moving pieces, the mindse …

Keynote: Trusted AI: Bringing Trust Back into AI through Open Source

February 27, 2020

As businesses move beyond experimentation to full-blown AI projects across the enterprise, they are recognizing that theres more to successful implementations than simply having the right datasets, A …

A novel solution for a data augmentation and bias problem in NLP using TensorFlow

February 24, 2020

Join KC Tung to discover a way to use TensorFlow to solve a natural language processing (NLP) model bias problem with data augmentation for an enterprise customer (one of the largest airlines in the world). KC leveraged hidden gems in tf.data and the new API to easily find a novel use for text generation and found it surprisingly improved his NLP model.

Effective sampling methods within TensorFlow input functions

February 24, 2020

Many real-world machine learning applications require generative or reductive sampling of data. Laxmi Prajapat and William Fletcher demonstrate sampling techniques applied to training and testing data directly inside the input function using the tf.data API.

Generative malware outbreak detection

February 23, 2020

Practical defense systems require precise detection during malware outbreaks with only a handful of available samples. Sean Park demonstrates how to detect in-the-wild malware samples with a single training sample of a kind, with the help of TensorFlow's flexible architecture in implementing a novel variable-length generative adversarial autoencoder.

Open Source Tools for ML Experiments Management

February 23, 2020

The rise of new AI and ML requires new workflows and new tools: data versioning, ML pipeline versioning, experiments metrics visualization and others that have not been formalized and even named yet.T …

Anomaly detection using deep learning to measure the quality of large datasets

February 22, 2020

Any business, big or small, depends on analytics, whether the goal is revenue generation, churn reduction, or sales or marketing purposes. No matter the algorithm and the techniques used, the result depends on the accuracy and consistency of the data being processed. Sridhar Alla examines some techniques used to evaluate the quality of data and the means to detect the anomalies in the data.

Audience projection of target consumers over multiple domains: A NER and Bayesian approach

February 21, 2020

AI-powered market research is performed by indirect approaches based on sparse and implicit consumer feedback (e.g., social network interactions, web browsing, or online purchases). These approaches are more scalable, authentic, and suitable for real-time consumer insights. Gianmario Spacagna proposes a novel algorithm of audience projection able to provide consumer insights over multiple domains.

Architecting a data analytics service both in the public cloud and in the on-premise private cloud: ETL, BI, and machine learning (sponsored by SK Holdings)

February 16, 2020

Jungwook Seo walks you through a data analytics platform in the cloud by the name of AccuInsight+ with eight data analytic services in the CloudZ (one of the biggest cloud service providers in Korea), which SK Holdings announced in January 2019.

Learning with limited labeled data

February 12, 2020

Supervised machine learning requires large labeled datasetsa prohibitive limitation in many real world applications. But this could be avoided if machines could earn with a few labeled examples. Shioulin Sam explores and demonstrates an algorithmic solution that relies on collaboration between human and machine to label smartly, and she outlines product possibilities.

Sketching data and other magic tricks

February 10, 2020

Go hands-on with Sophie Watson and William Benton to examine data structures that let you answer interesting queries about massive datasets in fixed amounts of space and constant time. This seems like magic, but they'll explain the key trick that makes it possible and show you how to use these structures for real-world machine learning and data engineering applications.

Working with time series: Denoising and imputation frameworks to improve data density

February 8, 2020

The application of smoothing and imputation strategies is common practice in predictive modeling and time series analysis. With a technique-agnostic approach, Anjali Samani provides qualitative and quantitative frameworks that address questions related to smoothing and imputation of missing values to improve data density.

Your easy move to serverless computing and radically simplified data processing

February 7, 2020

Most analytic flows can benefit from serverless, starting with simple cases to and moving to complex data preparations for AI frameworks like TensorFlow. To address the challenge of how to easily integrate serverless without major disruptions to your system, Gil Vernik explores the push to the cloud experience, which dramatically simplifies serverless for big data processing frameworks.

Machine Learning Models and Datasets Versioning Practices and Tools

February 5, 2020

The rise of AI and ML changes development workflow and requires new development tools: data versioning, ML pipeline versioning, experiments metrics tracking and others that have not been formalized an …

Lightning Talk: A Perfect Match: AI4EU and Acumos for Europe

February 4, 2020

This presentation will make the link between Linux Foundations' Acumos project and the AI4EU project, which has started at the beginning of 2019 and will run for 3 years. At the beginning of 2020 we w …

Long-term real-time network traffic flow prediction using LSTM recurrent neural network

February 4, 2020

Real-time traffic volume prediction is vital in proactive network management, and many forecasting models have been proposed to address this. However, most are unable to fully use the information in traffic data to generate efficient and accurate traffic predictions for a longer term. Wei Cai explores predicting multistep, real-time traffic volume using many-to-one LSTM and many-to-many LSTM.

Putting cutting-edge modern NLP into practice

February 3, 2020

AllenNLP is a PyTorch-based library designed to make it easy to do high-quality research in natural language processing (NLP). Joel Grus explains what modern neural NLP looks like; you'll get your hands dirty training some models, writing some code, and learning how you can apply these techniques to your own datasets and problems.

Scaling AI at Cerebras

February 3, 2020

Long training times are the single biggest factor slowing down innovation in deep learning. Today's common approach of scaling large workloads out over many small processors is inefficient and requires extensive model tuning. Urs Kster explains why with increasing model and dataset sizes, new ideas are needed to reduce training times.

Building machine learning inference pipelines at scale

January 31, 2020

Real-life ML workloads require more than training and predicting: data often needs to be preprocessed and postprocessed. Developers and data scientists have to train and deploy a sequence of algorithms that collaborate in delivering predictions from raw data. Julien Simon outlines how to build machine learning inference pipelines using open source libraries and how to scale them on AWS.

Managing machines

January 27, 2020

Machine learning (ML) drove massive growth at consumer internet companies over the last decade, enabled by open software, datasets, and AI research. For many problems, ML will produce better, faster, and more repeatable decisions at scale. Unfortunately, building and maintaining these systems is difficult and expensive. Pete Skomoroch explores what you need to produce better ML results.

Removing unfair bias in machine learning using open source (sponsored by IBM)

January 25, 2020

ML models are increasingly used to make decisions that impact lives. Ana Echeverri and Trisha Mahoney walk you through how to use the open source Python package AI Fairness 360, developed by IBM researchers, a comprehensive open source toolkit empowering users with metrics to check for unwanted bias in datasets and machine learning models and state-of-the-art algorithms to mitigate such bias.

Walking on Hexagons: Unification of Urban Data with H3

January 24, 2020

We are proposing the use of H3, as a standard hexagonal discrete global grid system, to index data for analysis. When it comes to big data, H3 is uniquely suited to this analysis because of its hexago …

Build your own data lake with AWS Glue and Amazon Athena (sponsored by Amazon Web Services)

January 14, 2020

Damon Cortesi demonstrates how to use AWS Glue and Amazon Athena to implement an end-to-end pipeline.

Herding elephants: Seamless data access in a multicluster clouds

January 10, 2020

Travel platform Expedia Group likes to give its data teams flexibility and autonomy to work with different technologies. However, this approach generates challenges that cannot be solved by existing tools. Pradeep Bhadani and Elliot West explain how the company built a unified virtual data lake on top of its many heterogeneous and distributed data platforms.

Learning "learning to rank"

January 9, 2020

Identifying relevant documents quickly and efficiently enhances both user experience and business revenue every day. Sophie Watson demonstrates how to implement learning-to-rank algorithms and provides you with the information you need to implement your own successful ranking system.

Mastering data with Spark and machine learning

January 8, 2020

Enterprise data on customers, vendors, and products is often siloed and represented differently in diverse systems, hurting analytics, compliance, regulatory reporting, and 360 views. Traditional rule-based MDM systems with legacy architectures struggle to unify this growing data. Sonal Goyal offers an overview of a modern master data application using Spark, Cassandra, ML, and Elastic.

Reinforcement learning: A gentle introduction and an industrial application

January 6, 2020

Reinforcement learning (RL) learns complex processes autonomously like walking, beating the world champion in Go, or flying a helicopter. No big datasets with the right answers are needed: the algorithms learn by experimenting. Christian Hidber shows how and why RL works and demonstrates how to apply it to an industrial hydraulics application with 7,000 clients in 42 countries.