Data science at Deutsche Telekom: Predicting global travel patterns and network demand
Knowledge of customers' location and travel patterns is important for many companies, including German telco service operator Deutsche Telekom. Vclav Surovec and Gabor Kotalik explain how a commercial roaming project using Cloudera Hadoop helped the company better analyze the behavior of its customers from 10 countries and provide better predictions and visualizations for management.
|Talk Title||Data science at Deutsche Telekom: Predicting global travel patterns and network demand|
|Speakers||Vaclav Surovec (Deutsche Telekom), Gabor Kotalik (Deutsche Telekom)|
|Conference||Strata Data Conference|
|Conf Tag||Making Data Work|
|Location||London, United Kingdom|
|Date||April 30-May 2, 2019|
Knowledge of customers’ location and travel patterns is important for many companies, including German telco service operator Deutsche Telekom. Václav Surovec and Gabor Kotalik explain how a commercial roaming project using Cloudera Hadoop helped the company better analyze the behavior of its customers from 10 countries and provide better predictions and visualizations for management. Gabor begins by discussing the motivation and business use case for the project. The Commercial Roaming Department deals with analyses of how the customers of its network, in Germany and nine other countries in Europe within the DTAG group, use other service providers’ networks and vice versa. These analyses are very important for negotiations with other service providers (roaming partners) in the world (Orange, Vodafone, O2, AT&T, Verizon, etc.). Every service provider must have a contract with each other about agreed price list of how much the service provider (as DT) will pay to its roaming partner (as Verizon) for its customers using the network in the foreign country. The roaming environment is rapidly changing, so it’s essential to have a clear picture on the customer and travel patterns and to have a better understanding on the drivers behind them. Václav then covers the security aspect and architecture. You’ll learn why Deutsche Telekom decided to use Cloudera Hadoop to build a platform to support the necessary ad hoc and regular analytical activities. Because of very strict requirements from the Local Security and Data Privacy Department, the platform has to be a very secure environment. All customer data coming from the network must be anonymized and aggregated so it’s not possible to identify the exact location of a specific customer or the customer themself, but it must be still possible to use the data for the analyses and predictions. A very important part of the implemented security concept is Sentry, which (with Kerberos and LDAP) is used to authenticate and authorize the user so they are able to see only the data they are allowed to. Anonymization, aggregation, and lookup methods are implemented in PySpark and the keys that are used for anonymization of the sensitive values (phone number, SIM card ID, phone ID, location, etc.) are stored in an HSM (hardware security module) outside of the Cloudera Hadoop cluster. Cleansed data is then stored in HDFS and Parquet in semiflat/JSON format so they are accessible via ad hoc SQL queries via Hive or Impala. Visualization is achieved with Solr and Hue. The dashboards are regularly updated and then shared with Deutsche Telekom’s upper management. Cloudera Hadoop 6 was recently released, and commercial roaming analysts are missing some functions in Solr dashboards. Václav details Deutsche Telekom’s recent experience using C6, highlighting some of the challenges and possible workarounds.