Though visualization is used in data science to understand the shape of the data, it's not widely used for statistical models, which are evaluated based on numerical summaries. Amit Kapoor explores model visualization, which aids in understanding the shape of the model, the impact of parameters and input data on the model, the fit of the model, and where it can be improved.
|Talk Title||Model visualization|
|Conference||Strata + Hadoop World|
|Conf Tag||Make Data Work|
|Location||New York, New York|
|Date||September 27-29, 2016|
Data science is a process of abstraction. In order to explain or to predict a real phenomena, the process starts with acquiring and refining the data. It then moves between the three layers of abstraction: transformations (data abstraction), visualizations (visual abstraction), and modeling (symbolic abstraction). All three layers of abstraction together build a truer (or closer) representation of the real phenomena. Data visualization (data-vis) helps us to understand the portrait and the shape of the data. The science of data-vis for exploratory data analysis is well developed for both static graphics (scatter-plot matrices, glyph-based approaches, geometric transforms like parallel coordinates) and interactive graphics (layering, brushing and linking, projections and tours). (For more information, see Amit Kapoor’s Strata + Hadoop World Singapore talk, Visualizing Multidimensional Data.) Though visualization is used in data science to understand the shape of the data, it’s not widely used for statistical models, which are evaluated based on numerical summaries. Amit Kapoor demonstrates extending visualization to the statistical model (model-vis), which aids in understanding the shape of the model, the impact of parameters and input data on the model, the fit of the model, and where it can be improved. Model visualization can help us to understand the shape of the model and compare it to the shape of the data. It allows us to see the fit of the model and understand where the fit can be improved. It also allows us to better understand the parameters in the model and how the model changes when the parameters change as well as how the parameters changes when the input data changes. The science and tools for model-vis are still very underdeveloped. Amit looks at practical examples of doing model-vis in regression (linear, lasso), classification (logistic, trees, LDA), and clustering (hierarchical) problems that can help us better understand the model. This includes exploring model-vis approaches that: Integrating these approaches for model-vis as a part of model evaluation strengthens a data scientist’s understanding of the model and leads to better model building, complementing data-vis for fitting better models as well as communicating the insight from the data science process.