Data Science, Big Data and Statistics

This article analyzes how Big Data is changing the way we learn from observations.We describe the changes in statistical methods in seven areas that have been shaped bythe Big Data-rich environment: the emergence of new sources of information; visualization in high dimensions; multiple testing problems; analysis of heterogeneity;automatic model selection; estimation methods for sparse models; and merging network information with statistical models. Next, we compare the statistical approachwith those in computer science and machine learning and argue that the convergenceof different methodologies for data analysis will be the core of the new field of datascience. Then, we present two examples of Big Data analysis in which several new tools discussed previously are applied, as using network information or combiningdifferent sources of data. Finally, the article concludes with some final remarks.

Focus: Methods or Design
Source: TEST
Redability: Expert
Type: PDF Article
Open Source: No
Keywords: Machine learning, Sparse model selection, Statistical learning, Network analysis, Multivariate data , Time series
Learn Tags: Bias Data Collection/Data Set Data Tools Design/Methods Small Data
Summary: An argument that traditional statistical methods were developed for small data sets and are not suitable for current large and complex data sets.