logo Insalogo Insa

Softwares and methods of statistical exploratory data analysis


  • Introduction to exploratory analysis of big data
  • Syntax and objects of R and Python languages, functions, object and functional programmation (python).
  • Multivariate exploratory statistical analysis. Principal component, discriminant and correspondence analyses, multidimensional scaling, hierarchical clustering, k-means, mixture models and DBSCAN.


At the end of this module, the student will have understood and be able to explain (main concepts):

  • Data base organisation of R data frames. Syntaxes of R language and Python languages. R and Python functions design, program and test.
  • Statistical analyses of multidimensional data: dimension reduction and clustering with R and Python.
  • Statistical interpretation of various graphical displays including the different kinds of factor analyses and clustering.


The student will be able to:

  • Manage big data sets with R and Python.
  • Lead exploratory data analyses of real big data. It includes univariate, bivariate and multivariate data analyses featuring PCA, MCA, FDA, kmeans, mixture models, DBSCAN… depending on data structures and analysis purposes;
  • Detect relevant structures within complex data sets and compile insightful interpretations.

Needed prerequisite

Statistics [I3MIMT41]

Form of assessment

The evaluation of outcome prior learning is made as a continuous training during the semester. According ot the teaching, the assessment will be different: as a written exam, an oral exam, a record, a written report, peers review...