logo Insalogo Insa

Data Analysis


Programme (detailed contents) :

  • Introduction to exploratory analysis of big data
  • Syntax and objects of R and Python languages, functions, object and functional programmation (Python).
  • Multivariate exploratory statistical analysis. Principal component, discriminant and correspondence analyses, multidimensional scaling, NMF, hierarchical clustering, k-means, mixture models and EM algorithm, DBSCAN.


Organisation :

  • Lectures : 27.5 h
  • Practical work of applications on real data sets with the software R and Python’s libraries (Scikit Learn) : 27.5 h


Main difficulties for students :

The achievement of a practical experience for conducting any statistical exploratory analyses on big and complex data sets.


At the end of this module, the student will have understood and be able to explain (main concepts) :

  • Data base organisation of R and Python data frames. Syntaxes R and Python languages. R and Python functions design, program and test.
  • Statistical analyses of multidimensional data: dimension reduction and clustering with R and Python.
  • Statistical interpretation of various graphical displays including the different kinds of factor analyses and clustering.


The student will be able to :

  • Manage big data sets with R and Python.
  • Lead exploratory data analyses of real big data. It includes univariate, bivariate and multivariate data analyses featuring PCA, MCA, FDA, NMF kmeans, mixture models, DBSCAN… depending on data structures and analysis purposes;
  • Detect relevant structures within complex data sets and compile insightful interpretations.

Needed prerequisite

Statistics [I3MIMT41]

Form of assessment

The evaluation of outcome prior learning is made as a continuous training during the semester. According ot the teaching, the assessment will be different: as a written exam, an oral exam, a record, a written report, peers review...