 # Statistics

## Presentation

Program (detailed contents):

1. Descriptive statistics

• Unidimensional descriptive statistics
• Bidimensional descriptive statistics
• Principal component analysis

2. Probability

• Gaussian, chi-square, Student, … distributions: definition and properties
• Multidimensional Gaussian distribution: definition and main properties, Cochran’s theorem

3. Inferential statistics

• Estimation in a parametric model: method of moments, maximum likelihood
• Cramer-Rao bound, efficiency of an estimator
• Confidence interval for the mean and the variance of a Gaussian or a non-Gaussian sample
• Parametric hypothesis testing: concept, tests on the mean, tests on the variance, p-value, test for comparing two Gaussian distributions, Neyman-Pearson’s lemma, maximum likelihood ratio tests

4. Sampling:

• Sampling modelling: definitions, sampling design, inclusion probabilities, Horvitz-Thompson estimator
• Simple random sampling with (SSAR) and without (SSSR) replacement: estimation of a total and a mean, a proportion and a ratio. Comparison between SSAR and SSSR
• Stratified sampling: estimation of a total and a mean, stratified sampling with proportional allocation, optimal stratified sampling
• Cluster sampling and sampling of two stages: estimation of a total and a mean, uniform cluster sampling, proportional cluster sampling

Organization:

For the four parts of this module, a copy of the Lecture Notes will be given to the students. Theoretical tools will be presented (CM), alternating with tutorials (TD) and lab-work (TP) with the R software, spread as follows:

1. Descriptive statistics:

5 x 1.25 = 6.25 CM

2x1.25=2.5 TD

3x2.75=8.25 TP

2. Probability:

2x1.25=2.5 CM

2x1.25=2.5 TD

3. Inferential statistics:

12x1.25=15 CM

10x1.25=12.5 TD

2 x 2.75 = 5.5 TP

4. Sampling:

6 x 1.25=7.5 CM

5 x 1.25 = 6.25 TD

Main difficulties for students:

• Understand the principal component analysis
• Know the usual probability distributions such as Gaussian, chi-square, Student, …
• Understand the main concepts of the inferential statistics
• Acquire the presented sampling methods

This module presents the fundamental tools of statistics. This calls for dedicated and regular homework.

## Objectives

At the end of this module, the student will have understood and be able to explain (main concepts):

• the main definitions for the unidimensional and bi-dimensional descriptive statistics
• the main concepts and the interpretation of graphical results of the principal component analysis
• the properties of multidimensional Gaussian variables and the usual probability distributions as Gaussian, chi-square, Student and Fisher distributions
• parameter estimation in a parametric model
• building of a confidence interval
• building of hypothesis testing
• the main concepts of sampling
• the different sampling methods presented in this module

The student will be able to:

1. Descriptive statistics:

• develop a descriptive statistical analysis with the R software
• manipulate the concepts of the principal component analysis, know the main properties and comment the graphical results

2. Probability:

• manipulate the usual probability distributions among which the multidimensional Gaussian variables

3. Inferential statistics:

• Estimate parameters in a parametric model and study the properties of these estimators
• Build a confidence interval
• Build a hypothesis testing

4. Sampling:

• model a sampling strategy
• differentiate the sampling strategies introduced in this module
• build an estimator

• evaluate the bias and the variance of a Horvitz-Thompson estimator and propose an estimator for the unknown variance

## Needed prerequisite

Probability and Statistics (MIC2) I2MIMT31

## Form of assessment

The evaluation of outcome prior learning is made as a continuous training during the semester. According ot the teaching, the assessment will be different: as a written exam, an oral exam, a record, a written report, peers review...

## Bibliography

Y. Tillé, Théorie des sondages : Echantillonnage et estimation en populations finies, Dunod, 2001

P. Ardilly, Les techniques de sondages, Technip, 2006

G. Saporta, Probabilités, analyse des données et Statistique, Technip, 2011

F. Husson, S. Lê et J. Pagès, Analyse de données avec R, Presses universitaire de Rennes, 2016

P. Barbe et M. Ledoux, Probabilités, EDP Sciences, 2007

N. Bouleau, Probabilités de l’ingénieur, Hermann, 1986

V. Rivoirard et G. Stoltz, Statistique en action, Vuibert, 2012

L. Wasserman, All of statistics, Springer, 2010

## Additional information

descriptive statistics, PCA, Gaussian vector, estimation, confidence interval, hypothesis testing, simple sampling, stratified sampling, cluster sampling