# Statistics

## Presentation

Program (detailed contents):

1. Descriptive statistics

• Unidimensional descriptive statistics
• Bidimensional descriptive statistics
• Principal component analysis

2. Probability

• Gaussian, chi-square, Student, … distributions: definition and properties
• Multidimensional Gaussian distribution: definition and main properties, Cochran’s theorem

3. Inferential statistics

• Estimation in a parametric model: method of moments, maximum likelihood
• Cramer-Rao bound, efficiency of an estimator
• Confidence interval for the mean and the variance of a Gaussian or a non-Gaussian sample
• Parametric hypothesis testing: concept, tests on the mean, tests on the variance, p-value, test for comparing two Gaussian distributions, Neyman-Pearson’s lemma, maximum likelihood ratio tests

4. Sampling:

• Sampling modelling: definitions, sampling design, inclusion probabilities, Horvitz-Thompson estimator
• Simple random sampling with (SSAR) and without (SSSR) replacement: estimation of a total and a mean, a proportion and a ratio. Comparison between SSAR and SSSR
• Stratified sampling: estimation of a total and a mean, stratified sampling with proportional allocation, optimal stratified sampling
• Cluster sampling and sampling of two stages: estimation of a total and a mean, uniform cluster sampling, proportional cluster sampling

Organization:

For the four parts of this module, a copy of the Lecture Notes will be given to the students. Theoretical tools will be presented (CM), alternating with tutorials (TD) and lab-work (TP) with the R software, spread as follows:

1. Descriptive statistics:

5 x 1.25 = 6.25 CM

2x1.25=2.5 TD

3x2.75=8.25 TP

2. Probability:

2x1.25=2.5 CM

2x1.25=2.5 TD

3. Inferential statistics:

12x1.25=15 CM

10x1.25=12.5 TD

2 x 2.75 = 5.5 TP

4. Sampling:

6 x 1.25=7.5 CM

5 x 1.25 = 6.25 TD

Main difficulties for students:

• Understand the principal component analysis
• Know the usual probability distributions such as Gaussian, chi-square, Student, …
• Understand the main concepts of the inferential statistics
• Acquire the presented sampling methods

This module presents the fundamental tools of statistics. This calls for dedicated and regular homework.

## Objectives

At the end of this module, the student will have understood and be able to explain (main concepts):

• the main definitions for the unidimensional and bi-dimensional descriptive statistics
• the main concepts and the interpretation of graphical results of the principal component analysis
• the properties of multidimensional Gaussian variables and the usual probability distributions as Gaussian, chi-square, Student and Fisher distributions
• parameter estimation in a parametric model
• building of a confidence interval
• building of hypothesis testing
• the main concepts of sampling
• the different sampling methods presented in this module

The student will be able to:

1. Descriptive statistics:

• develop a descriptive statistical analysis with the R software
• manipulate the concepts of the principal component analysis, know the main properties and comment the graphical results

2. Probability:

• manipulate the usual probability distributions among which the multidimensional Gaussian variables

3. Inferential statistics:

• Estimate parameters in a parametric model and study the properties of these estimators
• Build a confidence interval
• Build a hypothesis testing

4. Sampling:

• model a sampling strategy
• differentiate the sampling strategies introduced in this module
• build an estimator

• evaluate the bias and the variance of a Horvitz-Thompson estimator and propose an estimator for the unknown variance

## Needed prerequisite

Probability and Statistics (MIC2) I2MIMT31

## Form of assessment

The evaluation of outcome prior learning is made as a continuous training during the semester. According ot the teaching, the assessment will be different: as a written exam, an oral exam, a record, a written report, peers review...