Statistics and data sciences

lepl1109  2023-2024  Louvain-la-Neuve

Statistics and data sciences
5.00 credits
30.0 h + 30.0 h
Q1
Language
French
Prerequisites
To follow this course the student must have a basic knowledge of probabilities such as taught in courses LEPL1108 or LBIR1212.
Main themes
This course presents the fundamental statistical concepts in an engineering context (exploratory analysis, inference, simulation) as well as basis method for analysing multivariate databases (like the linear regression, the principal component analysis and the classification).
Learning outcomes

At the end of this learning unit, the student is able to :

1
  • Explore datasets of small and big sizes with few or many dimensions
  • Infer features of a population from a sample using techniques of inference, estimation, confidence intervals and statistical tests.
  • To connect the deductive approach from the probability theory to the statistical inductive approach, and to identify the probabilistic models used in statistical inference. 
  • To translate the textual formulation of a problem of statistical inference into an accurate, statistical and mathematical formalism, while recognizing the adequate models and corresponding estimation methods.
  • To solve an applied problem by following a logical approach based on a correct use of models and statistical inference.
  • To use techniques of Monte-Carlo simulations, K-fold cross validation and bootstrapping in order to estimate models and validate results.
  • To analyse multivariate data with fundamental methods of linear regressions, of principal component analysis and of classification/clustering.
  • To use statistical tools to validate the conclusions from a model e.g. like the linear regression.
  • To make the link between the mathematical objectives of a method of data mining and its practical purposes.
 
Content
- Exploratory analysis and sampling
- Introduction to multivariate data analysis
- Parametric estimate (methods of moments and log-likelihood maximization) and properties of estimators (bias, variance, mean-squared error).
- Statistical inference (confidence intervals and significance tests): comparison of means of two or several normal populations, proportions, variance testing.
- Linear regression, including the analysis of coefficients and significance tests.
- Panorama of learning techniques, supervised and unsupervised learning methods
- Links between objectives of data analysis methods and their mathematical representation.
- Regression and classification methods (such as linear models and least square, k-nearest neighbors, logistic regression)
- Training, test error and generalization error, the Bias-Variance tradeoff, and elements of statistical decision theory
- Resampling techniques for model selection/evaluation (e.g., validation set, K-fold cross validation)
- Unsupervised learning: reduction of dimension (principal component analysis) and methods of clustering (K-means).
Teaching methods
The course is composed of:
- 10 lectures on the topics listed in the course content;
- 9 practical sessions, both classical and numerical;
- 3 hackathons associated with small Python projects realized in group on subjects discovered both in the lectures and in the practical sessions.
Evaluation methods
Written individual exam to evaluate the understanding of concepts and techniques   The hackathons represents 6 points (over 20) of the final mark. Lecturers keep the right to orally question students about their exam and hackathons.
  • Individual written exam (in-session) to assess understanding of concepts and techniques (theory and exercises, in the form of multiple choice exercises and open questions). This exam represents 14 points (out of 20) of the final course grade.
  • The hackathons are evaluated during the semester (off-session*) and the average of their ratings accounts for 6 points (out of 20) of the final course grade. The mark obtained for the hackathons is acquired for all sessions of the academic year.
The teachers reserve the right to question the student orally both on the answers to his exam and on the hackathons.
*: Hackathons will result in a single overall mark for out-of-session assessment. Failure to comply with the methodological guidelines set out on moodle, particularly with regard to the use of online resources or collaboration between students, will result in an overall mark of 0 for the out-of-session assessment.
Other information
To follow this course the student must have a basic knowledge of probabilities such as taught in courses LEPL1108 or LBIR1212. The schedule of course is subject to modifications due to sanitary conditions. Please check the Moodle website for more details.
Online resources
The totality of teaching material is available on the companion moodle website of the course. Please consult it for additional information.
Faculty or entity


Programmes / formations proposant cette unité d'enseignement (UE)

Title of the programme
Sigle
Credits
Prerequisites
Learning outcomes
Bachelor in Engineering

Master [120] in Environmental Science and Management

Bachelor in Computer Science