Hands-on data science with Python

5.00 credits

30.0 h + 15.0 h

Teacher(s)

Caelen Olivier; Heuchenne Cédric;

Language

English

Prerequisites

Knowledge in programming (e.g. LDATS2030)

Main themes

This course covers the essential conceptual and practical components of applied data science. The major themes include:
– Application domains of Data Science and real-world case studies illustrating end-to-end project development.
– The standard life-cycle of a data science project: data acquisition, exploratory analysis, data cleaning and preprocessing, model development, evaluation, and deployment.
– Fundamentals of supervised learning, with an emphasis on classification and regression problems.
– Introduction to classical machine learning models, including decision trees, k-nearest neighbours, and neural networks…
– Practical implementation using the Python scientific ecosystem and modern tools supporting reproducible pipelines and deployment.

Learning outcomes

At the end of this learning unit, the student is able to :

With regard to the AA framework of the Master [120] in Data Science : Statistic, this activity contributes to the development and acquisition of the following AAs:

as first priority: 1.1 – 1.2 – 1.3 – 2.1 – 2.4 – 2.5 – 6.3
as secondary: 4.1 – 4.2 – 4.4 – 5.3 – 5.6 – 6.1

With regard to the AA framework of the Master [120] in Statistics: General, this activity contributes to the development and acquisition of the following AAs:

as first priority: 1.3 – 2.2 – 2.5 – 3.3 – 5.3
as secondary: 2.4 – 5.4 – 5.6 – 6.3

Content

Introduction to Data Science
1. Overview of the data science workflow: business understanding, data understanding, modelling, evaluation, deployment
2. Case-study
Data Extraction and Data Manipulation in Python
1. Introduction to the Python scientific ecosystem: NumPy, pandas, …
2. Data loading from files and APIs (CSV, JSON, SQL)
3. Data preprocessing: missing values, feature engineering, encoding categorical variables, scaling and normalization …
4. Exploratory data analysis and visualisation (matplotlib, seaborn,…)
Supervised Learning: Classification and Regression
1. Introduction to the Python machine learning ecosystem: sklearn & statsmodels
2. Classical machine learning models for tabular data:
  1. k-nearest neighbours
  2. decision trees and random forest
  3. introduction to neural networks and PyTorch
Model Evaluation and Interpretability
1. Train/validation/test splits, cross-validation, performance metrics (accuracy, F1, ROC-AUC, MSE)
2. Feature importance and model explainability (SHAP, permutation importance)
Introduction to Tools and Computing Environment
1. Jupyter, MLflow, Streamlit, FastAPI

Teaching methods

The course combines Ex-cathedra course supported by slides with practical computer sessions in which students apply each concept using Python notebooks.

Evaluation methods

Group project and individual oral exam. The oral exam may include questions about the project. Both parts are mandatory to pass the course.

Bibliography

“Hands‑On Machine Learning with Scikit‑Learn and PyTorch: Concepts, Tools, and Techniques to Build Intelligent Systems”, Aurélien Géron, O’Reilly Media, ISBN -13: 979-8341607989
“Machine Learning with PyTorch and Scikit-Learn: Develop machine learning and deep learning models with Python”, Sebastian Raschka, Yuxi (Hayden) Liu & Vahid Mirjalili, Packt Publishing, ISBN-13: 978-1801819312.

Faculty or entity

> LSBA

Programmes / formations proposant cette unité d'enseignement (UE)

Title of the programme

Sigle

Credits

Prerequisites

Learning outcomes

Master [120] in Data Science : Statistic

DATS2M

Master [120] in Statistics: Biostatistics

BSTA2M

Master [120] in Linguistics

LING2M

Master [120] in Environmental Bioengineering

BIRE2M

Master [120] in Mathematics

MATH2M

Master [120] in Actuarial Science

ACTU2M

Master [120] in Statistics: General

STAT2M

Master [120] in Chemistry and Bioindustries

BIRC2M

Master [120] in Mathematical Engineering

MAP2M

Minor in Statistics, Actuarial Sciences and Data Sciences

MINSTAT

Certificat d'université : Statistique et science des données (15/30 crédits)

STAT2FC