The version you’re consulting is not final. This course description may change. The final version will be published on 1st June.
5.00 credits
30.0 h + 15.0 h
Q2
Teacher(s)
Language
English
Prerequisites
Knowledge in programming (e.g. LDATS2030)
Main themes
This course covers the essential conceptual and practical components of applied data science. The major themes include:
– Application domains of Data Science and real-world case studies illustrating end-to-end project development.
– The standard life-cycle of a data science project: data acquisition, exploratory analysis, data cleaning and preprocessing, model development, evaluation, and deployment.
– Fundamentals of supervised learning, with an emphasis on classification and regression problems.
– Introduction to classical machine learning models, including decision trees, k-nearest neighbours, and neural networks…
– Practical implementation using the Python scientific ecosystem and modern tools supporting reproducible pipelines and deployment.
– Application domains of Data Science and real-world case studies illustrating end-to-end project development.
– The standard life-cycle of a data science project: data acquisition, exploratory analysis, data cleaning and preprocessing, model development, evaluation, and deployment.
– Fundamentals of supervised learning, with an emphasis on classification and regression problems.
– Introduction to classical machine learning models, including decision trees, k-nearest neighbours, and neural networks…
– Practical implementation using the Python scientific ecosystem and modern tools supporting reproducible pipelines and deployment.
Learning outcomes
At the end of this learning unit, the student is able to : | |
| 1 | With regard to the AA framework of the Master [120] in Data Science : Statistic, this activity contributes to the development and acquisition of the following AAs:
|
Content
- Introduction to Data Science
- Overview of the data science workflow: business understanding, data understanding, modelling, evaluation, deployment
- Case-study
- Data Extraction and Data Manipulation in Python
- Introduction to the Python scientific ecosystem: NumPy, pandas, …
- Data loading from files and APIs (CSV, JSON, SQL)
- Data preprocessing: missing values, feature engineering, encoding categorical variables, scaling and normalization …
- Exploratory data analysis and visualisation (matplotlib, seaborn,…)
- Supervised Learning: Classification and Regression
- Introduction to the Python machine learning ecosystem: sklearn & statsmodels
- Classical machine learning models for tabular data:
- k-nearest neighbours
- decision trees and random forest
- introduction to neural networks and PyTorch
- Model Evaluation and Interpretability
- Train/validation/test splits, cross-validation, performance metrics (accuracy, F1, ROC-AUC, MSE)
- Feature importance and model explainability (SHAP, permutation importance)
- Introduction to Tools and Computing Environment
- Jupyter, MLflow, Streamlit, FastAPI
Teaching methods
The course combines Ex-cathedra course supported by slides with practical computer sessions in which students apply each concept using Python notebooks.
Evaluation methods
Group project and individual oral exam. The oral exam may include questions about the project. Both parts are mandatory to pass the course.
Bibliography
“Hands‑On Machine Learning with Scikit‑Learn and PyTorch: Concepts, Tools, and Techniques to Build Intelligent Systems”, Aurélien Géron, O’Reilly Media, ISBN -13: 979-8341607989
“Machine Learning with PyTorch and Scikit-Learn: Develop machine learning and deep learning models with Python”, Sebastian Raschka, Yuxi (Hayden) Liu & Vahid Mirjalili, Packt Publishing, ISBN-13: 978-1801819312.
“Machine Learning with PyTorch and Scikit-Learn: Develop machine learning and deep learning models with Python”, Sebastian Raschka, Yuxi (Hayden) Liu & Vahid Mirjalili, Packt Publishing, ISBN-13: 978-1801819312.
Faculty or entity
Programmes / formations proposant cette unité d'enseignement (UE)
Title of the programme
Sigle
Credits
Prerequisites
Learning outcomes
Master [120] in Data Science : Statistic
Master [120] in Statistics: Biostatistics
Master [120] in Linguistics
Master [120] in Environmental Bioengineering
Master [120] in Mathematics
Master [120] in Actuarial Science
Master [120] in Statistics: General
Master [120] in Chemistry and Bioindustries
Master [120] in Mathematical Engineering
Minor in Statistics, Actuarial Sciences and Data Sciences
Certificat d'université : Statistique et science des données (15/30 crédits)