The version you’re consulting is not final. This course description may change. The final version will be published on 1st June.
5.00 credits
30.0 h + 15.0 h
Q2
Teacher(s)
Language
English
> French-friendly
> French-friendly
Prerequisites
Artificial Intelligence, as covered by LINFO1361
Main themes
- Foundations of Reinforcement Learning (RL)
- Multi-armed bandits and exploration/exploitation
- Markov Decision Processes (MDP)
- Solving with dynamic programming
- Monte Carlo methods
- Temporal Difference Learning methods (Q-learning)
- Deep Reinforcement Learning
- Value function approximations (DQN and variants)
- Policy gradient methods (REINFORCE, AC, PPO)
- Monte Carlo Tree Search
- Large Reasoning Models and RL from Human Feedback
- Applications to games and simulated environments
- Contemporary challenges, limitations, and perspectives of RL
Learning outcomes
At the end of this learning unit, the student is able to : | |
Given the learning outcomes of the "Master in Computer Science and Engineering" program, this course contributes to the development, acquisition and evaluation of the following learning outcomes:
|
|
Content
- General introduction to RL (agent, environment, states, actions, rewards, policy, value functions, convergence).
Multi-armed bandits (Exploration/Exploitation, ε-greedy, upper confidence bound, softmax, Thompson sampling, Regrets).
- Markov Decision Processes: formalism and dynamics (Markov property, stochastic vs deterministic policies, action-value functions, Bellman equation, optimality).
Solving with dynamic programming (policy evaluation, policy iteration, value iteration).
- Monte Carlo methods (state-value and action-value estimation, convergence).
Temporal Difference Learning (Bootstrap, TD(0), variance, online learning).
- Q-Learning algorithms.
- Function approximation and Deep Q-Networks (gradient, nonlinear approximation, DQN).
- Monte Carlo Tree Search and deep variants.
Advanced exploration (REINFORCE, Actor-Critic, Proximal Policy Optimization).
- Introduction to Large Reasoning Models (LRMs) and RL from Human Feedback (RLHF) - Language Modeling, Direct Preference Optimization (DPO), supervised Fine-Tuning.
- Applications to games and simulated environments with the open-source Gymnasium library.
- Case studies (Atari, CartPole, LunarLander) and/or practical project on implementation and comparative analysis of methods.
Evaluation methods
- Project (30%): design and implementation of an AI based on RL for a situation involving an opponent (stochastic and imperfect-information game). The project will take the form of a friendly competition among students.
- Assignment 1 (10%): implementation of a classical RL algorithm.
- Assignment 2 (10%): implementation of a deep RL algorithm.
- Assignment 3 (10%): reading and critical analysis of a recent paper on RL.
- Final exam (40%): the final exam will be comprehensive; it covers the entire material and is open-book.
Other information
Background / prerequisites :
- LBIR1304 ou LFSAB1105 : a course on probability theory and mathematical statistics,
- LBIR1200 ou LFSAB1101 : a course on linear and matrix algebra,
- LFSAB1402 : a good Python programming course,
- A course in multivariate calculus (mathematics).
Online resources
Available on Moodle
Bibliography
Some recommended reference books :
- Alpaydin (2004), "Introduction to machine learning". MIT Press.
- Bardos (2001), "Analyse discriminante. Application au risque et scoring financier. Dunod.
- Bishop (1995), "Neural networks for pattern recognition". Clarendon Press.
- Bishop (2006), "Pattern recognition and machine learning". Springer-Verlag.
- Bouroche & Saporta (1983), "L'analyse des données". Que Sais-je.
- Cornuéjols & Miclet (2002), "Apprentissage artificiel. Concepts et algorithmes". Eyrolles.
- Duda, Hart & Stork (2001), "Pattern classification, 2nd ed". John Wiley & Sons.
- Dunham (2003), "Data mining. Introductory and advanced topics". Prentice-Hall.
- Greenacre (1984), "Theory and applications of correspondence analysis". Academic Press.
- Han & Kamber (2005), "Data mining: Concepts and techniques, 2nd ed.". Morgan Kaufmann.
- Hand (1981), "Discrimination and classification". John Wiley & Sons.
- Hardle & Simar (2003), "Applied multivariate statistical analysis". Springer-Verlag. Disponible à http://www.quantlet.com/mdstat/scripts/mva/htmlbook/mvahtml.html
- Hastie, Tibshirani & Friedman (2001), "The elements of statistical learning". Springer-Verlag.
- Johnson & Wichern (2002), "Applied multivariate statistical analysis, 5th ed". Prentice-Hall.
- Lebart, Morineau & Piron (1995), "Statistique exploratoire multidimensionnelle". Dunod.
- Mitchell (1997), "Machine learning". McGraw-Hill.
- Naim, Wuillemin, Leray, Pourret & Becker (2004), "Réseaux bayesiens". Editions Eyrolles.
- Nilsson (1998), "Artificial intelligence: A new synthesis". Morgan Kaufmann.
- Ripley (1996), "Pattern recognition and neural networks". Cambridge University Press.
- Rosner (1995), "Fundamentals of biostatistics, 4th ed".Wadsworth Publishing Company.
- Saporta (1990), "Probabilités, analyse des données et statistique". Editions Technip.
- Tan, Steinbach & Kumer (2005), "Introduction to data mining". Pearson.
- Theodoridis & Koutroumbas (2003), "Pattern recognition, 3th ed". Academic Press.
- Therrien (1989), "Decision, estimation and classification". Wiley & Sons.
- Venables & Ripley (2002), "Modern applied statistics with S. Springer-Verlag.
- Webb (2002), "Statistical pattern recognition, 2nd ed". John Wiley and Sons.
Faculty or entity
Programmes / formations proposant cette unité d'enseignement (UE)
Title of the programme
Sigle
Credits
Prerequisites
Learning outcomes
Master [120] in Data Science : Statistic
Master [120] in Chemical and Materials Engineering
Master [120] in Civil Engineering
Master [120] in Biomedical Engineering
Master [120] in Forests and Natural Areas Engineering
Master [120] in Environmental Bioengineering
Master [120] in Mechanical Engineering
Master [120] in Electrical Engineering
Master [120] in Physical Engineering
Master [120] in Chemistry and Bioindustries
Master [120] in Computer Science and Engineering
Master [120] in Computer Science
Master [120] in Electro-mechanical Engineering
Master [120] in Mathematical Engineering
Master [120] in Data Science Engineering
Certificat d'université : Statistique et science des données (15/30 crédits)
Master [120] in Agricultural Bioengineering
Master [120] in Data Science: Information Technology
Master [120] in Energy Engineering