# Mathematical Foundations of Machine Learning (new syllabus in 2025)

**Program**

The program is composed of two parts of offline learning and online learning presented below.

### Part I: Offline learning (taught by Massih-Reza Amini)

Supervised Learning

This part gives an overview of foundations of supervised learning. We will see that learning is an inductive process where a general rule is to be found from a finite set of labeled observations by minimizing the empirical risk of the rule over that set. The study of consistency gives conditions that, in the limit of infinite sample sizes, the minimizer of the empirical risk will lead to a value of the risk that is as good as the best attainable risk. The direct minimization of the empirical risk is not tractable as the latter is not derivative, hence learning algorithms find the parameters of the learning rule by minimizing a convex upper-bound (or surrogate) of the empirical risk. We present, classical strategies for unconstrained convex optimization: gradient descente, Quasi-Newton approach, and conjugate gradient descente. We present classical learning algorithms for binary classification: the perceptron, logistic regression and boosting by linking the development of these models to the Empirical Risk Minimization framework as well as the Multi-class classification paradigm. Particularly, we present Multi-Layer Perceptron as well as the back-propagation algorithm that is in use in deep learning.

Unsupervised and semi-supervised learning

In this part, we will present generative models for clustering as well as two powerful tools for parameter estimation namely Expectation-Maximization (EM) and Classification Expectation-Maximization (CEM) algorithms. In the context of Big Data, labeling observations for learning is a tedious task. Semiu-supervised paradigm aims at learining with few labeled and a huge amount of unlabeled data. In this part we review the three families of techniques proposed in semi-supervised learning, that is Graphical, Generative and Discriminant models.

### Part II: Online learning (taught by Pierre Gaillard and Nicolas Gast)

**Adversarial bandits and online learning**

This part present different key paradigms that address sequential decision-making under uncertainty. Online prediction with expert advice which focuses on leveraging the wisdom of multiple experts to make predictions, where the goal is to perform nearly as well as the best expert in hindsight. This approach is crucial in scenarios where the true underlying model is unknown, and the learner must adapt to the advice of various experts over time. Online convex optimization extends this concept to more general settings, allowing for the optimization of convex functions in an online manner. Here, the learner makes decisions in a convex set and receives feedback in the form of convex loss functions, aiming to minimize regret over time. And finally, Adversarial bandits introduce a more challenging setting where the learner must explore and exploit in an environment where the rewards are chosen by an adversary. Unlike stochastic bandits, adversarial bandits do not assume any probabilistic structure in the reward generation process, making the learning task significantly more complex. The learner must employ strategies that balance exploration and exploitation effectively, even in the face of potentially malicious reward assignments. These three paradigms collectively contribute to the robustness and versatility of online learning algorithms, enabling them to tackle a wide range of real-world problems characterized by uncertainty and adversity.

**Reinforcement learning**

This part presents Reinforcement Learning (RL) which is a type of machine learning where an agent learns to make decisions by interacting with an environment, often modeled as a Markov Decision Process (MDP), which provides a mathematical framework describing how the agent’s actions affect the environment’s states and rewards. Classical RL algorithms, such as Q-Learning and SARSA, focus on estimating value functions or direct policy optimization to maximize cumulative rewards. These algorithms have laid the foundation for understanding and solving sequential decision-making problems. Modern RL has seen significant advancements with the integration of deep learning, giving rise to Deep Reinforcement Learning (Deep RL) algorithms like Deep Q-Networks (DQN) and Proximal Policy Optimization (PPO), which can handle high-dimensional state and action spaces. Additionally, techniques like Monte Carlo Tree Search (MCTS), popularized by AlphaGo, have further enhanced RL by enabling more sophisticated planning and decision-making. These modern approaches have expanded the applicability of RL to complex real-world problems, including game playing, robotics, and resource management, where traditional methods may struggle.

**Materials**

**In brief**

- Period : Semester 9
- Credits : 6 ECTS
- Number of hours : 36h

**Pedagogical team**

**References**

**[1]** Massih-Reza Amini - Machine Learning, de la théorie à la pratique, Eyrolles (2nd edition), 2020.

**[2]** Christopher Bishop - Neural Networks for Pattern Recognition, Oxford University Press, 1995.

**[3]** Richard Duda, Peter Hart & David Strok - Pattern Classification, John Wiley & Sons, 1997.

**[4]** John Shawe-Taylor & Nello Cristianini - Kernel Methods for Pattern Analysis, Cambridge University Press, 2004.

**[5]** Colin McDiarmid - On the method of bounded differences, Surveys in Combinatorics, 141:148-188, 1989.

**[6]** Mehryar Mohri, Afshin Rostamzadeh & Ameet Talwalker - Foundations of Machine Learning, MIT Press, 2012.

**[7]** Bernhard Schölkopf & Alexander J. Smola - Learning with Kernels, MIT Press, 2002.

**[8]** Vladimir Kolchinskii - Rademacher penalties and structural risk minimization, IEEE Transactions on Information Theory, 47(5):1902–1914, 2001.