Advanced Machine Learning: Applications to Vision, Audio and Text


6 ECTS - 18h


Karteek Alahari, Xavier Alameda-Pineda, Ahlame Douzal, Eric Gaussier, Georges Quénot and Didier Schwab


The course is split into two parts. During the first part, a wide range of machine learning algorithms will be discussed. The second part will focus on deep learning, and presentations more applied to the three data modalities and their combinations. The following is a non-exhaustive list of topics discussed:

  • Computing dot products in high dimension & Page Rank
  • Matrix completion/factorization (Stochastic Gradient Descent, SVD)
  • Monte-carlo, MCMC methods: Metropolis-Hastings and Gibbs Sampling
  • Unsupervised classification: Partitionning, Hierarchical, Kernel and Spectral clustering
  • Alignment and matching algorithms (local/global, pairwise/multiple), dynamic programming, Hungarian algorithm,…
  • Introduction to Deep Learning concepts, including CNN, RNN, Metric learning
  • Attention models: Self-attention, Transformers
  • Auditory data: Representation, sound source localisation and separation.
  • Natural language data: Representation, Seq2Seq, Word2Vec, Machine Translation, Pre-training strategies, Benchmarks and evaluation
  • Visual data: image and video representation, recap of traditional features, state-of-the-art neural architectures for feature extraction
  • Object detection and recognition, action recognition.
  • Multimodal learning: audio-visual data representation, multimedia retrieval.
  • Generative Adversarial Networks: Image-image translation, conditional generation


Final exam