Data Science Seminars and Challenge






Ronaldo Phlypo and Sana Louhichi


This course contains two parts.

Part I concerns Data challenge.

This part consists in a real problem that is given to the students for which data are readily available. The goal is to have teams of five to six students compete in solving (at least partially) the problem.

The work is spread over the Autumn semester and consists of: building a prediction model or a methodology to solve the problem based on a set of training data, blind evaluation of the model or methodology on a test bench (unseen data, withheld from the students), using an appropriate performance measure.

At the end, the teams will present their solution path in a formal presentation and a short report.

Part II concerns Data Science seminars.

This is a cycle of seminars or presentations with a common factor that is the project of the data challenge. A first seminar will settle the context and the problem for that year’s data challenge.

The other seminars will propose different industrial or academic approaches and problems that are (loosely) related to the objective of the data challenge. Presentations have a time slot of one hour and students will have to read up front some ressources to orient their questions about the subject after the seminar.

Course Outline


basic concepts on applied mathematics, probability, statistics


  • written report (10–20 pages) 40% (discusses the problem, details the developed method(s) with a bibliography covering the state-of-the-art and situates the problem or one of the proposed approaches with respect to one of the seminars in 1–2 pages)
  • oral presentation 20% (discusses the problem, the proposed technical solution, and perspectives)
  • project utility 20% (covers a utility vote from the customer/company andâ‹…or a ranking score of the proposed solution)