The working group typically offers various thesis topics each semester in the areas computational statistics, machine learning, data mining, optimization and statistical software. You’re welcome to suggest your own topic as well.

Before you apply for a thesis topic make sure that you fit the following profile:

Before you start writing your thesis you must look for a supervisor within the working group.

Send an email to janek.thomas [at] with the following information:

Your application will only be processed if it contains all required information.

Potential Thesis Topics

[Potential Thesis Topics] [Student Research Projects] [Current Theses] [Completed Theses]

Below is a list of potential thesis topics. Before you start writing your thesis you must look for a supervisor within the working group.

For a list of current theses click here. For a list of completed theses click here.

* Learning Embeddings for Categorical Variables (Betreuer: Florian Pfisterer)

Many machine learnings naturally lend themselves to numeric data. In order for them to be able to deal with categorical data, either extensions of the algorithms or numerical representations (one-hot encoding etc.) are required. A class of those numerical representations are so called ‘embeddings’, that can be obtained for example from neural networks. Embeddings can be learned from datasets using different methods. Methods that allow for learning embeddings will be implemented and tested in this thesis.

Possible directions:

* Compressing Ensembles of Machine Learning Models (Betreuer: Florian Pfisterer)

Complex ensembles of machine learning models are usually more performant, but very hard to deploy in real world applications, such as mobile phones, machines etc. The question to be answered in this work, is whether we can compress the results of an ensemble into a single model, that is (possibly) easily deployable with minimal prerequisites and (technical, time-) overhead. Training of NN’s can be simplified, as overfitting on the predictions of the ensemble is no longer a problem, but something to strive for. A possible class of those approximators can be the family of (feed-forward) neural networks. The work includes implementing functionality that allows for training a learner on the output of an arbitrary ensemble / model. Afterwards, an evaluation of the model performance and resulting stability / usability in the proposed context of compression needs to be conducted. This includes comparing different NN architectures with respect to stability, and evaluating possible extensions to the usual training processes, that would allow for faster or more stable training. An additional question is, whether some parts of preprocessing can also be approximated in this way, which would further reduce the overhead required for real world deployment of such models.

* Multi-Output Prediction (Betreuer: Quay Au)

The general learning task of predicting multiple targets, which could be real-valued, binary, ordinal, categorical or even of mixed type is known as multi-output prediction. The general idea is to improve the accuracy of a predictor by making use of the statistical dependencies among the output variables. Methods, which transform the multi-output prediction problem into single-output prediction problems, so that ordinary classification and regression algorithms can be applied, shall be implemented in the machine learning R package mlr. The evaluation of multi-output prediction problems, is inherently a challenging task and shall be worked out in this thesis.

* Highdimensional Seature Selection (Betreuer: Xudong Sun)

High-dimensional feature selection remains a challenging topic. High-dimensional data include functional data like curve or video data, high-throughput biotechnology data and so on. This project will explore new advances in this field. Ideally, implementation could be done for at least one up-to-date algorithm.

* Video Activity Detection Using Convolutional Recurrent Neural Networks (Betreuer: Xudong Sun)

This project will utilize some state of art models in recurrent neural network and convolutional neural network and benchmark the results on some public datasets, for instance UCF101. This is an application of functional on scalar classification extended from the one dimensional curve case to multidimensional image case.

* Online Machine Learning Implementation (Betreuer: Xudong Sun)

This project is about implementation of several online machine learning algorithms like online RDA or online boosting. Applicants need a sound understanding of exisiting algorithms and adapt them to online fashion and implement them in R and/or RCPP.

Student Research Projects

[Potential Thesis Topics] [Student Research Projects] [Current Theses] [Completed Theses]

We are always interested in mentoring interesting student research projects. Please contact us directly with an interesting resarch idea. In the future you will also be able to find research project topics below.

For more information please visit the official web pageStudentische Forschungsprojekte (Lehre@LMU)

Current Theses (With Working Titles)

[Potential Thesis Topics] [Student Research Projects] [Current Theses] [Completed Theses]

Student Title Type
J. Moosbauer Bayesian Optimization under Noise for Model Selection in Machine Learning MA
J. Fried Interpretable Machine Learning - An Application Study using the Munich Rent Index MA
B. Burger Average Marginal Effects in Machine Learning MA
J. Goschenhofer   MA
S. Gruber Visualization and Efficient Replay Memory for Reinforcement Learning BA

Completed Theses

[Potential Thesis Topics] [Student Research Projects] [Current Theses] [Completed Theses]

Completed Theses (LMU Munich)

Student Title Type Completed
S. Coors Automatic Gradient Boosting MA 2018
D. Schalk Efficient and Distributed Model-Based Boosting for Large Datasets MA 2018
K. Engelhardt Linear individual model-agnostic explanations - discussion and empirical analysis of modifications MA 2018
N. Klein Extending Hyperband with Model-Based Sampling Strategies MA 2018
M. Dumke Reinforcement learning in R MA 2018
M. Lee Anomaly Detection using Machine Learning Methods MA 2018
J. Langer RNN Bandmatrix MA 2018
B. Klepper Configuration of deep neural networks using model-based optimization MA 2017
F. Pfisterer Kernelized anomaly detection MA 2017
M. Binder Automatic model selection amd hyperparameter optimization MA 2017
V. Mayer mlrMBO / RF distance based infill criteria MA 2017
L. Haller Kostensensitive Entscheidungsbäume für beobachtungsabhängige Kosten BA 2016
B. Zhang Implementation of 3D Model Visualization for Machine Learning BA 2016
T. Riebe Eine Simulationsstudie zum Sampled Boosting BA 2016
P. Rösch Implementation and Comparison of Stacking Methods for Machine Learning MA 2016
M. Erdmann Runtime estimation of ML models BA 2016
A.Exterkate Process Mining: Checking Methods for Process Conformance MA 2016
J.-Q. Au Implementation of Multilabel Algorithms and their Application on Driving Data MA 2016
  (J.-Q. Au was a master student of TU Dortmund)    
J. Thomas Stability Selection for Component-Wise Gradient Boosting in Multiple Dimensions MA 2016
A. Franz Detecting Future Equipment Failures: Predictive Maintenance in Chemical Industrial Plants MA 2016
T. Kühn Fault Detection for Fire Alarm Systems based on Sensor Data MA 2016
B. Schober Laufzeitanalyse von Klassifikationsverfahren in R BA 2015
F. Pfisterer Benchmark Analysis for Machine Learning in R BA 2015
T. Kühn Implementierung und Evaluation ergänzender Korrekturmethoden für statistische Lernverfahren BA 2014
  bei unbalancierten Klassifikationsproblemen    

Completed Theses (Supervised by Bernd Bischl at TU Dortmund)

Student Title Type Completed
P. Probst Anwendung von Multilabel-Klassifikationsverfahren auf Medizingerätestatusreporte zur Generierung von Reparaturvorschlägen MA 2015
D. Kirchhoff Erweiterung der Plattform OpenML um Ereigniszeitanalysen MA 2015
J. Bossek Modellgestützte Algorithmenkonfiguration bei Feature-basierten Instanzen: Ein Ansatz über das Profile-Expected-Improvement Dipl. 2015
J. Richter Modellbasierte Hyperparameteroptimierung für maschinelle Lernverfahren auf großen Daten MA 2015
B. Elkemann Implementierung einer Testsuite für mehrkriterielle Optimierungsprobleme BA 2014
M. Dagge R-Pakete für Datenmanagement und -manipulation großer Datensätze BA 2014
K. U. Schorck Lokale Kriging-Verfahren zur Modellierung und Optimierung gemischter Parameterräume mit Abhängigkeitsstrukturen BA 2014
P. Kerschke Kostensensitive Algorithmenselektion für stetige Black-Box-Optimierungsprobleme basierend auf explorativer Landschaftsanalyse MA 2013
D. Horn Exploratory Landscape Analysis für mehrkriterielle Optimierungsprobleme MA 2013
J. Bossek Feature-based Algorithm Selection for the Traveling-Salesman-Problem BA 2013
O. Meyer Implementierung und Untersuchung einer parallelen Support Vector Machine in R Dipl. 2013
S. Hess Sequential Model-Based Optimization by Ensembles: A Reinforcement Learning Based Approach Dipl. 2012
P. Kerschke Vorhersage der Verkehrsdichte in Warschau basierend auf dem Traffic Simulation Framework BA 2011
L. Schlieker Klassifikation von Blutgefäßen und Neuronen des menschlichen Gehirns anhand von ultramikroskopierten 3D-Bilddaten BA 2011
H. Riedel Uncertainty Sampling zur Auswahl optimaler Sampler aus der trunkierten Normalverteilung BA 2011
S. Meinke Over-/Undersampling für unbalancierte Klassifikationsprobleme im Zwei-Klassen-Fall BA 2010