Natural Language Processing

Research

This group focuses on methodological and applied research in the context of natural language processing (NLP), including (but not limited to) the following topics:

Reproducibility/Comparability/Benchmarking of LLMs
Active Learning for NLP
Resources and Evaluation
Bias and Stereotypes
Multi-Modal Deep Learning
Uncertainty quantification

We have ongoing collaborations with the Bavarian Academy of Sciences (M. Schöffel), the MISODA working group at LMU (C. Heumann, E. Garces Arias), and the University of Applied Sciences Munich (V. Thurner, S. Thiemichen, S. Urchs).

Teaching

We are actively developing the Deep Learning for Natural Language Processing (DL4NLP) course together with colleagues from LMU Munich and the University of Vienna.

Members

Name				Position
Dr. Matthias Aßenmacher				Lead
Esteban Garces Arias				(External) Collaborating PhD Student
Matthias Schöffel				(External) Collaborating PhD Student
Stefanie Urchs				(External) Collaborating PhD Student

Students / Thesis supervision

If you are interested in writing your thesis under our supervision, please include the following information in your e-mail
- your field of interest and at least a tentative idea for the direction of a potential thesis topic
- a CV, and your current transcript of records
- a planned starting date for your thesis (you should also bring some time for developing and refining a research idea, so do not expect to start in e.g. one week)
Disclaimer: Before you apply for a thesis topic regarding NLP make sure that you fit the following profile:
- Willingness and ability to engage in a topic which (potentially) requires a notable amount of self-study, since it is normally not part of the regular curriculum of your studies in statistics.
- Readiness to do quite some programming (most probably in Python)
- Please include the following information in the email mentioned above: Previously experience/attended classes on NLP, deep learning, machine learning, and programming.
- This is not meant to discourage you from writing your thesis on NLP, but rather to get expectations straight in advance.
If you want to apply for supervision of an external thesis, please also include the following information in your email:
- A clear formulation of the thesis goal from an academic perspective of ~1 page (It should not be a pure business case, such projects are better suited for e.g. the Consulting module)
- Information on the external partner, data availability (detailed please), computational resources supplied by the project partner (if applicable)
- Again: Not meant to discourage you or to set any artificial barriers, but to get expectations/goals straight in advance.

Publications

Gruber C, Hechinger K, Aßenmacher M, Kauermann G, Plank B (2024) More Labels or Cases? Assessing Label Variation in Natural Language Inference Proceedings of the Third Workshop on Understanding Implicit and Underspecified Language, pp. 22–32. Association for Computational Linguistics, Malta.
link|pdf.
Deiseroth B, Meuer M, Gritsch N, Eichenberg C, Schramowski P, Aßenmacher M, Kersting K (2024) Divergent Token Metrics: Measuring degradation to prune away LLM components – and optimize quantization. Accepted at NAACL 2024.
link|pdf.
Garces Arias E, Pai V, Schöffel M, Heumann C, Aßenmacher M (2023) Automatic Transcription of Handwritten Old Occitan Language Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp. 15416–15439. Association for Computational Linguistics, Singapore.
link|pdf.
Öztürk IT, Nedelchev R, Heumann C, Garces Arias E, Roger M, Bischl B, Aßenmacher M (2023) How Different Is Stereotypical Bias Across Languages? 3rd Workshop on Bias and Fairness in AI (co-located with ECML-PKDD 2023),
link|pdf.
Witte M, Schwenzow J, Heitmann M, Reisenbichler M, Aßenmacher M (2023) Potential for Decision Aids based on Natural Language Processing Proceedings of the European Marketing Academy, 52nd, (114322),
link|pdf.
Aßenmacher M, Rauch L, Goschenhofer J, Stephan A, Bischl B, Roth B, Sick B (2023) Towards Enhancing Deep Active Learning with Weak Supervision and Constrained Clustering Proceedings of the 7th Workshop on Interactive Adaptive Learning (co-located with ECML-PKDD 2023),
link|pdf.
Aßenmacher M, Sauter N, Heumann C (2023) Classifying multilingual party manifestos: Domain transfer across country, time, and genre. arXiv preprint arXiv:2307.16511.
link|pdf.
Akkus C, Chu L, Djakovic V, Jauch-Walser S, Koch P, Loss G, Marquardt C, Moldovan M, Sauter N, Schneider M, Schulte R, Urbanczyk K, Goschenhofer J, Heumann C, Hvingelby R, Schalk D, Aßenmacher M (2023) Multimodal Deep Learning. arXiv preprint arXiv:2301.04856.
link|pdf.
Garces Arias E, Pai V, Schöffel M, Heumann C, Aßenmacher M (2023) Automatic transcription of handwritten Old Occitan language Accepted at EMNLP 2023,
Koch P, Nuñez GV, Garces Arias E, Heumann C, Schöffel M, Häberlin A, Aßenmacher M (2023) A tailored Handwritten-Text-Recognition System for Medieval Latin First Workshop on Ancient Language Processing (ALP 2023),
link|pdf.
Rauch L, Aßenmacher M, Huseljic D, Wirth M, Bischl B, Sick B (2023) ActiveGLAE: A Benchmark for Deep Active Learning with Transformers ECML-PKDD 2023,
link|pdf.
Schulze P, Wiegrebe S, Thurner PW, Heumann C, Aßenmacher M, Wankmüller S (2023) Exploring Topic-Metadata Relationships with the STM: A Bayesian Approach. Accepted at Advances in Statistical Analysis (AStA).
link.
Urchs S, Thurner V, Aßenmacher M, Heumann C, Thiemichen S (2023) How Prevalent is Gender Bias in ChatGPT? - Exploring German and English ChatGPT Responses 1st Workshop on Biased Data in Conversational Agents (co-located with ECML-PKDD 2023),
link|pdf.
Aßenmacher M, Dietrich M, Elmaklizi A, Hemauer EM, Wagenknecht N (2022) Whitepaper: New Tools for Old Problems.
link.
Koch P, Aßenmacher M, Heumann C (2022) Pre-trained language models evaluating themselves - A comparative study Proceedings of the Third Workshop on Insights from Negative Results in NLP, pp. 180–187. Association for Computational Linguistics, Dublin, Ireland.
link|pdf.
Lebmeier E, Aßenmacher M, Heumann C (2022) On the current state of reproducibility and reporting of uncertainty for Aspect-based Sentiment Analysis Machine Learning and Knowledge Discovery in Databases (ECML-PKDD), Springer International Publishing, Grenoble, France.
pdf.
Goschenhofer J, Ragupathy P, Heumann C, Bischl B, Aßenmacher M (2022) CC-Top: Constrained Clustering for Dynamic Topic Discovery Workshop on Ever Evolving NLP (EvoNLP), Association for Computational Linguistics, Abu Dhabi, United Arab Emirates.
link|pdf.