Tutorials

Developing ASR Systems for Conversational Speech Transcription and Analysis

Abstract: Automatic speech recognition (ASR) has made substantial progress in recent years, but transcribing conversational speech remains a major challenge. This talk addresses these challenges to applying ASR in the analysis of talk-in-interaction, including the presence of disfluencies, non-speech tokens, overlapping speech, and the use of colloquial language and dialectal variation. I first present some practical strategies for handling such phenomena in ASR and outline their limitations. I then present empirical analyses identifying specific characteristics of conversational speech, i.e., reduction, pronunciation variation, and disfluent utterance structures, all systematically decreasing recognition performance. Finally, the talk turns to the question of how ASR-based systems may nevertheless facilitate linguistic and phonetic studies.

From Vision Language Models to Embodied AI

Elia Bruni
University of Osnabrück – Germany

Abstract: This tutorial provides a hands-on overview of how vision-language models and embodied AI systems are moving from benchmark performance to real-world deployment. We begin by examining the current landscape of multimodal foundation models — how they represent and ground language in visual perception — and where critical gaps remain when these models must operate in open-ended, physically situated environments. Drawing on recent evaluation work, we discuss why standard accuracy metrics often fail to capture the competencies that matter for downstream applications, and how more principled evaluation frameworks can guide both research and engineering decisions.

We then turn to real-world application scenarios — from robotics to safety-critical perception — exploring the recurring challenges of deploying multimodal AI under constraints of latency, robustness, and limited supervision. Participants will come away with a clearer picture of where the field stands, where it falls short, and what principled evaluation and design choices can bridge that gap.

NLP Beyond the Standard: Dialects, Variation, and Shared Representations in Multilingual Language Models

Barbara Plank
LMU Munich – Germany

Abstract:Multilingual language models have primarily focused on cross-lingual differences, with intra-language variation only recently gaining more attention. Dialects and non-standard varieties challenge core assumptions about data, representation, and evaluation. In this talk, I discuss what makes dialects particularly challenging for multilingual models, review approaches starting from early encoder-based methods, and give an overview of resources developed for dialectal NLP, with a focus on German dialects. I then turn to recent work on multilingual training dynamics and shared representations, analyzing when linguistic information and shared concept spaces emerge during training and where alignment breaks down. Although dialects are not yet explicitly modeled in this analysis, the findings provide insight into multilingual representation learning during pre-training.

Supported by the FESR:

TUTORIALS

Tutorials

Developing ASR Systems for Conversational Speech Transcription and Analysis

From Vision Language Models to Embodied AI

NLP Beyond the Standard: Dialects, Variation, and Shared Representations in Multilingual Language Models

Title