Tutorials

Fundamentals of Linguistic Linked Open Data

Jorge García
University of Zaragoza – Spain

Abstract: In this session, the fundamentals of linguistic linked data (LLD) will be presented from a practical perspective. It will combine theoretical explanation with hands-on exercises. First, the basic notions behind the Semantic Web in general and linked data (LD) in particular will be introduced (e.g., ontologies, RDF). Then, we will motivate the application of such principles to linguistic data, resulting in the emerging field of LLD. Finally, some application examples will be visited, particularly in the lexicography domain.

Short Bio: He currently works as senior research fellow (“Ramón y Cajal” postdoctoral position) at the Department of Computer Science and Systems Engineering (University of Zaragoza, Spain) as a member of the Aragon Institute of Engineering Research (I3A) and of the Distributed Information Systems research group. His main research interests include multilingual Semantic Web, ontology matching, linguistic linked data, and neuro-symbolic artificial intelligence. He chaired NexusLinguarum, the “European network for Web-centred linguistic data science”, a COST Action that joined the effort of over researchers from 42 countries, and currently acts as Vice-Chair of Goblin, the Global Network on Large-Scale, Cross-domain and Multilingual Open Knowledge Graphs. He has been also involved in another six EU projects related to Semantic Web, multilingualism and language technologies, acting in two of them as Principal Investigator

Advanced Topics of Linguistic Linked Open Data

Max Ionov
University of Cologne – Germany

Abstract: In this session, we will explore the ways linguistic linked data can be stored and queried. Through a series of hands-on exercises, we will get explore SPARQL, the query language for the semantic web. Building on the examples of LLOD in the lexicography domain presented earlier, we will cover some practical considerations on setting up graph databases and SPARQL endpoints, and write SPARQL queries, going from simple to more advanced ones, performing filtering and data transformation. Finally, we will look into federation, a cornerstone of Linked Data, allowing combining knowledge from various resources at once.

Short Bio: Max Ionov is a research assistant at the Cologne Center for eHumanities group and Department for Digital Humanities at the University of Cologne. Currently, he is involved in the project Postil Time Machine and in the project Textdatenbank und Wörterbuch des Klassischen Maya. He is co-leading a task on less-resourced languages within the COST Action NexusLinguarum and takes part in the development of extensions (Morph, FrAC) of the OntoLex-Lemon vocabulary for publishing lexical data as Linguistic Linked Open Data. His research interests are Linked data and Semantic Web, Computational aid of empirical linguistic research, Digital Humanities, Anaphora and Coreference resolution, and Discourse processing.

Beyond Naive RAG: How Entities and Graphs Enhance Retrieval-Augmented Generation

Matteo Palmonari
University of Milano-Bicocca – Italy

Abstract: Retrieval-Augmented Generation (RAG) has become a powerful paradigm for improving the accuracy and reliability of language models and integrating external knowledge into responses to users’ prompt. However, naive RAG implementations often struggle with limitations such as irrelevant retrieval, lack of contextual awareness, and inefficient knowledge utilization. After a brief introduction to entity-aware knowledge representation structures and techniques to extract entity-related knowledge from text, we will explore the role of entities and graph-based structures in enhance RAG systems. We will discuss a broad spectrum of solutions — from lightweight entity-centric enhancements to full-fledged GraphRAG approaches — highlighting trade-offs in complexity, efficiency, and performance. During the presentation we will discuss examples from the literature, from ongoing projects on vertical domains in the Italian language, and from commercial solutions that exploits entity-centric approaches to ground users prompts.

Short Bio: Matteo Palmonari is an Associate Professor in the Department of Informatics, Systems, and Communication at the University of Milan-Bicocca. His research spans data management and artificial intelligence, with a focus on semantic matching, knowledge graph profiling and exploration, natural language processing, and data enrichment. Recently, his interest has concentrated on the integration of symbolic and neural approaches in particular in the context of applications in the legal domain. He has played key roles in numerous innovation and research projects, serving as coordinator, scientific manager, or partner.

From Corpora to Capabilities: Rethinking Language Resources in the LLM Era

Zheng Yuan
University of Sheffield – UK

Abstract: The rise of Large Language Models (LLMs) has revolutionised the development and deployment of language technologies. Yet, language resources remain at the heart of these advancements. This talk reexamines the evolving role of language resources in the LLM era – spanning their application in pretraining, fine-tuning, evaluation, and integration with retrieval-augmented generation (RAG) systems. We will explore how traditional resources such as annotated corpora and lexicons are being reimagined, the growing emphasis on data quality and documentation, and the persistent challenges in supporting multilingual and low-resource languages. Through concrete examples and case studies, the talk will highlight how curated, transparent, and inclusive language resources can drive the development of responsible and capable AI systems.

Short Bio: Zheng Yuan (website) is an Associate Professor in Natural Language Processing at the University of Sheffield and an Affiliated Researcher at the University of Cambridge, where she is also a Fellow in Computer Science at Trinity College. Her research sits at the intersection of machine learning and NLP, with a strong focus on real-world applications in education, creativity, healthcare, social media, and finance. Zheng’s work draws on insights from computer science, linguistics, education, and psychology. Previously, she served as Vice President of Data Science at Chatterbox Labs, Assistant Professor at King’s College London, and Research Associate at the University of Cambridge. She holds a PhD and MPhil from the University of Cambridge and a BSc(Eng) from Queen Mary University of London.

Supported by the Future Artificial Intelligence Research (FAIR) project:

TUTORIALS