Labs

Marco Cremaschi

Lab. 1 (part 1) – Retrieval Augmented Generation, Large Language Models and Knowledge Bases

Marco Cremaschi
University of Milano-Bicocca – Italy

Abstract: The hands-on lab will demonstrate how to implement a Retrieval-Augmented Generation (RAG) system using open-source LLMs. Participants will learn how to customize and run local models with Ollama, generate context-aware embeddings via LangChain, and store relevant knowledge in vector databases. We will showcase strategies to efficiently retrieve and integrate external information, emphasizing the importance of entity-aware structures to enhance accuracy and contextual grounding. Through live coding and interactive exercises, attendees will gain practical insights into harnessing entity-focused techniques, bridging the gap between theoretical concepts of knowledge graph construction and real-world RAG implementations for various linguistic and semantic use cases.

Short Bio: Marco Cremaschi is a research assistant at the University of Milan-Bicocca, Department of Informatics, Systems, and Communication (DISCo). He teaches courses at several academic institutions, including the University of Bergamo and Università Cattolica (Master's degree in Linguistic Computing). His research expertise spans knowledge graph construction, data enrichment, and—more recently—the application of artificial intelligence techniques for developing clinical decision support systems and predictive models in mental health. Over the years, he has significantly contributed to the Semantic Web and AI communities by publishing scientific papers in reputable conferences and journals, and by serving on the program committees of various top-tier international conferences, where he has taken on roles such as chair and reviewer. He is the founder of a university spin-off dedicated to harnessing artificial intelligence in the health and finance sectors.

Blerina Spahiu

Lab. 1 (part 2) – Evaluating Large Language Models for Linguistic Linked Data Generation

Blerina Spahiu
University di Milano-Bicocca – Italy

Abstract: Large Language Models (LLMs) are transforming the way we interact with language technologies, offering new possibilities for linguistic data processing and representation. This hands-on lab explores the potential of LLMs for generating Linguistic Knowledge Graphs, with a focus on the OntoLex-Lemon model. Through practical exercises, we will examine how LLMs perform in formalizing lexical data across different languages, evaluating their output using a multidimensional framework that considers lexical, morphological, and semantic factors. The session will highlight both the opportunities and challenges of integrating LLMs into the Semantic Web context, articularly for linguistic resource creation and enrichment.

Short Bio: Blerina Spahiu is an Assistant Professor at the University of Milano-Bicocca, within the Department of Informatics, Systems, and Communication (DISCo) in Italy. Her research expertise includes knowledge graph profiling, data quality evaluation, and data enrichment, with a recent focus on leveraging LLM-based multi-agent architectures for collaborative knowledge sharing and validation. She has been involved in several European and national research projects and currently serves as the group leader for Knowledge Graph Management within the GOBLIN COST Action. She is an active contributor to the academic community, serving as chair and program committee member for numerous top-tier international conferences, and collaborates closely with researchers across both academia and industry.

Eleonora Litta
Federica Iurescia

Lab. 2 – The LiLa Knowledge Base: Hands-on Session

Eleonora Litta and Federica Iurescia
Università Cattolica del Sacro Cuore – Italy

Abstract: A two-hour session dedicated to discovering the potential of LiLa: Linking Latin, a Linked Data Knowledge Base of interlinked language resources for Latin, based upon a language-independent architecture. In the first part of the session, we will introduce the Knowledge Base, exploring its architecture, core components, data modeling, and the range of services it offers, understanding how the Knowledge Base organises, interlinks, and makes accessible vast Latin resources, enhancing research possibilities in Classical studies and Computational Linguistics. The second half of the workshop will feature a hands-on session where participants will dive into real data. You will learn how to interact with the Knowledge Base using SPARQL queries, supported by LLMs that assist in crafting and refining your search strategies.

Short bio: Federica Iurescia is a Latinist specializing in linguistic annotation and computational methods for language analysis. She has contributed to the development of the LiLa: Linking Latin project, focussing on lexical and textual resources. She currently serves as scientific collaborator at the CIRCSE research centre.
Eleonora Litta is a linguist specializing in linguistic annotation, corpus linguistics, and computational methods for language analysis. She has contributed to the development of the LiLa: Linking Latin project, especially focussing on morphological and semantic data. Currently, she serves as the scientific coordinator for the LiIta project, which focuses on interlinking linguistic resources for Italian via Linked Data.

Supported by the Future Artificial Intelligence Research (FAIR) project:

Logos FAIR