Abstract: A key challenge in cross-lingual NLP is developing general language-independent architectures that are equally applicable to any language. However, this ambition is hindered by the large variation in structural and semantic properties of the world’s languages. As a consequence, existing language technology is still largely limited to a handful of resource-rich languages. In this tutorial, we introduce and discuss a range of techniques that aim to deal with such cross-language variations to build robust multilingual and cross-lingual NLP models that work across typologically diverse languages, with a long-term dream to enable language technology also in low-resource languages. We provide an extensive overview of typologically informed and cross-lingual NLP transfer methods, focusing on: 1) the characterization of linguistic typology and the impact of semantic and syntactic variation on the performance of cross-lingual transfer and multilingual models; 2) techniques that integrate available discrete typological knowledge into neural NLP architectures to guide multilingual learning; 3) methods that attempt to implicitly capture the cross-lingual variation directly from data and leverage it for guiding cross-lingual and multilingual NLP models, and 4) recent efforts in neural representation learning that aim to construct widely portable cross-lingual representations and transfer methods with minimum cross-lingual supervision in zero-shot and few-shot learning setups, including adapter-based approaches, hypernetworks, target-specific tuning, and other potential solutions
Short Bio: Ivan Vulić is a Senior Research Associate in the Language Technology Lab, University of Cambridge and a Senior Scientist at PolyAI. He holds a PhD in Computer Science from KU Leuven awarded summa cum laude. His core expertise is in representation learning, cross-lingual learning, human language understanding, distributional, lexical, multi-modal, and knowledge-enhanced semantics in monolingual and multilingual contexts, transfer learning for enabling cross-lingual NLP applications such as conversational AI in low-resource languages, and machine learning for (cross-lingual) NLP. He has published more than 100 papers at top-tier NLP and IR conferences and journals. He co-lectured a tutorial on word vector space specialization at EACL 2017, ESSLLI 2018, EMNLP 2019, and tutorials on cross-lingual representation learning and cross-lingual NLP at EMNLP 2017 and ACL 2019. He also co-lectured tutorials on conversational AI at NAACL 2018 and EMNLP 2019. He co-authored a book on cross-lingual word representations for the Morgan & Claypool Handbook series, published in June 2019, and has started writing a book on NLP methods for low-resource languages. He serves as an area chair and regularly reviews for all major NLP and Machine Learning conferences and journals. Ivan has given invited talks at academia and industry such as Apple Inc., University of Cambridge, UCL, University of Copenhagen, Paris-Saclay, Bar-Ilan University, Technion IIT, University of Helsinki, UPenn, KU Leuven, University of Stuttgart, TU Darmstadt, London REWORK summit, University of Edinburgh, etc. He co-organised a number of NLP workshops, served as the publication chair for ACL 2019, and currently serves as the tutorial chair for EMNLP 2021 and the program chair for *SEM 2021.