Tutorials

Knowledge Probing, Infusing and Reasoning in Natural Language Processing

Zaiqiao Meng
University of Glasgow

Abstract: Pre-trained language models (PLMs) have orchestrated incredible progress on myriads of few- or zero-shot language understanding tasks, by pre-training model parameters in a task-agnostic way and transferring knowledge to specific downstream tasks via fine tuning. Leveraging factual knowledge from knowledge graphs (KGs) to augment PLMs is of paramount importance for knowledge-intensive tasks, such as question answering and fact-checking. Especially in some specific domains (e.g. biomedical domain) where public training corpora are limited and noisy, trusted biomedical KGs are crucial for deriving accurate inferences. Therefore, how to measure the amount of world knowledge that is stored in PLMs (i.e. knowledge probing), how to integrate factual knowledge into the pertained models (knowledge infusing), and how to conduct knowledge reasoning over the learned knowledge and rules (knowledge reasoning) are challenging tasks in the NLP field. In this tutorial, I will review the developments and approaches of knowledge graph utilization in the NLP domain according to the three key tasks and dive into some specific models to better understand these key challenges in these tasks.

Short Bio: Zaiqiao is currently a Lecturer of the University of Glasgow, and is based within IR Group of the IDA Section of the School of Computing Science. He was previously working as a Postdoctoral Researcher at the Language Technology Laboratory of the University of Cambridge, and at the IR Group of the University of Glasgow, respectively. Zaiqiao obtained his Ph.D. in computer science from Sun Yat-sen University in December 2018. His research interests include information retrieval, recommender systems, graph neural networks, knowledge graphs and NLP. He has published more than 50 papers at top-tier ML, NLP and IR conferences and journals.

The Relevance of Deep Learning to Understanding the Cognitive Dimensions of Natural Language

Shalom Lappin
University of Gothenburg – Queen Mary University of London

Mail: shalom.lappin@gu.se

Abstract: In this talk I discuss the way in which recent work in deep learning can illuminate important cognitive properties of human language acquisition and linguistic representation. Specifically, it is possible to use these models to test the extent to which core elements of linguistic knowledge can be acquired by relatively domain general learning devices through different sorts of training and data. These results do not demonstrate that people actually acquire and represent knowledge of language in the way that Deep Neural Networks do. They demonstrate what sort of knowledge can, in principle, be achieved through domain general inductive procedures, in a computationally efficient way. Further psychological and neurolinguistic research is required to determine the extent to which these procedures correspond to those that humans apply. Recent progress in deep learning in NLP strongly motivates exploring alternatives to classical algebraic models for encoding linguistic information.

References: Marco Baroni (2021), “On the proper role of linguistically-oriented deep net analysis in linguistic theorizing”, arXiv, https://arxiv.org/abs/2106.08694.

Felix Hill, A. Lampinen, R. Schneider, S. Clark, M. Botvinick, J.L. McClelland, and A. Santoro, (2020), “Drivers of systematicity and generalization in a situated agent. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26–30, 2020“. https://arxiv.org/pdf/1910.00571.pdf

Shalom Lappin (2021), “Deep Learning and Linguistics Representation“, Taylor and Francis, Boca Raton and Oxford.

Alex Warstadt and Samuel R. Bowman (2020), “Do self-supervised neural networks acquire a bias towards structural linguistic generalizations?”, Proceedings of Cognitive Science, https://cognitivesciencesociety.org/cogsci20/papers/0381/0381.pdf.

Short Bio: Shalom Lappin is Professor of Natural Language Processing, Queen Mary University of London, Professor of Computational Linguistics and Director of the Centre for Linguistic Theory and Studies in Probability (CLASP), University of Gothenburg, and Emeritus Professor of Computational Linguistics, King’s College London

Transcending Dependencies

Martha Palmer
University of Colorado, Boulder

Mail: martha.palmer@colorado.edu

Abstract: This talk will discuss symbolic representations of sentences in context, ranging from universal dependencies to abstract meaning representations (AMR), and examine their capability for capturing certain aspects of meaning. A main focus will be the ways in which AMR’s can be expanded to encompass figurative language, the recovery of implicit arguments and relations between events. These examples will be primarily in English, and indeed some features of AMR are fairly English-centric. The talk will conclude by introducing Uniform Meaning Representations, a multi-sentence annotation scheme that is revising AMR’s to make them more suitable for other languages, especially low resource languages, and expanding the annotation guidelines to include Number, Tense, Aspect and Modality as well as Temporal Relations.

References: Jens E. L. Van Gysel, Meagan Vigus, Jayeol Chun, Kenneth Lai, Sarah Moeller, Jiarui Yao, Tim O’Gorman, Andrew Cowell, William Croft, Chu-Ren Huang, Jan Hajič, James H. Martin, Stephan Oepen, Martha Palmer, James Pustejovsky, Rosa Vallejos, Nianwen Xue. (2021) “Designing a Uniform Meaning Representation for Natural Language Processing, Künstliche Intelligenz“. https://doi.org/10.1007/s13218-021-00722-w. View-only full text

Tim O’Gorman, Michael Regan, Kira Griffitt, Martha Palmer, Ulf Hermjakob and Kevin Knight, 2018, “AMR Beyond the Sentence: the Multi-sentence AMR corpus, in the Proceedings of the International Conference on Computational Linguistics,(COLING 2018), Santa Fe, NM“. https://aclanthology.org/C18-1313.pdf

Tim O’Gorman, Sameer Pradhan, Julia Bonn, Katie Conger, James Gung, Martha Palmer, 2018, The New PropBank: “Aligning PropBank with AMR through POS Uniﬁcation, In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018).” https://aclanthology.org/L18-1231.pdf

Claire Bonial,Bianca Badarau, Kira Griffitt, Ulf Hermjakob, Kevin Knight, Tim O’Gorman, Martha Palmer, Nathan Schneider, 2018, “Abstract Meaning Representation of Constructions: The More We Include, the Better the Representation, In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan, May.” https://aclanthology.org/L18-1266.pdf

Short Bio: Martha Palmer is the Helen & Hubert Croft Professor of Engineering in the Computer Science Department, and Arts & Sciences Professor of Distinction for Linguistics, at the University of Colorado, with over 300 peer-reviewed publications. She is a co- Director of CLEAR, an Association of Computational Linguistics Fellow, an Association for the Advancement of Artificial Intelligence Fellow, and a co-editor of LiLT: Linguistic Issues in Language Technology. She has previously served as co- editor of the Journal of Natural Language Engineering, a member of the editorial board of Computational Linguistics, President of ACL, Chair of SIGLEX, and Founding Chair of SIGHAN.

The Linguistic Linked Open Data Cloud or how to share and integrate your language resources in the Web of Data

Elena Montiel-Ponsoda
Universidad Politécnica de Madrid – UPM

Mail: emontiel@fi.upm.es

Abstract: The Linked Data initiative arose in the early 2000s to encourage the representation, exposure and connection (linking) of data on the Web. The “Web of Data” is to be understood as an extension of the traditional “Web of Documents” in which data can be directly consumed by software agents. Shortly after, the Linguistic Community found value in publishing linguistic data also according to the formats proposed by this initiative that could be used by NLP systems. This has derived in the emergence of models to represent traditional linguistic resources (lexicons, terminologies, thesauri, corpus) as linked data, as well as in the creation of a set of linked linguistic resources in what is known as the Linguistic Linked Open Data cloud (LLOD cloud). This lecture aims to provide an introduction to the principles of Linked Data and the benefits of Linked Data for language resources. It gives an overview of existing models to represent linguistic information as Linked Data in the LLOD cloud, and offers students a hands-on experience on systems that use Linked Data.

Short Bio: Elena Montiel-Ponsoda is an Associate Professor of Applied Linguistics at Universidad Politécnica de Madrid (UPM), in Madrid, Spain, and member of the Ontology Engineering Group at the same University. Her main research interests are in the common ground between Terminology and Ontology Engineering. Her main research has focused on the development of models to enrich ontologies with multilingual information and to expose terminologies and other language resources as linked data. She is currently exploring the use of interlinked multilingual terminologies in semantic-based information retrieval. Recently, she has coordinated the Lynx project, and innovation action funded by the European Union’s Horizon 2020 research and innovation programme under grant agreement No 780602. This project has created an ecosystem of smart cloud services based on a Legal Knowledge Graph, which integrates and links heterogeneous data sources across languages and jurisdictions.

TUTORIALS