Eneko Agirre

Deep Learning for Natural Language Processing

Eneko Agirre
Universidad del País Vasco

Abstract: Deep Learning methods are impacting Artificial Intelligence research and applications, including Natural Language Processing. Classical applications like Machine Translation have experienced significant improvement in performance. In this brief introduction to Deep Learning techniques for Natural Language Processing, the relation to representation learning will be stressed, including word embeddings and sentence embeddings. Architectures like Recurrent Neural Nets and Convolutional Neural Nets will be introduced, with applications to Text Classification and Machine Translation. Apart from the introduction, recent research on unsupervised cross-lingual embeddings and unsupervised machine translation will be highlighted

Francesca Chiusaroli

Media linguistics for computational linguistics – crowdsourcing practices and digital tools for language analysis and testing

Francesca Chiusaroli
Università degli studi di Macerata

Abstract: The aim of the tutorial is to present interdisciplinary research in linguistics and information technology, carried out through an active experience on Twitter. The research arises from the setting up of a metalinguistic label known as “Scritture Brevi”, studying synthetic writings as conditioned by various supports and environments (www.scritturebrevi.it). The homonymous hashtag (#scritturebrevi) since 2012 connects a wide Italian Twitter community engaged in textual practices and the related digital tools. Special crowdsourcing projects come from the Scritture Brevi experience, such as experimental translation in emoji (Pinocchio in Emojitaliano) and sentiment analysis tests (#ITAmoji at EVALITA 2018). Original data sets are therefore made available for linguistic analysis, allowing to observe, under real conditions, the performance of short writings in view of the information theory, appraising the impact of digital writing on the standard language, and facing the compelling challenges of world communication (Emojilingo).

Felice DellOrletta

NLP stylistic analysis for author and textual profiling

Felice dell’Orletta
Istituto di Linguistica Computazionale – CNR

Abstract: Over the last years, Natural Language Processing (NLP) techniques combined with machine learning algorithms started being used to investigate the “form” of a text rather than its content. This is the focus of the NLP-based stylistic analysis aimed at characterizing the linguistic profile of a text by relying on the distribution of linguistically-motivated features automatically extracted from linguistically annotated texts.

For the specific concerns of this presentation, we focus on sets of linguistic features ranging across different linguistic description levels (lexical, morpho-syntactic and syntactic) and it will be shown how they are reliable both to capture the writing style of text and to profile the writer of text. A number of applications will be illustrated showing the contribution of these features for characterizing and modeling aspect of text style, such as the level of syntactic complexity and readability, as well as the textual genre, and for describing the sociolinguistic profile of the text writer, such as for example her/his native language, the gender, and the level of language proficiency.

Agata Savary

Multiword expressions – the Achilles’ heel of natural language processing

Agata Savary
Université François Rabelais Tours

Abstract: Multiword expressions (MWEs) such as a “all of a sudden” (suddenly), “the bottom line” (ultimate result), “to take part” (to participate), “to pull strings” (to use one’s influence),  or to “do in” (to kill) are word combinations which exhibit lexical, syntactic, and especially semantic idiosyncrasies. Notably they most often exhibit non-compositional semantics, i.e. their meaning cannot be deduced from the meanings of their components, and from their syntactic structure, in a way deemed regular for the given language. For these reasons, MWE pose special challenges to linguistic modeling and semantically-oriented natural language processing applications. This lecture is meant as an introduction to MWEs. It addresses linguistic properties of MWEs and their definition criteria. It discusses sample NLP applications and challenges posed in them by MWEs. It further presents the task of multilingual manual annotation of MWEs in running text, illustrated by the PARSEME corpus of verbal multiword expressions. It finally shows how such a corpus can be employed in the task of automatic identification of MWEs, which is a pre-requisite for semantically-oriented downstream applications.