BLOG – Page 2

New AILC governing bodies elected

During the General Assembly which was held in Milan on June 30th 2022 as part of CLiC-it 2021, the new AILC Association Board was elected, who will lead the Association for the three-year period 2022-2025. The Supervisory Committee laid out by the Articles of Association was also appointed to provide guidance. Good luck to the newly elected Board of Directors and Supervisory Committee! And a heartfelt thanks to the outgoing members!

By root|2022-10-04T15:35:28+02:0022 Sep, 2022|BLOG, NEWS|

Bid for Lectures on Computational Linguistics 2023

L’Associazione Italiana di Linguistica Computazionale (AILC) sollecita candidature per ospitare
l’edizione 2024 delle “Lectures on Computational Linguistics”.

Le Lectures sono una iniziativa annuale di AILC rivolta alla formazione nel campo della
Linguistica Computazionale, e sono il frutto di una stretta collaborazione con l’alta formazione in
Università, in particolare con le Scuole di Dottorato. Informazioni sul format delle Lectures e
sulle edizioni precedenti sono disponibili qui.
Le sedi che si intendono proporre dovranno presentare un documento contenente le seguenti
informazioni:

Gruppo organizzatore locale: indicare le persone coinvolte nell’organizzazione locale
delle Lectures, inclusa una persona proposta per far parte del Comitato Scientifico delle
Lectures per due anni; indicare precedenti esperienze dei local organizers
nell’organizzazione di eventi di formazione.
Caratteristiche della Sede: indicare la posizione della sede, numero di sale disponibili
con la relativa capienza, spazi per sessione poster, presenza di attrezzature audio-video,
possibilità di pranzo in mensa per i partecipanti.
Scuola di dottorato e corsi universitari collegati alla sede: indicare la/le scuole di
dottorato coinvolte nell’organizzazione, gli eventuali corsi di laurea interessati, e il
corrispondente numero di studenti potenzialmente interessati alle Lectures.
Caratterizzazione scientifica della Sede: indicare la caratterizzazione della sede (sede a
orientamento umanistico, orientamento informatico o misto); indicare eventualmente
alcuni temi scientifici che la Sede ospitante intende proporre al Comitato Scientifico delle
Lectures, nel caso la sede fosse selezionata.
Alloggi: indicare la disponibilità di alloggi, in particolare a costi contenuti, per es., se
presenti, studentati e strutture universitarie di accoglienza.
Evento sociale: indicare possibilità e costi per una cena o altro evento sociale.
Trasporti: indicare i collegamenti (aereo, treni) per raggiungere la sede; mezzi di
trasporto urbani con i tempi per raggiungere la sede.
Budget: indicare i costi stimati relativi alle sale, i costi della mensa, e eventuali altri costi
fissi richiesti dalla sede ospitante.
Sponsorizzazioni: indicare eventuali sponsorizzazioni da parte di istituzioni universitarie
(dipartimento, scuola di dottorato).
Date: indicare le date possibili delle Lectures (tre giorni) nel periodo maggio-giugno 2023.

La selezione della sede ospitante verrà effettuata, in seduta congiunta, dal Comitato Scientifico
delle Lectures e dal Consiglio Direttivo AILC.
Le candidature dovranno essere inviate per posta elettronica al Presidente AILC (Simonetta Montemagni – simonetta.montemagni@ilc.cnr.it) e al Coordinatore del Comitato Scientifico delle Lectures (Elisabetta
Jezek – jezek@unipv.it) entro il giorno 15 ottobre 2023.

Contatti: Simonetta Montemagni (simonetta.montemagni@ilc.cnr.it) e Elisabetta Jezek (jezek@unipv.it)

By Simonetta Montemagni|2023-09-26T10:48:25+02:0020 Jul, 2022|BLOG, EDUCATION, EVENTS, NEWS|

Dante or not Dante? That is the question

Dante and artificial intelligence: ever thought about them together? In this workshop you will get the chance to familiarize with Digital Humanities and Computational Linguistics playing around with the results of statistical models for language generation, which will try to imitate the writing style of Dante Alighieri. Will you be able to tell apart the real Dante from the robotic one? Or will you be fooled?

This workshop is organized in collaboration with AIUCD

This workshop was presented here:

Festival della Scienza 2021

By root|2024-12-04T17:42:41+01:0021 Oct, 2021|BLOG, LABORATORY, POP|

But Does a Computer Understand Me? What Computational Linguistics Is and What It Is Used For

Tools based on natural language processing and artificial intelligence, such as recommendation systems on social media, automatic translators, and voice assistants, are now part of our daily lives, both in personal and professional contexts.

These technologies rely on the representation of linguistic knowledge, the research object of a discipline often little known outside its narrow specialist field: Computational Linguistics.

The more pervasive these tools become, the more we take them for granted, without questioning how they were created, how they precisely work, and, above all, what the consequences of their widespread, massive, and largely unconscious use might be. more info here.

Ludovica Pannitto, University of Trento

Malvina Nissim, University of Trento

By root|2024-12-04T17:05:40+01:0021 May, 2021|BLOG, POP, SEMINARS|

Computational Linguistics and the COVID-19 Outbreak
Gallery
Computational Linguistics and the COVID-19 Outbreak

BLOG, HOME, RESOURCES

Computational Linguistics and the COVID-19 Outbreak

This page is maintained by AILC (the Italian Association for Computational Linguistics). It groups some of the initiatives that the Computational Linguistics community is carrying out to contribute to the fight against COVID-19. Everyone is invited to collaborate by reporting new initiatives. Please do so through our contact form.

Datasets

CORD-19 – The Allen Institute COVID-19 Open Research Dataset, a collection of Covid-19 scientific papers, weekly updated (March 2020)
Processed CORD-19 – The Allen Institute corpus processed with Sketch Engine (March 2020)
40wita – A dataset of tweets in Italian collected daily by the University of Turi
Corona Corpus – A corpus of texts from online newspapers and magazines in 20 different English-speaking countries and part of the English-Corpora.org suite of corpora

Tools

COVID-19 Semantic Browser – A semantic search tool on COVID-19 scientific papers developed by Gabriele Sarti and hosted by Area Science Park (April 2020)
COVID19 Infodemics Observatory -A platform to monitor fake news on covid-19, developed at FBK (March 2020)

Shared Tasks and Events

CLEF 2020: CheckThat! Lab Task 1 Tweet Check-Worthiness –The task asks to rank a stream of tweets on a number of topics, including COVID-19, according to their check-worthiness (March 2020)
Kaggle Tasks –Several tasks on COVID-19 (March 2020)
NLP COVID-19 Workshop an emergency workshop at ACL 2020 – Authors are invited to submit papers related to NLP applied to combat the COVID-19 pandemic (July 2020)
TREC-COVID program – Launched by NIST and OSTP, the challenge will follow the TREC assessment process to evaluate search systems, based on the CORD-19 documents

Publications

Björn W. Schuller, Dagmar M. Schuller, Kun Qian, Juan Liu, Huaiyuan Zheng, Xiao Li. COVID-19 and Computer Audition: An Overview on What Speech & Sound Analysis Could Contribute in the SARS-CoV-2 Corona Crisis, Arxive.org.

By Manuela Speranza|2020-05-18T12:47:59+02:002 Apr, 2020|BLOG, HOME, RESOURCES|

COVID-19 Browser: Using Natural Language Processing to Fight the Pandemic

Our society is facing an unprecedented crisis due to the recent COVID-19 outbreak that is putting sanitary systems in check all around the world. Recently, dozens of countries announced the shutdown of all non-essential activities for the next foreseeable future, and scientists are striving worldwide to find cures and vaccines able to stop the ongoing pandemic.

In these hard times, everyone should put their expertise at play to help in the fight against the virus. For Gabriele Sarti, a Data Science student at the University of Trieste and a young member of the Italian Association for Computational Linguistics (AILC), this meant exploiting his expertise in Natural Language Processing (NLP) to develop the COVID-19 Browser, a system leveraging state-of-the-art techniques in NLP to extract meaningful information and guide scientists towards a better understanding of COVID-19.

As of today, more than 32 000 scientific papers have been published by research laboratories worldwide on the topics of the new corona virus SARS-CoV-2 and the disease COVID-19. It is very likely that in such a large quantity of text a lot of useful information is lost, making our knowledge on the subject too sparse to be exploited to its full potential. COVID-19 Browser allows users to browse a large collection of those articles directly in their console, matching article’s abstracts with user queries formulated in natural language to delve deeper in our current knowledge of the subject.

The model underlying Covid-19 Browser is SciBERT-NLI, a cutting-edge language model trained by the American nonprofit AI2 on a corpus of 1.14M scientific papers and subsequently adjusted by Gabriele to be used for the retrieval task.

Gabriele Sarti is a student in the Data Science master at the University of Trieste (https://dssc.units.it/), and is affiliated with SISSA (https://www.sissa.it), and the CNR ItaliaNLP Lab in Pisa (http://www.italianlp.it). He is a member of the Italian Association for Computational Linguistics (https://www.ai-lc.it/en/) and plays an active role in its Dissemination Team.

Links

The code for the project is open-source and available here: https://github.com/gsarti/covid-papers-browser
A brief description of the model used is available here: https://huggingface.co/gsarti/scibert-nli
The paper collection used for the project is available here: https://pages.semanticscholar.org/coronavirus-research

By m.nissim|2020-04-06T10:33:03+02:0024 Mar, 2020|BLOG, RESEARCH|

Affective lexica and other resources for Italian
Gallery
Affective lexica and other resources for Italian

BLOG, RESOURCES

Affective lexica and other resources for Italian

An affective lexicon is a database of words (or word senses, phrases, or other kinds of lexical items) where each item is classified according to its content in terms of subjectivity, polarity (positive or negative), capability of evoking specific emotions and so on. Such resources are used to build automatic systems that analyze natural language (for example, from websites or social media), and “read” the sentiment expressed in the text. This activity is called Sentiment Analysis (or Opinion Mining) and it is gaining more and more attention from the scientific communities as well as industry, because it can answer questions like “are customers happy with product X?” or “what type of people approve policy Y?”. Italian is a somewhat poorly represented language in the panorama of language resources. This is true for affective lexica too, but thanks to a vibrant community, things are rapidly changing. We conducted a quick survey, asking the members of AILC about affective lexica for Italian. The results of the survey are summarized in the list below. Some of them are lexica, some are other kinds of resources and methods, in the Italian language or otherwise linked to the Italian NLP community.

Sentix
Affective lexicon, automatically build by aligning MultiWordNet, WordNet and SentiWordNet.
Each sense is given scores for positive polarity, negative polarity and intensity.
Available at http://valeriobasile.github.io/twita/downloads.html.
Publication: V. Basile and M. Nissim (WASSA 2013).
Lexicon created semi-automatically for the participation to the EVALITA 2014 shared task SENTIPOLC.
Described in Di Gennaro, Rossi e Tamburini (EVALITA 2014).
Sentiment lexicon developed semi-automatically for the Opener project.
It contains 24.293 lexical entries labeled with positive/neutral/negative polarity.
Available at https://dspace-clarin-it.ilc.cnr.it/repository/xmlui/handle/20.500.11752/ILC-73.
Proprietary sentiment lexicon containing single words, multiword expressions and idiomatic expressions, annotated with polarity, intensity, emotions and domain distributed by CELI under commercial licence.
Described in A. Bolioli, F. Salamino, V. Porzionato (ESSEM 2013).
Polarized word embeddings can be created with the technique described in G. Attardi (IIR 2015) and implemented in DeepNL.
Database of affective norms for Italian developed for the INCREASE project.
Available at https://sites.google.com/view/mariamontefinese/norms-data?authuser=0 (other affective and semantic resources are available on the same Web page).
Described in Montefinese, M., Ambrosini, E., Fairfield, B. et al. Behav Res (2014).
Automatic method to build multilingual opinionated lexicons based on distant supervision.
Used for the participation to the EVALITA 2016 shared task SENTIPOLC.
Dictionaries in English and Italian are available at http://sag.art.uniroma2.it/demo-software/distributional-polarity-lexicon/.
Described in G. Castellucci, D. Croce, R. Basili (2016) and G. Castellucci, D. Croce, R. Basili (2015).
SentiWords
High coverage resource containing roughly 155.000 English words associated with a sentiment score included between -1 and 1.
Available at http://hlt-nlp.fbk.eu/technologies/sentiwords.
Described in Gatti L., Guerini M. & Turchi M. (2015).
SentIta and Doxa
Italian databases and tools for sentiment analysis.
Described in S. Pelosi (CLiC-it 2015), A Maisto and S Pelosi (NOOJ 2014), Elia et al (FSMNLP 2015)

This list is open to updates and additions. If you know of other resources that would fit the list above, please contact AILC and let us know.

By Valerio Basile|2017-10-04T16:45:23+02:002 Oct, 2017|BLOG, RESOURCES|

The usefulness of research for companies
Gallery
The usefulness of research for companies

BLOG, INDUSTRY

The usefulness of research for companies

Innovation and research in Italian companies of computational linguistics.

At the beginning of the 90s, when the young people of my generation were studying Computational Linguistics (or Natural Language Processing) University, the Center for the Study of Language and Information of the Stanford University was one of the most coveted and dreamed places. Many of us were in love with the Head-Driven Phrase Structure Grammar (HPSG), invented by Carl Pollard and Ivan A. Sag in California. It sounded like HPSG could be the definitive word on formal grammars of natural languages, because they joined some language universal principles (inspired by Noam Chomsky Linguistics) with a powerful computational framework. The approach, however, had two problems: it was difficult to create and manage all the rules quite complex; parsing was not as fast as we would have liked. We devoted ourselves to research but could not make effective commercial services based on this or other computational linguistic framework.

Since then some years have passed. In October 2016 I read an interview with Andrew Ng at the issue by the Chinese company Baidu a chatbot to make medical diagnoses: “As Melody has blackberries conversations, it will Also learn and keep getting better. This is just the start of a much larger, AI-driven transformation of the healthcare industry. “In 1990, Andrew Ng was 14 years old. After a couple of degrees and doctorates, in 2002 he began working at Stanford University. In 2011 he founded the Google Brain project at Google. Also in 2011 he gave a course Machine Learning online to Stanford University, which was followed by about 100,000 students around the world. In 2012 he founded Coursera. In 2014 Ng works in Baidu as chief scientist, and so far has remained to work in that company. This exceptional man is a brilliant example of how the world of research, training and production business will nourish each other with continuous exchanges.

The world of Computational Linguistics and Artificial Intelligence in general are experiencing a period of incredible acceleration

with fast passages between the research and the application of research results into practical services and vice versa, when the issues raised by real cases become a subject of study.
This lively exchange takes place even in Italian companies doing computational linguistics. As well as researchers in this field have always been at the forefront globally, even the Italian companies doing computational linguistics have relied on an international level. For example Expert System, a public limited company based in Modena, Naples, Rovereto, has landed a number of years in the United States and grew up in Europe. CELI as an SME, with offices in Turin, Milan, provides Natural Language Processing technologies and consulting to international companies, from Korea to California. Euregio, based in Bolzano, uses NLP to provide media intelligence services. Interactive Media SpA, with offices in Rome, Trento and in Brazil, specializes in speech solutions. The startup Puglia QuestionCube is focused on the question answering and use the main machine learning tool.
Even Almawave, the Almaviva Group, for some years integrates NLP technologies. Other smaller and larger companies are integrating these technologies to provide their services, using machine learning technologies combined with standard NLP technologies.

What services they offer to customers? The main service is the “Natural Language Understanding”, that is, the automatic analysis and understanding of written texts and speech.

The understanding is obviously partial compared to human understanding, but is much faster, and this allows you to do more things that otherwise would not be feasible, or oversimplify complex activities.
In the next post of this blog we will be described in more detail the issues and the problems of Computational Linguistics addressed in universities and companies.
One of the purposes dell’AILC is to facilitate exchanges between universities, research centers and companies in this sector. In this blog so you can tell some of the findings, the results obtained, the ongoing projects, and problems encountered in the various areas of this discipline.

CELI, Expert System, Euregio and QuestionCube are already members of the Italian Association for Computational Linguistics. We hope that in the coming months other companies will join to contribute to the Italian ecosystem creation of Computational Linguistics and Artificial Intelligence.

By |2017-04-04T16:26:40+02:0012 Dec, 2016|BLOG, INDUSTRY|

This workshop was presented here:

Datasets

Tools

Shared Tasks and Events

Publications

Links

Innovation and research in Italian companies of computational linguistics.

Title