RESOURCES

Computational Linguistics and the COVID-19 Outbreak

This page is maintained by AILC (the Italian Association for Computational Linguistics). It groups some of the initiatives that the Computational Linguistics community is carrying out to contribute to the fight against COVID-19. Everyone is invited to collaborate by reporting new initiatives. Please do so through our contact form.

Datasets

CORD-19 – The Allen Institute COVID-19 Open Research Dataset, a collection of Covid-19 scientific papers, weekly updated (March 2020)
Processed CORD-19 – The Allen Institute corpus processed with Sketch Engine (March 2020)
40wita – A dataset of tweets in Italian collected daily by the University of Turi
Corona Corpus – A corpus of texts from online newspapers and magazines in 20 different English-speaking countries and part of the English-Corpora.org suite of corpora

Tools

COVID-19 Semantic Browser – A semantic search tool on COVID-19 scientific papers developed by Gabriele Sarti and hosted by Area Science Park (April 2020)
COVID19 Infodemics Observatory -A platform to monitor fake news on covid-19, developed at FBK (March 2020)

Shared Tasks and Events

CLEF 2020: CheckThat! Lab Task 1 Tweet Check-Worthiness –The task asks to rank a stream of tweets on a number of topics, including COVID-19, according to their check-worthiness (March 2020)
Kaggle Tasks –Several tasks on COVID-19 (March 2020)
NLP COVID-19 Workshop an emergency workshop at ACL 2020 – Authors are invited to submit papers related to NLP applied to combat the COVID-19 pandemic (July 2020)
TREC-COVID program – Launched by NIST and OSTP, the challenge will follow the TREC assessment process to evaluate search systems, based on the CORD-19 documents

Publications

Björn W. Schuller, Dagmar M. Schuller, Kun Qian, Juan Liu, Huaiyuan Zheng, Xiao Li. COVID-19 and Computer Audition: An Overview on What Speech & Sound Analysis Could Contribute in the SARS-CoV-2 Corona Crisis, Arxive.org.

Affective lexica and other resources for Italian

An affective lexicon is a database of words (or word senses, phrases, or other kinds of lexical items) where each item is classified according to its content in terms of subjectivity, polarity (positive or negative), capability of evoking specific emotions and so on. Such resources are used to build automatic systems that analyze natural language (for example, from websites or social media), and “read” the sentiment expressed in the text. This activity is called Sentiment Analysis (or Opinion Mining) and it is gaining more and more attention from the scientific communities as well as industry, because it can answer questions like “are customers happy with product X?” or “what type of people approve policy Y?”. Italian is a somewhat poorly represented language in the panorama of language resources. This is true for affective lexica too, but thanks to a vibrant community, things are rapidly changing. We conducted a quick survey, asking the members of AILC about affective lexica for Italian. The results of the survey are summarized in the list below. Some of them are lexica, some are other kinds of resources and methods, in the Italian language or otherwise linked to the Italian NLP community.

Sentix
Affective lexicon, automatically build by aligning MultiWordNet, WordNet and SentiWordNet.
Each sense is given scores for positive polarity, negative polarity and intensity.
Available at http://valeriobasile.github.io/twita/downloads.html.
Publication: V. Basile and M. Nissim (WASSA 2013).
Lexicon created semi-automatically for the participation to the EVALITA 2014 shared task SENTIPOLC.
Described in Di Gennaro, Rossi e Tamburini (EVALITA 2014).
Sentiment lexicon developed semi-automatically for the Opener project.
It contains 24.293 lexical entries labeled with positive/neutral/negative polarity.
Available at https://dspace-clarin-it.ilc.cnr.it/repository/xmlui/handle/20.500.11752/ILC-73.
Proprietary sentiment lexicon containing single words, multiword expressions and idiomatic expressions, annotated with polarity, intensity, emotions and domain distributed by CELI under commercial licence.
Described in A. Bolioli, F. Salamino, V. Porzionato (ESSEM 2013).
Polarized word embeddings can be created with the technique described in G. Attardi (IIR 2015) and implemented in DeepNL.
Database of affective norms for Italian developed for the INCREASE project.
Available at https://sites.google.com/view/mariamontefinese/norms-data?authuser=0 (other affective and semantic resources are available on the same Web page).
Described in Montefinese, M., Ambrosini, E., Fairfield, B. et al. Behav Res (2014).
Automatic method to build multilingual opinionated lexicons based on distant supervision.
Used for the participation to the EVALITA 2016 shared task SENTIPOLC.
Dictionaries in English and Italian are available at http://sag.art.uniroma2.it/demo-software/distributional-polarity-lexicon/.
Described in G. Castellucci, D. Croce, R. Basili (2016) and G. Castellucci, D. Croce, R. Basili (2015).
SentiWords
High coverage resource containing roughly 155.000 English words associated with a sentiment score included between -1 and 1.
Available at http://hlt-nlp.fbk.eu/technologies/sentiwords.
Described in Gatti L., Guerini M. & Turchi M. (2015).
SentIta and Doxa
Italian databases and tools for sentiment analysis.
Described in S. Pelosi (CLiC-it 2015), A Maisto and S Pelosi (NOOJ 2014), Elia et al (FSMNLP 2015)

This list is open to updates and additions. If you know of other resources that would fit the list above, please contact AILC and let us know.

Computational Linguistics and the COVID-19 Outbreak

Computational Linguistics and the COVID-19 Outbreak

Datasets

Tools

Shared Tasks and Events

Publications

Affective lexica and other resources for Italian

Affective lexica and other resources for Italian

Title