Abstract: You have all worked closely with word vectors and witnessed first hand how they can encode meaning and aid tasks across the NLP spectrum. Your favourite algorithm provides you with these high dimensional vectors. What kind of space do they live on?
The manifold hypothesis suggests that word vectors live on a submanifold within their ambient vector space. We argue that we should, more accurately, expect them to live on a “pinched manifold”: a space obtained from a manifold by gluing together certain points. The gluing points correspond to polysemous words, i.e. words with multiple meanings.
Our point of view suggests that monosemous and polysemous words can be distinguished based on the topology of their neighbourhoods. We present two kinds of empirical evidence to support this point of view:
(1) We introduce a measure of polysemy, based on tools from topological data analysis, that correlates well with the actual number of meanings of a word.
(2) We propose a simple, topologically motivated solution to the SemEval-2010 task on Word Sense Induction & Disambiguation that produces
Short Bio: Milica Gašić is a Professor of Dialogue Systems and Machine Learning at Heinrich Heine University Düsseldorf. Prior to her current position she was a Lecturer in Spoken Dialog Systems at the Department of Engineering, University of Cambridge where she was leading the Dialogue Systems Group. She completed her PhD under the supervision of Professor Steve Young and the topic of her thesis was Statistical Dialogue Modelling. She holds an MPhil degree in Computer Speech, Text and Internet Technology from the University of Cambridge and a Diploma in Mathematics and Computer Science from the University of Belgrade. She is a member of ACL, a member of ELLIS and a senior member of IEEE. She is a recipient of a European Research Council Starting Grant and an Alexander von Humboldt Sofja Kovalevskaja Award.