Towards Web 3.0, an interview with Roberto Navigli and Daniele Vannella (9th January 2014)

(This post is part of a serie on GDG Rome DevFest 2014)

150109-roberto-daniele.jpg

(Roberto Navigli and Daniele Vannella)

Agatino Grillo: Hi Roberto, hi Daniele. Could you introduce yourselves?

Roberto Navigli: I am a professor in the Department of Computer Science at the Sapienza University of Rome. Since I was a child, I have always been very interested in the complexity of language and this is the reason why, as a computer science student, I was quickly fascinated by research in the field of Natural Language Processing and decided to start a Ph.D. on the topic of word sense disambiguation. In 2010 I was the first Italian winner of a prestigious ERC Starting Grant in Computer Science and Informatics (I was only 32 years old with an amazing 1.3 million-euro contract!). Now I manage a group of 10 Ph.D. students doing research in many areas of Natural Language Processing, including Word Sense Disambiguation, Knowledge Acquisition, Ontology Learning, Semantic Information Retrieval, the Semantic Web and its applications. You can find more information on me at http://wwwusers.di.uniroma1.it/~navigli

Daniele Vannella: I am a Ph.D. student at Department of Computer Science at “La Sapienza” University of Rome under the supervision of prof. Navigli. I have a B.Sc. degree and an MSc in Computer Science both from “La Sapienza”. My research interests are in the areas of Word Sense Induction and Lexical Substitution . My curriculum is available at
http://www.di.uniroma1.it/sites/default/files/cv/curriculum-Vannella.pdf

Agatino Grillo: You were a speakers at the Google Developer Group (GDG) Fest on 8th November in Rome in a code-lab titled “Towards Web 3.0 with BabelNet e Babelfy”. What is the Web 3.0?

Roberto & Daniele: The Web 3.0 is sometimes used as a synonym for “Semantic Web” which, using the definition of Tim Berners-Lee, is a “common framework” to allows data to be shared and reused across application, enterprise, and community boundaries. More simply, we can say, using the Wikipedia’s definition that the Semantic Web aims at converting the current Web, dominated by unstructured and semi-structured documents, into a “web of data”, where data are interoperable and semantically connected.

150109-babelnet.png

(source: Wikipedia, click to enlarge)

Agatino Grillo: What is BabelNet?

Roberto & Daniele: A key recent development in the Semantic Web area is the Linguistic Linked Open Data cloud. However, this cloud does not contain many rich resources and, with the exception of DBpedia, it is mostly monolingual. To address this and many other issues in semantics, we have introduced BabelNet, a very large multilingual semantic network that was created by automatically integrating existing knowledge resources, including machine-readable dictionaries such as WordNet, OmegaWiki and Wiktionary, and encyclopedic knowledge from Wikipedia and Wikidata.

Agatino Grillo: Could you explain BabelNet in more detail?

Roberto & Daniele: BabelNet is a sort of multilingual encyclopedic dictionary which connects concepts and named entities in a very large network of semantic relations, made up of more than 13 million entries, called Babel synsets. Each Babel synset represents a given meaning and contains all the synonyms which express that meaning in a range of different languages.
BabelNet provides, for example, lexical knowledge about the concept apple as a fruit, with its part of speech, its definitions and its set of synonyms in multiple languages, as well as encyclopedic knowledge about, among other entities, the Apple Inc. company, along with definitions in multiple languages, connections to other concepts and entities, etc.
Thanks to the semantic relations it is furthermore possible to learn that apple is an edible fruit (or a fruit comestible, a frutta, an essbare Früchte) and that Apple Inc. is related to Mac and Mountain View California. While 6 languages were covered in version 1.0, BabelNet 3.0 makes giant strides in this respect and covers the amazing number of 271 languages!

150109-babelnet2.jpg

Agatino Grillo: Why a “multilingual” approach?

Roberto & Daniele: The tremendous growth in the amount of multilingual text on the Web has significantly increased the need for multilingual resources in many research areas. Multilingual lexical knowledge is indispensable for implementing the next step towards the multilingual Semantic Web, i.e. a Web in which multilinguality is not a barrier, but an opportunity for sharing and spreading information across cultures and languages. As a result BabelNet provides a unified multilingual repository of knowledge for solving issues in many areas such as computer-assisted translation, localization, multilingual semantic processing of text, cross-lingual information retrieval, etc.

Agatino Grillo: And Babelfy?

Roberto: Having developed the largest multilingual knowledge repository, the first natural step was to use it to address the language ambiguity issue. With Andrea Moro, another Ph.D. student in my research group, we therefore conceived and developed Babelfy, a unified approach to word sense disambiguation and entity linking in arbitrary languages, with performance on both tasks on a par with, or surpassing, those of task-specific state-of-the-art supervised systems.

Agatino Grillo: Recently you announced BabelNet 3.0, covering 271 languages. What is new?

Roberto: BabelNet 3.0 is the result of the automatic integration of six different resources:

  • WordNet 3.0, a popular computational lexicon of English,
  • The Open Multilingual WordNet, a collection of wordnets available in different languages,
  • Wikipedia, the largest collaborative multilingual Web encyclopedia,
  • OmegaWiki, a large collaborative multilingual dictionary,
  • Wiktionary, a collaborative project to produce a free-content multilingual dictionary,
  • Wikidata, a free knowledge base that can be read and edited by humans and machines alike.

Additionally, it contains translations obtained from sense-annotated sentences. BabelNet is fully integrated with our Babelfy multilingual disambiguation and entity linking system as well as the Wikipedia Bitaxonomy, a state-of-the-art taxonomy of Wikipedia pages aligned to a taxonomy of Wikipedia categories. Don't forget to join our facebook group at: https://www.facebook.com/groups/babelnet/

Agatino Grillo: Thanks Roberto, thanks Daniele.

Roberto & Daniele: Thanks to you!

Links

  • R. Navigli and S. Ponzetto. BabelNet: The Automatic Construction, Evaluation and Application of a Wide-Coverage Multilingual Semantic Network. Artificial Intelligence, 193, Elsevier, 2012, pp. 217-250. http://babelnet.org
  • A. Moro, A. Raganato, R. Navigli. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the Association for Computational Linguistics (TACL), 2, pp. 231-244, 2014. http://babelfy.org

Videos

  • Daniele Vannella - Linguistic Computing Laboratory (LCL) @ Università la Sapienza di Roma, BabelNet 2.0: un dizionario enciclopedico multilingue in formato elettronico  (video, 30.9) 20th November 2013
  • Roberto Navigli (University of Rome): Babelfying the Multilingual Web. (video)  23rd June 2014

Contacts

Roberto Navigli

Daniele Vannella

●    https://it-it.facebook.com/daniele.vannella
●    https://sites.google.com/a/di.uniroma1.it/danielevannella/
●    https://www.linkedin.com/pub/daniele-vannella/99/842/854

Connected posts