Semantic Web

Towards Web 3.0, an interview with Roberto Navigli and Daniele Vannella (9th January 2014)

(This post is part of a serie on GDG Rome DevFest 2014)

150109-roberto-daniele.jpg

(Roberto Navigli and Daniele Vannella)

Agatino Grillo: Hi Roberto, hi Daniele. Could you introduce yourselves?

Roberto Navigli: I am a professor in the Department of Computer Science at the Sapienza University of Rome. Since I was a child, I have always been very interested in the complexity of language and this is the reason why, as a computer science student, I was quickly fascinated by research in the field of Natural Language Processing and decided to start a Ph.D. on the topic of word sense disambiguation. In 2010 I was the first Italian winner of a prestigious ERC Starting Grant in Computer Science and Informatics (I was only 32 years old with an amazing 1.3 million-euro contract!). Now I manage a group of 10 Ph.D. students doing research in many areas of Natural Language Processing, including Word Sense Disambiguation, Knowledge Acquisition, Ontology Learning, Semantic Information Retrieval, the Semantic Web and its applications. You can find more information on me at http://wwwusers.di.uniroma1.it/~navigli

Daniele Vannella: I am a Ph.D. student at Department of Computer Science at “La Sapienza” University of Rome under the supervision of prof. Navigli. I have a B.Sc. degree and an MSc in Computer Science both from “La Sapienza”. My research interests are in the areas of Word Sense Induction and Lexical Substitution . My curriculum is available at
http://www.di.uniroma1.it/sites/default/files/cv/curriculum-Vannella.pdf

Agatino Grillo: You were a speakers at the Google Developer Group (GDG) Fest on 8th November in Rome in a code-lab titled “Towards Web 3.0 with BabelNet e Babelfy”. What is the Web 3.0?

Roberto & Daniele: The Web 3.0 is sometimes used as a synonym for “Semantic Web” which, using the definition of Tim Berners-Lee, is a “common framework” to allows data to be shared and reused across application, enterprise, and community boundaries. More simply, we can say, using the Wikipedia’s definition that the Semantic Web aims at converting the current Web, dominated by unstructured and semi-structured documents, into a “web of data”, where data are interoperable and semantically connected.

150109-babelnet.png

(source: Wikipedia, click to enlarge)

Agatino Grillo: What is BabelNet?

Roberto & Daniele: A key recent development in the Semantic Web area is the Linguistic Linked Open Data cloud. However, this cloud does not contain many rich resources and, with the exception of DBpedia, it is mostly monolingual. To address this and many other issues in semantics, we have introduced BabelNet, a very large multilingual semantic network that was created by automatically integrating existing knowledge resources, including machine-readable dictionaries such as WordNet, OmegaWiki and Wiktionary, and encyclopedic knowledge from Wikipedia and Wikidata.

Agatino Grillo: Could you explain BabelNet in more detail?

Roberto & Daniele: BabelNet is a sort of multilingual encyclopedic dictionary which connects concepts and named entities in a very large network of semantic relations, made up of more than 13 million entries, called Babel synsets. Each Babel synset represents a given meaning and contains all the synonyms which express that meaning in a range of different languages.
BabelNet provides, for example, lexical knowledge about the concept apple as a fruit, with its part of speech, its definitions and its set of synonyms in multiple languages, as well as encyclopedic knowledge about, among other entities, the Apple Inc. company, along with definitions in multiple languages, connections to other concepts and entities, etc.
Thanks to the semantic relations it is furthermore possible to learn that apple is an edible fruit (or a fruit comestible, a frutta, an essbare Früchte) and that Apple Inc. is related to Mac and Mountain View California. While 6 languages were covered in version 1.0, BabelNet 3.0 makes giant strides in this respect and covers the amazing number of 271 languages!

150109-babelnet2.jpg

Agatino Grillo: Why a “multilingual” approach?

Roberto & Daniele: The tremendous growth in the amount of multilingual text on the Web has significantly increased the need for multilingual resources in many research areas. Multilingual lexical knowledge is indispensable for implementing the next step towards the multilingual Semantic Web, i.e. a Web in which multilinguality is not a barrier, but an opportunity for sharing and spreading information across cultures and languages. As a result BabelNet provides a unified multilingual repository of knowledge for solving issues in many areas such as computer-assisted translation, localization, multilingual semantic processing of text, cross-lingual information retrieval, etc.

Agatino Grillo: And Babelfy?

Roberto: Having developed the largest multilingual knowledge repository, the first natural step was to use it to address the language ambiguity issue. With Andrea Moro, another Ph.D. student in my research group, we therefore conceived and developed Babelfy, a unified approach to word sense disambiguation and entity linking in arbitrary languages, with performance on both tasks on a par with, or surpassing, those of task-specific state-of-the-art supervised systems.

Agatino Grillo: Recently you announced BabelNet 3.0, covering 271 languages. What is new?

Roberto: BabelNet 3.0 is the result of the automatic integration of six different resources:

  • WordNet 3.0, a popular computational lexicon of English,
  • The Open Multilingual WordNet, a collection of wordnets available in different languages,
  • Wikipedia, the largest collaborative multilingual Web encyclopedia,
  • OmegaWiki, a large collaborative multilingual dictionary,
  • Wiktionary, a collaborative project to produce a free-content multilingual dictionary,
  • Wikidata, a free knowledge base that can be read and edited by humans and machines alike.

Additionally, it contains translations obtained from sense-annotated sentences. BabelNet is fully integrated with our Babelfy multilingual disambiguation and entity linking system as well as the Wikipedia Bitaxonomy, a state-of-the-art taxonomy of Wikipedia pages aligned to a taxonomy of Wikipedia categories. Don't forget to join our facebook group at: https://www.facebook.com/groups/babelnet/

Agatino Grillo: Thanks Roberto, thanks Daniele.

Roberto & Daniele: Thanks to you!

Links

  • R. Navigli and S. Ponzetto. BabelNet: The Automatic Construction, Evaluation and Application of a Wide-Coverage Multilingual Semantic Network. Artificial Intelligence, 193, Elsevier, 2012, pp. 217-250. http://babelnet.org
  • A. Moro, A. Raganato, R. Navigli. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the Association for Computational Linguistics (TACL), 2, pp. 231-244, 2014. http://babelfy.org

Videos

  • Daniele Vannella - Linguistic Computing Laboratory (LCL) @ Università la Sapienza di Roma, BabelNet 2.0: un dizionario enciclopedico multilingue in formato elettronico  (video, 30.9) 20th November 2013
  • Roberto Navigli (University of Rome): Babelfying the Multilingual Web. (video)  23rd June 2014

Contacts

Roberto Navigli

Daniele Vannella

●    https://it-it.facebook.com/daniele.vannella
●    https://sites.google.com/a/di.uniroma1.it/danielevannella/
●    https://www.linkedin.com/pub/daniele-vannella/99/842/854

Connected posts

“Let’s expose Rome” with a Cloud Cult Platform: a conversation with Camelia Boban and Simone Pulcini about semantic web (15th December 2014)

(This post is part of a serie on GDG Rome DevFest 2014)

141215-camelia-simone-2.jpg

Agatino Grillo: Hi Camelia, hi Simone. Could you introduce yourselves?

Camelia Boban: My name is Camelia. I come from Romania where I graduated in Economics from Craiova University. I have been living in Italy since 1992. I am a freelance software developer, member of Google Developer Group L-Ab Lazio Abruzzo, contributor of Wikipedia, the collaborative and online free encyclopedia, affiliate with Wikimedia, the movement behind Wikipedia and promoter of “Wiki Loves Monuments” the international photo contest, organised by Wikipedia.

Simone Pulcini: Hi! My name is Simone. I’m engaged in software development for more than a decade. I specialized myself in enterprise architectures and modelling. I covered several business tasks for private firms as well as for public administration. I’m a certified UML developer (OCUP) and I'm obtaining the Java Enterprise Architect (OCMJEA 6) certification. I’m the Google Developers Group Rome chapter co-organizer together with Antonella Blasetti. I have a Master’s Degree in Computer Science from “La Sapienza” Rome University.

locandina2-devfest-2.png

Agatino Grillo: You were speakers to the Google Developer Group (GDG) Fest on 8th November in Rome in the code-lab dedicated to semantic web using Google technologies. What about it?

Camelia & Simone: The code-lab was titled «Roma non è mai stata così “Esposta”»: the title is a pun, a joke, that you can translate in English like “Let’s expose Rome” in the sense of permitting to explore Rome’s monuments and to run a risk. You can find the slides here http://www.slideshare.net/cameliaboban/cloud-cult-platform-roma-non-mai-stata-cos-esposta or here in pdf format or pptx format

Agatino Grillo: Why a code lab about semantic web?

Camelia & Simone: Nowadays there are a lot of semantic data available in the Web. Their potential is enormous but often it is very difficult to explore them. Using semantic Web technologies help users to easily explore large amounts of data and interact with them.

4-slide-cloud-cult-platform-roma-mai-stata-cosi-esposta.jpg

Agatino Grillo: Could you explain in a nutshell what is “semantic web”?

Camelia & Simone: Using Wikipedia’s definition, the Semantic Web aims at converting the current web, dominated by unstructured and semi-structured documents into a “web of data”.
Semantic Web technology lets you push the web from a web of documents to a web of data using open and free “Linked Data” technologies like RFD, SPARQL, DBPedia which permit to access to information without creating a lot of custom code. In our codelab we also use Google’s solutions like “Google Cloud Endpoint” and “Google App Engine” to realize our application.

Agatino Grillo: What did you propose in your lab?

Camelia & Simone: We used data produced by DBpedia, a community effort to extract structured information from Wikipedia and to make the information available on the web by exposing it with RDF a public standard for “linked data”.

Agatino Grillo: And what about Google Cloud Endpoints?

Camelia & Simone: Google Cloud Endpoints”  consists of tools, libraries and capabilities that allow easier to create a web backend for web clients and mobile clients such as Android or Apple’s iOS. For backend we use “App Engine app” to be freed from system admin work, load balancing, scaling, and server maintenance.

Agatino Grillo: Thanks Camelia, thanks Simone.

Camelia & Simone: Thanks to you

1-slide-cloud-cult-platform-roma-mai-stata-cosi-esposta.jpg

Slides

Project’s code (codelab)

Links

Requirements for codelab: https://docs.google.com/document/d/1Ys7HpA_9vKT5fBGtQ1rj0Y2DdURaVXcbodQU...

How to contact Camelia

How to contact Simone

Connected posts