Mapping Natural Language Labels to Structured Web Resources

Abstract

Mapping natural language terms to a Web knowledge base enriches information systems without additional context, with new relations and properties from the Linked Open Data. In this paper we formally define such task, which is related to word sense disambiguation, named entity recognition and ontology matching. We provide a manually annotated dataset of labels linked to DBpedia as a gold standard for evaluation, and we use it to experiment with a number of methods, including a novel algorithm that leverages the specific characteristics of the mapping task. The empirical evidence confirms that general term mapping is a hard task, that cannot be easily solved by applying existing methods designed for related problems. However, incorporating NLP ideas such as representing the context and a proper treatment of multiword expressions can significantly boost the performance, in particular the coverage of the mapping. Our findings open up the challenge to find new ways of approaching term mapping to Web resources and bridging the gap between natural language and the Semantic Web.