Choosing Between Graph Databases and RDF Engines for Consuming and Mining Linked Data (Universidad Simon Bolívar, Caracas, Venezuela)
tipo de documento Página Web
Graphs naturally represent Linked Data and implementations of graph-based tasks are required not only for data consumption, but also for mining patterns among links. Despite efficient graph-based algorithms and engines have been implemented, there is no clear understanding of how these solutions may behave on Linked Data
In a twist that has inevitable written all over it, the database industry has at last begun to take heed of the power of consumerization. The once mighty RDBMS is now obliged to make room for an emerging and increasingly important partner in the data center: the graph database. Twitter’s doing it, Facebook’s doing it, even online dating sites are doing it; what they are doing is tracing relationship graphs. After all, social is social, and ultimately it’s all about relationships.
tipo de documento Página Web
There are triplestores (semantic databases), and there are general-purpose graph databases.
Both are based on the similar concepts of linking one "item" to another via a relationship. Triplestores support RDF and are queried by SPARQL, but such add-ons can be (and are) implemented ontop of general-purpose graph databases as well.
What is the fundamental difference that would make you prefer a semantic db / triplestore to a general purpose graph database like neo4j?
tipo de documento Página Web
Graph Databases vs. RDF Triple Stores
To summarize, both graph databases and triple stores are designed to store linked data. RDF is a specific kind of linked data that is queried using SPARQL, so it is fair to say that RDF triple stores are a kind of graph database. But, there are some subtle but important differences that are described below.
How They Are Similar
· Graph databases and rdf triple stores focus on the relationships between the data, often referred to as “linked data.” Data points are called nodes, and the relationship between one data point and another is called an edge.
· A web of nodes and edges can be put together into interesting visualizations—a defining characteristic of graph databases.
How They Are Different
· Graph databases are more versatile with query languages: Neo4J can run an RDF triple store and use SPARQL but generally focuses on its own proprietary language, Cypher. Other graph databases support G, GraphLog, GOOD, SoSQL, BiQL, SNQL, and more. RDF triple stores only use SPARQL as the query language.
· Graph databases can store various types of graphs, including undirected graphs, weighted graphs, hypergraphs, etc. RDF triple stores focus solely on storing rows of RDF triples.
· Graph databases are node, or property, centric whereas RDF triple stores are edge-centric. RDF triple stores are really just a list of graph edges, many of which are 'properties' of a node and not critical to the graph structure itself.
· Graph databases are better optimized for graph traversals (degrees of separation or shortest path algorithms). With RDF triple stores, the cost of traversing an edge tends to be logarithmic.
· RDF triple stores also provide inferences on data but graph databases do not (e.g., if humans are a subclass of mammals and man is a subclass of humans, then it can be inferred that man is a subclass of mammals).
· RDF triple stores are more synonymous with the “semantic web” and the standardized universe of knowledge being stored as RDF triples on DBpedia and other sources whereas graph databases are seen as more pragmatic rather than academic.
tipo de documento Página Web
Se presenta una nueva versión de la aplicación Linked Open Vocabularies (LOV), con una importante re-ingeniería, usando MongoDB y ElasticSearch para ofrecer un acceso rápido a los datos, y NodeJS para mostrar un interfaz de usuario limpio y rápido.
El proyecto LOV, que casi tiene 4 años, incorpora las siguientes mejoras:
- Uso de tags para vocabularios en vez de categorías jerárquicas (p.e. “Time” ).
- La posibilidad de realizar rápidas búsquedas de texto libre sobre 469 vocabularios, más de 46.000 términos, y 462 agentes (creadores, contribuyentes, publicadores).
- Un conjunto de APIs (http://lov.okfn.org/dataset/lov/api) para acceder a los datos de LOV.
- Un punto de acceso SPARQL, sobre los registros LOV y la última versión de cada vocabulario.
WebVOWL es una aplicación web de visualización de ontologías, orientada a usuarios. Implementa Visual Notation for OWL Ontologies (VOWL), proporcionando descripciones gráficas de elementos de OWL (Web Ontology Language) que, combinados con unos gráficos, representan la ontología.
Las visualizaciones VOWL se generan automáticamente desde ficheros JSON, dónde se incluyen las ontologías. Se proporciona un conversor OWL2VOWL, desarrollado en Java, para transformas las ontologías al formato JSON adecuado.
tipo de documento Vídeo
Este video de algo más de 5 minutos expresa las principales ideas del proyecto web www.mismuseos.net, un proyecto desarrollado por RIAM I+L LAB, pyme tecnológica española que trabaja en el campo de la web semántica.
The main goal of Mismuseos.net is to present a case of exploitation of Linked Data for the G.L.A.M. community through innovative end-user applications built on GNOSS, a semantic and social software platform. Mismuseos.net is a free access semantic online solution for end-users that allows them to find and discover museums-related content, and also reach some related external information thanks to the correlation with other datasets. We currently have collections of seven Spanish museums, where users can browse over 15,000 pieces of art and 2,650 artists. The featured applications are: faceted searches, enriched contexts and navigation through graphs. The search engine enables aggregated searches by different facets and summarization of results for each successive search.
Mismuseos.net obtains the information about cultural goods from the Europeana dataset and the online collections of public Spanish Museums. It also extracts and links data from additional datasets of the Linking Open Data cloud, either to supplement information or to generate enriched contexts: Dbpedia, Geonames and Didactalia (a GNOSS project with an index of more than 50,000 open educational resources).
Mismuseos.net shows a case of exploitation of Linked Data for the G.L.A.M. community through innovative end-user applications built on GNOSS, a semantic and social software platform. In more detail, the project is guided by the following goals:
- Put data to work: exploit public datasets and information on museums to generate benefits for users and improve the user’s experience thanks to the potential of the semantic web.
- Link datasets both to enrich content and generate accurate contexts of information building a cultural and educational graph.
- Connect cultural and educational worlds in a knowledge ecosystem.
SEMANTIC SOFTWARE SOLUTION: MISMUSEOS.NET
Mismuseos.net is a free access semantic online solution for end-users that allows them to find and discover museums-related content, and also reach some related external information thanks to the correlation with other datasets. Mismuseos.net structures, organizes and makes available to you, in accordance with the principles promoted by the Linked Data Project, an extensive catalog of artworks that museums publish on the Web. Moreover, it links the catalog with other existing LOD educational knowledge bases allowing the generation of educational contexts related to cultural goods.
We currently have a collection of seven Spanish museums (a meta-museum), where users can browse over 15,000 pieces of art and 2,650 artists. These are the museums included until now: Museo Bellas Artes de Bilbao, Museo Reina Sofía, Museo del Prado, Museo Sorolla, Museo de la Fundación Lázaro Galdiano, Museo del Greco and Museo de la Biblioteca Nacional (Museum of the National Spanish Library).
Datasets used: Europeana (CER.ES collection), Dbpedia, Geonames and Didactalia (GNOSS)
Mismuseos.net uses several datasets and sources of information:
- Europeana dataset, specifically the data from the CER.ES collection (CER.ES is the Digital Network of Collections of Spanish Museums), and the online collections of public Spanish Museums. These two datasets were used in order to obtain the information about cultural goods (pieces of arts and museum information basically).
- DBPedia, used to supplement the information about the author with and extract information on authors and museums location.
- Geonames, in order to obtain the geolocation data of artists and museums, once we have obtained the names of the places from the primary source or from Dbpedia. This information will be exploited in the future to locate them in a map view.
- Didactalia, an index of over 50,000 educational resources on gnoss.com, linked to provide users with related educational content.
To sum up the process, the primary information has been enriched, cleaned and normalized when necessary, and uploaded to the project online space inside the gnoss.com platform, so that we can consume and exploit the data and present the end-user applications. We have prepared a general navigation through tabs that includes a homepage with content selection, a tab for the collection (pieces of art) and another one for artists. In the near future, we will also include a tab for museums. The previous entities (pieces of art, artists and museums) are represented on the platform with their specific ontologies thanks to the semantic CMS of GNOSS using standard vocabularies if available.
Technology and main features:
The solution has been developed on gnoss.com, a social and semantic platform with a deep focus on the generation of social knowledge ecosystems and end-user applications in a Linked Data environment. It includes faceted searches, recommendation systems and adapted contexts in education, university and enterprises. GNOSS could be conceived as a network of networks or a linked networks space oriented to using semantic technologies for data and service integration. Moreover, it has a wide range of configurable social tools, which have been mostly deactivated in the case of Mismuseos.net.
- Semantic Content Management System (SemCMS): semantic forms engine
GNOSS expresses user-generated content as structured data with default basic semantic standard vocabularies. This is done automatically when a user shares content on the platform. Besides, GNOSS has an engine for developing specific ontologies to represent knowledge objects, and, as a consequence, specific search engines if necessary. The semCMS allows uploading an OWL file describing the concepts and relations within a particular knowledge domain, and it generates a semantic form with all the classes and properties represented in the OWL file. This is the case of MisMuseos.net, which has ontologies with particular vocabularies for artworks, artists and museums. So, all the information is available in RDF files.
- Faceted searches
Mismuseos.net has a powerful faceted search engine that is generated by its semantic graphs (RDF triplets); the search engine exploits that graphs through reasoned or inference-based searches. It provides specific configurable facets for each item type. For instance, in the case of pieces of art, users can search by facets such as collection type (sculpture, drawing, painting, etc.), museum, key words, author, time period, art techniques, etc.
By selecting a search option, it allows you to filter the results in consecutive searchers, and therefore restrict the results to a manageable number of entries. It offers summarization of the results, so that users can better understand how the results relate to all search facets. The values are recalculated for every set of results in aggregated searches. Also, one can only filter by those options where there results, avoiding incoherent search options.
- Contexts or related information: enriched content in Mismuseos.net
In Mismuseos.net, we have set several contexts depending on the object or entity that the user is viewing, which offer dynamically generated content:
1. Contexts for the entity ‘piece of art’: related works by the same artist and artworks within the same particular time period (internal), artist information (internal); related educational resources of Didactalia (external, on gnoss.com).
2. Contexts for the entity ‘artist’: artworks of the artist (internal), contemporary artists (internal), related paper toys and educational resources of Didactalia (external).
tipo de documento Página Web
Ludwig Wittgenstein’s 20,000 pages of manuscripts and typescripts (his ‘Nachlass’) display his continuous philosophical development and contain revisions, rearrangements and ‘multiple versioning.’ Their publication poses a number of challenges, for book as well as for digital editions (Huitfeldt 1994). Since its creation in 1990, the Wittgenstein Archives at the University of Bergen (WAB) has tried to meet these challenges through digital editorial philology. In 2000, WAB’s ‘Bergen Electronic Edition’ (BEE) of Wittgenstein’s Nachlass was published by Oxford University Press, and in 2009, 5000 pages from the Nachlass were made freely available (Open Access) on the Web ( http://www.wittgensteinsource.org/, cf. Fig. 1). Since 2001, XML-based text encoding (TEI P5) has been one of several central ingredients through which WAB has worked continuously to improve access to Wittgenstein’s manuscripts.
El grupo OWL Working Group ha publicado 15 documentos con la recomendación para OWL 2. Los elementos OWL 2 estan basados en XSD 1.1 (XML Schema Definition Language (XSD) 1.1, Part 2: Datatypes”):
- OWL 2 Web Ontology Language Document Overview.
- OWL 2 Web Ontology Language Structural Specification and Functional-Style Syntax.
- OWL 2 Web Ontology Language Mapping to RDF Graphs.
- OWL 2 Web Ontology Language Direct Semantics.
- OWL 2 Web Ontology Language RDF-Based Semantics. .
- OWL 2 Web Ontology Language Conformance.
- OWL 2 Web Ontology Language Profiles.
- OWL 2 Web Ontology Language New Features and Rationale.
- OWL 2 Web Ontology Language Quick Reference Guide.
- OWL 2 Web Ontology Language XML Serialization.
- rdf:PlainLiteral: A Datatype for RDF Plain Literals.
- OWL 2 Web Ontology Language Primer.
- OWL 2 Web Ontology Language Manchester Syntax. (Working Group Note)
- OWL 2 Web Ontology Language Data Range Extension: Linear Equations. (Working Group Note)
- owl working group
- , xsd
- , xml schema definition language
- , datatypes
- , web ontology language document overview
- , web ontology language structural specification
- , functional-style syntax
- , web ontology language mapping
- , rdf graphs
- , web ontology language direct semantics
- , web ontology language rdf-based semantics
- , web ontology language conformance
- , web ontology language profiles
- , web ontology language new features
- , rationale
- , web ontology language quick reference guide
- , web ontology language xml serialization
- , plainliteral
- , datatype
- , rdf plain literals
- , web ontology language primer
- , web ontology language manchester syntax
- , working group note
- , web ontology language data range extension
- , linear equations
- , owl 2
tipo de documento Página Web
Uno de los desafíos que enfrenta la Web Semántica para el Cuidado de la Salud y Ciencias de la Vida es el de la conversión de bases de datos relacionales en formato de Web Semántica. Los problemas y los pasos implicados en tal conversión no han sido bien documentados. Este documento describe las experiencias en el proceso de convertir las bases de datos relacionales SenseLab, una colección de bases de datos relacionales (Oracle) para la investigación neurocientífica, en OWL. La conversión de estas bases de datos en formato RDF / OWL es un paso importante hacia la consecución de los beneficios de la Web Semántica en la investigación de la neurociencia integrativa. Este documento describe cómo representamos a algunas de las bases de datos SenseLab en el Resource Description Framework (RDF) y Web Ontology Language (OWL), y se discuten las ventajas y desventajas de estas representaciones.