Cookies policy

GNOSS usa cookies, propias y de terceros, con finalidad principalmente técnica y necesaria para la prestación de nuestros servicios.Más información sobre nuestra política de cookies.

ACEPTA para confirmar que has leído la información y aceptado su instalación.Puedes modificar la configuración de tu navegador.

¿Mis datos en manos de terceros? Ventajas de expresar contenidos con estándares de la web semántica/ My data in third parties’ hands? Advantages of expressing content with semantic web standards

Ricardo Alonso Maturana

26/05/2010

visibility 3485

thumb_up_alt 0

send Send link
more_vert
Actions
- send Send link
- create_new_folder Save to his/her personal space

ENGLISH VERSION/ TEXTO EN ESPAÑOL

gnoss.com es un espacio de redes sociales que pueden enlazarse Open Data, lo que significa que sus datos están disponibles, para cualquiera que lo desee, en un formato estándar. Para ello, los datos en gnoss se expresan en una clase de ficheros denominados RDF (Resource Description Framework), que son uno de los estándares de W3C para la web semántica. En nuestro caso, un RDF es un modelo semántico de descripción de un recurso concreto cuya forma viene determinada en un lenguaje de marcado o etiquetado denominado OWL (Ontology Web Language).

Una ontología concreta expresa un modo de categorizar, modelar o representar nuestro conocimiento con relación a un campo, entidad u objeto determinado. Lo normal es que las ontologías representen las entidades, que en nuestro lenguaje natural son denotadas mediante los nombre propios y comunes, y sus relaciones. Como lo hacemos en un lenguaje que puede ser "comprendido" por las máquinas, éstas pueden "entendernos" a nosotros o, visto al revés, nosotros podemos conversar con ellas utilizando nuestras capacidades de un modo natural, esto es, podemos razonar con ellas. ¡Y esto representa una gran oportunidad para todos!

El problema es que con las ontologías pasa lo mismo que con nuestras palabras: no todo el mundo les atribuye el mismo significado. En el mundo offline casi cualquier acuerdo entre personas termina derivando en una cuestión terminológica. Cuando firmamos un acuerdo o un contrato, nos tomamos mucho tiempo para especificar que significa en concreto cada palabra. De lo contrario, pueden surgir conflictos que precisan de alguien que interprete el texto desde una posición neutral, esto es, que ejerza un arbitraje. Lo mismo pasa casi con cualquier código o sistema de normas. ¡Por eso el lenguaje jurídico se parece tanto al de los informáticos, aunque ellos no lo sepan!

Con los sistemas y las máquinas sucede lo mismo: si no utilizan la misma ontología, si no atribuyen el mismo significado y relaciones a una entidad, no pueden entenderse; y, si no se entienden, no pueden comunicarse, lo que en términos un poco más técnicos quiere decir que no pueden interoperar. Imaginemos, por ejemplo, que para un sistema la ontología de "persona" (el conjunto mínimo de atributos que permite a un sistema identificar a un humano en concreto) son: nombre, apellido del padre, apellido de la madre y fecha de nacimiento. Parece un descripción muy lógica e intituitiva, pero no se entenderá con otra que la describa en estos términos: nombre, primer apellido, segundo apellido, fecha de nacimiento y mucho menos con una tercera que lo haga de este modo: first name, last name, etc... Los sistemas pueden tener sus datos abiertos, pero como no se entiendan...Para ello se necesitan acuerdos, formas normalizadas de definir una determinada entidad.

Hay muchas cosas que necesitan conocer los sistemas, especialmente en el contexto de una red social, para poder comunicarse con sentido con las personas: para ser "inteligentes"; y si, además, queremos que se entiendan e interoperen con otros sistemas, precisamos que todos ellos hablen con las mismas palabras, esto es, que utilicen las mismas ontologías. A estas ontologías sobre las que existe un acuerdo (que puede ser universal, muy amplio o...menos amplio) las denominamos vocabularios. Algunos vocabularios de carácter muy general resultan especialmente importantes. Dado que los sistemas funcionan sobre la base de documentos digitalizados y descripciones de personas, las ontologías que representan nuestra idea general de lo que es un recurso o documento digital, las que modelan la descripción de una persona y aquellas que describen un sistema de categorías o tesauro resultan especialmente importantes. Ellas representan del modo más inclusivo a casi cualquier contenido que puede encontrarse en internet y por ello hacen que las máquinas y los sistemas puedan interoperar entre sí. Por supuesto, existen muchas más ontologías y vocabularios, generalmente pertenecientes a dominios más concretos o sectoriales (como las que representan el conjunto de patologías clínicas, por ejemplo).

ONTOLOGÍA DE GNOSS.COM

La ontología de gnoss.com la hemos ido construyendo nosotros, lo que quiere decir que no nos hemos fijado en el modo en el que otras personas o grupos entendían tal o cual concepto, objeto o cosa. Responde a nuestra visión del mundo. El problema en este caso radica en que, aunque se trata de una ontología abierta, las personas que la interpreten deben asumir nuestra visión para poder expresar nuestros datos en sus páginas web. Evidentemente se trata de un gran problema. Aunque no paramos de, por así decirlo, "hablar", lo hacemos en un lenguaje privado. Esto suele ser así en los albores de una tecnología, cuando no se conocen bien ni sus límites, ni sus posibilidades.

En efecto, la web semántica es algo muy nuevo y no existían acuerdos previos sobre cómo describir un tesauro o una patología clínica, del mismo modo que durante muchos años no existía un estándar que regulara el sentido de la rosca de los tornillos. Como es sabido, la estandarización industrial corrió de la mano de una oficina de estandarización (la ISO), pues bien, la de la web corre a cargo de W3C. Somos conscientes de la importancia de trabajar con estándares ontológicos si realmente queremos no sólo que nuestros datos estén abiertos y disponibles, sino que sean de verdad enlazables desde otras aplicaciones.

Como ya hemos señalado, W3C ha avanzado en la estandarización de algunas ontologías muy generales. Como son muy generales, resultan de aprovechamiento casi universal y por tanto muy útiles para resolver problemas de interoperabilidad de muy amplio espectro. Estas ontologías se refieren a:

· El modo en el que debemos describir a una persona para que los sistemas sepan que se trata de una persona (FOAF).

· El modo en el que deben estar descritos la información y los recursos en una red social, comunidad o grupo de trabajo colaborativo para que puedan ser interpretados y mostrados desde otra; esto es, el modo en el que debemos describir la información para que las redes sociales puedan intercambiar información o interoperar semánticamente sobre la base del conocimiento o interpretación automática de la misma por parte de los sistemas (SIOC).

· El modo en el que organizamos o categorizamos la información (el modo en el que creamos tesauros o taxonomías) (SKOS).

En gnoss.com estamos migrando nuestra ontología con el fin de expresarla de acuerdo con estos estándares. Este trabajo estará finalizado para principios de junio de 2010 (en una semana aproximadamente), con lo que gnoss.com, además de ser un espacio Open Data, será un espacio de Linking Data, esto es, sus datos serán enlazables, interpretables y expresables desde cualquier web que trabaje dentro de los estándares de la web semántica. Estos estándares son los que en el corto y medio plazo se irán imponiendo para resolver los profundos problemas de aislamiento a los que nos somete el no hacerlo así. Los sistemas de salud, las administraciones públicas y las grandes corporaciones están asumiendo la necesidad de trabajar con ellos si quieren aprovechar el potencial de sus sistemas y de la relación entre ellos y las personas. Poco a poco lo irán haciendo el resto de las empresas y personas. Aparte de poder disponer de mis datos PARA SIEMPRE en forma segura, el hecho de que las máquinas puedan interpretar documentos en “modo casi humano” presenta grandes ventajas cuando de lo que se trata de buscar o rescatar la información o de descubrir relaciones ocultas en ella.

LINKED DATA vs OPEN SOURCE: POR QUÉ LAS ALTERNATIVAS BASADAS EN DATOS ABIERTOS SON SUPERIORES A LAS DE CÓDIGO ABIERTO

Jon Bishop ofrece en 9 Free Ning Alternatives And Some Open Source Solutions un resumen de las alternativas gratuitas a Ning sobre plataformas de terceros. De muchas de ellas, y de las dificultades de trasladar los contenidos a dichas plataformas ya hemos hablado. El post propone también un conjunto de soluciones Open Source (Código Abierto): BuddyPress [Message from Buddypress]; Elgg - [Message from Elgg]; Pligg; Dolphin; LovdByLess; Insoshi; Astrospaces. Algunas personas han reflexionado, como nosotros, sobre los riesgos de poner los datos en manos de terceros y han llegado a la conclusión de que la solución consiste en ser propietario de la plataforma y de su código.

Se trata de una alternativa aparentemente razonable porque evitaría esa dependencia de terceros que tantos quebraderos de cabeza nos puede llegar a dar, como se ha visto. Ahora bien, para empezar, construir una plataforma con algunas de las soluciones Open Source que existen en el mercado y que acabamos de enumerar, siempre será un trabajo y…la comunidad de desarrolladores podría abandonar en algún momento su mantenimiento. Es un riesgo, pero menor que el que supone que nuestros datos se queden en un silo del que no podamos sacarlo, pensarán algunos. Aparte del hecho de que deberemos en algún momento superar la cultura del bricolaje informático, parece necesario expresar con toda claridad algo que con frecuencia queda oculto en el debate OpenSource: el problema no está en el código, está en los datos, en poder interoperar con ellos y no simplemente en tenerlos.

Mis datos abiertos se pueden expresar en otros lugares, a través de otros ‘frames’ y, sobre todo, pueden conectarse con otros para producir una experiencia de conocimiento más expresiva, evolutiva y extensible. Porque una solución de datos abiertos tiene más extensibilidad, flexibilidad y expresividad que cualquier otra que consideremos. El código es infoestructura y por tanto tratar con él podríamos considerarlo como fontanería o bricolaje de la web. Es el equivalente a la caja de herramientas del Ford T (entonces no había muchos talleres y se asumía que el que se comprara un coche debería dedicar un buen rato a mantenerlo y, eventualmente a repararlo). Hoy día a nadie se le pasa por la cabeza que tendrá que meter mano el el motor de su coche. Pues bien, del mismo modo que no se nos ocurre, cuando compramos un piso, picar la pared para comprobar de qué están hechas las cañerías, tampoco nos debería preocupar el código, sino sólo el hecho de que nuestros datos estén fácilmente disponibles cuando los necesite y para lo que los necesite.

Por supuesto, las soluciones menos recomendables son aquellas cuyos datos están cerrados y que, además, no son Open Source (como lamentablemente es el caso de Ning, y…de la mayor parte de las redes sociales, incluidas las muy populares); en segundo lugar, en esta lista que va de menos a más en "recomendabilidad", estarían las redes verticales construidas con Código Abierto; pero sin duda, las mejores o más recomendables serían aquellas cuyos datos están abiertos y pueden ser enlazables, esto es, las soluciones expresadas de acuerdo con los estándares de la web semántica que incluyen, a su vez, sistemas de representación del conocimiento u ontologías que son también estándar.

gnoss.com representa una solución de esa naturaleza. Es un espacio para alojar redes cuyos datos pueden conectarse con otras redes, por supuesto de las que están albergadas dentro del propio gnoss.com, pero eventualmente también con aquellas otras que estén fuera, pero que compartan la misma ontología. De las que aparecen en la gráfica de abajo, Twine, la solución conceptualmente más próxima a la nuestra, ha sido recientemente comprada por Evri y está en trance de extinción.

Conviene hacerlo notar una vez más: ¡Son los datos! Si de algo debemos preocuparnos es de ser los absolutos propietarios de los datos; una vez asegurado esto, lo demás debería darnos un poco lo mismo. Las gráficas de abajo expresan con claridad la superioridad de las soluciones Linked Open Data con relación a cualquier otra que podamos considerar y, en particular, las ventajas asociadas con nuestro proyecto.

Open Linked Data es la solución más escalable y flexible.

¿Mis datos en manos de terceros? Ventajas de expresar contenidos con estándares de la web semántica/ My data in third parties’ hands? Advantages of expressing content with semantic web standards

Y la más expresiva y extensible

Expresividad: es la medida de capacidad de un lenguaje para definir la semántica de un dominio de conocimiento, esto es, para representar conceptos y relaciones entre conceptos.
Extensibilidad: es la medida de capacidad de un lenguaje para permitir el futuro crecimiento de un sistema, es decir, la inclusión de nuevos conceptos y relaciones, y del esfuerzo requerido en el sistema para implementar la extensión.

¿Mis datos en manos de terceros? Ventajas de expresar contenidos con estándares de la web semántica/ My data in third parties’ hands? Advantages of expressing content with semantic web standards

Información relacionada:

-Si te quieres ir de Ning, deberías conocer por qué una solución Open Data puede resultar superior

-Usabilidad débil y usabilidad fuerte

Los gráficos de este post poseen Copyright de RIAM Intelearning Lab.

ENGLISH VERSION/ TEXTO EN ESPAÑOL

gnoss.com is a space of social networks that can be Open Data linked, which means that their data are available to anyone who wants in a standard format. To this end, GNOSS data are expressed in a file type called RDF (Resource Description Framework), which are one of the W3C standards for the semantic web. In our case, a RDF is a semantic model for describing a particular resource whose form is determined by a marking or tagging language called OWL (Web Ontology Language).

A specific ontology expresses a way of categorizing, modeling or representing our knowledge in relation to a field, an entity or an object. Ontologies normally represent the entities, which in our natural language are denoted by proper and common names, and their relationships. As we do so in a language that can be ‘understood’ by machines, they can ‘understand’ us. From the opposite perspective, we can talk with them using our skills in a natural way, that is, we can reason with them. And this is a great opportunity for everyone!

The problem is that what happens with ontologies is similar to what happens with our words: not everyone gives them the same meaning. In the offline world, almost any agreement between people ends up drifting into a question of terminology. When we sign an agreement or contract, we take a long time to specify what each word means in particular. Otherwise, some conflicts that require someone to interpret the text from a neutral position may arise, ie someone who acts as an arbitrator. The same thing happens with almost any code or set of standards. That is why legal language is so much like the computer one, although they don’t know it!

The same thing happens with systems and machines: if they don’t use the same ontology, if they don’t give the same meaning and relations to an entity, they can’t understand each other. And if they don’t understand each other, they can’t communicate, what in some more technical terms means that they can’t interoperate. Imagine, for example, that the ontology of "person" (the minimum set of attributes that allows a system to identify a particular human) for a system includes: name, father’s surname, mother’s surname and date of birth. It seems a very logical and intuitive description, but it won’t be understood by another one that describes it in these terms: name, first surname, second surname, date of birth, and even less with a third one that to does it this way: first name, last name, etc... The systems can have their data open, but if they don’t understand each other... It requires agreements, standard ways to define a particular entity.

There are many things that systems need to know, especially in the context of a social network, to communicate meaningfully with people: to be ‘smart’. And if we also want them to understand other systems and interoperate with them, we need all them to speak the same words, that is, to use the same ontologies. In practice, there are many things systemes need to know, especially in the context of a social network, to communicate meaningfully with people and to interoperate with other systems. For this to be really possible, they all must speak the same words, that is, using the same ontologies. These ontologies for which there is an agreement (which may be universal, broad or… narrower) are called vocabularies. Some very general vocabularies are particularly important. As the systems operate on the basis of digital documents and descriptions of people, the following ontologies related to them are specially important because they allow you to connect most of the entities that exist on the web: a) ontologies that represent our general idea about a resource or a digital document, b) the ones that shape the description of a person and c) those that describe a system of categories or thesaurus. They represent the most inclusive way to almost any content that can be found on the Internet. Thus, they make the machines and systems to interoperate with each other. Of course, there are many other ontologies and vocabularies, usually with more specific or sectorial domains (such as those representing the set of clinical pathologies, for example).

ONTOLOGY OF GNOSS.COM

We have been building the ontology of gnoss.com ourselves, which means that we haven’t looked at the way other people or groups understand this or that concept, object or thing. It reflects our worldview. The problem here is that, although it is an open ontology, the people who interpret it must take our vision in order to express our data on their websites. Obviously this is a big problem. Although we do not stop ‘talking’, so to speak, we do it in a private language. This is usually the case at the dawn of a technology, when nor its limits, not it possibilities are well understood.

In fact, the semantic web is something very new and there were no previous agreements on how to describe a thesaurus or a clinical pathology, just as for years there was no standard to regulate the direction of the screw threads. As is known, the industrial standardization went hand in hand with an office for standardization (ISO). Well, that one of the web depends on W3C. We understand the importance of working with ontological standards if we really want not only that our data are open and available, but also really linkable from other applications.

As already noted, W3C has made progress in the standardization of some very general ontologies. As they are very general, they have almost universal utilization, therefore they are very useful for resolving interoperability issues in a very wide range. These ontologies refer to:

The way we should describe a person so that systems know that it is a person (FOAF).
The way in which information and resources must be written on a social network, community or collaborative workgroup, so they can be interpreted and displayed on another network, in other words, the way we should describe information so that social networks can exchange information and interoperate semantically on the basis of knowledge or automatic interpretation of that information by systems (SIOC).
The way we organize or categorize information (the way we create thesauri or taxonomies) (SKOS).

In gnoss.com, we are migrating our ontology in order to express it in accordance with those standards. This work will be completed in early June 2010. Then gnoss.com, besides being an Open Space Data, will become a Linking Data space, that is, its data can be linked, interpreted and expressed by any website that works within the standards of the Semantic Web. Those standards are the ones that will go imposing in the short and medium term to solve the deep problems of isolation generated when not doing it so. Health systems, public administrations and large corporations are taking the need to work with them if they want to exploit the potential of their systems and the relationship between them and people. The rest of the companies and individuals will be doing it little by little. Apart from having your data available FOREVER and securely, the fact that machines can read documents in an ‘almost human way’ has great advantages when you are seeking or retrieving information or trying to discover hidden relationships in it.

LINKED DATA vs OPEN SOURCE: WHY THE ALTERNATIVES BASED ON DATA ARE SUPERIOR TO THE OPEN SOURCE ONES

Jon Bishop offers a summary of the free alternatives to Ning on third parties’ platforms in 9 Free Ning Alternatives And Some Open Source Solutions. We have already talked about many of them and the difficulties of transferring the content to such platforms. The post proposes a set of Open Source solutions: BuddyPress [Message from Buddypress]; Elgg - [Message from Elgg]; Pligg; Dolphin; LovdByLess; Insoshi; Astrospaces. Some people have thought about the risks of putting data in third parties’ hands and have come to the conclusion that the solution is to own the platform and its code.

This seems a reasonable alternative because it would avoid the dependence on third parties that can give us so many headaches, as we have seen. But to start with, building a platform with some of the just listed existing Open Source solutions will always mean some work and ... the community of developers might leave at some point its maintenance. It’s a risk, but lower than that assuming that our data could stand in a silo from which we cannot remove it, some people will think. Apart from the fact that we should overcome the culture of computer DIY at some point, it seems necessary to clearly express something that is often hidden in the Open Source debate: the problem is not in the code but in the data, in being possible to interoperate with them and not just having them.

My open data can be expressed in other sites, through others frames and, especially, they can connect with others to produce a more expressive, evolutionary and extensible knowledge experience. This is so because open data solutions have more extensibility, flexibility and expressivity than any other solution you can take into account. The code is ‘infostructure’ and therefore we could consider that dealing with it as plumbing or DIY on the Web. It’s equivalent to the toolbox of the Ford T (by then there weren’t many workshops and it was assumed that someone we bought a car should spend some time to maintain it and possibly repair it). Today nobody thinks about handling with his car engine. Well, just as we don’t think about chipping the walls of a newly bought apartment to check what the pipes are made of, we shouldn’t either be concerned about the code, but only that our data are readily available when needed.

Of course, the least suitable solutions are those which data are closed and, moreover, are not Open Source (as it is unfortunately the case of Ning, and... most social networks, including the most popular ones). Secondly, that list would contain the vertical Open Source networks, ordered by advisability (from least to most advisable). But without any doubt, the best or most desirable ones would be those which data are open and can be linked, i.e., solutions expressed according to the semantic web standards including, in turn, knowledge representation systems or ontologies that are also standard.

gnoss.com represents a solution of this nature. It is a space to host networks which data can connect to other networks, if they are hosted inside the gnoss.com of course, but possibly also to others that are outside but share the same ontology. Of those that appear in the chart below, Twine, the solution conceptually closer to ours, was recently purchased by Evri and is on the verge of extinction.

It is to notice once again: It’s the data! If we should be concerned about something, it is about being absolute owners of the data. Once this is assured, we shouldn’t care too much about the rest. The graphics below clearly express the superiority of the Linked Open Data solutions with respect to any other that we can consider and, in particular, the advantages associated with our project.

Open Linked Data is a more scalable and flexible solution.

¿Mis datos en manos de terceros? Ventajas de expresar contenidos con estándares de la web semántica/ My data in third parties’ hands? Advantages of expressing content with semantic web standards

And the most expressive and extensible one.

Expressiveness: the extent of the capacity of language to define the semantics of a domain of knowledge, that is, the capacity to represent concepts, and relationships between concepts.
Extensibility: is the measure of the capacity of a language to enable future growth of a system, that is, the inclusion of new concepts and relationships, and the effort required in the system to implement the extension.

¿Mis datos en manos de terceros? Ventajas de expresar contenidos con estándares de la web semántica/ My data in third parties’ hands? Advantages of expressing content with semantic web standards

Related information:

- If you want to leave Ning, you should know why an Open Data solution can be superior.

- Weak usability and strong usability.

The graphics on this post have RIAM Intelearning Lab Copyright.

Editors:

Ricardo Alonso Maturana
Editores Watermelon

Authors:

ricardo alonso maturana

Categories:

Grafos de conocimiento: Web semántica

Tags:

mode_comment comments (0)

Do you want to comment? Sign up or Sign in