How will we interact with the data network?

Original author: Tom Heath
  • Transfer

The semantic web is a common information space of related data, intended more for machines than for people. Is it so? Yes and no. Indeed, machine-readable data endowed with precise semantics and published on the web together with the ability to link data into distributed sets are the main characteristic feature of the semantic web. Together, these features allow you to collect and combine heterogeneous data on an unprecedented scale, and the whole routine for us will be performed by machines.

However, all this is meaningless without a person who can reap the benefits of emerging opportunities. A machine-readable data network (semantic web or data network) is far from deleting a person from the process. Moreover, it opens up great prospects for the interaction of man and machine.

To date, the semantic web community has mainly been developing technical infrastructure to make the data network feasible in principle, and publishing sets of related data to fill it with content. If we want to make full use of the prospects and capabilities of the data network, we need to overcome this initial stage and work on understanding how the paradigm of user interaction with the network is changing.

In this article, I will discuss some aspects of how our interaction with the data network can differ from the interaction with the existing network of documents, and what this can mean for both users and creators of network content.

Semantic Web: From Vision to Reality

In 1999, Jakob Nielsen wrote about the looming crisis: the network grew at an incredible rate, and he predicted that without increased attention to the principles of the user interface, it would become a useless dump of documents. Almost 10 years have passed since then and the network is experiencing a new round of its development. A data network or semantic web arises, foreseen in more than a decade, and is the result of many years of work on the technologies underlying it. Although we can consider them various concepts, a data network is probably another step in the development of the web that we know, and not something completely different from the existing network of hypertext documents.

Today, even growth statistics are not given in terms of pages or sites. Instead, they talk about the number of triplets placed on the data network using the Resource Description Framework (RDF) and the number of links created by triplets between different data sets.

RDF is a machine-readable entity specification developed by W3C. Each of these statements consists of three parts: the subject, the predicate and the object, and is therefore called a triplet. In most cases, the subject in the triplet is a uniform Resource Identifier (URI), which allows you to identify everything that the data creator wants: a person, a place, a document on the network, an abstract concept - in general, that's all. The predicate determines the nature of the relationship between the subject and the object, is taken from dictionaries located on the network and identified using the URI. An RDF triple object is usually a string literal or other URI. If the object is a URI from another namespace, that is, it defines something from a different data set, then the RDF triplet creates a link between these sets, linking isolated data islands into a giant distributed storage built on the basis of the architecture of the Internet. This is a real data network.

When the participants in the initial Linking Open Data project last attempted to calculate the current size of a data network, their cautious estimates showed that the data sets on the network contain more than two billion RDF triplets, three million of which are links between sets. The growth rate of this network is so great that any future estimates seem to be out of date at the time of publication.

Another additional feature of RDF: you can combine triplets contained in any number of documents distributed over the network. Source documents can be painlessly combined without the need for the resulting graph to correspond to any particular scheme. One consequence of this is significantly less headache associated with the integration of heterogeneous data.

Throw out your home page!

In a network of documents, individuals and organizations often devote much attention to the design of externally attractive sites that create the right impression for their target audience. But if RDF allows you to combine data from multiple sources to create a consistent image of some entity, how will this affect the way we host data on the network? This will lead to the fact that the web pages in the form in which we are accustomed to perceive them now, simply disappear.

Developers of Web 2.0 mashups have been demonstrating this for some time now, combining data from several different sources to present them in a new form, which none of the source sources, by itself, can do. A data network is a logical extension that allows developers to create links between the data sources presented on the network so that others can use them to create large-scale specialized mashups while facilitating the integration of heterogeneous data.

Documents will always be useful data warehouses, but in many cases, I believe, their role will be limited to this. On the semantic web, you will not be able to control how the information you post will be presented - this is just data. In terms of visual design, RDF is a continuation of the long-standing principle of separating content from presentation. For some content creators, this can be troubling - how to maintain the brand with less control over the presentation? For others, an opportunity arises to free oneself from worries about the appearance by focusing on posting relevant, high-quality data, giving everyone the opportunity to create the kind of presentation that he wants, rather than being content with what is destined for him by someone else.

At the data level, their creator may have some influence on where his data refers, mainly by independently creating these links and placing them for others to use. However, in a data network, no one can, with any degree of certainty, control the sources with which his data is associated. As a result, there is the possibility of reusing data, and this is just what you need! As already described, data placed on the network in a form suitable for repeated use allows you to create new representations whose value is higher than that of a simple sum of their constituent parts, which the creators of the initial data could not have previously guessed.

It is for these reasons that I propose to abandon home pages. Researchers are well aware of the difficulties of combining all the pieces of their professional activity into a single whole: projects, documents, participation in committees and editorial boards, blog entries and photo albums scattered on isolated islands on the network, possibly copied to their personal website or linked via hypertext , and maybe not, given the difficulties involved.

A homepage on a data network can take many forms. In the simplest case, it can be just a set of RDF triplets linking together the data we want to present scattered in different places. To collect this data into a single representation suitable for human use is the work of the machine.

In order not to be unfounded, the next time I print my business card, I’ll write on it not the address of my home page, but my URI, being sure that a person with a browser, semantic or not, can look through this URI and find what what the network knows about me.

What should a semantic browser look like?

Developing the ideas described, we can see that the document in which some RDF graph is placed, first of all indicates the source of its origin, and does not act as a hard-wired package for this data.

Much more important than the documents themselves are the entities described in them: people, places and concepts. Here I use the term “data network”, but in fact I use it as an abbreviation for “network of data about entities”, arbitrary entities. Perhaps we can’t get the car via HTTP, but we can identify it using the HTTP URI and use the network to get the car description as RDF.

Data network browsers must operate at the entity level. Creating simple browsers to display RDF triplets and documents containing them is one of the options for people to interact with this information space. We saw a similar approach in the early browsers of the semantic web, but they probably miss the point. Viewing one page at a time, which is familiar to us from the existing network of documents, nullifies the potential of a generalized presentation of data collected from many places.

Thus, semantic web browsers are designed not only to display a low-level representation of data. Instead, they should treat entities (in the broadest sense) as basic elements of an interface. The entity in question should be in the spotlight, while the browser collects and organizes the information related to it in a transparent way for the user.

We see hints of a similar approach in semantic browsers such as Tabulator and DBpedia Mobile, where the entity in question is in the spotlight, and specific documents only provide pieces of data that together make up the whole picture. Despite this movement in the right direction, there is still room for improvement.

Familiar browsers, basically, did not succeed in transmitting the original vision of the network as a medium for reading and writing. Despite the fact that this approach is generally gradually implemented through, for example, blogs, wikis, and special tagged services like Flickr , there remains a significant degree of indirection when it comes to editing network documents. In some cases, the process still includes starting the HTML editor, making the necessary changes and using another application (such as an FTP client) to host the modified document.

Browsers for the semantic web, which I prefer to call "entity browsers," have a chance to provide much more opportunities for direct processing in their interfaces. Different types of objects imply different types of actions, and knowing the type of object that the user is focused on will allow browsers to provide a set of actions specifically for that object, and possibly even adapt them to the context.

For example, if a user is currently viewing information about a person, the browser can allow him to send a message to that person, share an object with him, or make an appointment without having to explicitly indicate for that person the ability to perform any of these functions. Instead, the semantic web as a whole can provide all the necessary knowledge and capabilities to perform these functions, for example, a definition that describes “make an appointment” as an action that can be performed on an entity such as “a person”, or determine what the meeting consists of, or venue assumptions based on relationships between participants and time of day.

Obviously, the data network does not allow you to operate with real things, such as cars or dogs, which are not there and will never be online. However, in a data network, we can explicitly refer to anything, not just documents. This is the great potential for reducing the level of indirection in network interfaces. We can no longer link to web pages about any entities, we can link to these entities themselves.

In case there are doubts: all this is not some kind of fleeting fashion, but a direction, the implementation of which will take years and can take various forms. Speaking at a 2007 World Wide Web conference, Bill Buxton of Microsoft Research stated that "the variety of" web browsers "will soon be the same as the variety of" ink browsers "(meaning paper) today in terms of differences in form, function, location and importance. ” I didn’t get the impression that Buxton was thinking about the data network when he made this statement, but it nevertheless seems plausible. A true network of entities will require a similar variety of interfaces through which we will use it. The browser is just one of the approaches.

Back button for semantic web?

Accepting the transition from documents to entities and from predefined representations to created ones will require dynamically not only completely new interfaces, but also some changes in the interaction elements that we are already familiar with. If viewing will not only be a transition from one document to another by links, but will also use a generalized representation of data collected from various sources, then the concept of a back button in the interface will have a slightly different meaning. Rather, the browser should move the user not to the previous document, but to the previous entity in question. More importantly, the “undo the change” button that you could see in word processors can be critical in an environment where a huge amount of data can be collected with minimal effort, но не все из них могут быть подходящими для текущей задачи.

The circle of potential sources providing data on some entity will be enormous. Imagine that you entered the London URI in the address bar of your semantic web browser. All available London information on the network cannot be placed on one interface. The user must decide which sources to add depending on the current task or context, or allow the browser to make this decision for him with the ability to cancel the addition of certain sources. This functionality becomes even more important if the automatic reasoning performed on the semantic data in the network creates new knowledge that previously did not exist explicitly in any of the individual sources.

Managing a set of data sources is becoming an issue. When several colleagues and I evaluated the demonstration of various technologies of the semantic web to the delegates of the European Semantic Web Conference in 2006, one of the main topics that came up was “integrity”. Various semantic web applications were presented to delegates for use. They expected the data to be combined and presented as a whole. For various reasons (described in other publications) this was not possible, which disappointed the delegates, leaving not the best impressions.

The key to developing data network browsers will be search services like Sindice, which provide a way to find other RDF documents on the semantic network that mention some entity. Services of this kind can help make sure that the data received by the user is complete, that is, that they include everything that the user expects. But there is still the question of checking whether a certain representation of the data is useful.

Any system designed to integrate heterogeneous data in real time and present the result to the user will need to use complex models of relevance, quality and reliability, taking into account the current task of the user and its context. How this can be achieved is a question of the future.

IEEE Internet Computing

Original (English): How Will We Interact with the Web of Data?

Translation: daeq , dulanov , vvvolf , jupy (the choice of article for the translation and the translation itself was performed as part of the mailing group ).

License: переведено толпой