If ‘semantic web’ annoys you, read on…

Say "semantic web" to a lot of people and the shutters on their brains come down. They may have lived through the disappointments of the AI or expert systems eras. Or they may simply know how impossibly tedious it would be to retrofit their web pages with semantic data.

Say "linked data" to them and they might ask "what's that?" with a reasonably open mind. At some point during the explanation, it will dawn on them that the terms are identical to those used in the semantic web. By then, of course, it's too late, they're hooked.

The basic idea is that web pages, html or otherwise, contain some information that links them to other web pages in a meaningful way. Nothing particularly new in that, you might say. But the meaningful bit in this context is not what the human reads – a bit of clickable text that takes you to another web page – but what a computer application can read and make sense of.

An example might be understood as: 'The prime minister is Gordon Brown'. This might be expressed as prime minister:Gordon Brown. And these elements, in turn might point to well-defined explanations of the two concepts elsewhere on the web. In dbpedia.org/page/ the links would be Prime_minister and Gordon_Brown, respectively. Other authentic sources include Freebase, the Guardian or the New York Times. The application might drill into these pages plucking out useful information and following other links, which would have been defined in a similar fashion.

Of course, because this page has been published, it becomes a potential resource for others to link to. It rather depends what the page was about. The Gordon Brown entry, in this case, was just one element. It might have been 'The British Cabinet in March 2010', for example. And others might have found that information useful.

(If you want to experiment a bit, go to <sameAs> where you can whack in terms and read their definitions in plain text.)

Many public and not-so-public bodies have been making their resource or link information openly available. Friend of a Friend (or FOAF) provides a means of defining yourself. The National Library of Congress has published its Subject Headings – a list of standard names which everyone may as well use to ensure consistency. But it's not essential, you (or someone else) can always declare equivalence using a SameAs or exactMatch type of relationship. e.g. 'Brown, Gordon' can be equated to 'Gordon Brown'.

As you rummage, you'll come across terms such as RDF, URI, graphs, triples and so on. These exist to clarify rather than confuse. The resource description framework (RDF) defines how information should be expressed. Fundamentally each item is a triple comprising: subject; predicate (or property); object, as in Gordon Brown; is a; politician. A uniform resource identifier (URI) might define each of those elements. And the collection of triples is referred to as an RDF graph. Of course, you'll get exceptions, and finer nuances, but that's the basic idea.

The point of all this is that, as with the rest of the web, it must be allowed to flourish in a decentralised and scalable way, which means without central control, although open standards are very important and make life easier for all participants.

With this general introduction, it's possible to see how data sets can be joined together without the explicit permission or participation of the providers. You could find a URI and, from that, find all the other datasets that reference it, if you wanted to. Because of the common interest, you (or your application, more like) would be able to collect further information about the subject.

Talis is a UK company that's deep into this stuff. It's been going for around 40 years and was originally a library services provider. It has spread its wings somewhat and now divides its attention between education, library and platform services. The platform element is the part that's deeply into linked data. It recently set up a demonstration for the Department of Business, Innovation and Skills (BIS) to show some of the potential of this stuff. It takes RDF information from three sources – the Technology Strategy Board (TSB), Research Councils UK (RCUK) and the Intellectual Property Office (IPO) – and produces a heat map of activity in mainland Britain. You can see how much investment is going in, how many patents are being applied for and so on. You can zoom into to ever finer-grained detail and use a slider to see how the map changes over time. You can play with the Research Funding Explorer yourself or follow the links in this piece by Richard Wallis to see a movie.

For you, the question in your mind must be, "All very well, but what's in it for me?" For a start, you can get hold of a lot of data which might be useful in your business – information about customers, sources of supply or geographic locations, for example. So, you may find value purely as a consumer. However, you may be able to give value by sharing data sets or taxonomies that your company has developed. This might sound like madness, but we've already seen in the social web that people who give stuff away become magnets for inbound links and reputational gains. In this case, you could become the authoritative source for certain definitions and types of information. It all depends what sort of organisation you are and how you want to be seen by others.

2 thoughts on “If ‘semantic web’ annoys you, read on…

  1. I’m puzzelled. You begin by pointing out that semantic / ontology based approaches to managing data have failed in the market. No arguing there. But I don’t see how changing “semantic web” to “linked data” (or “web 3.0”, which the name used the last time this tactic was tried) solves any of the problems which caused semantic web to have failed in the past. Most (if not all) of the activity in this space is academic or funded by the public service. We’re yet to see significant adoption by the general business community.
    Really, this reduces to one ke point: what are we doing differently this time which will allow semantic solutions to succeed today?

  2. It was AI and Expert Systems that had ‘failed’ not the semantic web. That has neither failed nor succeeded yet. To my mind, one of the dangers the semantic web faces is the negative association in many minds of the use of the word ‘semantic’. The word itself is the barrier.
    So by using the “linked data” term, you lower the expectations to realistic levels.
    The rest of the piece was about explaining the basics of the subject because I have an audience (through IT-Director and IT Analysis which mirror some of my blogs) who might find potential in linked data.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s