Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Linked Data and the Digital Public Space

nevali
December 11, 2011

Linked Data and the Digital Public Space

Presentation at the Kultivate Linked Data Workshop at HEFCE in London on Monday 12th December 2011. See http://pres.spindle.org.uk/2011/kultivate/ for more information.

nevali

December 11, 2011
Tweet

More Decks by nevali

Other Decks in Technology

Transcript

  1. 0101010001101000011010010111001100100000011001000110111101100011011101010110110101100101011011100111010000100000011010000110000101110011001000000110001001100101 0110010101101110001000000111000001110010011001010111000001100001011100100110010101100100001000000110000101110011001000000111000001100001011100100111010000100000 0110111101100110001000000111010001101000011001010010000001000100011010010110011101101001011101000110000101101100001000000101000001110101011000100110110001101001 0110001100100000010100110111000001100001011000110110010100100000011100000111001001101111011010100110010101100011011101000000101000001010011000100111100100100000 0111010001101000011001010010000001000010010000100100001100100000010000010111001001100011011010000110100101110110011001010010000001000100011001010111011001100101 0110110001101111011100000110110101100101011011100111010000100000011101000110010101100001011011010010110000100000011011000110010101100100001000000110001001111001 0010000001010100011011110110111001111001001000000100000101100111011001010110100000101100001000000100001101101111011011100111010001110010011011110110110001101100 0110010101110010001000000110111101100110001000000100000101110010011000110110100001101001011101100110010100001010000010100100010001100101011101100110010101101100 0110111101110000011011010110010101101110011101000010111000001010000010100000101000001010010101000110100001100101001000000100010001101001011001110110100101110100 0110000101101100001000000101000001110101011000100110110001101001011000110010000001010011011100000110000101100011011001010010000001110000011100100110111101101010

    0110010101100011011101000010000001110011011001010110010101101011011100110010000001110100011011110010000001101101011000010110101101100101001000000111010001101000 0110010100100000011011100110000101110100011010010110111101101110001001110111001100100000011000110111010101101100011101000111010101110010011000010110110000100000 0110000101110010011000110110100001101001011101100110010101110011000010100000101001100110011100100110010101100101011011000111100100100000011000010111011001100001 0110100101101100011000010110001001101100011001010010000001110100011011110010000001100101011101100110010101110010011110010110111101101110011001010010110000100000 0110011001101111011100100110010101110110011001010111001000101100001000000111010001101111001000000110010001101111011101110110111001101100011011110110000101100100 0010110000100000011010110110010101100101011100000010110000100000011100110110100001100001011100100110010100101100001000000111001001100101011011010110100101111000 0010000001100001011011100110010000001010000010100110000101101110011011100110111101110100011000010111010001100101001011100010000001001001011101000010000001100001 0110110001110011011011110010000001100001011010010110110101110011001000000111010001101111001000000110110101100001011010110110010100100000011000110110111101101101 0110110101100101011100100110001101101001011000010110110000100000011011000110100101100011011001010110111001110011011010010110111001100111001000000110111101100110 0010000001100001011100100110001101101000011010010111011001100101001000000110110101100001011101000110010101110010011010010110000101101100001000000110010101100001 0111001101111001000010100000101001100001011011100110010000100000011100110111010001110010011000010110100101100111011010000111010001100110011011110111001001110111 0110000101110010011001000010111000100000010000010110111001100100001000000110011001101001011011100110000101101100011011000111100100101100001000000110001001100101 0110001101100001011101010111001101100101001000000111011101101000011000010111010000100000011101110110010100100000011000110110111101101110011100110110100101100100 0110010101110010001000000111010001101111001000000110001001100101001000001110001010000000100111000110001101110101011011000111010001110101011100100110000101101100 0000101000001010011010000110010101110010011010010111010001100001011001110110010111100010100000001001110100100000011010010111001100100000011000110110100001100001 0110111001100111011010010110111001100111001000000110010101110110011001010111001001111001001000000110010001100001011110010010000001100001011011100110010000100000 0110100101110011001000000110100101101110011000110111001001100101011000010111001101101001011011100110011101101100011110010010000001110010011001010110001101101111 0110011101101110011010010111001101100001011000100110110001111001001000000111001101101000011000010111000001100101011001000010000001100010011110010000101000001010 0110111101110101011100100010000001101001011011100110010001101001011101100110100101100100011101010110000101101100001000000110000101100011011101000110100101101111 0110111001110011001011000010000001101001011101000010000001101110011001010110010101100100011100110010000001110100011011110010000001110111011011110111001001101011 0010000001110111011010010111010001101000001000000111000001100101011100100111001101101111011011100110000101101100001000000110000101110010011000110110100001101001 0111011001100101011100110010000001101010011101010111001101110100001000000110000101110011001000000110100101110100000010100000101001100100011011110110010101110011 0010000001101001011011100111001101110100011010010111010001110101011101000110100101101111011011100110000101101100001000000110000101110010011000110110100001101001 0111011001100101011100110010111000001010000010100000101000001010010101000110100001101001011100110010000001110111011010010110110001101100001000000110001001100101 0010000001101110011011110010000001101101011001010110000101101110001000000110011001100101011000010111010000101110001000000100111001101111011011100110010100100000 0110111101100110001000000111010001101000011001010010000001100011011010000110000101101100011011000110010101101110011001110110010101110011001000000111001101110101 0111001001110010011011110111010101101110011001000110100101101110011001110010000001101101011001010111010001100001011001000110000101110100011000010010110000001010 0000101001110101011011100110100101110110011001010111001001110011011000010110110000100000011000010110001101100011011001010111001101110011001011000010000001100011 0110111101110000011110010111001001101001011001110110100001110100001011000010000001100100011101010111010001111001001011010110111101100110001011010110001101100001 0111001001100101001011000010000001101001011001000110010101101110011101000110100101110100011110010010110000100000011011010110010101100100011010010110000100100000 0111000001110010011001010111001101100101011100100111011001100001011101000110100101101111011011100010110000100000011011010110010101100100011010010110000100001010 0110011001101111011100100110110101100001011101000111001100101100001000000111010101110011011000010110001001101001011011000110100101110100011110010010110000100000 0110111001101111011100100010000001100110011010010110111001100001011011100110001101101001011011100110011100100000011000010111001001100101001000000110100101101110 0111001101101001011001110110111001101001011001100110100101100011011000010110111001110100001011100000101000001010000010100000101001001000011011110111011101100101 0111011001100101011100100010110000100000011101110110010100100000011100110110100001100001011100100110010100100000011101000110100001100101001000000111011001101001 0111001101101001011011110110111000100000011000010110111001100100001000000111001101101000011001010110010101110010001000000110001001101100011011110110111101100100 0111100100101101011011010110100101101110011001000110111001100101011001000110111001100101011100110111001100100000011100100110010101110001011101010110100101110010 0110010101100100001000000111010001101111001000000110110101100001011010110110010100001010000010100111010001101000011010010111001100100000011010000110000101110000 0111000001100101011011100010000001101111011011100110010100100000011101110110000101111001001000000110111101110010001000000110000101101110011011110111010001101000 0110010101110010001110100010000001100001011011100110010000100000011100110110111100100000011101110110010100100000011101110110100101101100011011000010000001110111 0110111101110010011010110010000001100001011100110010000001101000011000010111001001100100001000000110000101110011001000000111011101100101001000000110001101100001 0110111000100000011101000110111100100000011001000110111100001010000010100110101001110101011100110111010000100000011101000110100001100001011101000010111000001010 0000101000001010000010100101010001101000011001010010000001000100011010010110011101101001011101000110000101101100001000000101000001110101011000100110110001101001 0110001100100000010100110111000001100001011000110110010100100000011100000111001001101111011010100110010101100011011101000010000001101001011100110010000001100010 0111001001101111011101010110011101101000011101000010000001110100011011110010000001111001011011110111010100100000011000100111100100100000010101000110111101101110 0111100100100000010000010110011101100101011010000010110000100000010000010110110001100101011110000110000101101110011001000110010101110010000010100000101001000010 0110000101101011011001010111001000101100001000000100101001100001011010110110010100100000010000100110010101110010011001110110010101110010001011000010000001000011 0110000101110100011010000111100100100000010001000110010101110010011100100110100101100011011010110010110000100000010000100110010101101110001000000100011101110010 0110010101100101011011100010110000100000010011010110100101101011011001010010000001000111011100100110100101100110011001100110100101110100011010000111001100101100 0010000001010011011101000110010101110110011001010010000001001010011101010111000001100101001011000000101000001010010000010110010001100001011011010010000001001100 0110010101100101001011000010000001000100011000010111001001110010011001010110111000100000010011000110010101101001011001110110100001110100011011110110111000101100 0010000001000010011010010110110001101100001000000101010001101000011011110110110101110000011100110110111101101110001011000010000001001010011011110110100001101110 0010000001011010011101010110001001110010011110100111100101100011011010110110100100101100001000000110000101101110011001000010000001001011011010010110110100100000 0101011101100001011011000111001101101000001011000000101000001010011101110110100101110100011010000010000001110011011100000110010101100011011010010110000101101100 0010000001110100011010000110000101101110011010110111001100100000011101000110111100100000010001100111001001100001011011100010000001000001011011000110010101111000 0110000101101110011001000110010101110010001011000010000001000010011100100110000101101110011001000110111101101110001000000100001001110101011101000111010001100101 0111001001110111011011110111001001110100011010000010110000100000010000010111000001110010011010010110110000100000010000110110000101110010011101000110010101110010 0010110000001010000010100100111101101100011010010111011001100101011100100010000001000111011000010111001001100100011010010110111001100101011100100010110000100000 0100010001101001011100100110101100101101010101110110100101101100011011000110010101101101001000000111011001100001011011100010000001000111011101010110110001101001 0110101100101100001000000101001001101111011011000111100100100000010010110110010101100001011101000110100101101110011001110010110000100000010010100110111100100000 0100110101100001011100110110111101101110001011000010000001000111011000010111001001110010011010010110010100100000010011010110000101101100011011000110010101101110 0010110000001010000010100100101101100101011011100010000001001101011000110100010101101110011001010111001001111001001011000010000001010010011010010110001101101000 0110000101110010011001000010000001001110011011110111001001110100011010000110111101110110011001010111001000101100001000000100100001100101011011000110010101101110 0010000001010000011000010111000001100001011001000110111101110000011101010110110001101111011101010111001100101100001000000100110101100001011101000111010001101000 0110010101110111001000000101000001101111011100110111010001100111011000010111010001100101001011000010000001000011011010000111001001101001011100110000101000001010 0101001101101001011110100110010101101101011011110111001001100101001011000010000001010011011010010110110101101111011011100010000001010011011101000110100101101100 0110110000101100001000000100001001110010011001010110111001100100011000010010000001010100011010010110110101101101011011110110111001110011001011000010000001000111 0111010101100100011100100111010101101110001000000101011001101001011000110110101101100101011100100111100100101100001000000111010001101000011001010010000001110100 0110010101100001011011010010000001100001011101000000101000001010010011010110010101110100011000010110001001110010011011110110000101100100011000110110000101110011 0111010000101100001000000100001001000010010000110010000001010010011001010111001101100101011000010111001001100011011010000010000001100001011011100110010000100000 0100010001100101011101100110010101101100011011110111000001101101011001010110111001110100001011000010000001110100011010000110010100100000010101010100101100100000 0101001101101111011101010110111001100100001000000110000101101110011001000010000001010110011010010111001101101001011011110110111000001010000010100100001101101111 0110110001101100011001010110001101110100011010010110111101101110011100110010000001000111011100100110111101110101011100000010110000100000011000010110111001100100 0010000001100101011101100110010101110010011110010110001001101111011001000111100100100000011001010110110001110011011001010010000001110111011010000110111100100000 0110100001100001011100110010000001100111011010010111011001100101011011100010000001100001011001000111011001101001011000110110010100101100001000000110001101110010 Digital Public Space
  2. Identifiers • All very important, everybody knows that • The

    great thing about identifiers is that there are so many of them…
  3. URIs • URIs have some unique properties • Anybody can

    make them • They can refer to anything • A (significant) subset of URIs are resolveable
  4. Resolvable URIs are useful The Linked Data golden rule: •

    Give everything a URI and publish the data about that thing at that URI
  5. What makes a good Linked Data URI? • It uses

    http: or https: — you want things to be able to consume your data, right? • It differentiates between “documents” and “things described by those documents” • It’s not going to change • It’s the same place humans go for a page about that thing
  6. The pattern http://example.com/catalogue/AB12458#thing Your server The largest collection you have

    containing the item (in this case, “our catalogue of everything”) Your persistent identifier The fragment identifier
  7. The fragment identifier? • A cheap (but elegant) hack to

    avoid ambiguity • Allows differentiation between “resources” and “things described by resources” • Lots of common properties can be used to describe either the document or the thing • It doesn’t really matter what you pick as your fragment identifier
  8. Part 2: A step-by-step guide to publishing some Linked Data

    with just a text editor and a Web server (or: “Look Ma, no database!”)
  9. Our example scenario • It’s data about a book (Should

    be familiar territory for many of you) • We’ve got our own internal, stable, catalogue reference — in our case a UUID, but the exact nature isn’t too important • We’ve got a web server to publish it on
  10. What we’re asserting • The book’s ISBN • The book’s

    title and author • Some links to other sources of information about this book
  11. It looks like this: 52279a4d-1707-45e7-b19d-f5571218e9dd ISBN-13: 978-1899066100 Title: Acronyms &

    synonyms in medical imaging Author: David Allison More information can be found at: http://bnb.data.bl.uk/id/resource/011012558
  12. Our identifier • The prefix (where we’re going to put

    it all): http://foobarbooks.org/items/ • Our local identifier (our catalogue reference): 52279a4d-1707-45e7-b19d-f5571218e9dd • The fragment (which differentiates information about the book): #item
  13. Publishing the data • We’ll start with the human-readable version

    • Let’s create 52279a4d-1707-45e7-b19d- f5571218e9dd.html • Nothing particularly out of the ordinary here!
  14. What happens when you resolve the URI? 1. You type…

    http://foobarbooks.org/items/52279a4d-1707-45e7-b19d-f5571218e9dd#item …into your address bar 2. Your browser requests… /items/52279a4d-1707-45e7-b19d-f5571218e9dd …from the server at foobarbooks.org on TCP/IP port 80 (that’s where HTTP lives)
  15. Requesting the document The request looks something like this: GET

    /items/52279a4d-1707-45e7-b19d-f5571218e9dd HTTP/1.0 Host: foobarbooks.org Accept: text/html;q=1.0, text/xml;q=0.7, application/xml;q=0.8 Accept-Language: en-GB, en-US;q=0.9, en;q=0.7 User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_7_2; en- GB) AppleWebKit/534.51.22 (KHTML, like Gecko) Version/534.51.22 Safari/533.21.1
  16. And then…? 3. The server sends back a 404 Not

    Found response because there’s no document named /items/52279a4d-1707-45e7-b19d-f5571218e9dd 4. Oh :(
  17. Cool URIs don’t change • We could change our identifier

    to explicitly include the “.html” • “.cgi, or even .html, is something which will change. You may not be using HTML for that page in 20 years time, but you might want today's links to it to still be valid.” • Why not just configure the server to not need the filename extension?
  18. Let’s try that request again… 3. The server sends back

    the HTML document that we created. 4. Hurrah!
  19. Big deal • So far, we’ve assigned a URI, created

    a human-readable document and published it on a Web server. • Welcome to 1992.
  20. Adding some structured data • We can describe our book

    using RDF • RDF is useful because it identifies everything using URIs, even the classes and properties • This means that (a) anybody can play, and (b) there’s not much risk of conflict
  21. Publishing our RDF • Start with RDF/XML • Many people

    find it ugly, but it’s the one RDF serialisation everything can process • So, we’ll create 52279a4d-1707-45e7-b19d- f5571218e9dd.rdf • This is why putting .html into our item URI would be problematic — without special knowledge, consumers would only retrieve the HTML document!
  22. What happens now? • When a human requests the item

    from the Web server nothing changes. They still get the HTML. • When a piece of software which understands RDF/XML requests it, though…
  23. Requesting the RDF The request looks something like this: GET

    /items/52279a4d-1707-45e7-b19d-f5571218e9dd HTTP/1.0 Host: foobarbooks.org Accept: application/rdf+xml, text/turtle, application/json User-Agent: MyLinkedDataProcessor/1.0 And the response: HTTP/1.0 200 OK Content-Type: application/rdf+xml Content-Location: /items/52279a4d-1707-45e7-b19d-f5571218e9dd.rdf <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns"> : :
  24. What can you do now? • You could add structured

    data to the HTML document which would mean some consumers don’t have to request the RDF/ XML • You could help consumers which don’t understand Content Negotiation by adding a <link> element to your HTML • You could publish data in other formats: JSON, Turtle, CSV, MARC…
  25. The BBC Archive: •2.3m hours of film & video •300,000

    hours of audio •4m photographs •20,000 rolls of microfilm
  26. What are we doing? • Taking lots of that lovely

    Linked Data • Caching it (to make it easier for consuming applications) • Breaking it down into people, places, events, collections and… things • Finding the overlaps
  27. Two ways of finding overlaps • The easy way: explicit

    links between catalogues in their data (the “Linked” part of “Linked Data”) • The harder, but more common way: heuristic matching
  28. What do you get? • Linked Data In, Linked Data

    Out • (Sometimes) Garbage In, Garbage Out • A “Webby” platform for exploration of cultural archives
  29. Status • We’ve got a prototype aggregator • We built

    a couple of interfaces, and commissioned a third party to build another • We threw a small subset (500k items) at it
  30. What’s next? • Put a lot more data in and

    let the matching code do its thing • Make as much as we can available to as many people as possible