Chapter 2 provides some background on RDF, the semantic web, and where SPARQL fits in, but before going into that, letâs start with a bit of hands-on experience writing and running SPARQL queries to keep the background part from looking too theoretical.
But first, what is SPARQL? The name is a recursive acronym for SPARQL Protocol and RDF Query Language, which is described by a set of specifications from the W3C.
Note
The W3C, or World Wide Web Consortium, is the same standards body responsible for HTML, XML, and CSS.
As you can tell from the âRQLâ part of its name, SPARQL is designed to query RDF, but youâre not limited to querying data stored in one of the RDF formats. Commercial and open source utilities are available to treat relational data, XML, JSON, spreadsheets, and other formats as RDF so that you can issue SPARQL queries against data in these formatsâor against combinations of these sources, which is one of the most powerful aspects of the SPARQL/RDF combination.
The âProtocolâ part of SPARQLâs name refers to the rules for how a client program and a SPARQL processing server exchange SPARQL queries and results. These rules are specified in a separate document from the query specification document and are mostly an issue for SPARQL processor developers. You can go far with the query language without worrying about the protocol, so this book doesnât go into any detail about it.
Chapter 2 describes more about RDF and all the things that people do with it, but to summarize: RDF isnât a data format, but a data model with a choice of syntaxes for storing data files. In this data model, you express facts with three-part statements known as triples. Each triple is like a little sentence that states a fact. We call the three parts of the triple the subject, predicate, and object, but you can think of them as the identifier of the thing being described (the âresourceâ; RDF stands for âResource Description Frameworkâ), a property name, and a property value:
subject (resource identifier) | predicate (property name) | object (property value) |
---|---|---|
richard | homeTel | (229) 276-5135 |
cindy | cindym@gmail.com |
The ex002.ttl file below has some triples expressed using the Turtle RDF format. (Weâll learn about Turtle and other formats in Chapter 2.) This file stores address book data using triples that make statements such as ârichardâs homeTel value is (229) 276-5135â and âcindyâs email value is cindym@gmail.com.â RDF has no problem with assigning multiple values for a given property to a given resource, as you can see in this file, which shows that Craig has two email addresses:
# filename: ex002.ttl @prefix ab: <http://learningsparql.com/ns/addressbook#> . ab:richard ab:homeTel "(229) 276-5135" . ab:richard ab:email "richard49@hotmail.com" . ab:cindy ab:homeTel "(245) 646-5488" . ab:cindy ab:email "cindym@gmail.com" . ab:craig ab:homeTel "(194) 966-1505" . ab:craig ab:email "craigellis@yahoo.com" . ab:craig ab:email "c.ellis@usairwaysgroup.com" .
Like a sentence written in English, Turtle (and SPARQL) triples usually end with a period. The spaces you see before the periods above are not necessary, but are a common practice to make the data easier to read. As weâll see when we learn about the use of semicolons and commas to write more concise datasets, an extra space is often added before these as well.
Tip
Comments in Turtle data and SPARQL queries begin with the hash (#
) symbol. Each query and sample data file in this book
begins with a comment showing the fileâs name so that you can easily
find it in the ZIP file of the bookâs sample data.
The first nonblank line of the data above, after the comment about
the filename, is also a triple ending with a period. It tells us that
the prefix âabâ will stand in for the URI http://learningsparql.com/ns/addressbook#
,
just as an XML document might tell us with the attribute setting
xmlns:ab="http://learningsparql.com/ns/addressbook#"
.
An RDF tripleâs subject and predicate must each belong to a particular
namespace in order to prevent confusion between similar names if we ever
combine this data with other data, so we represent them with URIs.
Prefixes save you the trouble of writing out the full namespace
URIs over and over.
A URI is a Uniform Resource Identifier. URLs (Uniform Resource Locators), also known as web addresses,
are one kind of URI. A locator helps you find something, like a web page
(for example, http://www.learningsparql.com/resources/index.html), and
an identifier identifies something. So, for example, the unique
identifier for Richard in my address book dataset is http://learningsparql.com/ns/addressbook#richard
.
A URI may look like a URL, and there may actually be a web page at that
address, but there might not be; its primary job is to provide a unique
name for something, not to tell you about a web page where you can send
your browser.
A SPARQL query typically says âI want these pieces of information from the subset of the data that meets these conditions.â You describe the conditions with triple patterns, which are similar to RDF triples but may include variables to add flexibility in how they match against the data. Our first queries will have simple triple patterns, and weâll build from there to more complex ones.
The following ex003.rq file has our first SPARQL query, which weâll run against the ex002.ttl address book data shown above.
Note
The SPARQL Query Language specification recommends that files storing SPARQL queries have an extension of .rq, in lowercase.
The following query has a single triple pattern, shown in bold, to
indicate the subset of the data we want. This triple pattern ends with a
period, like a Turtle triple, and has a subject of ab:craig
, a predicate of
ab:email
, and a
variable in the object position.
A variable is like a powerful wildcard. In addition to telling the
query engine that triples with any value at all in that position are OK
to match this triple pattern, the values that show up there get stored
in the ?craigEmail
variable so that we can use
them elsewhere in the query:
# filename: ex003.rq
PREFIX ab: <http://learningsparql.com/ns/addressbook#>
SELECT ?craigEmail
WHERE
{ ab:craig ab:email ?craigEmail . }
This particular query is doing this to ask for any ab:email
values
associated with the resource ab:craig
. In plain English, itâs asking
for any email addresses associated with Craig.
Note
Spelling SPARQL query keywords such as PREFIX, SELECT, and WHERE in uppercase is only a convention. You may spell them in lowercase or in mixed case.
Tip
In a set of data triples or a set of query triple patterns, the period after the last one is optional, so the single triple pattern above doesnât really need it. Including it is a good habit, though, because adding new triple patterns after it will be simpler. In this bookâs examples, you will occasionally see a single triple pattern between curly braces with no period at the end.
As illustrated in Figure 1-1, a SPARQL queryâs WHERE clause says âpull this data out of the dataset,â and the SELECT part names which parts of that pulled data you actually want to see.
What information does the query above select from the triples that
match its single triple pattern? Anything that got assigned to the
?craigEmail
variable.
Tip
As with any programming or query language, a variable name should give a clue about the variableâs purpose.
Instead of calling this variable ?craigEmail
, I could have called it
?zxzwzyx
, but
that would make it more difficult for human readers to understand the
query.
A variety of SPARQL processors are available for running queries
against both local and remote data. (You will hear the terms
SPARQL processor and SPARQL engine, but they mean the same thing: a program that can apply a
SPARQL query against a set of data and let you know the result.) For
queries against a data file on your own hard disk, the free, Java-based
program ARQ makes it pretty simple. ARQ is part of the Apache Jena
framework, so to get it, follow the Downloads link from ARQâs homepage
at http://jena.apache.org/documentation/query
and download the binary file whose name has the format apache-jena-*.zip
.
Unzipping this will create a subdirectory with a name similar to the ZIP
file name; this is your Jena home directory. Windows users will find
arq.bat
and
sparql.bat
scripts in a bat
subdirectory of the home directory, and users with Linux-based systems
will find arq
and sparql
shell
scripts in the home directoryâs bin
subdirectory. (The former of each
pair enables the use of ARQ extensions unless you tell it otherwise.
Although I donât use the extensions much, I tend to use that script
simply because its name is shorter.)
On either a Windows or Linux-based system, add that directory to
your path, create an environment variable called JENA_HOME
that stores the
name of the Jena home directory, and youâre all set to use ARQ. On
either type of system, you can then run the ex003.rq query against the
ex002.ttl data with the following command at your shell prompt or
Windows command line:
arq --data ex002.ttl --query ex003.rq
Note
Running either ARQ script with a single parameter of --help
lists all the
other command-line parameters that you can use with it.
ARQâs default output format shows the name of each selected variable across the top and lines drawn around each variableâs results using the hyphen, equals, and pipe symbols:
-------------------------------- | craigEmail | ================================ | "c.ellis@usairwaysgroup.com" | | "craigellis@yahoo.com" | --------------------------------
The following revision of the ex003.rq query uses full URIs to express the subject and predicate of the queryâs single triple pattern instead of prefixed names. Itâs essentially the same query, and gets the same answer from ARQ:
# filename: ex006.rq SELECT ?craigEmail WHERE { <http://learningsparql.com/ns/addressbook#craig> <http://learningsparql.com/ns/addressbook#email> ?craigEmail . }
The differences between this query and the first one demonstrate two things:
You donât need to use prefixes in your query, but they can make the query more compact and easier to read than one that uses full URIs. When you do use a full URI, enclose it in angle brackets to show the processor that itâs a URI.
Whitespace doesnât affect SPARQL syntax. The new query has carriage returns separating the triple patternâs three parts and still works just fine.
Note
The formatting of this bookâs query examples follow the conventions in the SPARQL specification, which arenât particularly consistent anyway. In general, important keywords such as SELECT and WHERE go on a new line. A pair of curly braces and their contents are written on a single line if they fit there (typically, if the contents consist of a single triple pattern, like in the ex003.rq query) and are otherwise broken out with each curly brace on its own line, like in example ex006.rq.
The ARQ command above specified the data to query on the command
line. SPARQLâs FROM keyword lets you specify the dataset to query as part of
the query itself. If you omitted the --data ex002.ttl
parameter shown in that
ARQ command line and used this next query, youâd get the same result,
because the FROM keyword names the ex002.ttl data source right in the
query:
# filename: ex007.rq
PREFIX ab: <http://learningsparql.com/ns/addressbook#>
SELECT ?craigEmail FROM <ex002.ttl>
WHERE
{ ab:craig ab:email ?craigEmail . }
(The angle brackets around âex002.ttlâ tell the SPARQL processor to treat it as a URI. Because itâs just a filename and not a full URI, ARQ assumes that itâs a file in the same directory as the query itself.)
Warning
If you specify one dataset to query with the FROM keyword and another when you actually call the SPARQL processor (or, as the SPARQL query specification says, âin a SPARQL protocol requestâ), the one specified in the protocol request overrides the one specified in the query.
The queries weâve seen so far had a variable in the triple patternâs object position (the third position), but you can put them in any or all of the three positions. For example, letâs say someone called my phone from the number (229) 276-5135, and I didnât answer. I want to know who tried to call me, so I create the following query for my address book dataset, putting a variable in the subject position instead of the object position:
# filename: ex008.rq
PREFIX ab: <http://learningsparql.com/ns/addressbook#>
SELECT ?person
WHERE
{ ?person ab:homeTel "(229) 276-5135" . }
When I have ARQ run this query against the ex002.ttl address book data, it gives me this response:
-------------- | person | ============== | ab:richard | --------------
Triple patterns in queries often have more than one variable. For
example, I could list everything in my address book about Cindy with the
following query, which has a ?propertyName
variable in the predicate
position and a ?propertyValue
variable in the object
position of its one triple pattern:
# filename: ex010.rq
PREFIX ab: <http://learningsparql.com/ns/addressbook#>
SELECT ?propertyName ?propertyValue
WHERE
{ ab:cindy ?propertyName ?propertyValue . }
The queryâs SELECT clause asks for values of the ?propertyName
and
?propertyValue
variables, and ARQ shows them as a table with a column for each
one:
------------------------------------- | propertyName | propertyValue | ===================================== | ab:email | "cindym@gmail.com" | | ab:homeTel | "(245) 646-5488" | -------------------------------------
In most RDF data, the subjects of the triples wonât be names that
are so understandable to the human eye, like the ex002.ttl datasetâs
ab:richard
and
ab:cindy
resource names. Theyâre more likely to be identifiers assigned by some
process, similar to the values a relational database assigns to a tableâs unique ID field. Instead of storing
someoneâs name as part of the subject URI, as our first set of sample
data did, more typical RDF triples would have subject values that make
no human-readable sense outside of their important role as unique
identifiers. First and last name values would then be stored using
separate triples, just like the homeTel
and email
values were stored in the sample
dataset.
Another unrealistic detail of ex002.ttl is the way that resource
identifiers like ab:richard
and property names like
ab:homeTel
come
from the same namespaceâin this case, the http://learningsparql.com/ns/addressbook#
namespace that the ab:
prefix represents. A vocabulary of
property names typically has its own namespace to make it easier to use
it with other sets of data.
Note
When working with RDF, a vocabulary is a set of terms stored using a standard format that people can reuse.
When we revise the sample data to use realistic resource
identifiers, to store first and last names as property values, and to
put the data values in their own separate http://learningsparql.com/ns/data#
namespace,
we get this set of sample data:
# filename: ex012.ttl @prefix ab: <http://learningsparql.com/ns/addressbook#> . @prefix d: <http://learningsparql.com/ns/data#> . d:i0432 ab:firstName "Richard" . d:i0432 ab:lastName "Mutt" . d:i0432 ab:homeTel "(229) 276-5135" . d:i0432 ab:email "richard49@hotmail.com" . d:i9771 ab:firstName "Cindy" . d:i9771 ab:lastName "Marshall" . d:i9771 ab:homeTel "(245) 646-5488" . d:i9771 ab:email "cindym@gmail.com" . d:i8301 ab:firstName "Craig" . d:i8301 ab:lastName "Ellis" . d:i8301 ab:email "craigellis@yahoo.com" . d:i8301 ab:email "c.ellis@usairwaysgroup.com" .
The query to find Craigâs email addresses would then look like this:
# filename: ex013.rq PREFIX ab: <http://learningsparql.com/ns/addressbook#> SELECT ?craigEmail WHERE { ?person ab:firstName "Craig" . ?person ab:email ?craigEmail . }
Note
Although the query uses a ?person
variable, this variable isnât
in the list of variables to SELECT (a list of just one variable,
?craigEmail
,
in this query) because weâre not interested in the ?person
variableâs
value. Weâre just using it to tie together the two triple patterns in
the WHERE clause. If the SPARQL processor finds a triple with a
predicate of ab:firstName
and an object of âCraigâ,
it will assign (or bind) the URI in the subject of that triple to the variable
?person
. Then,
wherever else ?person
appears in the query, it will
look for triples that have that URI there.
Letâs say that our SPARQL processor has looked through our address
book dataset triples and found a match for that first triple pattern in
the query: the triple {ab:i8301 ab:firstName "Craig"}
. It will
bind the value ab:i8301
to the ?person
variable, because ?person
is in the subject
position of that first triple pattern, just as ab:i8301
is in the subject position of
the triple that the processor found in the dataset to match this triple
pattern.
Note
When referring to a triple in the middle of a sentence, like in the first sentence of the above paragraph, I usually wrap it in curly braces to show that the three pieces go together.
For queries like ex013.rq that have more than one triple pattern,
once a query processor has found a match for one triple pattern, it
moves on to the queryâs other triple patterns to see if they also have
matches, but only if it can find a set of triples that match the set of
triple patterns as a unit. This queryâs one remaining triple pattern has
the ?person
and
?craigEmail
variables in the subject and object positions, but the processor wonât
go looking for a triple with any old value in the subject, because the
?person
variable
already has ab:i8301
bound to it. So, it looks for a
triple with that as the subject, a predicate of ab:email
, and any value in the object
position, because this second triple pattern introduces a new variable
there: ?craigEmail
. If the processor finds a
triple that fits this pattern, it will bind that tripleâs object to the
?craigEmail
variable, which is the variable that the queryâs SELECT clause is asking
for.
As it turns out, two triples in ex012.ttl have d:i8301
as a subject and
ab:email
as a
predicate, so the query returns two ?craigEmail
values:
âcraigellis@yahoo.comâ and âc.ellis@usairwaysgroup.comâ.
-------------------------------- | craigEmail | ================================ | "c.ellis@usairwaysgroup.com" | | "craigellis@yahoo.com" | --------------------------------
Note
A set of triple patterns between curly braces in a SPARQL query is known as a graph pattern. Graph is the technical term for a set of RDF triples. While there are utilities to turn an RDF graph into a picture, it doesnât refer to a graph in the visual sense, but as a data structure. A graph is like a tree data structure without the hierarchyâany node can connect to any other one. In an RDF graph, nodes represent subject or object resources, and the predicates are the connections between those nodes.
The ex013.rq query used the ?person
variable in two different triple
patterns to find connected triples in the data being queried. As queries
get more complex, this technique of using a variable to connect up
different triple patterns becomes more common. When you progress to
querying data that comes from multiple sources, youâll find that this
ability to find connections between triples from different sources is
one of SPARQLâs best features.
If your address book had more than one Craig, and you specifically wanted the email addresses of Craig Ellis, you would just add one more triple to the pattern:
# filename: ex015.rq
PREFIX ab: <http://learningsparql.com/ns/addressbook#>
SELECT ?craigEmail
WHERE
{
?person ab:firstName "Craig" .
?person ab:lastName "Ellis" .
?person ab:email ?craigEmail .
}
This gives us the same answer that we saw before.
Letâs say that my phone showed me that someone at â(229) 276-5135â
had called me and I used the same ex008.rq query about that number that
I used beforeâbut this time, I queried the more detailed ex012.ttl data
instead. The result would show me the subject of the triple that had
ab:homeTel
as a predicate and â(229) 276-5135â as an
object, just as the query asks for:
--------------------------------------------- | person | ============================================= | <http://learningsparql.com/ns/data#i0432> | ---------------------------------------------
If I really want to know who called me, âhttp://learningsparql.com/ns/data#i0432â isnât a very helpful answer.
Tip
Although the ex008.rq query doesnât return a very human-readable answer from the ex012.ttl dataset, we just took a query designed around one set of data and used it with a different set that had a different structure, and we at least got a sensible answer instead of an error. This is rare among standardized query languages and one of SPARQLâs great strengths: queries arenât as closely tied to specific data structures as they are with a query language like SQL.
What I want is the first and last name of the person with that phone number, so this next query asks for that:
# filename: ex017.rq PREFIX ab: <http://learningsparql.com/ns/addressbook#> SELECT ?first ?last WHERE { ?person ab:homeTel "(229) 276-5135" . ?person ab:firstName ?first . ?person ab:lastName ?last . }
ARQ responds with a more readable answer:
---------------------- | first | last | ====================== | "Richard" | "Mutt" | ----------------------
Revising our query to find out everything about Cindy in the
ex012.ttl data is similar: we ask for all
the predicates and objects (stored in the ?propertyName
and ?propertyValue
variables)
associated with the subject that has an ab:firstName
of âCindyâ and an ab:lastName
of
âMarshallâ:
# filename: ex019.rq PREFIX a: <http://learningsparql.com/ns/addressbook#> SELECT ?propertyName ?propertyValue WHERE { ?person a:firstName "Cindy" . ?person a:lastName "Marshall" . ?person ?propertyName ?propertyValue . }
In the response, note that the values from the ex012.ttl fileâs
new ab:firstName
and ab:lastName
properties appear in the ?propertyValue
column. In other words,
their values got bound to the ?propertyValue
variable, just like the
ab:email
and
ab:homeTel
values:
------------------------------------- | propertyName | propertyValue | ===================================== | a:email | "cindym@gmail.com" | | a:homeTel | "(245) 646-5488" | | a:lastName | "Marshall" | | a:firstName | "Cindy" | -------------------------------------
Note
The a:
prefix used in the ex019.rq query was different from the ab:
prefix used in the
ex012.ttl data being queried, but ab:firstName
in the data and a:firstName
in this
query still refer to the same thing: http://learningsparql.com/ns/addressbook#firstName
.
What matters are the URIs represented by the prefixes, not the
prefixes themselves, and this query and this dataset happen to use
different prefixes to represent the same namespace.
What if you want to check for a piece of data, but you
donât even know what subject or property might have it? The following
query only has one triple pattern, and all three parts are variables, so
itâs going to match every triple in the input dataset. It wonât return
them all, though, because it has something new called a FILTER that instructs the query processor to only pass along
triples that meet a certain condition. In this FILTER, the condition is
specified using regex()
, a function that checks for strings matching a certain
pattern. (Weâll learn more about FILTERs in Chapter 3 and regex()
in Chapter 5.) This particular call to
regex()
checks
whether the object of each matched triple has the string âyahooâ
anywhere in it:
# filename: ex021.rq
PREFIX ab: <http://learningsparql.com/ns/addressbook#>
SELECT *
WHERE
{
?s ?p ?o .
FILTER (regex(?o, "yahoo","i"))
}
Note
Itâs a common SPARQL convention to use ?s
as a variable name
for a triple pattern subject, ?p
for a predicate, and ?o
for an
object.
The query processor finds a single triple that has âyahooâ in its object value:
--------------------------------------------------------------------------------- | s | p | o | ================================================================================= | <http://learningsparql.com/ns/data#i8301> | ab:email | "craigellis@yahoo.com" | ---------------------------------------------------------------------------------
Something else new in this query is the use of the asterisk instead of a list of specific variables in the SELECT list. This is just a shorthand way to say âSELECT all variables that get bound in this query.â As you can see, the output has a column for each variable used in the WHERE clause.
Tip
This use of the asterisk in a SELECT list is handy when youâre doing a few ad hoc queries to explore a dataset or trying out some ideas as you build to a more complex query.
Letâs modify a copy of the ex015.rq query that asked for Craig
Ellisâs email addresses to also ask for his home phone number. (If you
review the ex012.ttl data, youâll see that Richard and Cindy have
ab:homeTel
values, but not Craig.)
# filename: ex023.rq PREFIX ab: <http://learningsparql.com/ns/addressbook#> SELECT ?craigEmail ?homeTel WHERE { ?person ab:firstName "Craig" . ?person ab:lastName "Ellis" . ?person ab:email ?craigEmail . ?person ab:homeTel ?homeTel . }
When I ask ARQ to apply this query to the ex012.ttl data, it gives me headers for the variables I asked for but no data underneath them:
------------------------ | craigEmail | homeTel | ======================== ------------------------
Why? The query asked the SPARQL processor for the email address
and phone number of anyone who meets the four conditions listed in the
graph pattern. Even though resource ab:i8301
meets the first three conditions
(that is, the data has triples with ab:i8301
as a subject that matched the
first three triple patterns), no resource in the data meets all four
conditions because no one with an ab:firstName
of âCraigâ and an ab:lastName
of âEllisâ
has an ab:homeTel
value. So, the SPARQL
processor didnât return any data.
In Chapter 3,
weâll learn about SPARQLâs OPTIONAL keyword, which lets you make
requests like âShow me the ?craigEmail
value and, if itâs there, the
?homeTel
value
as well.â
Querying data on your own hard drive is useful, but the real fun of SPARQL begins when you query public data sources. You need no special software, because these data collections are often made publicly available through a SPARQL endpoint, which is a web service that accepts SPARQL queries.
The most popular SPARQL endpoint is DBpedia, a collection of data from the gray infoboxes of fielded data that you often see on the right side of Wikipedia pages. Like many SPARQL endpoints, DBpedia includes a web form where you can enter a query and then explore the results, making it very easy to explore its data. DBpedia uses a program called SNORQL to accept these queries and return the answers on a web page. If you send a browser to http://dbpedia.org/snorql/, youâll see a form where you can enter a query and select the format of the results you want to see, as shown in Figure 1-2. For our experiments, weâll stick with âBrowseâ as our result format.
I want DBpedia to give me a list of albums produced by the hip-hop
producer Timbaland and the artists who made those albums. If Wikipedia
has a page for âSome Topicâ at http://en.wikipedia.org/wiki/Some_Topic
, the
DBpedia URI to represent that resource is usually http://dbpedia.org/resource/Some_Topic
. So,
after finding the Wikipedia page for the producer at
http://en.wikipedia.org/wiki/Timbaland
, I sent a
browser to http://dbpedia.org/resource/Timbaland
. I
found plenty of data there, so I knew that this was the right URI to
represent him in queries. (The browser was actually redirected to
http://dbpedia.org/page/Timbaland
, because when a
browser asks for the information, DBpedia redirects it to the HTML
version of the data.) This URI will represent him just like
http://learningsparql.com/ns/data#i8301
(or its shorter,
prefixed name version, d:i8301
)
represents Craig Ellis in ex012.ttl.
I now see on the upper half of the SNORQL query in Figure 1-2 that http://dbpedia.org/resource/
is already
declared with a prefix of just â:â, so I know that I can refer to the
producer as :Timbaland
in my query.
Tip
A namespace prefix can simply be a colon. This is popular for namespaces that are used often in a particular document because the reduced clutter makes it easier for human eyes to read.
The producer
and musicalArtist
properties that I plan to
use in my query are from the http://dbpedia.org/ontology/
namespace, which
is not declared on the SNORQL query input form, so I included a
declaration for it in my query:
# filename: ex025.rq PREFIX d: <http://dbpedia.org/ontology/> SELECT ?artist ?album WHERE { ?album d:producer :Timbaland . ?album d:musicalArtist ?artist . }
This query pulls out triples about albums produced by Timbaland
and the artists listed for those albums, and it asks for the values that
got bound to the ?artist
and ?album
variables. When I replace the
default query on the SNORQL web page with this one and click the
button, SNORQL displays the
results to me underneath the query, as shown in Figure 1-3.
The scroll bar on the right shows that this list of results is only the beginning of a much longer list, and even that may not be completeâremember, Wikipedia is maintained by volunteers, and while there are some quality assurance efforts in place, they are dwarfed by the scale of the data to work with.
Also note that it didnât give us the actual names of the albums or
artists, but names mixed with punctuation and various codes. Remember
how :Timbaland
in my query was an abbreviation of a full URI representing the producer?
Names such as :Bj%C3%B6rk
and :Cry_Me_a_River_%28Justin_Timberlake_song%29
in the result are abbreviations of URIs as well. These artists and songs
have their own Wikipedia pages and associated data, and the associated
data includes more readable versions of the names that we can ask for in
a query. Weâll learn about the rdfs:label
property that often stores
these more readable labels in Chapters 2 and 3.
In this chapter, we learned:
What SPARQL is
The basics of RDF
The meaning and role of URIs
The parts of a simple SPARQL query
How to execute a SPARQL query with ARQ
How the same variable in multiple triple patterns can connect up the data in different triples
What can lead to a query returning nothing
What SPARQL endpoints are and how to query the most popular one, DBpedia
Later chapters describe how to create more complex queries, how to modify data, how to build applications around your queries, the potential role of inferencing, and the technologyâs roots in the semantic web world, but if you can execute the queries shown in this chapter, youâre ready to put SPARQL to work for you.
Get Learning SPARQL, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.