Tripletest Report

1. Introduction

The tests included in this report show memory utilization and speed for adds, deletions, and queries of Sesame and Kowari-based triplestores.

A. Data Sources

Two large datasets were used for these tests.

The first dataset, consisting of about 500,000 triples, was the Wordnet RDF Representation from semanticweb.org. This represents definitions of, and relationships between concepts in the English language. The RDF schema for this dataset is available at http://www.semanticweb.org/library/wordnet/wordnet-20000620.rdfs.
The second dataset, consisting of about 20,000,000 triples, was derived from several plaintext files available at http://imdb.com/interfaces/. The entire dataset (110MB zipped) is available at http://perseus.lib.virginia.edu:8080/cc/imdbrdf.zip The graph structure is described here.

Wordnet tests were conducted on a Windows XP machine with an AMD Athlon XP 2800 processor. IMDB tests were conducted on a Linux machine running with 4 Intel(R) Xeon(TM) CPU 3.20GHz processors. All tests were run using Sun's Java 1.4.2 JRE with default JVM settings.

The tests were performed using Tripletest, a Java utility that uses Trippi v0.9.4 to provide a uniform add/query/delete interface to the triplestores being tested.

Triplestore Test Configurations:

Kowari v1.0.5 - Running locally, using a triple buffer of size 30,000 for adds/deletes.
Sesame v1.2RC2 - Running locally, using a triple buffer of size 500 for adds/deletes. Both the native and the RDMBS sails were tested. For the RDBMS sail, a local instance of MySQL 4.0.21-nt (with no special configuration) was used.

Inference capabilities were not tested.

C. Understanding the Graphs

Some graphs are plotted with logarithmic co-ordinates for readbility. This is indicated by the scale on the graph as well as accompanying text where applicable.

Query result graphs convey a lot of information, so they need a little explanation: Memory utilization is measured at even increments of time throughout the process of getting a query result set. Before the first result is returned memory use is plotted with decorated lines. After the first result is returned, the lines are no longer decorated. From the X coordinates of the lines, you can determine how long a query took 1) to return the first result, and 2) to complete.

You can click any inline graph to see a 1024x768 version.

2. Wordnet 20k Test

The first test was intended to quickly determine how the Sesame native sail's add/delete speed compared to that of the Sesame RDBMS (MySQL) sail.

Sesame-native generally outperformed Sesame-MySQL in this test. To find out why Sesame-MySQL was performing so poorly on deletes, I logged the database queries being done by the RDBMS sail and found that it was constantly making "OPTIMIZE TABLE" requests to MySQL. Since this can be an expensive operation, I temporarily disabled these OPTIMIZE requests but performance became worse as a result. A quick look at the source code for this sail indicates that it uses several temporary tables during the add and delete process. The necessity of using temporary tables here is unclear. TODO: Ask the Sesame developers about this.

A. Adds

For adds, the native sail was about 5x faster.

Memory usage while adding was roughly equivalent.

B. Deletes

Sesame-MySQL took an inordinate amount of time with 20k triples, finally decreasing to a rate of 11 seconds/10,000 while deleting the last chunk of 500. Sesame-native stayed pretty constant at 1-to-2 seconds/10,000.

Memory usage while deleting was similar to the above: Sesame-MySQL required more to start, then tapered off, while Sesame-native remained low.

3. Wordnet 500k Test

This test compares Sesame-native and Sesame-MySQL with Kowari for add and query performance. Note that for adds and deletes, Kowari has fewer datapoints than Sesame. This is due to the size of the triple buffers. Kowari performs better with a large buffer, whereas Sesame performs better with a smaller one.

A. Adds

Kowari is a bit slower to add triples than Sesame-native, but faster than Sesame-MySQL.

Kowari requires more memory, which makes sense, since its update buffer is larger. Memory utilization is not alarming in any case.

B. Queries

Query 1

Sesame (SPO) Query:
* * *

Kowari (SPO) Query:
* * *

Kowari and Sesame-native both completed successfully, but Sesame-MySQL failed with an out of memory error after 22 seconds. This is similar to the result we saw with Jena on a previous test, and similarly seems to be caused by not using streaming ResultSets from MySQL. See Sesame Bug #59.

Query 2

Sesame (SeRQL) Query:
select word 
from   {wordnet:102669463} schema:wordForm {word}

Kowari (iTQL) Query:
select $word
from   <#test>
where  <wordnet:102669463> <schema:wordForm> $word

Query 3

Sesame (SeRQL) Query:
select word, definition 
from   {myConcept}    schema:wordForm      {"happy"}; 
                      schema:similarTo     {thatConcept}, 
       {thatConcept}  schema:wordForm      {word};
                      schema:glossaryEntry {definition} 

Kowari (iTQL) Query:
select $word $definition 
from   <#test>
where  $myConcept   <schema:wordForm>      'happy' 
and    $myConcept   <schema:similarTo>     $thatConcept 
and    $thatConcept <schema:glossaryEntry> $definition 
and    $thatConcept <schema:wordForm>      $word

Query 4

Sesame (SeRQL) Query:
select distinct word, superTypeA, superTypeB, superTypeC, superTypeD 
from   {concept}   rdf:type         {schema:Verb}; 
                   schema:hyponymOf {h1}; 
                   schema:wordForm  {word}, 
       {h1}        schema:hyponymOf {h2}; 
                   schema:wordForm  {superTypeA}, 
       {h2}        schema:hyponymOf {h3}; 
                   schema:wordForm  {superTypeB}, 
       {h3}        schema:hyponymOf {h4}; 
                   schema:wordForm  {superTypeC}, 
       {h4}        schema:wordForm  {superTypeD} 

Kowari (iTQL) Query:
select $word $superTypeA $superTypeB $superTypeC $superTypeD 
from   <#test>
where  $concept <rdf:type>          <schema:Verb>
and    $concept <schema:hyponymOf> $h1 
and    $concept <schema:wordForm>  $word 
and    $h1      <schema:hyponymOf> $h2 
and    $h1      <schema:wordForm>  $superTypeA 
and    $h2      <schema:hyponymOf> $h3 
and    $h2      <schema:wordForm>  $superTypeB 
and    $h3      <schema:hyponymOf> $h4 
and    $h3      <schema:wordForm>  $superTypeC 
and    $h4      <schema:wordForm>  $superTypeD

Kowari and Sesame-native performed impressively on this one. Notice again how Sesame-MySQL takes up most of its memory before the first result is returned.

4. IMDB 20M Test

Having excluded Sesame-MySQL as an option for a large-scale triplestore, this test was intended to compare Sesame-native to Kowari. Unfortunately, the test ran out of disk space when the Kowari load of 20M triples was 99% done (after about 24 hours), so only the add numbers up to that point could be compared. Query numbers are compared in the next (smaller) test.

A. Adds

Kowari's add performance degrades (linearly, at a low slope) with the number of triples in the store. Sesame-native is much faster overall, but notice the interesting pattern of spikes doubling in size at a decreasing frequency. No guesses as to the cause of that.

Again, these differences are explained by differing buffer sizes.

5. IMDB 7M Test

This test compares Sesame-native to Kowari for query performance with a relatively large number of triples. It uses the productions.rdfxml and actresses.rdfxml files from the IMDB dataset.

A. Queries

Query 1

Sesame (SPO) Query:
* * *

Kowari (SPO) Query:
* * *

Kowari was over 2x faster for this query.

Query 2

Sesame (SeRQL) Query:
select movieName 
from   {actress}    imdb:name         {"Zuniga, Daphne"};
                    imdb:playedRole   {role}, 
       {role}       imdb:inProduction {production}, 
       {production} rdf:type          {imdb:Movie};
                    imdb:name         {movieName}

Kowari (iTQL) Query:
select $movieName 
from   <#test>
where  $actress    <imdb:name>         'Zuniga, Daphne' 
and    $actress    <imdb:playedRole>   $role 
and    $role       <imdb:inProduction> $production 
and    $production <rdf:type>          <imdb:Movie>
and    $production <imdb:name>         $movieName

Kowari was about 54x faster for this query.

Note that the X scale above is logarithmic for readability

Query 3

Sesame (SeRQL) Query:
select actressName, characterName, billingPosition 
from   {actress}    imdb:name            {actressName};
                    imdb:playedRole      {role}, 
       {role}       imdb:characterName   {characterName};
                    imdb:billingPosition {billingPosition};
                    imdb:inProduction    {production}, 
       {production} imdb:name            {"Shrek 2 (2004)"} 
where  billingPosition < "10"^^xsd:int

Kowari (iTQL) Query:
select $actressName $characterName $billingPosition 
from   <#test>
where  $actress         <imdb:name>            $actressName 
and    $actress         <imdb:playedRole>      $role 
and    $role            <imdb:characterName>   $characterName 
and    $role            <imdb:billingPosition> $billingPosition 
and    $role            <imdb:inProduction>    $production 
and    $production      <imdb:name>            'Shrek 2 (2004)' 
and    $billingPosition <tucana:lt>            '10' in <#xsd>

Kowari failed this query because it apparently does not support xsd#int comparisons. Future tests will use xsd#double, since both Kowari and Sesame support it.

Query 4

Sesame (SeRQL) Query:
select char, title 
from   {role} imdb:billingPosition {"151"^^xsd:int}; 
              imdb:characterName   {char};
              imdb:inProduction    {prod}, 
       {prod} imdb:name            {title} 

Kowari (iTQL) Query:
select $char $title 
from   <#test>
where  $role <imdb:billingPosition> '151'^^xsd:int 
and    $role <imdb:characterName>   $char 
and    $role <imdb:inProduction>    $prod 
and    $prod <imdb:name>            $title

Kowari was about 205x faster for this query.

Note that the X scale above is logarithmic for readability

Query 5

Sesame (SeRQL) Query:
select actressName, productionName 
from   {actress}    imdb:name          {actressName};
                    imdb:playedRole    {role}, 
       {role}       imdb:characterName {"Laverne"};
                    imdb:inProduction  {production}, 
       {production} imdb:name          {productionName}; 
                    rdf:type           {imdb:TVMovie}

Kowari (iTQL) Query:
select $actressName $productionName 
from   <#test>
where  $actress    <imdb:name>          $actressName 
and    $actress    <imdb:playedRole>    $role 
and    $role       <imdb:characterName> 'Laverne' 
and    $role       <imdb:inProduction>  $production 
and    $production <imdb:name>          $productionName 
and    $production <rdf:type>           <imdb:TVMovie>

Kowari was about 357x faster for this query.

Note that the X scale above is logarithmic for readability

Query 6

Sesame (SeRQL) Query:
construct 
distinct  {actress1}    imdb:appearedWith {actress2}, 
          {actress2}    imdb:appearedWith {actress1} 
from      {actress1}    imdb:name         {"Tosca, MariAna"};
                        imdb:playedRole   {role1}, 
          {role1}       imdb:inProduction {production}, 
          {actress2}    imdb:playedRole   {role2}, 
          {role2}       imdb:inProduction {production}

Kowari (iTQL) Query:
select $actress1 $actress2 
from   <#test> 
where  $actress1 <imdb:name>         'Tosca, MariAna' 
and    $actress1 <imdb:playedRole>   $role1 
and    $role1    <imdb:inProduction> $production 
and    $actress2 <imdb:playedRole>   $role2 
and    $role2    <imdb:inProduction> $production

Kowari Tuples-to-Triples Template:
$actress1 <imdb:appearedWith> $actress2 
$actress2 <imdb:appearedWith> $actress1

Kowari was about 9294 times faster for this query.

Note that the X scale above is logarithmic for readability

June 8th - June 10th, 2005

Table of Contents

1. Introduction

A. Data Sources

B. Hardware and Software

C. Understanding the Graphs

2. Wordnet 20k Test

A. Adds

B. Deletes

3. Wordnet 500k Test

A. Adds

B. Queries

Query 1

Query 2

Query 3

Query 4

4. IMDB 20M Test

A. Adds

5. IMDB 7M Test

A. Queries

Query 1

Query 2

Query 3

Query 4

Query 5

Query 6