The tests included in this report show memory utilization and speed for adds, deletions, and queries of Sesame and Kowari-based triplestores.
Two large datasets were used for these tests.
Wordnet tests were conducted on a Windows XP machine with an AMD Athlon XP 2800 processor. IMDB tests were conducted on a Linux machine running with 4 Intel(R) Xeon(TM) CPU 3.20GHz processors. All tests were run using Sun's Java 1.4.2 JRE with default JVM settings.
The tests were performed using Tripletest, a Java utility that uses Trippi v0.9.4 to provide a uniform add/query/delete interface to the triplestores being tested.
Triplestore Test Configurations:
Some graphs are plotted with logarithmic co-ordinates for readbility. This is indicated by the scale on the graph as well as accompanying text where applicable.
Query result graphs convey a lot of information, so they need a little explanation: Memory utilization is measured at even increments of time throughout the process of getting a query result set. Before the first result is returned memory use is plotted with decorated lines. After the first result is returned, the lines are no longer decorated. From the X coordinates of the lines, you can determine how long a query took 1) to return the first result, and 2) to complete.
You can click any inline graph to see a 1024x768 version.
The first test was intended to quickly determine how the Sesame native sail's add/delete speed compared to that of the Sesame RDBMS (MySQL) sail.
Sesame-native generally outperformed Sesame-MySQL in this test. To find out why Sesame-MySQL was performing so poorly on deletes, I logged the database queries being done by the RDBMS sail and found that it was constantly making "OPTIMIZE TABLE" requests to MySQL. Since this can be an expensive operation, I temporarily disabled these OPTIMIZE requests but performance became worse as a result. A quick look at the source code for this sail indicates that it uses several temporary tables during the add and delete process. The necessity of using temporary tables here is unclear. TODO: Ask the Sesame developers about this.
For adds, the native sail was about 5x faster.
Memory usage while adding was roughly equivalent.
Sesame-MySQL took an inordinate amount of time with 20k triples, finally decreasing to a rate of 11 seconds/10,000 while deleting the last chunk of 500. Sesame-native stayed pretty constant at 1-to-2 seconds/10,000.
Memory usage while deleting was similar to the above: Sesame-MySQL required more to start, then tapered off, while Sesame-native remained low.
This test compares Sesame-native and Sesame-MySQL with Kowari for add and query performance. Note that for adds and deletes, Kowari has fewer datapoints than Sesame. This is due to the size of the triple buffers. Kowari performs better with a large buffer, whereas Sesame performs better with a smaller one.
Kowari is a bit slower to add triples than Sesame-native, but faster than Sesame-MySQL.
Kowari requires more memory, which makes sense, since its update buffer is larger. Memory utilization is not alarming in any case.
Sesame (SPO) Query: * * * Kowari (SPO) Query: * * *
Kowari and Sesame-native both completed successfully, but Sesame-MySQL failed with an out of memory error after 22 seconds. This is similar to the result we saw with Jena on a previous test, and similarly seems to be caused by not using streaming ResultSets from MySQL. See Sesame Bug #59.
Sesame (SeRQL) Query: select word from {wordnet:102669463} schema:wordForm {word} Kowari (iTQL) Query: select $word from <#test> where <wordnet:102669463> <schema:wordForm> $word
Sesame (SeRQL) Query: select word, definition from {myConcept} schema:wordForm {"happy"}; schema:similarTo {thatConcept}, {thatConcept} schema:wordForm {word}; schema:glossaryEntry {definition} Kowari (iTQL) Query: select $word $definition from <#test> where $myConcept <schema:wordForm> 'happy' and $myConcept <schema:similarTo> $thatConcept and $thatConcept <schema:glossaryEntry> $definition and $thatConcept <schema:wordForm> $word
Sesame (SeRQL) Query: select distinct word, superTypeA, superTypeB, superTypeC, superTypeD from {concept} rdf:type {schema:Verb}; schema:hyponymOf {h1}; schema:wordForm {word}, {h1} schema:hyponymOf {h2}; schema:wordForm {superTypeA}, {h2} schema:hyponymOf {h3}; schema:wordForm {superTypeB}, {h3} schema:hyponymOf {h4}; schema:wordForm {superTypeC}, {h4} schema:wordForm {superTypeD} Kowari (iTQL) Query: select $word $superTypeA $superTypeB $superTypeC $superTypeD from <#test> where $concept <rdf:type> <schema:Verb> and $concept <schema:hyponymOf> $h1 and $concept <schema:wordForm> $word and $h1 <schema:hyponymOf> $h2 and $h1 <schema:wordForm> $superTypeA and $h2 <schema:hyponymOf> $h3 and $h2 <schema:wordForm> $superTypeB and $h3 <schema:hyponymOf> $h4 and $h3 <schema:wordForm> $superTypeC and $h4 <schema:wordForm> $superTypeD
Having excluded Sesame-MySQL as an option for a large-scale triplestore, this test was intended to compare Sesame-native to Kowari. Unfortunately, the test ran out of disk space when the Kowari load of 20M triples was 99% done (after about 24 hours), so only the add numbers up to that point could be compared. Query numbers are compared in the next (smaller) test.
Kowari's add performance degrades (linearly, at a low slope) with the number of triples in the store. Sesame-native is much faster overall, but notice the interesting pattern of spikes doubling in size at a decreasing frequency. No guesses as to the cause of that.
Again, these differences are explained by differing buffer sizes.
This test compares Sesame-native to Kowari for query performance with a relatively large number of triples. It uses the productions.rdfxml and actresses.rdfxml files from the IMDB dataset.
Sesame (SPO) Query: * * * Kowari (SPO) Query: * * *
Kowari was over 2x faster for this query.
Sesame (SeRQL) Query: select movieName from {actress} imdb:name {"Zuniga, Daphne"}; imdb:playedRole {role}, {role} imdb:inProduction {production}, {production} rdf:type {imdb:Movie}; imdb:name {movieName} Kowari (iTQL) Query: select $movieName from <#test> where $actress <imdb:name> 'Zuniga, Daphne' and $actress <imdb:playedRole> $role and $role <imdb:inProduction> $production and $production <rdf:type> <imdb:Movie> and $production <imdb:name> $movieName
Kowari was about 54x faster for this query.
Sesame (SeRQL) Query: select actressName, characterName, billingPosition from {actress} imdb:name {actressName}; imdb:playedRole {role}, {role} imdb:characterName {characterName}; imdb:billingPosition {billingPosition}; imdb:inProduction {production}, {production} imdb:name {"Shrek 2 (2004)"} where billingPosition < "10"^^xsd:int Kowari (iTQL) Query: select $actressName $characterName $billingPosition from <#test> where $actress <imdb:name> $actressName and $actress <imdb:playedRole> $role and $role <imdb:characterName> $characterName and $role <imdb:billingPosition> $billingPosition and $role <imdb:inProduction> $production and $production <imdb:name> 'Shrek 2 (2004)' and $billingPosition <tucana:lt> '10' in <#xsd>
Sesame (SeRQL) Query: select char, title from {role} imdb:billingPosition {"151"^^xsd:int}; imdb:characterName {char}; imdb:inProduction {prod}, {prod} imdb:name {title} Kowari (iTQL) Query: select $char $title from <#test> where $role <imdb:billingPosition> '151'^^xsd:int and $role <imdb:characterName> $char and $role <imdb:inProduction> $prod and $prod <imdb:name> $title
Kowari was about 205x faster for this query.
Sesame (SeRQL) Query: select actressName, productionName from {actress} imdb:name {actressName}; imdb:playedRole {role}, {role} imdb:characterName {"Laverne"}; imdb:inProduction {production}, {production} imdb:name {productionName}; rdf:type {imdb:TVMovie} Kowari (iTQL) Query: select $actressName $productionName from <#test> where $actress <imdb:name> $actressName and $actress <imdb:playedRole> $role and $role <imdb:characterName> 'Laverne' and $role <imdb:inProduction> $production and $production <imdb:name> $productionName and $production <rdf:type> <imdb:TVMovie>
Kowari was about 357x faster for this query.
Sesame (SeRQL) Query: construct distinct {actress1} imdb:appearedWith {actress2}, {actress2} imdb:appearedWith {actress1} from {actress1} imdb:name {"Tosca, MariAna"}; imdb:playedRole {role1}, {role1} imdb:inProduction {production}, {actress2} imdb:playedRole {role2}, {role2} imdb:inProduction {production} Kowari (iTQL) Query: select $actress1 $actress2 from <#test> where $actress1 <imdb:name> 'Tosca, MariAna' and $actress1 <imdb:playedRole> $role1 and $role1 <imdb:inProduction> $production and $actress2 <imdb:playedRole> $role2 and $role2 <imdb:inProduction> $production Kowari Tuples-to-Triples Template: $actress1 <imdb:appearedWith> $actress2 $actress2 <imdb:appearedWith> $actress1