In the last week or so I have been focused on getting large imports to run through to completion without memory leaks. The good news is that I have now successfully imported files as large as 100M triples (about 25GB of NTriples) in a single batch on a machine with only 8GB of RAM without causing memory thrashing. This is definitely progress! With that big obstacle now out of the way my plan is to start to take a look at query functionality and performance.

Starting with functionality, BrightstarDB already has support for all of SPARQL 1.1 thanks to the awesome DotNetRDF library. However the LINQ implementation still has some gaps in it – notably around grouping and aggregation. I’m not yet sure how well the SPARQL notion of grouping and the LINQ notion line up – that’s an area for investigation still.

On performance, the recent changes to caching should definitely have helped query performance on machines where the entire store can fit into memory. What I want to look at is leveraging the functionality of DotNetRDF that allows resource strings to be lazily evaluated and instead to do joins based only on resource IDs this should help in two respects – firstly it should be a lot faster as it reduces the number of look-ups of resource values by ID for a given query; secondly it should help to reduce the amount of memory consumed by the query processor for intermediate results sets as they will be able to use 64-bit resource IDs instead of full resource strings much of the time. It will also speed up joins as they can compare resource IDs rather than full strings which is typically quite a lot faster.

Competing for attention is the Portable Class Library port. Its one of those tasks that I just know is going to be painful to do so it keeps getting put off. At some point I will have to bite the bullet though!

In the meantime (and by way of a plug) I am also in the middle of preparing a course for the XML Summer School. This year’s Summer School includes two days of Semantic Web as well as a lot of other fun stuff. If you are just getting into RDF and semantic web technologies in general, this is a great opportunity to learn in the “dreaming spires” of Oxford – be sure to check it out!

sparql dotnetrdf pcl performance update