Tuesday, January 05, 2016

Evaluating RavenDB as an embedded database

During the last two months of 2015, we've been evaluating RavenDB 3.0.30000 as an embedded database hosted in a Windows Service. We employ an architecture based on Command Query Responsibility Segregation and Event Sourcing architectural styles, but this post is relevant to anybody who wants to use RavenDB in embedded scenarios.

What we intended to achieve
This is a high-level diagram of our architecture:


In this architecture, you can identify two clearly separated sides. The right side (or write side) deals with the business commands and the logic that determines the effect on the domain. The left side (or read side) handles query requests and is tasked with answering those queries as efficient as possible. The only communication between those sides involves the exchange of business events that are raised by the domain as a result of executing commands. Those events represent functional changes in the domain such as a ProductDiscontinuedEvent or a PermitSignedEvent. The so-called projectors are there to project those functional events into a (denormalized) form that can be easily consumed by the code that needs to answer the queries.

Before the evaluation, those projections were persisted as simple tables in an RDBMS such as SQL Server, Oracle or lighter variants such as SQL Compact or SQLite. Although we initially projected the events within the same HTTP request that caused the business command to be executed. But the read side doesn't necessarily need to be updated in real-time. In fact, by making the projection process completely asynchronous, you can trade-off the consistency of the query store and gain higher performance instead. You can do that using a dedicated process or, like us, using a background thread hosted within the ASP.NET process.

The spike
Unfortunately that introduces an additional challenge when multiple instances of this ASP.NET process are run in a load-balanced web farm. In that case, you need to figure out which of the ASP.NET processes is going to handle the asynchronous projection work. You don't want all of them to compete for the same database resourcing, so some kind of synchronization protocol would be needed. Instead, we've been spiking a scenario in which an embedded instance of RavenDB is used as the query store at the left side of the diagram. With that, we hoped to benefit from the following (potential) RavenDB USPs.

  • Schema-less projections. Just write an object graph to the store, even if the structure has changed.
  • A very low deployment footprint because the binaries can be shipped with the product.
  • The ability to use a local per-application-server database rather than a shared networked RDBMS.
  • High-performance, asynchronous indexing with advanced features like map-reduce, hierarchical indexes and facet-based queries using Lucene.
  • Excellent operational dashboard with built-in profiling features

In the spike, we created a little console application that hosts RavenDB in embedded mode and uses NEventStore to read the events from a SQL Server database. Those events are then projected to a relatively complex object that is loaded and stored to and from the local RavenDB database, using a single RavenDB session per functional transaction (a group of events that were raised within the same functional transaction).

RavenDB supports two different storage engines, Esent, which is Windows' internal database, and Voron, a proprietary engine build specifically by Oren Eini (the brains behind RavenDB) to be able to optimize it for RavenDB. It took us quite some time on the RavenDB discussion forums to get a definitive answer on which one is best suited for our purpose. Most people recommended Esent, both for its write speed and its memory consumption, which brings me to the next point. 

While testing with various production databases, we noticed exceptional high memory usage. On my machine, which has 16GB, it was not unusual to see the process' memory footprint to exceed 10GB. Since RavenDB is designed to take as much memory as possible to be fast, this is kind of expected behavior. But since we intend to host it in a little Windows Service that runs alongside IIS on each application server, being able to control the memory is crucial. With the help of Oren, we managed to get the right settings in place to get some more grip on memory usage. With Voron, we never managed to get memory usage under 3GB, but we were much more successful with Esent.

var documentStore = new EmbeddableDocumentStore
    Conventions =
        MaxNumberOfRequestsPerSession = 100,
    Configuration =
        DefaultStorageTypeName = "Esent",

var configuration = documentStore.Configuration;
configuration.Settings.Add("Raven/Esent/CacheSizeMax", "256");
configuration.Settings.Add("Raven/Esent/MaxVerPages", "32");
configuration.Settings.Add("Raven/MemoryCacheLimitMegabytes", "512");
configuration.Settings.Add("Raven/MaxNumberOfItemsToIndexInSingleBatch", "4096");
configuration.Settings.Add("Raven/MaxNumberOfItemsToPreFetchForIndexing", "4096");
configuration.Settings.Add("Raven/InitialNumberOfItemsToIndexInSingleBatch", "64");

Large portions of the existing projection class hierarchy from the production code base wasn't JSON serializable, so we had to jump through quite a few hoops to get it to work nicely with RavenDB. In particular, our overzealous usage of value objects on projections proved to be a pain. In fact, at some point, we managed to completely kill RavenDB with a single 200MB document caused by a bug in our serialization code. We had to send memory dumps to Oren's team to learn how to diagnose these kinds of problems. Definitely something to remember while taking the next steps.

Performance Results
As a benchmark we used a production database that we used during a prior performance analysis of SQL Server. It contains 1.6 million events (grouped in 412000 functional transactions). Rebuilding the projections (and its associated lookup tables) through the production code-base using SQL Server on my local HDD took 0h23m. However, in production, the SQL Server instance and the application server are different machines. So rerunning that test against a networked SQL Server (with a latency of 1ms), resulted in a total time of 1h22m. Doing the same test using RavenDB using the Esent engine on an SSD took 0h46m. Switching to my HDD didn't make too much of a difference though (+2 minutes), but I can't really explain that.

However, we were quite disappointed with the performance. We knew our projection wasn't particularly optimized for JSON serialization, but we expected a bit more. After doing some performance and memory analysis with the JetBrains tools, we concluded that a majority of the time is spend in RavenDB. We also participated in several discussions on Google Groups to get a better understanding on this. RavenDB has two features that we suspected could help, Aggressive Caching and the Patching API, but neither appeared to be meant to be used in embedded scenarios.

So the next thing we tried is to introduce a Least Recently Used cache to prevent too many unnecessary loads when several closely located functional transactions affect the same projection. This dropped the rebuild time to 0h34m. Not that spectacular, but a similar run using the a much larger production database (14 million events), caused the test run to drop from 5h50 to 4h35.

The last thing we tried is to process the functional transactions in batches of 10, thereby extending the lifetime of the RavenDB session a bit. RavenDB is designed for short-lived sessions and uses all kinds of safe-by-defaults, but we already extended those limitations a bit. After we did that, the rebuild time for both databases dropped to respectively 0h21m and 3h07m. That's definitely good enough for the spike.


  • Esent is still the engine of your choice, especially now that we managed to control the memory consumption.
  • Don't bother with the Patching API in an embedded scenario. It won't give you any benefits, although Oren mentioned that he's working on some improvements for RavenDB 4.x.
  • Use only primitive types on the projections. This prevents unnecessary (de)serialization.
  • Don't map a hierarchy of sub-classes to a single document. Store them as separate documents so that individual documents stay small and focused. Use hierarchical indexes to query over the class hierarchies.
  • Consider splitting big collections within your object to serialize into separate documents, and use the Include feature were needed.
  • The operational dashboard (Raven Management Studio) is exceptionally good for tracing and diagnosing what's going on.

All in all, we are very confident that this is the way to go for scenarios like discussed above. So what do you think? Are you considering RavenDB yourself? And if you're already using it, I'd love to hear about your experiences with RavenDB. Just let me know by commenting below. Oh, and follow me at @ddoomen to get regular updates on my everlasting quest for better solutions.