Friday, January 29, 2016

Why C#'s “var” keyword can hamper maintainability

Coding conventions never cease to be a great source for heated debates. However, within the C# realm, two specific topics tend to reappear occasionally. The first one is about whether or not to use underscores for class fields (but I'm not going to discuss that here). The other one is the usage of the var keyword. I don't know why, but during code reviews, the over-zealous usage of var always triggers a feeling of annoyance in me. So beware, this is going to be a very opinionated view on that.

In short, in my opinion (and those of my coding guidelines) var should only be used if the actual type is immediately visible from the statement you're looking at. The only exception I can think of are anonymous types (often seen when using LINQ). So as far as I'm concerned, let's check out some proper usage of var.

var largeOrders =
   from order in orders
   where order.Items > 10 and order.TotalValue > 1000
   select new { OrderId = order.Id };

var repository = new unitOfWork.GetRepository<Company>();       

var orders = new List<Order>();

All three examples should make it very clear what the involved type is, even if it is an anonymous type. Now for some dubious usage of var:

// I assume it's a string, but maybe it's some kind of domain-specific value type
var key = CommandNameResolver.GetName<IncludeLogbookEntryReferenceInHandoverCommand>();

// What was that default type C# used for numbers?
var i =3;

// No clue. An enumerable of DateTime maybe? And if so, why not use a TimeSpan?
var startOfShifts = GetShiftStartTimes();

// What codes? Strings? Guids? Something else?
foreach (var code in codes) { }

I know that modern IDEs like Visual Studio will make it easy to detect the type you're looking at, but we do most code reviews - if not all - through GitHub Pull Requests. So consider this example screenshot.

clip_image001[4]

When I review a Pull Request that uses a var and the actual type is not immediately visible, I start to wonder about a couple of things.

  • If the var is assigned from a call to another method, how much information is that method returning. For instance, I wonder if it might return an entire class, even though the caller only needs a single value. A nice analogue is saying that you're returning the entire freight truck even though you only need a single parcel. Especially if the call site is passing the return value to another method, it becomes important to understand what is being passed around.
  • If the variable name implies some kind of collection, might it be returning an IEnumerable<T>? If so, that may imply that its execution could be deferred. So what impact will that happen on the calling code? Is that second iteration I just noticed going to cause some weird side-effects?
  • If the statement implies a boolean outcome, will it return a nullable boolean? And if so, did the caller deal with that correctly.
  • Does the type being used originate from a project, component or NuGet package that isn't supposed to be used in that part of the system?

I really can't answer those questions without drilling a bit deeper in the code base, which is less than trivial on Github. Worst case I might have to pull down the sources into my local repo and use Visual Studio to understand the code being changed.

It's widely accepted that source code is read many more times than it's written. Now imagine you have to design an architecture for a system with similar characteristics. Wouldn’t you optimize your system for reading? If so, why wouldn't you optimize your code for readability as well? Using a var just because you're too lazy to type the full type name sounds like a lame excuse. And if you really that lazy, install Resharper and use ALT-ENTER to quickly switch the var with the explicit type. And the argument that using var will make it more easy to refactor your code is IMHO bull. If you change the type a method returns, don't you want to make sure call sites are actually prepared for that rather than relying on compile time errors? And if it is, just use the infamous ALT-ENTER again…

All in all, I remain convinced that using an explicit type will help others understand your code much faster and will increase the change coupling errors will surface during code reviews. So what do you think? Let me know by commenting below. Oh, and follow me at @ddoomen to get regular updates on my everlasting quest for better solutions.

Thursday, January 21, 2016

Why every software architect is also an entrepeneur

I'm not sure when it happened exactly, but at some point this month while watching the new TV show Billions, it dawned on me: being an architect is just like being an entrepreneur, just without the huge financial benefits and risks. I might be wrong about this, or worse, I might be insulting our profession, but before you judge on me, give me a minute to explain.

For instance, just like an entrepreneur, an architect has to do a risk-analysis before he or she makes a decision. E.g. can we take a shortcut here without causing unrepairable technical debt? Or, what's the risk if we postpone this refactoring to a later point in time? In a similar fashion, the perfect solution doesn't exist. You always have to make a trade-off between multiple options for which the consequences (both good and bad) aren't always clear yet. For instance, the choice between a relational database or a NoSQL solution. One may be good for conservative clients and will increase the change it is accepted by management. At the same time, being agile and emerging database schemas don't always go hand in hand. Going for a NoSQL solution will give you some obvious benefits, but there's less experience in the field. If you happen to run into unknown territory, you might be on your own.

Successful entrepreneurs have to balance short-term investments against long-term investments. But architects do too. In most projects I've been involved with, the amount of development capacity is limited and pressure to deliver is high. So do you put those developers on introducing an automated UI testing pipeline that you expect to need very soon, or are you going to build that one feature the business people have been asking for. You know that you haven't delivered enough features to be competitive in the market. But you also know that if you continue without a proper testing framework, you won't be able to deliver the right quality anymore. Unfortunately, you don't know exactly when this point-of-no-return will be reached. And that's true for business investments as well. The entrepreneur might put some money in a new and better location for his shop, which he is really going to need if his business continuous to increase at the current rate. But he won't be able to replace that occasionally malfunctioning baking machine in his current shop.

Another similarity can be found when you consider an entrepreneur whose company sells professional coffee brewing machines. You'll have to sell those machines to your potential clients. This involves some serious marketing efforts to make those machines more attractive than your competitors'. But even if there's no competition, you still might need to convince your clients that they need to upgrade to newer or more expensive machines. As an architect, you're doing a similar thing with the developers that need to work with your architectural ideas. In most agile companies, you can't really force them to apply your architectural principles (well, you can, but you won't get the results you want). Instead, you need to sell your ideas to them so that they understand why certain extra layers of abstractions are needed, or why they need to write a unit test before they write production code. This requires face-to-face sessions, pair programming, written guidelines and investments in tools that facilitate that architecture vision as much as possible.

If an entrepreneur doesn’t have the money to buy that new location for his shop I mentioned before, he might have to go to an external investor or a bank to get some money. But he won't get that money without trying to get 'buy-in' for his business plan. The investment party needs to trust that their money will be spend wisely and will give them sufficient 'return-of-investment'. Similarly, if you're an architect and you want to invest in a NoSql solution that will take a couple of developers a few weeks to complete, you'll probably have to convince management to accept that that development capacity cannot be put on feature development. But you can't explain them the technical benefits, simply because they will not understand you. Instead, you need to try to formulate those technical benefits in business opportunities. Maybe you can convince them that doing this investment will allow them to more quickly deliver new features. You probably can't quantify this accurately enough, but by trying to put your feet in their shoes, you might be able to help them see this investment is worth the time.

Another difficult decision, especially when your time is limited, is to decide between quality, functionality or the delivery date. When a deadline is approaching, it’s the job of an architect to clarify the consequences of jeopardizing the internal software quality to all stakeholders. If the deadline is really close, it might be better to skip some functionality or forfeit some of those bells and whistles (a.k.a. gold plating). If that is absolutely not possible, you can decide to drop quality for this sprint. But only under the explicit agreement that this will be repaired right after the dead-line, even if it will cost twice as much capacity. If you're a vendor of coffee brewing machines, and your competition has just introduced their newest product, you could decide to ship a product a bit earlier. And since you can't really drop functionality from a physical thing like a coffee machine, you might decide to not wait for those updated batch of components, even though you know some of the internal machinery has reported construction errors. Maybe you're lucky and nobody will notice. And even if somebody does complain, it'll probably be cheaper to replace the machine than to postpone market introduction now. And why not? Volkswagen has managed to hide their emission cheating device for years.

So what do you think? Does this make sense? Or am I comparing apples with oranges. Let me know by commenting below. Oh, and follow me at @ddoomen to get regular updates on my everlasting quest for better solutions.

Tuesday, January 05, 2016

Evaluating RavenDB as an embedded database

During the last two months of 2015, we've been evaluating RavenDB 3.0.30000 as an embedded database hosted in a Windows Service. We employ an architecture based on Command Query Responsibility Segregation and Event Sourcing architectural styles, but this post is relevant to anybody who wants to use RavenDB in embedded scenarios.

What we intended to achieve
This is a high-level diagram of our architecture:

clip_image001[4]

In this architecture, you can identify two clearly separated sides. The right side (or write side) deals with the business commands and the logic that determines the effect on the domain. The left side (or read side) handles query requests and is tasked with answering those queries as efficient as possible. The only communication between those sides involves the exchange of business events that are raised by the domain as a result of executing commands. Those events represent functional changes in the domain such as a ProductDiscontinuedEvent or a PermitSignedEvent. The so-called projectors are there to project those functional events into a (denormalized) form that can be easily consumed by the code that needs to answer the queries.

Before the evaluation, those projections were persisted as simple tables in an RDBMS such as SQL Server, Oracle or lighter variants such as SQL Compact or SQLite. Although we initially projected the events within the same HTTP request that caused the business command to be executed. But the read side doesn't necessarily need to be updated in real-time. In fact, by making the projection process completely asynchronous, you can trade-off the consistency of the query store and gain higher performance instead. You can do that using a dedicated process or, like us, using a background thread hosted within the ASP.NET process.

The spike
Unfortunately that introduces an additional challenge when multiple instances of this ASP.NET process are run in a load-balanced web farm. In that case, you need to figure out which of the ASP.NET processes is going to handle the asynchronous projection work. You don't want all of them to compete for the same database resourcing, so some kind of synchronization protocol would be needed. Instead, we've been spiking a scenario in which an embedded instance of RavenDB is used as the query store at the left side of the diagram. With that, we hoped to benefit from the following (potential) RavenDB USPs.

  • Schema-less projections. Just write an object graph to the store, even if the structure has changed.
  • A very low deployment footprint because the binaries can be shipped with the product.
  • The ability to use a local per-application-server database rather than a shared networked RDBMS.
  • High-performance, asynchronous indexing with advanced features like map-reduce, hierarchical indexes and facet-based queries using Lucene.
  • Excellent operational dashboard with built-in profiling features

In the spike, we created a little console application that hosts RavenDB in embedded mode and uses NEventStore to read the events from a SQL Server database. Those events are then projected to a relatively complex object that is loaded and stored to and from the local RavenDB database, using a single RavenDB session per functional transaction (a group of events that were raised within the same functional transaction).

Challenges
RavenDB supports two different storage engines, Esent, which is Windows' internal database, and Voron, a proprietary engine build specifically by Oren Eini (the brains behind RavenDB) to be able to optimize it for RavenDB. It took us quite some time on the RavenDB discussion forums to get a definitive answer on which one is best suited for our purpose. Most people recommended Esent, both for its write speed and its memory consumption, which brings me to the next point. 

While testing with various production databases, we noticed exceptional high memory usage. On my machine, which has 16GB, it was not unusual to see the process' memory footprint to exceed 10GB. Since RavenDB is designed to take as much memory as possible to be fast, this is kind of expected behavior. But since we intend to host it in a little Windows Service that runs alongside IIS on each application server, being able to control the memory is crucial. With the help of Oren, we managed to get the right settings in place to get some more grip on memory usage. With Voron, we never managed to get memory usage under 3GB, but we were much more successful with Esent.

var documentStore = new EmbeddableDocumentStore
{
    Conventions =
    {
        MaxNumberOfRequestsPerSession = 100,
    },
    Configuration =
    {
        DefaultStorageTypeName = "Esent",
    },
};

var configuration = documentStore.Configuration;
configuration.Settings.Add("Raven/Esent/CacheSizeMax", "256");
configuration.Settings.Add("Raven/Esent/MaxVerPages", "32");
configuration.Settings.Add("Raven/MemoryCacheLimitMegabytes", "512");
configuration.Settings.Add("Raven/MaxNumberOfItemsToIndexInSingleBatch", "4096");
configuration.Settings.Add("Raven/MaxNumberOfItemsToPreFetchForIndexing", "4096");
configuration.Settings.Add("Raven/InitialNumberOfItemsToIndexInSingleBatch", "64");

Large portions of the existing projection class hierarchy from the production code base wasn't JSON serializable, so we had to jump through quite a few hoops to get it to work nicely with RavenDB. In particular, our overzealous usage of value objects on projections proved to be a pain. In fact, at some point, we managed to completely kill RavenDB with a single 200MB document caused by a bug in our serialization code. We had to send memory dumps to Oren's team to learn how to diagnose these kinds of problems. Definitely something to remember while taking the next steps.

Performance Results
As a benchmark we used a production database that we used during a prior performance analysis of SQL Server. It contains 1.6 million events (grouped in 412000 functional transactions). Rebuilding the projections (and its associated lookup tables) through the production code-base using SQL Server on my local HDD took 0h23m. However, in production, the SQL Server instance and the application server are different machines. So rerunning that test against a networked SQL Server (with a latency of 1ms), resulted in a total time of 1h22m. Doing the same test using RavenDB using the Esent engine on an SSD took 0h46m. Switching to my HDD didn't make too much of a difference though (+2 minutes), but I can't really explain that.

However, we were quite disappointed with the performance. We knew our projection wasn't particularly optimized for JSON serialization, but we expected a bit more. After doing some performance and memory analysis with the JetBrains tools, we concluded that a majority of the time is spend in RavenDB. We also participated in several discussions on Google Groups to get a better understanding on this. RavenDB has two features that we suspected could help, Aggressive Caching and the Patching API, but neither appeared to be meant to be used in embedded scenarios.

So the next thing we tried is to introduce a Least Recently Used cache to prevent too many unnecessary loads when several closely located functional transactions affect the same projection. This dropped the rebuild time to 0h34m. Not that spectacular, but a similar run using the a much larger production database (14 million events), caused the test run to drop from 5h50 to 4h35.

The last thing we tried is to process the functional transactions in batches of 10, thereby extending the lifetime of the RavenDB session a bit. RavenDB is designed for short-lived sessions and uses all kinds of safe-by-defaults, but we already extended those limitations a bit. After we did that, the rebuild time for both databases dropped to respectively 0h21m and 3h07m. That's definitely good enough for the spike.

Conclusions

  • Esent is still the engine of your choice, especially now that we managed to control the memory consumption.
  • Don't bother with the Patching API in an embedded scenario. It won't give you any benefits, although Oren mentioned that he's working on some improvements for RavenDB 4.x.
  • Use only primitive types on the projections. This prevents unnecessary (de)serialization.
  • Don't map a hierarchy of sub-classes to a single document. Store them as separate documents so that individual documents stay small and focused. Use hierarchical indexes to query over the class hierarchies.
  • Consider splitting big collections within your object to serialize into separate documents, and use the Include feature were needed.
  • The operational dashboard (Raven Management Studio) is exceptionally good for tracing and diagnosing what's going on.

All in all, we are very confident that this is the way to go for scenarios like discussed above. So what do you think? Are you considering RavenDB yourself? And if you're already using it, I'd love to hear about your experiences with RavenDB. Just let me know by commenting below. Oh, and follow me at @ddoomen to get regular updates on my everlasting quest for better solutions.