Sunday, February 04, 2018

Fluent Assertions 5.0: The best unit test assertion library in the .NET realm just got better

It has been almost a year since version 4.19, the last functional release of Fluent Assertions was shipped. Not because of a lack of feature requests, but simply because this new version has cost me all the private time I had. My main goal of this release was to repair some of the design mistakes I have made over the years and introduce the only key feature that Fluent Assertions was still missing compared to other libraries. This also gave me the time to run a contest resulting in a great new logo designed by Ben Palmer. So after three betas and five release candidates, I present to you Fluent Assertions 5.0. It contains loads of new features, small and big, but also tries to break with the past.


Embracing standards

Over the years, I've been using different techniques to support multiple platforms. I started with using Linked Files to share files between multiple versions of the main project. This worked, but it subdued any attempts to keep aggressively refactoring my code. Moving files around doesn't work well if you have five links to that same file. Then, with Visual Studio 2013 (I think), we got Shared Projects. This allowed me to refactor away and use conditional symbols to share the same files with the platform-specific projects. The next innovation that happened in the .NET space was the Portable Class Library. With the help of Oren Novotny (who is a master in anything .NET), we refactored the code-base to employ a mechanism where the bulk of the code was in a single PCL assembly and the platform-specific stuff would go in a smaller platform-specific assembly. At run-time, it used a bait-and-switch mechanism to dynamically load the platform-specific assembly and connect the implementation classes to the interface hooks the core assembly offered. In a way, it was doing dependency-injection.

However, all of this is in the past, now that we have .NET Standard and cross-compilation. With this release, Fluent Assertions is build from a single project that targets .NET Standard 1.4, 1.6 and 2.0, as well as the full .NET 4.5 Framework. You might wonder why I target multiple versions of .NET Standard. The simple reason is that .NET Standard 1.4 doesn't support all the features of the .NET Framework. The higher the .NET Standard version, the more features will light up. A nice side-effect of all of this is that it's now also much easier to contribute to this little project of mine (yes, that's a hint).

Moving towards a unified API

One of the things that has annoyed me for years is the inconsistency of the API. This all started when I introduced this very powerful and useful API for comparing deep object graphs, ShouldBeEquivalentTo. I really liked to be able to use the type of the subject for nice fluent expressions. I needed acccess to the generic type parameter representing the subject-under-test. I could not simply define another Should<T>() method since the compiler prefers that overload over Should<T> where T : IEnumerable<TItem>. That's why I settled for ShouldBeEquivalentTo. This caused a lot of confusion, especially since there was already a Should().BeEquivalentTo() on collections. I've tried to change that in a non-breaking way a couple of times, but it always resulted in a suboptimal experience.

In 5.0, I made several behavioral changes (more on that later) that allowed me to finally align all assertions. You'll now find that all assertions start with Should(), e.g.

  • object.Should().BeEquivalentTo(anotherObject)
  • action.Should().Throw<MyException>()
  • func.Should().NotThrow()
  • monitoredObject.Should().Raise("Event")
  • executionTime.Should().Exceed()

You may wonder about that existing Should().BeEquivalentTo() that was available to collections. Well, that's now behaving as a deep, structural comparison of collections. Existing calls will do exactly what it did before. It'll just give you more information when the items in one collection don't match those in the other collection.

Subject Identification

As a passionate open-source developer you keep an eye on your competition, in particular if that competition has features that make developers choose for the competition. Now, Fluent Assertions has grown up to be a mature and complete library and there's not a lot to wish for anymore. There was a single feature however, that I really wanted to get into 5.0: identitying the name of the variable on which an assertion is executed.

Well, as of 5.0, when this assertion fails:

IEnumerable numbers = new[] { 1, 2, 3 };
numbers.Should().HaveCount(4, "because we thought we put four items in the collection"))

The failure message will look like this:

"Expected numbers to contain 4 item(s) because we thought we put four items in the collection, but found 3."

Fluent Assertions will traverse the stack trace to find the line of code that invokes the assertion and then extracts the name of the variable or constant from your real C# files. So you'll need to build your unit tests in debug mode, even from a build server, to really benefit from this. In release builds, the compiler tends to inline lambda invocations. If it can't find this information, it will fall back on a more generic name like collection or object.

Note that analyzing the thread's stack trace is not supported in any .NET Standard preceding 2.0. So this feature will only work under .NET Standard 2.0 or the full .NET Framework. Also, if you've been building your own extensions around existing calls to Should(), consider tacking on the [CustomAssertion] attribute. You can read more about this in the @extensibility guidelines.

Redefining equivalency

Since I was on a roll to introduce breaking changes anyway, this release finally gave me the opportunity to repair quite a few of the behavioral design mistakes in the structural equivalency API. So in addition to the aforementioned change from ShouldBeEquivalenTo to Should().BeEquivalentTo, a lot more has changed.

First of all, the equivalency algorithm will now use the expectation to drive the comparison. For years, it would use the properties and/or fields of the subject-under-test to run a recursive comparison. Don't ask me why, because I don't remember that anymore (or I have blocked this part of my brain). But now, the expectation that you pass in really represents what you expect the subject to look like. This also make it very natural to use an anonymous type as the expectation.

Another thing I've changed is to disable auto-conversion of member values. This has been requested for many times, mostly because it confused so many people. For example, the conversion logic would allow you to treat a DateTime property and its string-representation as equivalent. This is no longer happening, but if you really want to, you can still opt-in this option using the WithAutoConversion and WithAutoConversionFor methods.

Similarly to auto-conversion, I've considerably simplified the way Fluent Assertions determines whether or not an object has value semantics. Before this release, it used a static IsValueType lambda and some awkwardly unclear heuristics. As of now, any type that overrides Object.Equals is treated as having value semantics. Why? Well, the entire purpose of that method (and its sibling GetHashCode) is to allow you to add value semantics to a reference type. Why wouldn't I comply to that .NET design principle? However, I also acknowledge the fact that not everybody will follow this principal faithfully, so you can override this using the ComparingMyMembers and ComparingByValue. Don't worry, I have some upgrading tips at the bottom of this post. Oh, and don't forget you can also set these and options globally using AssertionOptions.AssertEquivalencyUsing. Read all about this in the updated documentation.

Formatting your objects beautifully

The formatting engine in Fluent Assertions is based on built-in and custom implementations of the IValueFormatter interface. Unfortunately, this design has suffered from a long-standing design mistake. It could not properly detect cyclic references. The fix for that required me to change the method signature in a breaking way:

string Format(object value, FormattingContext context, FormatChild formatChild);

The context parameter provides information about the depth of the graph as well as an indicating whether the formatter should use line-breaks in its output. But the fundamental change here is the FormatChild delegate that is passed in. In previous releases, if a formatter needed to format data itself, it would directly call Formatter.ToString. But that did not allow me to keep track of the graph that was being formatted. By using the formatChild parameter instead, Fluent Assertions will automatically detect a cyclic dependency and display a clear message for that value. If you want to build your own formatters, check out the extensibility guide.

New event monitoring API

Being able to assert that a C# event was raised has been part of the API for years now. But with the trend of multi-threading development and the introduction of async and await, this API started to fall apart. It relied on thread-static state (did I already mention how bad static mutable state is?). So in this release, I've introduced a slightly modified syntax that makes the monitoring scope explicit and independent of the thread on which something is running.

var subject = new EditCustomerViewModel();

using (var monitoredSubject = subject.Monitor())





Note that the object you execute the Should().Raise call on is not the same object as your subject. The Monitor method returns an object implementing IMonitor as an override of IDisposable that defines when monitoring should be stopped. And for those people that love to build their own assertion, that object exposes a load of metadata that you can use any way you can. If you want to learn more about this, check out the updated documentation.

Upgrading tips

So while dogfooding the betas and release candidates on our own projects, I collected a couple of notes that might help you understand any issues that you may run into while upgrading. In general, be prepared for discovering some false-positives that were hidden because of earlier bugs in Fluent Assertions.

  • The changes to BeEquivalentTo will be the most visible ones:
    • Disabling auto-conversion may cause some tests to fail because different types used to be convertible. Fix the expectation or use the WithAutoConversionFor option.
    • Your tests may fail because of BeEquivalentTo reporting missing properties. This is caused by the expectation object being the driving factor for the structural comparison. Use Including or Excluding to fix that.
    • They may also fail because the expectation doesn't define any properties. This is often a signal that you pass in an abstract type as the expectation. Change the expectation or use the IncludeAllRuntimeProperties option.
    • Use WithTracing to understand how FA has evaluated your object graph.
  • The date and time extensions such as those to define 20.September(2018).At(19, 51) have moved to FluentAssertions.Extensions, so do a global regex text replace from

    using FluentAssertions;


    using FluentAssertions;
    using FluentAssertions.Extensions;
  • WithInnerException returns the inner exception, so we removed WithInnerMessage. Just use WithMessage instead.

Sponsor us

If you check out the release notes, you'll see that this release is quite big. But I could not have pulled this off without help from the community. First of all, a big shoutout goes to Adam Voss, Jonas Nyrup and Artur Krajewski for helping me out finalizing this release. Next to that, I'm really thankful for the new logo provided by Ben Palmer. And finally, big thanks to Jetbrains for providing us with licenses for their new IDE, Rider, as well as ReSharper. I honestly have not touched Visual Studio since I switched to Rider at the start of this project.

And we need your help as well. Support us by becoming a sponsor at Patreon or provide us with a one-time donation through Paypal.

Help wanted

But now that version 5.0 is out of the door, don't think that the work is done. There's still a lot of feature requests. More than enough to keep a lot of contributors busy for the foreseeable feature. Just checkout the Github items marked with Help Wanted to get you going. Oh, and follow me at @ddoomen to get regular updates on my everlasting quest for knowledge that significantly improves the way you build your projections in an Event Sourced world.

Wednesday, November 01, 2017

The Ugly of Event Sourcing–Real-world Production Issues

Event Sourcing is a beautiful solution for high-performance or complex business systems, but you need to be aware that this also introduces challenges most people don't tell you about. After having dedicated a post on the challenges of dealing with projection migrations and how to optimize that, it is time to talk about some of the problems that can happen in production.


So you've managed to design your aggregate boundaries properly, optimized your projection rebuilding logic so that migrations from one version to another complete painlessly, but then you face a run-time issue in production that you never saw before. I generally divide these kinds of problems in two categories. Those that you run into quite quickly and those that keep you awake outside business hours.

Issues that usually reveal themselves pretty quickly

Something we ran into a couple of times is a change in the maximum length of some aggregate root property. Maybe the title of a product was constraint to 50 characters, which seemed to be a very sensible limit for a long time. But then somebody changes the domain and increases that length. If your projection store isn't prepared for that, you'll end up with truncated data at best or a database error at worst. You could just define that column as being the database's max length, but I know for a fact that this has some serious performance implications on SQL Server. That's why we have the projector explicitly truncate the event data. Something similar but less likely can happen with columns that were supposed to hold a 32-bit integer, but then change into 64-bit longs.

Another interesting problem we've run into is an event which has a property that everybody expects to have a valid value but for which almost nobody remembers older versions of that event didn't even have that property. You won't spot that problem during day-to-day development, unless you happen to be running a build against older production databases like we do. The more versions you have of an event (we have one that is post-fixed with V5), the more of this knowledge dissipates into history. Unless you test your projectors against every earlier incarnation of an event (instead of relying on upconverters to do their thing), the only thing you can do is to document your events properly.

So in most cases the developers that change the domain are also the ones that work on the projection code. It's not entirely unconceivable that such a developer makes assumptions about the order the projector gets to see the events in. We have a guideline that states that you shouldn’t extend events for the purpose of improving the projector, and assuming an order may not feel like a violation to that. Just imagine what happens when somebody alters the domain in such a way that the order changes. And yes, this happened to us as well.

Issues that won't show up until the most inconvenient time

A very common problem in a CQRS architecture is the separation between the domain side (the write side) and the query side (the read side). Somehow those two worlds need to be kept in sync. With Event Sourcing, this is done using events. And though in most cases, the same developer deals with both sides of the system, both sides may evolve independently, especially in bigger code bases. This introduces the risk that the projection code doesn't entirely handle the events the way the domain intended them to be used. At some point somebody will replace, split or merge one or more events in the domain and forget to update the corresponding projections. And this is exactly what happened to us, more than once.

Another class of pain-in-the-butt problems are projectors that have misbehaving or unexpected dependencies. You may remember from one of my earlier posts that we started with a CQRS architecture and a traditional database-backed domain model. We didn’t move to Event Sourcing until much later. To keep that migration as smooth as possible, we introduced synchronous projectors that would maintain the immediate consistent query data as projections. If those synchronous projectors would be completely autonomous (as they should), everything would be fine and we could all go on with our lives.

However, over the years, some unexpected dependencies sneaked into the codebase. Apparently some developer decided it was a good idea to reuse the data that was maintained by another projector. This surfaced in two separate incidents. The first happened when we were rebuilding a projection after a schema upgrade. The projector ended up reading from another projection that was at a state much further in the event store's history. As this didn't cause any crashes, it took us quite some time to figure out why the rebuilt projection contained some unexpected data. The other one was quite similar and was caused by an asynchronous projector relying on the data persisted by a synchronous projector. Again, the autonomy of projectors is a key requirement.

In that respect, lookups can have similar problems even though they must be owned by the projector that maintains the main projection. Reuse of lookups is not that common, but not entirely exceptional either. I've seen lookups that can be used to find recurring things such as looking up the user's full name based on the identity. Since this is quite a common requirement, I can imagine such a lookup from being reused. However, the actuality of that look-up must be considered carefully. First, who maintains the lookup and how does the state of the lookup reflect on the projector that relies on it? What happens if they get updated at a different rate? And what if the lookup uses some kind of in-memory LRU cache? How will that work in a web farm? All questions that need to be answered on a case-by-case basis. Although there's no generic guideline here, we tend to ensure a lookup is used and owned by a single projector only. This simplifies the situation a bit and allows us to make more localized decisions on cachability, exception handling and how that affects the lookups, as well as the accuracy of it.

Those who have been using NEventStore as their storage engine are kind of forced into a model where the event store actively pushes events into the projectors. In other words, the event store tracks whether an event was handled by all projectors or not. So unless your solution wraps the projectors' work in one large database transaction, your projectors need to be idempotent. A common solution is to use the version of the aggregate that is often included in the event to see if that event was already handled. Although that is a pretty naïve solution, it gets worse if you need to aggregate events from multiple aggregates. Do you track two separate versions per projection? Or do you create some separate administration per projection? These kinds of problems let us to believe that we shouldn't use NEventStore anymore.

Hey, didn’t I say we only had two categories of problems? I did, but it just happens there is a third undocumented category of problems.

Things you would never expect they could happen

To speed up the projection work, at some point we started to experiment with batching a large number of projection operations into a single unit-of-work (we were and are still using NHibernate). But because we didn't want to maintain a database transaction of that size, we relied on the idempotency of the projectors to be able to replay multiple events when any of the projection work failed. This all worked fine for a while, until we got reports about projection exceptions referring to non-null database constraints. After some in-depth analysis, extended logging and painstakingly long debug sessions, we found the following events (no pun intended) happened:

  1. Event number 20 required a projection to be deleted, which it did.
  2. Some more unrelated events were handled after the application stopped or crashed for some reason.
  3. After restarting, the process restarted with event 10, which expected this projection to be still there.
  4. Since our code just creates projection the first time it is referred to, we created a new instance of this projection with all its properties set to default values, except those related to event 10.
  5. This projection got flushed into the database where it ran into a couple of non-null constraints and…boom!

This made us decide to abandon the idea of batching until we managed to reduce the scope of those transactions.

Another interesting problem happened when we got a production report about a unique key violation happening in one of the projection tables. Since that projector maintained a single projection per aggregate and the violation involved the functional key of that aggregate, we were at loss initially. After requesting a copy of the production database and studying the event streams we discovered two aggregates which identities where exactly the same except for their casing. Our event store does not treat those identities as equivalent because we started our project with an earlier version of NEventStore that required GUIDs as the stream identities. We convert natural keys to GUIDs by using an algorithm written by Bradley Grainger to generate deterministic GUIDs from strings. However, SQL Server, which serves as our projections store, does not care about casing differences. So even though our event store treated those identities as separate streams, the projection code ran into the database's unique key violation. Fortunately most event store implementations use strings for identifying streams. For our legacy scenario, we decided to generate those GUIDs from the lowercase version of the identity.

In another mystery case we received some complains that editing a particular document got slower and slower. Reading the data didn't show any issues, but writing definitely did. We quickly concluded that the involved projection was perfectly fine and started to look for bugs in the event store code, transaction management and the way we hydrate aggregates from events. We couldn't find anything out of the ordinary, until we requested a dump of that specific aggregates’ event history. We have a special diagnostics page to dump the event stream in JSON format, but somehow that page timed out. We needed to get the actual production database before we discovered a single event stream with over 100K events! Some kind of background job that ran regularly was updating the aggregate pretty often. But since the aggregate method involved didn't check for idempotency, a new event was emitted for each update. After a couple of months this definitely added up. We had to delete the entire event stream from the event store and rebuild the involved projections to resolve the issue.

However, the most painful problem we encountered did not surface until after months of regular load testing. It appeared as if a projector missed some events for some reason. We first assumed the projector itself had a bug, but then we discovered similar problems with other projects. We also learned that it only happened under high load, so we suspected that the projection plumbing didn't properly roll back the transactions that wrap the projections. We blamed kind of every part of the code base and even looked at the implementation of NEventStore itself. But we never considered the fact that a SQL Server identity column (which we use to identify the order we should project events) could result in inserts that complete out of order. So if the second insert completes before the first completes, it is possible that the projector will process that second event before it even had a chance to see the first one. We had to use exclusive locks during event store inserts to prevent this. And since our read-write ratio is 100:1, this doesn't affect our performance in any way. Other event stores have used an alternative solution by just reloading a page of events if a gap is detected.

What does that mean for the future?

Well, we did learn from all of this and identified a couple of guidelines that might be useful to you too.

  • Projections should never crash. Always truncate textual data, but log a warning if that happens.
  • If a projector throws and retrying doesn't help (transient exception et al), mark the projection as corrupt so that the UI can handle this.
  • Projectors should be autonomous. In other words, they run independent of other projectors, track their own progress and decide themselves when to rebuild. The consequence of this is that they need to run asynchronously.
  • Build infrastructure to extract individual streams or related streams for diagnostic purposes.
  • Account for case sensitivity of aggregate identities. However, how you handle them depends on the event store implementation and underlying persistency store.

A lot of the problems described in this post have been the main driving force for us to invest in LiquidProjections, a set of light-weight libraries for building various types of autonomous projectors. But that's a topic for another blog post….

What about you?

Hopefully this will be my last post on the dark side of Event Sourcing, which means I'd love to know whether you recognize any of these problems. Did you run into any other noticeable issues? Or did you find alternative or better solutions? If so, let me know by commenting below. Oh, and follow me at @ddoomen to get regular updates on my everlasting quest for knowledge that significantly improves the way you build your projections in an Event Sourced world.