Thursday, June 23, 2016

Microservices: The State of the Union

After attending a full-day track with multiple sessions and open-spaces on microservices at QCon New York, it is clear that this technique took flight since I first heard about that at the same conference in 2014. A lot of companies made the jump to solve their technical and organizational scaling issues, including Uber, Amazon and Netflix. And many of them used QCon to share their experiences. This is a random list of thoughts, observations and ideas from that day.

  • A lot of my concerns from that time seem to have been resolved by the industry. It's incredible how many (open-source) libraries, products and tools have emerged since then. Kafka seems to be big one here.
  • Container technology seem to have become a crucial element in a successful migration to microservices. Some would even call this a disrupting technology that changes the way we build software. Deployment times went from months, to days, to minutes. But apparently the next big thing is Serverless Architecture . AWS Lambda and Azure Functions are examples of that.
  • Domain Driven Design is becoming the de-facto technique for building microservices. You can see that in this nice definition: "A loosely coupled service-oriented architecture defined by bounded contexts". Event the DDD statement that if you have need to know too much about surrounding services, you probably have your bounded context wrong, applies to microservices. Unfortunately, according to an open space discussion, defining those boundaries is the most complex task.
  • One of the next challenges the community is expected to resolve is the complexity of authorization, security groups, network partition and such.
  • Something that Daniel Bryant noted (and something I observed myself while talking to the people at QCon) is that microservices is becoming the new the solution-to-all-problems. This is dangerous and leads to cargo culting.
  • Microservices is not just about technology. It also has a significant effect on the organization. In fact, as the workshop on the last day showed (about which a blog post will follow), these two go hand in hand.
  • Failure testing by injecting catastrophic events using Chaos Monkey (part of the Simian Army), Failure Injection Testing and Gremlin seem to become a commodity.
  • For obvious reasons Continuous Delivery is not a luxury anymore, it has become a prerequisite.
  • Version-aware routing and discovery of services through an API gateway is being used by all those that moved to microservices and seem to be a prerequisite. Such a gateway also provides a service registry to find services by name and version and get a IP and port. Smart Pipes are supposed to make that even more transparent. Examples of gateways that were used included Kong, Apigee, AWS API Gateway and Mulesoft.
  • With respect to communication protocols, the consensus is to use JSON over HTTP for external/public interfaces (which makes it easy to consume by all platforms), and using more bandwidth-optimized protocols like thrift, Protobuf, Avro or SBE internally. XML has been ruled out by all parties. Next to that, Swagger (through Swashbuckle for .NET) or DataWire Quark were mentioned for documenting the interfaces. Next to that, developers proposed to have the owning team build a reference driver for every service. Consumers can use that to understand the semantics of the service.
  • Even a deployment strategy has formed. I've heard multiple stating that you should deploy the existing code with either a new/updated platform stack or new dependencies, but not both. Consequently, deploying new code should happen without changing the dependencies or platform. This should reduce the feedback loop for diagnosing deployment problems.
  • A common recurring problem is overloading the network, also called retry storming because each service has its own timeout and retry logic causes an exponential retry duplication. A proposed solution is introducing a cascading timeout budget. Using event publishing rather than RPC surely can prevent this problem altogether.
  • If you have many services, you might eventually run into connection timeout issues. So using shared containers allows reusing connection. Don't use connection pooling either. Next to that, if services have been replicated for scale-out purposes, you should only retry on a different connection to a different instance.
  • Regressions don't surface immediately apparently, so all of the speakers agreed that canary testing is the only reliable way to find them.
  • A big-bang replacement is the worst thing you can do, and most of the successful companies used a form of the Strangler pattern to replace a part of the functionality.
  • A nice pattern that was mentioned a couple of times is that every service exposes a /test end-point so that be used to verify versions and dependencies during canary testing.

So what do you think? Are you already trying or operating microservices? If yes, any practices to share? If not, are you planning to? Love to hear your thoughts by commenting below. Oh, and follow me at @ddoomen to get regular updates on my everlasting quest for better solutions.

Tuesday, June 21, 2016

Event Sourcing from the Trenches: Mixed Feelings

While visiting QCon New York this year, I realized that a lot of the architectural problems that were discussed there could benefit from the Event Sourcing architecture style. Since I've been in charge of architecting such a system for several years now, I started to reflect on the work we've done, e.g. what worked for us, what would I do differently next time, and what is it that I still haven't made my mind up about. So after having discussed my thoughts on projections, I still have a couple of doubts to discuss.

Don't write to more than one aggregate per transaction
As long as you postpone dispatching those events until all is done, you should be fine. I know that Vaughn Vernon stated this in his posts about effective aggregate design, I still don't see the real pragmatic value here. Really trying to use a single transaction per aggregate means you need to perfectly design your aggregates so that every business action only affects a single aggregate. I seriously doubt most people will manage to do that. And if you can't, sticking to the single aggregate-per-transaction means you need to build logic for handling retries and compensating logic for when other parts of the bounded context are interested in those events. However, never ever use transactions across bounded contexts.

Separation of the aggregate root class and its state
Inspired by Locad.CQRS, we used a separate class to contain the When methods that are used to change the internal state as a result of an event, both during AR method invocations as well as during dehydration. However, using state results in some cumbersome usage of properties that point to the state class. Having them on the main AR class is going to make it very big, but maybe using a partial class makes sense.

Functional vs unique identifiers
For the aggregates that have a natural key, we use that key to identify the stream events belong to. However, Greg Young once mentioned that using a Guid or something is probably better, but somehow that never aligned with what I've learned to value from Pat Helland's old article Data on the Inside, Data on the Outside. Maybe you should do both?

Share by contract vs by type
Share events as a binary package or through some platform-agnostic mechanism (e.g. Json Schema) is a difficult one for me. Some people argue that sharing the binary package is going to be cause an enormous amount of coupling. But I would think that sharing just some Json Schema still means you're tied to that contract. For instance, if you're in the .NET space, being able to use a NuGet package that only contains the events from a bounded context that can be consumed by another context sounds very convenient. The only thing a schema-based representation will help you with is that it will force you to add a transformation step from that schema into some internal type. By doing that, you have a bit more flexibility in decoupling versioning differences. But somehow, I'm not convinced the added complexity is worth it (yet).

Feedback, please!
So what do you think? Do my thoughts make sense? Am I too pragmatic here? Are you using Event Sourcing yourself? If so, care to share some experiences? Really love to hear your thoughts by commenting below. Oh, and follow me at @ddoomen to get regular updates on my everlasting quest for better solutions.

Monday, June 20, 2016

Event Sourcing from the Trenches: Projections

While visiting QCon New York this year, I realized that a lot of the architectural problems that were discussed there could benefit from the Event Sourcing architecture style. Since I've been in charge of architecting such a system for several years now, I started to reflect on the work we've done, e.g. what worked for us, what would I do differently next time, and what is it that I still haven't made my mind up about. So after having discussed the domain events, let me share some of my thoughts on projections.

Optimized for querying
Projections, a denormalized aggregation of events, have only one purpose: be optimized for reading. So if your API or user interface needs data to be grouped or aggregated in a certain way, project the events like that into whatever storage mechanism you use. The work load should be focusing on the projection logic, not on the reading side.

Projections should not enforce constraints
Projections should be seen as an aggregated cache of events and consequently shouldn't be used to enforce any constraints. In fact, projection code should never crash, ever. This may sound trivial, but if you're code base has been evolving for the a couple of years and events and the underlying constraints have changed several times, bugs are inevitable. So make sure you projection code is resilient. An (bit of naïve) example would be to always cut a string value to the maximum length of the underlying database column, even though you know the event value never exceeds it…now. That's why I love NoSQL databases so much….

Don't share projections
A logical consequence of optimizing queries for a single purpose is that they are optimized for a single purpose…. Even though the persisted projection schema looks like it is a good fit for another kind of query, don't reuse it. Even though they may look similar now, they are inevitably going to deviate. The worst problem you can have is that you need to add all kinds of alternate executing paths in your logic to make sure both interests are served equally. Just duplicate the data.

Keep projections close to the consumer
If you persist the projections to a persistent store like a database, don't assume that the projection belongs to something like a data access layer. As I said before, they should be seen as a local query cache for a very particular purpose. So keep it as close to the consumer as possible. If you use it in a particular HTTP API implementation, move the projection code next to the API code. In fact, I would be going so far by saying that you should test the two together. Such a test should use the events as input and observe the HTTP response as output.

Allow each projector to decide on the persistency store.
Another logical result of all those earlier statements is that each projection can be persisted to whatever storage fits best. Sure, you can store it to a relational database, but I highly recommend to use a NoSQL database instead. However, you could even build up your specific projection into memory as soon as the system starts. And what about projecting your events directly to a local HTML or JSON file that is served by an HTTP API or web site? That's the beauty of Event Sourcing.

Track progress locally
So if each projection can be using a different storage mechanism, you need to be able to have the projection code to track progress themselves. This is something that NEventStore got wrong. It relied on a central structure for determine whether an event was dispatched to all projections. Instead, make sure you design your projection logic to track progress yourself. It gives the projection logic a lot of autonomy, including the possibility to rebuild itself when code changes deem that necessary. Notice that this does mean your event store should allow arbitrary subscribers, each interested in a different starting point.

Asynchronous projection as a first-class concern
Another consequence of all that autonomy is that it essentially means all that projection logic should run asynchronously. We kind of started the notion of having the projection logic run as part of the same thread of control that caused the action on the domain. This allowed us to avoid having to change the UI to deal with the fact that projections are eventually consistent. But a lot of projections don't have to be up-to-date all the time can be run in the background perfectly. This scales much better, in particular when a new version of the code base needs a rebuild of the persisted projections. We added this possibility afterwards, which means it needs to work around the existing code base. So if you can, make this a primary architectural principle. And if you really need a 'synchronous' behavior, have your projection logic expose an interface that you can observe to see if it caught up.

Feedback, please!
So what do you think? Do my thoughts make sense? Am I too pragmatic here? Are you using Event Sourcing yourself? If so, care to share some experiences? Really love to hear your thoughts by commenting below. Oh, and follow me at @ddoomen to get regular updates on my everlasting quest for better solutions.

Friday, June 17, 2016

Event Sourcing from the Trenches: Domain Events

While visiting QCon New York this year, I realized that a lot of the architectural problems that were discussed there, could benefit from the Event Sourcing architecture style. Since I've been in charge of architecting such a system for several years now, I started to reflect on the work we've done, e.g. what worked for us, what would I do differently next time, and what is it that I still haven't made my mind up about. After having discussed aggregates in my last post, let me share some of my thoughts on domain events.

Avoid CRUD terminology
Don't use terms like create, update and delete in the names of your events. Business doesn't talk in those terms, so you should not either. Stick to the terminology from your Ubiquitous Language that applies to the involved bounded context. Again, Event Storming will help here as well since the names will surface during the discussion with business.

Prefer fine-grained events
Avoid coarse-grained events, even if the latter originated from an Event Storming session. High-level events might be more aligned with the business, but they also require a lot of context to understand. As an example, consider a risk assessment for some work to do. Only such an assessment involves a high-risk task, an entire team of specialists is needed. Now consider an assessment for a high-risk which risk level is reduced to low-risk. It basically means the assessment team needs to be disbanded. How would you model that? Have a high-level event like TaskRiskLevelReduced? But that means that all subscribers need to understand that this means that that assessment team is no longer necessary. I prefer to raise separate events, e.g. a TaskRiskLevelReduced first followed by a RiskAssessmentTeamDisbanded. If the rules changes in the future, you'll be more flexible.

Make event merging a first class concern
Since our aggregates never need to handle concurrent commands, we could get away with optimistic and pessimistic locking on the aggregate level. But after two years or so, we couldn't maintain this stance anymore. Adding event merging to your architecture afterwards can be challenging. So, please consider adding this to your ES implementation as soon as possible. I just wish there was a .NET open-source project that could do that.

Allow multi-event conversion
Events evolve, you can't avoid that. Sometimes a event receives an extra property that has a suitable default. Sometimes an event is renamed or converted in another event. But sometimes you need to convert a couple of events into a single new one, or the other way around. So if load your events from an event store, make sure your up converters can have state to support this. As far as I know, none of the .NET event store library have this out-of-the-box.

Use primitive types
In the DDD worlds, it's very common to build domain-specific value types to represent concepts such as ISBN numbers, phone numbers, zip codes, etc. Never ever put these on your events. Events represent something that has happened. Just imagine if somebody changes the constraints that apply to an ISBN number and the old event doesn't comply to it. Moreover, serializing and deserializing those types to the underlying store is usually more expensive than when you use simple primitive types.

Don't enrich events
Don't enrich events with information from the aggregate or its association just to make it available for projection purposes later on. It'll create unnecessary duplication of data and increases the change you'll need event versioning. The only exception I can think of is when that data is functional. E.g. when you add a product to an order and it's important to track the price at that point, you would probably make it part of the aggregate's events. In that case, you could handle all that in the projection code, but that means that that code needs to understand how prices are handled.

Events are your contract
Treat your events as the contract of your bounded context. Other bounded contexts or external systems can consume or subscribe to them to keep track of what's going on. It's the perfect integration technique for decoupling systems, in particular for building micro-services.

Don't use them as a notification mechanism within the bounded context
In the original definition of a domain event, events were used as a notification mechanism from one domain entity to another. So, if a product was discontinued, an event would be raised for that. A domain event handler somewhere else in the code base could handle that and perform a business action on all running orders that included that product. I used to love that mechanism, but found that it makes it particular difficult to trace what's going on in a system. Next to that, DDD introduced domain services for that. Notice the bounded context in this discussion. I'm only talking about the code within the bounded context.

Feedback, please!
So what do you think? Do my thoughts make sense? Am I too pragmatic here? Are you using Event Sourcing yourself? If so, care to share some experiences? Really love to hear your thoughts by commenting below. Oh, and follow me at @ddoomen to get regular updates on my everlasting quest for better solutions.

Thursday, June 16, 2016

Event Sourcing from the Trenches: Aggregates

While visiting QCon New York this year, I realized that a lot of the architectural problems that were discussed there could benefit from the Event Sourcing architecture style. Since I've been in charge of architecting such a system for several years now, I started to reflect on the work we've done, e.g. what worked for us, what would I do differently next time, and what is it that I still haven't made my mind up about. So please let me share some of my thoughts on this.

Event Sourcing (ES) is an awesome architecture style for high-performance systems that supports some powerful concepts like fine-grained conflict handling, optimized projections, potentially unlimited horizontal scaling, and great business buy-in. But it introduces a lot of complexity like eventual consistency, event versioning and projection migration challenges. As with every (design) pattern, methodology or tool, you need to consider the trade-offs and the problem you're trying to solve. Don't jump on the ES train just because it sounds like a cool thing to work on. We only migrated from a CQRS-based architecture to ES to build an application-level replication protocol, even though we knew about ES when the entire project started. Granted, because of my positive experience in the current project, I would definitely consider ES for any non-trivial system. But I'm fully aware I might be falling in the second-system trap.

Use Event Storming
Event Storming is a technique to identify your (business) events from conversations with the business rather than extract them from your domain. Since we migrated from a relational-database-based domain model loosely based on Domain Driven Design principles to event sourcing, we had to extract our events from the existing code. This resulted sometimes in what Yves Reynhout amusedly called "property sourcing". They were rather technical and never encompassed the actual business intent. Event Storming helps you to identify the dynamics of the process you're trying to model rather than the state-oriented domain modelling approach. A nice side-effect of it is that it will surface potential conflicts in definitions warranting the introduction of separate bounded contexts.

Don't rely on aggregates to be in sync at all times
If your order (logically) references a product, don't rely on that product to exist or to be in a certain state. By following this principle, your code will be designed to handle non-existing data from day one. If, in the future, you're performance requirements grow that far that you need to partition your event store, you can do so. Trying to make your code handle these situations later on is going to be extremely painful. Projection code that aggregates events from multiple aggregates or maintains lookups is particularly susceptible to this. Prepare for it.

Postpone snapshots until you really need it
It adds complexity that you may not need. For instance, in our project we have two kinds of aggregates. They live a couple of days and receive a lot of events, but get abandoned after that. Or they live very long (like a user aggregate) but receive only a couple of events over a period of months. In both cases, we had no need for snapshots since the number of events per aggregate is rather low.

Identify a partition key for your aggregate
Even though you won't need it immediately, it allows you to scale in the future by partitioning the event store by that key. For instance, orders in a purchase system may be tied to a country. It's not like an order suddenly moves from one country to the other. And if the unexpected still happens, you always have the choice to copy the aggregate into a new aggregate with a different partition key. If there's no natural partition key, try to come up with something synthetic anyway.

Feedback, please!
So what do you think? Do my thoughts make sense? Am I too pragmatic here? Are you using Event Sourcing yourself? If so, care to share some experiences? Really love to hear your thoughts by commenting below. Oh, and follow me at @ddoomen to get regular updates on my everlasting quest for better solutions.

Tuesday, June 14, 2016

How to optimize your culture for learning, growth and collaboration

At the first day of QCon New York, I attended several talks and open-spaces that had some relation with culture, be it about improving the efficiency of developers, handling disagreement in respectful way, and creating an environment that embraces the learning experience.

For instance, one of the questions the attendees asked is how to convert low performers to high performers. As you can expect, nobody has that magic formula, but several ideas came up. Peer reviews and pair programming are the obvious example. Giving homework or assignment are others. But while discussing, the question came up what motivates people. The agreement was that being able to make a difference is a success factor here, so working on something you don't value is obviously not helping. Somebody used the analogy of a journalist working on a story that you're not interested in; you need to finish it, and there's a deadline. One little side-track dealt with the insecurity some potentially talented developers suffer from. They may be afraid to ask questions to the most knowledgeable people within the organization, thereby holding them back from obtaining a deeper understanding of the code they are working on. They may try to fall back on trying to find solutions on the internet or coming up with an idea themselves, thereby completing missing the alignment with the architectural vision. So having some people around that are actively approachable is extremely important for them. In a way you can conclude that fear can seriously hamper the growth of a high potential. Which brings me to the next point.

During that same Open Space, somebody explained the situation in which the CTO likes to be challenged by professionals in that company by publicly arguing about topics. His way of working involves cheesy values such as "Fight fairly, but argue to win". I don't have to explain you how bad that is for the motivation of the people in that company. Even if you're a strong communicator and feel very secure, you might attempt to talk with this guy a couple of times. But eventually you'll just stop. Now imagine the same for the more introvert or insecure people. Healthy arguments are..well..healthy, but where do you draw the boundary?

Sonali Sridhar, who gave a talk on this same topic, explained how her no-profit organization, Recurse Center, used a set of simple social rules to stimulate healthy and respectful conversations. For instance, you're not allowed to feign surprise when somebody asks you a question that you expect them to know already. It's condescending and will cause people to avoid asking questions in the future. Another one is to rule out the use of the phrase "Well actually". It is often used by people to emphasize an error in some argument that is totally irrelevant. It derails the original argument and moves the focus from the person that was speaking to the person that used that phrase. She also mentioned the "no subtle -isms" rule, which she used to reference subtle inquiries that might have an origin in sexism, racism, etc. And finally, she stated the simplest rule of all: treat people as adults. The longer I think about this, the more it makes sense to me. I'm pretty sure I've feigned surprise here and there…

So what do you think? Do you agree that these thinks can help build a working environment where failure as way to learn is a good thing? And what do you think about those social rules? Love to hear your thoughts by commenting below. Oh, and follow me at @ddoomen to get regular updates on my everlasting quest for better solutions.

Monday, June 13, 2016

A Git collaboration workflow that provides feedback early and fast

At Aviva Solutions, we’ve been using Git for a little of over two years now and I can wholeheartedly say that after having worked with TFS for years, we'll never go back… ever. But with any new technology, practice or methodology, you need to go through several cycles before you find a way that works well for you. After we switched over from TFS, we kept kind of working in a centralized fashion (hey, old habits don't die) where all those feature and team branches are kept on the centralized repository. If the entire code-base involves only a couple of developers, all is fine and dandy. But if you're working with 20 developers, not so much, unless you love those rainbow-style historical graphs…


So after a couple of months, in order to achieve a bit more isolation and less noise, the first teams started to fork the main repository. This worked quite well for most of the teams, in particular due to the power of pull requests. But it took more than a year before we managed to coerce all teams into doing that. That may surprise you, but teams get a lot of autonomy. For instance, they decide on their development process (Scrum, Kanban or a hybrid of that) or the way they work together. But switching from the nicely integrated combination of Visual Studio and TFS to the hybrid of command-line tools and half-baked desktop apps proved to be harder than I expected. Apparently the concepts of clones, forks and remotes were not as trivial than I thought.

We couldn't just revoke all write-access and force teams to work on forks without causing a riot. So it took another six months until we finally got all teams to agree on a process where everybody works on forks and uses pull requests (PRs) to submit their changes for merging by a small group of gate keepers. Teams use the PR for getting build status updates, tracking test status and performing code reviews by their peers and technical specialists. The gate keepers don't do code reviews themselves. They just make sure the code was reviewed and tested by the proper people. This has worked quite well for the last 9 months or so, but what about teams? How do developers within teams work together on their shared tasks and user stories?

Now that people can't directly push to the central repository anymore, they have no choice than to use a fork to do their work. We use GitHub, which does not have the concept of a team fork, so most teams use the fork of the first person that started working on a particular task. All developers involved in that work get write-access and can directly push to that repository. Most teams seem to be fine with that and I myself have worked like that as well. But in my experience, this approach has several caveats:

  • Code only gets reviewed when the combined work of the involved developers gets being scrutinized as part of the final pull request, which means the rework comes at the end as well. And if you value a clean source control history, it'll be very difficult to amend existing commits and result in those dreaded "Code review rework".
  • The changes being pushed by the individual developers can break the shared branch, e.g. by pushing compile-errors or failing unit tests. Even if that shared branch is linked to a WIP (work-in-progress) pull request, your build system might not be fast enough to trigger the team. They might be spending half an hour trying to figure out why their code doesn't compile, only to discover they pull a bad commit from the other team member.
  • A developer might push a solution that doesn't align with the agreed solution for the involved task. But if the other devs don't need to look at that code up until the final code review, the rework might be substantial.

That feedback loop is a bit too long for my taste. So we've tried an approach which probably could be best described as a multi-level pull request flow. It looks like this:


The idea is simple. Everybody works on their own fork, but there's a single branch on one of the designated forks that serves as the integration point for each other's changes. So in this example, the shared branch is on John's fork and neither of the three will write directly to the branch. Whenever Dean has completed his task, rather than writing directly to the shared branch, he'll issue a pull request to John's fork. This PR is then used to do an early code-review by either Mike or John, after which Dean will do the rework on his branch. Since this team values a clean history, Dean will complete his task by cleaning-up/squashing/reordering the commits on his branch using an interactive rebase. As soon as this has been done and the involved builds report a success status, John or Mike will merge the PR. Obviously, Dean and Mike follow the same work flow.

When all collaborative work has been completed, one of the guys will rebase their work on the latest state of affairs of the central repository and file a pull request to it. If not all earlier mentioned pull requests were merged using GitHub's new squashing merge technique, an (interactive) rebase will get rid of those noisy merge commits. Granted, that final PR might still receive some code review comments. But if you have been involving the right people in any of the earlier PRs, that should be minimized. Again, the goal is to keep the feedback loop as short as possible.

What do you think? Did I tell you something you didn't know yet? And what about you? Any tips to add to this post? Love to hear your thoughts by commenting below. And follow me at @ddoomen to get regular updates on my everlasting quest for better solutions.

Thursday, June 09, 2016

How to get the best performance out of NHibernate (and when not to use it at all)

Use the right tool for the right problem

A very common sentiment I'm getting from the .NET community is the aversion against object-relational mappers like NHibernate and Entity Framework. Granted, if I could, I would use an (embeddable) NoSQL solution like RavenDB myself. They remove the object-relational friction OR/Ms try to solve and allow you to decouple your code from scalability bottlenecks like shared database servers. And quite often they provide some cool features such as map-reduce indexes and faceted search. If a NoSQL product is not an option, some would argue that writing native SQL is always the better option. But in my opinion, unless you need to squeeze out the last bit of performance from a database, writing your own SQL statements is a waste of time. But even then, I would probably prefer some lightweight mapping library such as Dapper then writing the mapping code myselfclip_image001Those same people would also argue that NHibernate adds a lot of overhead and complexity, and there's some truth in there. It's a very powerful OR/M that is very good at mapping complex object-oriented designs to a relational database schema, but there's also a lot you can do wrong that will completely kill your performance. I once made the mistake of creating an abstraction on top of NHibernate (inspired by this article). It sounded like a nice idea for testability purposes, but treating NHibernate as a persistent LINQ-enabled collection forced me to limit myself to the common denominator (a.k.a. LINQ).

Regardless, if you can't use a NoSQL solution like RavenDB, you can't apply an architectural style that avoids the object-relation mismatch (e.g. Event Sourcing and/or CQRS), you need to support multiple database vendors, and you don't need the raw performance of native SQL, I would still recommend NHibernate over Entity Framework.

Now, when you do, please don't make the mistakes I made, and apply some or all of the following tips & tricks. For the record, I'm assuming you use Fluent NHibernate to avoid those ugly XML files. I never bothered with the built-in fluent API because I kind of got the impression it is still work-in-progress (hopefully somebody can convince me otherwise).

Don't abstract NHibernate

If you're practicing Test Driven Development, don't abstract away the code that uses NHibernate and write unit tests against an in-memory Sqlite or SQL Server LocalDB using NHibernate's built-in schema generation tool. It will surface any edge cases in NHibernate's LINQ support much earlier, and you will be able to profile the underlying raw queries right from inside your unit test. Which brings me to the next point…

Understand the run-time behavior

Ayende's NHProf is an awesome tool to find performance bottlenecks and common mistakes. It will not only show you’re the queries you've been executed, but also show you the entire stack trace of the code that was involved. Next to that, it can provide you with the actual results of that query as well as the full query execution plan from the underlying database. For each query it shows what part of the execution time was spent in the database and what part is added by NHibernate as well. It will even show you whether or not the query benefitted from NH's 2nd-level cache. And did I mention all the warnings it will give you when you're making common mistake such as N+1 selects, inefficient transaction management, or requesting unbounded result sets? In a way NHProf gives you a holistic view of your application's database interaction.


Prefer NH's own QueryOver API over LINQ

LINQ is a common denominator and doesn't support everything NH supports such as e.g. inner joins, left and right out joins, aliasing, projection transformers, etc. One more reason not to abstract NHibernate…

decimal mostExpensiveProduct = session.QueryOver<Product>().Select(Projections.Max<Product>(x => x.Price)).SingleOrDefault<decimal>();

Use optimistic concurrency and dynamic updates

NH's default behavior is to include all columns in every UPDATE or INSERT statement, regardless if the mapped property has changed or not. NH will also include all columns in the WHERE clause when doing an optimistic concurrency check. You can improve the speed of the latter by adding some kind of incremental number or timestamp and map that one as the version for the entity. That ensures that only the versioning column is included in the WHERE clause. But you can do even better by enabling dynamic updates on the mapping, e.g. using the DynamicUpdate method of the ClassMap<T>. This will tell NH to only include the actual columns that changed in the UPDATE and INSERT statements. I don't need to explain why that will give you a nice performance boost.

public class ProductClassMap : ClassMap<Product>
    public ProductClassMap ()

        Id(x => x.ProductId);
        Version(x => x.Version);
        Map(x => x.Name);
        Map(x => x.Price);

Avoiding the insert-insert-update for child collections

A long-standing issue that has caused a lot of confusion in many of my projects is the way NH deals with parent-child relationships (also known as HasMany associations). For reasons related to association ownership, NH will first insert the children without any foreign key, and then issue another update of those children after the parent has been added. Because of this weird algorithm, the foreign key column on the child table has to be nullable. Because of all of this, inserting a new parent with 5 childs involves a total of 11 SQL statements. 5 to insert the children, one to insert the parent and another 5 to update the foreign keys of those children. I only recently discovered that this has been changed in NHibernate 3.2 and you can now fix this by using the following (fluent) construct.

public class OrderClassMap : ClassMap<Order>
    public OrderClassMap ()
        HasMany(x => x.Products)

Notice the Not.KeyNullable() and Not.KeyUpdate(). You need both to make this work. The additional Not.Inverse() is not really needed, but can be used to emphasize that the Order in this association is responsible for maintaining the foreign key relationships. In NH jargon, this means that this side owns the association. You only need the Inverse() option in bi-directional associations so that NHibernate knows whether the Parent or the Child is responsible for properly setting up foreign keys. If this inverse thing still confuses you, I can highly recommend this article.

Lazy-loading heavy loaded properties

Sometimes you need to map a property on your entity that is pretty expensive to load and save, e.g. a byte array or some serialized Json or Xml. I know, you may want to avoid that in the first place, but if you can't, you need to know that you can mark properties as lazy loading like this:

Map(x => x.Thumbnail).LazyLoad();

So, assuming the ProductClassMap from earlier examples, when you fetch one or more orders, it will exclude the thumbnail data from the SELECT statement. But as soon as you access that property, it will issue a separate SELECT to get the actual column data. One caveat though. If you have multiple lazy-loaded properties, NH will fetch all them as soon as you access any of them.

Eager fetching of associations

Associations between entities are never initialized by default. This is well-known source of the infamous N+1 SELECT problem that happens when you load a bunch of entities using a query and then iterate over them. The first SELECT will get the parent entities, but accessing the association property of each parent entity will cause another SELECT to fetch the related childs. You can tell NH to fetch those children as part of a query using @ on a case by case. But if you know you'll always need them together and you can't merge those two tables in a nice and eficient cartesion product, map the association as a Not.LazyLoad().Fetch.Join().

Components without value semantics

A very much misunderstood aspect of component mapping is that the classes that are mapped as a component must behave like a component. They must expose value type semantics and have no identity other than the combined values of all of its properties. In other words, they must override Equals() and GetHashCode(). This is especially important when you map a property to a collection of components, like this:

HasMany<Address>(x => x.Shipments)
    .Component(x =>
        x.Map(c => c.ZipCode);
        x.Map(c => c.Number);
        x.Map(c => c.State);

If you don't, NH can't determine the equality of the objects in your collection property resulting in some weird behavior. I’ve seen NH delete and insert the same set of child objects every time somebody added an additional child to the collection. You don't notice that until you run that profiler again.

Some more tips & tricks

  • If you're in a position that you can't change too much of an existing database scheme, and you're application has vastly different needs in the way that data is read or written, you can consider multiple ClassMap to the same table. As long as all but one class maps are declared as ReadOnly, Nhibernate wil happily allow you. This has proved to be a very efficient technique to have different lazy-loading settings for the same table structure.
  • If you're into Domain Driven Design like me, you might be tempted to create all kinds of domain-specific and rich custom NHibernate types and map them to your columns. So rather than having a string-valued property to represent an ISBN number, you might define your own Isbn type. Now, if you value performance, don't. Just run a good CPU profiler like JetBrains' dotTrace to understand the impact of that.
  • Don't underestimate the power of NHibernate's second level cache offered by the likes of SysCache2. It can give you an enormous performance boost, especially if you deal with a lot of immutable data and you can avoid the infrastructural complexity of a distributed cache. Just don't forget to wrap all your code with a call to EnlistTransaction and CompleteTransaction. Ayende has written enough about that.
  • Consider you need to remove an entire range of entities from your database. You could query for them using LINQ or QueryOver and then issue individual Delete() statements on the session, but you can do better by employing NH's DML operations API. It supports HQL statements that resemble native SQL without any coupling to a specific database vendor like this:

    session.CreateQuery("delete Order order where order.CreatedAt > :minData")
  • You might know that you need to define cascading operations on parent-child collections. Just don't make the mistake to do this on HasManyToMany or References mappings. They are meant to create associations between entities which lifetime is indepednent of other entities. Doing it wrong caught us by surprise a couple of times, only to discover somebody added a Cascade.All or Cascade.AllDeleteOrphan().
  • NHibernate allows you to map simple collections of single elements such as numbers or strings to a child collection. But if you do, think hard about the uniqueness of those elements. By default, NH will treat an IList or array as a bag and allow duplicate items. If you don't want that, add an AsSet to the mapping like this:
  • HasMany(x => x.Options).Table("Options").KeyColumn("ParentId").Element("OptionValue").AsSet();

Well, that got a bit out of hand. What do you think? Did I tell you something you didn't know yet? And what about you? Any tips to add to this post? Love to hear your thoughts by commenting below. And follow me at @ddoomen to get regular updates on my everlasting quest for better solutions.