Sunday, November 06, 2016

The three mental modes of working with unit tests

The other day, while pairing up on some unit test, I started to realize that I generally have three modes of looking at my unit tests.

The Writing Mode

While writing, I mostly focus on the mechanics of getting the test to pass. By then, I usually have a mental model and a particular scenario in mind, and my thoughts mostly focus on finding the most elegant syntax and structure to get my test from red to green. Since I already know the exact scenario, I don't put too much attention on the name. If I'm really into the flow, the edge cases and alternative scenarios just pop into the back of my mind without me needing to really think about. In this mode, I also spend a lot of thoughts to come up with opportunities to refactor the test itself or the underlying constructs. For instance, is the scope of my test correct, does the subject-under-test not have too many dependencies. Since I practice Test Driven Development, some of these refactoring opportunities surface quick enough when I my set-up code explodes, or when my test code doesn't communicate the intend anymore.

The Review Mode

While reviewing somebody's pull request I switch to review mode in which I use the unit tests to understand the scope, the responsibilities and the dependencies of a class or set of classes. To understand those responsibilities, I put particular attention to the names of the tests thereby completely ignoring the implementation of the test itself. With the names as my only truth, I try to understand the observable behavior of the subject-under-test (SUT) under different scenarios. They should make me wonder about possible alternative scenarios or certain edge cases. In other words, they should make it possible for me to look at the code from a functional perspective. That doesn't mean they need to be understandable by business analysts or product owners, but they must help me understand the bigger picture.

Only when I'm satisfied that the developer considered all the possible scenarios, I start to look at the implementation details of particular test cases. What dependencies does the SUT have? Are there any I didn't expect? If so, did I understand the test case correctly, or is the test hiding important details? Are all dependencies I did expect there? If not, where are they? Is everything I see important to understand the test? If not, what aspects could be moved to a base-class (for BDD-style tests), or is a Test Data Builder or Object Mother a better solution? Do all assertion statements make sense? Did he or she use any constant values that are difficult to reason about? Is each test case testing a single thing. What if the test fails? Does it give a proper message to the developer what went wrong functionally or technically? A proper assertion framework can help because what use would the error "Expected true, but found false" have?

The Analysis Mode

Now, consider a test fails and I'm the one that needs to analyze the cause of this. In this debugging mode, I first need to understand what this test was supposed to verify. For this, I need a name that clearly explains the specifics of the test case on a functional level. Again, I won't let my thoughts be distracted by the implementation. The name should help me understand what is the expected behavior and help me make up my mind on whether that scenario makes sense at all. After I conclude that the test case indeed makes sense, I'll start studying the implementation to determine if the code really does do what the test name suggest. Does it bring the context in the right state? Does it set-up the dependencies correctly (either explicitly or through some kind of mocking framework)? Does it invoke the SUT using the right parameters? And does the assertion code expect something that makes sense to me considering the initial state and the action performed? Only if I've confirmed the correct implementation, it's time to launch a debugger.

I know the world is not perfect, but keeping out of the debugger hell should be a primary concern for the test writer. This is a difficult endeavour and requires the developer to ensure the intend of a unit test is as clear as crystal. Naming conventions, hiding the irrelevant stuff, and a clear cause and effect are adamant to prevent yourself from shooting in your own foot in the long run. If you're looking for tips to help you in this, consider reading my prior post on writing maintainable unit tests.

So what do you think? Do you recognize yourself in these modes? What do you think is important to be successful in unit testing and/or Test Driven Development? I've love to know what you think by commenting below. Oh, and follow me at @ddoomen to get regular updates on my everlasting quest for better tests.

Sunday, October 30, 2016

Principles for Successful Package Management

A couple of months ago I shared some tips & tricks to help you prevent ending up in NuGet dependency hell. As a big fan of the SOLID principles, I've always wondered why nobody thought of applying these principles on the package level. If SOLID can help you to build cohesive, loosely coupled components which do one thing only and do that well, why can't we do the same thing on the package level. As it happens, my colleague Jonne enthousiastically referred me to the book Principles of Package Design by Matthias Noback. It's available from Leanpub and does exactly that, offering a couple of well-named guidelines inspired by SOLID that will help you design better NuGet, NPM or whatever your package management solution of choice uses.

The first half of the 268 pages provide an excellent refresh of the SOLID principles. He even does a decent job of explaining the inversion of control principle (although I would still refer to the original to really grasp that often misunderstood principle). After that he carefully dives into the subtleties of cohesion as a guiding principles before he commences on the actual package design principles. The examples are all in PHP (yeah, really), but the author clearly explains how these would apply to other platforms. Notice that this post is mostly an exercise for me to see if I got the principles right, so I would highly recommend buying the .epub, .mobi or PDF from Leanpub. It's only 25 USD and well worth your money. So let's briefly discuss the actual principles.

The Release/Reuse Equivalency Principle

IMHO, the first principle has a rather peculiar name. Considering its purpose, it could have been called The Ship a Great Package Principle. The gist of this principle is that you should not ship a package if you don't have the infrastructure in place to properly support that. This means that the package should follow some kind of clear (semantic) versioning strategy, has proper documentation, a well-defined license, proper release notes, and is covered by unit tests. The book goes into great lengths to help you with techniques and guidance on ensuring backwards compatibility. Considering the recentness of the book and the fact it mentions Semantic Versioning, I would expected some coverage of GitFlow and GitHubFlow. Nonetheless, most of the stuff mentioned here should be obvious, but you'll be surprised how often I run into a unmaintainable and undocumented package.

The Common Reuse Principle

The purpose of the second principle is much clearer. It states that classes and interfaces that are almost always used together should be packaged together. Consequently, classes and interfaces that don't meet that criteria don't have a place in that package. This has a couple of implications. Users of your package shouldn't need to take the entire package if they just need a couple of classes. Even worse, if they use a subset of the package's contents, there must not be a need to get confronted with additional package dependencies that have nothing to do with the original package. And if that specific package has a dependency, then it's an explicit dependency. A nice side-effect of this principle is that it makes packages Open for Extension and Closed for Modification.

I've seen packages that don't seem to have any dependencies until you use certain classes that employ dynamic loading. NHibernate is clear violator of this principle in contrast to the well-defined purpose of the Owin NuGet package. My own open-source library, Fluent Assertions also seems to comply. When a contributor proposed to build a Json extension to my library, I offered to take in the code and ship the two NuGet packages from the same repository. So if somebody doesn't care about Json, it can use the core package only, without any unexpected dependencies on NewtonSoft.Json.

The Common Closure Principle

The third principle is another one that needs examples to really grasp its meaning. Even the definition doesn't help that much:

The classes in a package should be closed against the same kinds of changes. A change that affects a package affects all the classes in that package.

According to many examples in the book, the idea is that packages should not require changes (and thus a new release) for unrelated changes. Any change should affect the smallest number of packages possible, preferably only one. Alternatively, a change to a particular package is very likely to affect all classes in that package. If it only affects a small portion of the package, or it affects more than one package, chances are you have your boundaries wrong. Applying this principle might help you decide on which class belongs in which package. Reflecting on Fluent Assertions again, made me realize that even though I managed to follow the Common Reuse Principle, I can't release the core and Json packages independently. A fix in the Json package means that I also need to release the core package.

The Acyclic Dependencies Principle

For once, the fourth principle discussed in this book is well described by its definition:

The dependency structure between packages must be a directed acyclic graph, that is, there must be no cycles in the dependency structure.

In other words, your package should not depend on a package which dependencies would eventually result in cyclic dependency. At first thought, this looks like an open door. Of course you don't want to have a dependency like that! However, that cyclic dependency might not be visible at all. Maybe your dependency depends on something else that ultimately depends on a package that is hidden in the obscurity of all the other indirect dependencies. In such case, the only way to detect that, is to carefully analyze each dependency and create a visual dependency graph.

Another type of dependencies that the book doesn’t really cover are diamond dependencies (named for the visual dependency graph). Within the .NET realm this is a quite a common thing. Just consider the enormous amount of NuGet packages that depend on NewtonSoft's Json .NET. So for any non-trivial package, it's quite likely that more than one dependency eventually depends on that infamous Json library. Now consider what happens if those dependencies depend on different versions.

The book offers a couple of in-depth approaches and solutions to get yourself out of this mess. Extracting an adapter or mediator interface to hide an external dependency behind is one. Using inversion-of-control so that your packages only depend on abstract constructs is another. Since the book is written by a PHP developer, it's no surprise that it doesn't talk about ILMerge or its open-source alternative ILRepack. Both are solutions that will merge an external .NET library into the main DLL of your own package. This essentially allows you to treat that dependency as internal code without any visible or invisible DLL dependencies. An alternative to merging your .NET libraries is to use a source-only NuGet package. This increasingly popular technique allows you to take a dependency on a NuGet package that only contains, surprise, source code that is compiled into your main package. LibLog, TinyIoc and even my own caching library FluidCaching uses this approach. It greatly reduces the dependency chain of your package.

The Stable Dependencies Principle

The name of the principle is quite self-explanatory, but the definition is even clearer.

The dependencies between packages in a design should be in the direction of the stability of the packages. A package should only depend upon packages that are more stable than it is.

In other words, you need to make sure you only depend on stable packages. The more stable your dependency, the more stable your package is going to look to your consumers. Determining whether a package is stable or not isn't exact science. You need to do a bit of digging for that. For instance, try to figure out how often a dependency introduced a breaking change? And if they did, did they use Semantic Versioning to make that clear? How many other public packages depend on that package? The more dependents, the higher the chance that the package owners will try to honor the existing API contracts. And how many dependencies does that package have? The more dependencies, the higher the chance some of those dependencies introduce breaking changes or instability. And finally, check out its code and judge how well that package follow the principles mentioned in this post? The book doesn't mention this, but my personal rule-of-thumb to decide on whether I will use a package as a dependency is to consider the circumstances when the main author abandons the project. The code should either be good enough for me to maintain it myself/ourselves, or the project should be backed by a large group of people that can ensure continuity.

The Stable Abstractions Principle

Now if you understand (and agree) with the Stable Dependencies principle, you'll most definitely understand and agree with the Stable Abstractions Principle. After all, what's more stable? An interface, an abstract type or a concrete implementation? An interface does not have any behavior that can change, so it is the most stable type you can depend on. That's why a well-designed library often uses interfaces to connect many components together and quite often provides you would with an interface-only package. For the same reason, the Inversion of Control principle tries to nudge you in the same direction. In fact, in the .NET world even interfaces are being frowned on and are being replaced with old-fashioned delegate types. These represent a very tiny and very focused interface so it doesn't get any more stable than that. And because of their compatibility with C#'s lambda statements you don't even need to using a mocking library.

So what about you?

The names are not always that catchy and easy to remember, mostly because they use the same wording, but the underlying philosophy makes a lot of sense to me. I've already started to re-evaluate the design decisions of my projects. The only thing I was hoping to read more about is the explicit consequence of building a component or package as a library versus building it as a framework. This is something that heavily influences the way I'm building LiquidProjections, my next open-source project.

So what do you think? Do you see merits in these principles? Do they feel as helpful as the original SOLID principles? I've love to know what you think by commenting below. Oh, and follow me at @ddoomen to get regular updates on my everlasting quest for better designs.

Thursday, October 06, 2016

The magic of keeping a band of developers together

As I work as a consultant for Aviva Solutions, and the nature of my job is to be involved in moderately long-running client projects, I don't get to come to the office that often. And if I do, it's on different days of the week. Over the last year so, our locations in Koudekerk aan de Rijn and Eindhoven have grown with almost 10 new developers. Since we did not have any company-wide meetings since May, it should not come as a surprise that I arrived at the office one day, only to conclude that I could not recall the name of those new guys. When our company was only 15 people strong, any new colleague joining was a big thing. And it's still a big thing. It's just that it is becoming less visible. Sounds familiar? It's even more embarrassing considering that I'm usually the one that creates the new Github and FlowDock accounts. But even if I do remember a new name, I often can't map that to somebody's face. That's why I ask all new employees to add a little picture of themselves to their Flowdock profile. In a way, I'm cheating a bit.

Aviva Summer Event IbizaA similar challenge happens on the other side of the hiring spectrum. When somebody joins a company like Aviva, it's important for that person to feel valuable and be able to identify with the company's culture. The only way to do that is to engage with your coworkers, finding out who knows what, understanding the chain of command (which we don't have), and surfacing the areas where that person can add value. The problem in 2016 is that competitors are looking for people as well and recruiters are getting more aggressive by the day. So how do you keep a group of people like that together? Sure, you can offer them more money, give them expensive company cars, laptops and phones. But that's never really going to help. If they don't feel connected to the company, they might leave for the first deal they get.

So how do we stay connected? Well, twice a year, we have company-wide meetings, either at our HQ, our office in Eindhoven, a restaurant or some other venue. This usually involves a formal part where the two owners and the social committee share an update on sales, running projects, HR, the financial outlook and any upcoming social events. Then we have dinner, an occasional drink and some kind of activity (for those that want that). For example, when the Eindhoven office was just opened, the meet-up was organized at that office, and all co-workers were offered an overnight stay to go have a fun night out in downtown Eindhoven. This not only allowed the people that joined the Eindhoven office to meet the other people, but it also ensured that everybody has seen the new office and knows where to find it. We really encourage people to work in a different office occasionally, regardless of you being involved with an internal project or working for a client.

We also regularly get together for pizza sessions where we exchange project experiences, explore new technologies and run trial talks for public events. These are very informal evenings were everybody can share their findings or have discussions, whether or not you have presentation skills, you're a senior developer or just joined the company. Quite often, these little evenings are visited by colleagues from other companies or people who just happen to have heard about the topic (we usually Tweet and post on Facebook about them). Sometimes, these events turn into something bigger when we work together with the .NET community to organize public events.

I particularly like Flowdock as a low-threshold collaboration tool that is accessible as a web site, a desktop app or smartphone app. We have different channels (or flows) for trolling around, getting technical support, or having deep discussions on technical topics. Next to that, all our projects have dedicated flows, so everybody can read along and learn about the daily troubles and tribulations of their co-workers, even if you're stationed at a client's office. Flowdock is probably the most engaging platform I've come across. Neither Skype, Jabber or any other platform has helped us so much to keep in touch with each other. It also allowed us to avoid those long email threads that nobody is waiting for. And since we heavily use GitHub for our project's sources, we can directly see what's going on from inside Flowdock.

Now, all of this helps a lot to keep the group together, but the ultimate trick to make this group a team is to stuff the entire company in a plane, fly us to a warm place in Spain, a nice city like Prague or, like this year's 10-year-anniversary special edition trip, Ibiza. Yes, it sounds like pure luxury and over spoiling your employees. And yes, we did have a lot of fun, took a trip on a catamaran and explored the island driving around in those old 2CVs. But for 12 people, this was the first time they joined us on our annual trip. It allowed them to get to know their co-workers, find people with similar interests, share some frustration about that last project, get advice on how to resolve a technical or other work-related challenge, or receive tips on advancing their careers. Some would even use the weekend to debate technical solutions or new technological hypes. Nonetheless, the point is that before that weekend they were just employees. After that weekend they were colleagues, and in some unique cases, friends. What more can you expect from a company?

Aviva Summer Event Ibiza

So what do you think? Are you a junior, medior or senior .NET or front-end developer? Did you just graduate from university or a polytechnical college? Does a company like this appeal to you? Comment below, or even better, contact me by email, twitter or phone to visit one of the upcoming events or join us for a coffee and taste the unique atmosphere of an office with passionate people….

Bonus Content: We’ve compiled a little compilation of our trip. Check it out on YouTube.

Tuesday, August 30, 2016

Continuous Delivery within the .NET realm

Continuous what?

Well, if you browse the internet regularly, you will encounter two different terms that are used rather inconsistently: Continuous Delivery and Continuous Deployment. In my words, Continuous Delivery is a collection of various techniques, principles and tools that allow you to deploy a system into production with a single press of a button. Continuous Deployment takes that to the next level by completely automating the process of putting some code changes that were committed to source control into production, all without human intervention. These concepts are not trivial to implement and involve both technological innovations as well some serious organizational changes. In most projects involving the introduction of Continuous Delivery, an entire cultural shift is needed. This requires some great communication and coaching skills. But sometimes it helps to build trust within the organization by showing the power of technology. So let me use this post to highlight some tools and techniques that I use myself. 

What do you need?
As I mentioned, Continuous Delivery involves a lot more than just development effort. Nonetheless, these are a few of the practices I believe you need to be successful.

  • As much of your production code as possible must be covered by automated unit tests. One of the most difficult part of that is to determine the right scope of those tests. Practicing Test Driven Development (TDD), a test-first design methodology, can really help you with this. After trying both traditional unit testing as well as TDD, I can tell you that is really hard to add maintainable and fast unit tests after you've written your code.
  • If your system consists of multiple distributed subsystems that can only be tested after they've been deployed, then I would strongly recommend investing in acceptance tests. These 'end-to-end' tests should cover a single subsystem and use test stubs to simulate the interaction with the other systems.
  • Any manual testing should be banned. Period. Obviously I realize that this isn't always possible due to legacy reasons. So if you can't do that for certain parts of the system, document which part and do a short analysis on what is blocking you.
  • A release strategy as well as a branching strategy are crucial. Such a strategy defines the rules for shipping (pre-)releases, how to deal with hot-fixes, when to apply labels what version numbering schema to use.
  • Build artifacts such as DLLs or NuGet packages should be versioned automatically without the involvement of any development effort.
  • During the deployment, the administrator often has to tweak web/app.config settings such as database connections strings and other infrastructure-specific settings. This has to be automated as well, preferably by parametrizing deployment builds.
  • Build processes, if they exist at all, are quite often tightly integrated with build engines like Microsoft's Team Build or JetBrain's Team City. But many developers forget that the build script changes almost as often as the code itself. So in my opinion, the build script itself should be part of the same branching strategy that governs the code and be independent of the build product. This allows you to commit any changes needed to the build script together with the actual feature. An extra benefit of this approach is that developers can test the build process locally.
  • Nobody is more loathed by developers than DBAs. A DBA that needs to manually review and apply database schema changes is a frustrating bottleneck that makes true agile development impossible. Instead, use a technique where the system uses metadata to automatically update the database schema during the deployment.

What tools are available for this?

Within the .NET open-source community a lot of projects have emerged that have revolutionized the way we build software.

  • OWIN is an open standard to build components that expose some kind of HTTP end-point and that can be hosted everywhere. WebAPI, RavenDB and ASP.NET Core MVC are all OWIN based, which means you can build NuGet packages that expose HTTP APIs and host them in IIS, a Windows Service or even a unit test without the need to open a port at all. Since you have full control of the internal HTTP pipeline you can even add code to simulate network connectivity issues or high-latency networks.
  • Git is much more than a version control system. It changes the way developers work at a fundamental level. Many of the more recent tools such as those for automatic versioning and generating release notes have been made possible by Git. Git even triggered de-facto release strategies such as GitFlow and GitHubFlow that directly align with Continuous Delivery and Continuous Deployment. In addition to that, online services like GitHub and Visual Studio Team Services add concepts like Pull Requests that are crucial for scaling software development departments.
  • XUnit is a parallel executing unit test framework that will help you build software that runs well in highly concurrent systems. Just try to convert existing unit tests built using more traditional test frameworks like MSTest or Nunit to Xunit. It'll surface all kinds of concurrency issues that you normally wouldn't detect until you run your system in production under a high load.
  • Although manual testing of web applications should be minimized and superseded by JavaScript unit tests using Jasmine, you cannot entirely get rid of a couple of automated tests. These smoke tests can really help you to get a good feeling of the overall end-to-end behavior of the system. If this involves automated tests against a browser and you've build them using the Selenium UI automation framework, then BrowserStack would be the recommended online service. It allows you to test your web application against various browser versions and provides excellent diagnostic capabilities.
  • Composing complex systems from small components maintained by individual teams has been proven to be a very successful approach for scaling software development. MyGet offers (mostly free) online NuGet-based services that promotes teams to build, maintain and release their own components and libraries and distribute using NuGet, all governed by their own release calendar. In my opinion, this is a crucial part of preventing a monolith.
  • PSake is a PowerShell based make-inspired build system that allows you to keep your build process in your source code repository just like all your other code. Not only does this allow you to evolve your build process with new requirements and commit it together with the code changes, it also allows you to test your build in complete isolation. How cool is it to be able to test your deployment build from your local PC, isn't it?
  • So if your code and your build process can be treated as first-class citizens, why can't we do the same to your infrastructure? You can, provided you take the time to master PowerShell DSC and/or modern infrastructure platforms like TerraForm. Does your new release require a newer version of the .NET Framework (and you're not using .NET Core yet)? Simply commit an updated DSC script and your deployment server is re-provisioned automatically.

Where do you start?

By now, it should be clear that introducing Continuous Delivery or Deployment isn't for the faint of heart. And I didn’t even talk about the cultural aspects and the change management skills you need to have for that. On the other hand, the .NET realm is flooded with tools, products and libraries that can help you to move in the right direction. Provided I managed to show you some of the advantages, where do you start?

  • Switch to Git as your source control system. All of the above is quite possible without it, but using Git makes a lot of it a lot easier. Just try to monitor multiple branches and pull requests with Team Foundation Server based on a wildcard specification (hint: you can't).
  • Start automating your build process using PSake or something alike. As soon as you have a starting point, it'll become much easier to add more and more of the build process and have it grow with your code-base.
  • Identify all configuration and infrastructural settings that deployment engineers normally change by hand and add them to the build process as parameters that can be provided by the build engine. This is a major step in removing human errors.
  • Replace any database scripts with some kind of library like Fluent Migrator or the Entity Framework that allows you to update schema through code. By doing that, you could even decide to support downgrading the schema in case a (continuous) deployment fails.
  • Write so-called characterization tests around the existing code so that you have a safety net for the changes needed to facilitate continuous delivery and deployment.
  • Start the refactoring efforts needed to be able to automatically test more chunks of the (monolithical) system in isolation. Also consider extracting those parts into a separate source control project to facilitate isolated development, team ownership and a custom life cycle.
  • Choose a versioning and release strategy and strictly follow it. Consider automating the version number generation using something like GitVersion.

Let's get started

Are you still building, packaging and deploying your projects manually? How much time have you lost trying to figure out what went wrong, only to find out you forgot some setting or important step along the way? If this sounds familiar, hopefully this post will help you to pick up some nice starting points. And if you still have question, don't hesitate to contact me on twitter or by reaching out to me at TechDays 2016.

Monday, July 25, 2016

Scaling a growing organization by reorganizing the teams

During this year's QCon conference held in New York, I attended a full-day workshop on the scalability challenges a growing organization faces, hosted by Randy Shoup. In my previous two posts I discussed a model to understand the needs of an organization in its different life phases, as well as a migration strategy for getting from a monolith to a set of well-defined microservices.

The Universal Scalability Law…again

However, Randy also talked about people, or more specifically, how to reorganize the teams for scalability without ignoring the Universal Scalability Law. What this means is that you should be looking for a way to have lots of developers in your organization working on things in isolation (thereby reducing contention) without the need for a lot of communication (a.k.a. coherence). So any form of team structuring that involves a lot of coordination between teams is obviously out of the question, particularly skill-based teams, project-based teams or large teams.

For the same reason, Randy advises against geographically split teams or outsourcing to so-called job shops. Those do not only involve a lot of coordination, but local conversations will become disruptive in melding a team. Just like Randy, I find face-to-face discussions crucial for effective teams. But if your team is not co-located, those local conversations will never reach the rest of the team. Yes, you may persist on posting a summary of that discussion on some kind of team wiki, Flowdock/Slack or other team collaboration tool, but they will still miss the discussions that let to that summary. Even using a permanent video conferencing set-up doesn't always solve that, particularly if the people in the team don't share the same native language (which is already a problem for co-located teams).

The ideal team

He also said something about the effect of getting more people in the organization. In his view, 5 people is the ideal. That number of people can sit around a table, will benefit from high bandwidth communication and roles can be fluid. When you reach about 20 people, you require structure, which on turn, can be a potential trough of productivity and motivation. When you reach 100 people, you must shift your attention from coordinating individuals to coordinating teams. A clear team structure and well-defined responsibilities becomes critical. Knowing this, it's kind of expected that Randy likes to size his teams using the "2 pizza rule", the number of people you can feed from 2 pizza's. So a team consisting of 4-6 people in a mix of junior and senior and (obviously) co-located has his preference.

Ideally he wants to have that team take ownership of a component or service, including maintenance and support as well as the roadmap for that component or service. This implies that all teams are full-stack from a technological perspective and are capable of supporting their component or service all the way into production. But Randy emphasizes that managers shouldn't see teams like this as software factories. Teams should have an identity and be able to build up proud-of- ownership. This also implies taking responsibility over the quality of those services. He purposely mentioned the problem of not having the time to do their work right and taking shortcuts because of (perceived) pressure from management or other stakeholders. In his opinion, this is the wrong thing to do, since it means you'll need to do the work twice. The more constrained the team is in time, the more important it is to do it the right way first.

The effect of team structure on architecture

Another argument for his ideas is provided by Conway's Law. Melvin Conway observed that in a lot of organizations the structure of the software system closely followed the structure of the organization. This isn't a big surprise, since quite often, cross-team collaboration requires some kind of agreed way of working, both on the communication level as well as on the technical level. Quite often, architectural seams like API contracts or modules emerge from this. So based on that observation, he advices organizations to structure the teams along the boundaries you want to accomplish in your software architecture. And this is how Conway's Law is usually used. But in this workshop, Randy has already steered us towards the notion of using microservices for scaling the organization. So does Conway's Law apply here? Each team owns one or more microservices, or, the API contracts I just discussed. They work in isolation, but negotiate about the features provided by their services. I would say that is a resounding yes!

All things considered, it should not come as a surprise that he believes microservices are the perfect architecture for scaling an organization on both the people and the technical level. So what do you think? Are microservices the right technology to scale teams? Or is this the world up-side down? Let me know by commenting below. Oh, and follow me at @ddoomen to get regular updates on my everlasting quest for better solutions

Sunday, July 17, 2016

Scaling a growing organization by rearchitecting the monolith

During this year's QCon conference held in New York, I attended a full-day workshop on the scalability challenges a growing organization faces, hosted by Randy Shoup. In my previous post, I elaborated on Randy's classification system to illustrate the phases of a growing organization and how that affects technology. In his opinion, facilitating an organization that enters the scaling phase means re-architecting the monolith.

About Rearchitecting

In his words, rearchitecting is the act of re-imagining the architecture of an existing system so that there's a path to meet the requirements of the organization in its current form. As I said before, rebuilding a system is not an option, particularly not in the scaling phase. By that time, you'll have way too many clients that need new features and other improvements. On the other hand, you will very likely suffer from typical monolithical symptoms like lack of isolation in the code base, teams that are stepping on each other's toes, new engineers that need months to get a decent understanding of the code, painful and slow releases, etc., etc. Instead of that, you want to be able to have components which lifecycle is independent of others and that are deployed in an automated fashion. Sounds familiar? Enter microservices…

According to Randy, microservices are the perfect solution to rearchitect an existing monolith. Each service is simple, can be independently scaled, tested and deployed and allows optimal tuning without affecting any of the other services. The tooling, platform and practices have evolved considerably since people started talking about microservices two years ago. Building them is a lot less of a challenge then it used to be, but it still comes with a cost. You'll end up with lots of small source code repositories as well as the organizational structure to support all that (e.g. who owns what repo). Finding the repo that belongs to a particular micro-service requires naming conventions and tools not provided by the current online providers. On a technical level, you need to consider the network latency and the availability of a service. You'll also need sophisticated tooling to track, manage, version and control dependencies between the services. As many QCon sessions have demonstrated, a lot of tooling has emerged. But just like the wave of JavaScript frameworks and libraries that occurred when React became popular, I suspect it'll take a while until the dust has settled.

From monolith to microservices

So now that we've established the decision to re-architect the monolith into microservices, how are we going to do that? Well, if it's up to Randy, he wants you to start carving up the monolith by finding a vertical seam that allows you to wall off a functional feature behind an interface. This is obviously the hardest part since monoliths typically don't expose a lot of cohesion. The logic related to a feature is spread out over the codebase, sometimes crosses layers and involves way too much coupling. The next step is to write automated tests around that interface so you can replace the implementation with a remote microservice without causing breaking changes in the semantics of the feature involved. As you can see, this is anything but a big bang approach, and can be done in relatively small and riskless steps. However, Randy shared that in his own experience, it is very bad to combine a migration like this with the introduction of new features. He stressed the importance of first completing a migration of an existing feature so that it can serve as the basis of a new microservice and then add the additional functionality. Doing both at the same time is simply too risky and may blow up the project.

Now, he doesn't want you to become too overzealous and just start carving like crazy. Instead, he advises you to start with a pilot implementation. The pilot should represent an end-to-end vertical part of the system's experience, something significant enough to be representative for the monolith's complexity. Such a pilot provides an opportunity for the team to learn and use that experience to manage expectations. At the same time, it can be used to demonstrate the feasibility to the stakeholders.

When the pilot is deemed to be successful, it is time to continue the migration on a larger scale. However, Randy advices to prioritize future candidates for a migration to microservices based on their business value. In other words, prefer those parts of the monolith that give you the highest return-of-investment. If that doesn't help you, focusing on the the areas with the greatest rate of change. I mean, that was the whole premise of switching to microservices; being able to work and deploy in isolation. And finally, as you would approach any technological task with a lot of uncertainty, consider solving the hardest problems first.

Anti-Patterns

He also identified a couple of anti-patterns while working on his own migration projects. For instance, the Mega-Service, similar to the God class, is a microservice that is not focusing on a single feature. If you're practicing Domain Driven Design, I think aligning a microservice with a single bounded context makes sense. Smaller, like a couple of domain Aggregates is probably fine too. But a microservice that crosses multiple domain boundaries is very likely a bad idea.

Another anti-pattern, the Leaky Abstraction Service, deals with the subtle issues of growing a microservice from its implementation rather than defining the consumer's contract first. Randy is clearly adamant about making sure microservices are designed from a consumer-first approach. He believes that the usage of a microservice is the true metric of the value of such a service. So a service that is designed without any particular business client, the so-called Client-less Service, is obviously an anti-pattern as well. One final anti-pattern he mentioned that day is the Shared Persistence anti-pattern: two or more microservices that share their data store. As microservices are supposed to be independent, introducing any kind of coupling is always a bad idea.

Well, that's all I got on technology from that day. Next time, I'll talk a bit on the people side of his story. What do you think? Are microservices the next big thing to move away from monoliths? And do you agree with his migration approach? Let me know by commenting below. Oh, and follow me at @ddoomen to get regular updates on my everlasting quest for better solutions.

Monday, July 11, 2016

Understanding a growing organization and the effect on technology

The characteristics of a growing organization
During this year's QCon conference held in New York, I attended a full-day workshop on the scalability challenges a growing organization faces, hosted by Randy Shoup. Randy explained us how every start-up goes through several phases, each with a different focus, specifically search, execution and scaling. The most difficult part of that growth is to scale the organization, the process, the culture and the technology at the same time. Some organizations have proven to be very good at the organizational level, but lacked on the technology level. Others tried to cling on the original culture, but didn't understand that this is something that changes as well. Randy emphasized that properly scaled agile teams, a DevOps culture and modern architectural styles like micro services aren't a luxury anymore.

To illustrate his pleading, he explained how the Universal Scalability Law, originally coined by Neil Gunther, applies to both software as well as organizational scalability. This law states that throughput is limited by two things; contention and coherence. Contention is caused by any form of queuing on a shared resource, be it some technical element, be it a authoritive person, department or process. Coherence defines the amount of coordination and communication is needed between nodes, machines, processes and people.

A real-world analogy of that could be the process of moving people out of a room through a door. If the door is narrow, only one person can get through, which means it'll take a while to empty the room. Having a wider door or even two different doors are obvious solutions to the problem. But if you have two doors, you'll need to coordinate the group or agree on an algorithm to decide who will go through which door. In other words, you need to ensure coherence. Most growing organizations can apply this law both on the organization itself, but also on the architecture. You can see that for example when more developers are hired. Work needs to be distributed over the teams, the architect or product owner becomes a bottleneck, and more coordination is needed between the people.

The phases of a startup
But, as Randy explained, there is a time where you don't need to think about this and a time where this becomes a real problem. In short, you need the right tool for the right job at the right time. Even a prototype or a monolith have merits under the right circumstances. To illustrate that, Randy divided the growth process of a start-up into three phases: search, execution and scaling.

The search phase is all about finding the right business model, finding a product that fits the market and acquiring the first customers. In other words, the organization is discovering the market. In this phase it is imperative to try new things quickly, so prototyping is an essential part of it. Scalability is not a concern yet, might even slow you down and thereby jeopardize the chance you reach your market in time. Paul Graham, one of the founders of the popular start-up investor Y Combinator, even encourages start-ups to do stuff that doesn't scale. So it's fine to take a technology or platform that your organization can't or doesn't want to support, as long as it allows you to quickly try out products and solutions. In fact, it's even advisable to take a non-conforming technology, since it might prevent you from converting that prototype in the real deal. Which brings me to the execution phase.

In the execution phase, an organization is focusing on meeting the near-time requirements as cheaply as possible to meet the evolving customer needs. In a way it's entering the market. Just enough architecture is the way to go and scalability concerns are not an issue yet. The point is that the organization wants to learn and improve rapidly, thereby expanding the market as fast as possible. Consequently, it will use familiar technology that is simple and easy to use and guarantees high team productivity. Organizations in this phase typically build monolithic systems that employ a single database. Although we all know that this will ultimately result in a lot of coupling, performance and scalability bottlenecks, trying to build something very scalable in this phase might actually kill your business. However, identifying natural seams in the architecture and using internal componentization can prepare you for the next phase.

The last phase is all about owning the market and scaling the organization to meet global demands. More centralized teams and standardization becomes necessary. Choices are made on the preferred network protocols, common services such as document management, as well as on development tools and source control systems. Tools are introduced to facilitate the discovery and sharing of code through libraries, searchable code and code reviews. But also the monolith must finally make way for the next generation architecture that uses scalable persistency and supports concurrency and asynchronity. Many of the prior concerns are replaced by dedicated services for data analytics, searching, caching and queuing. However, Randy emphasizes that rebuilding a system from scratch is out of the question. In his opinion (and mine), that would be the worst thing you can do. There's just so much information and history in that existing monolith, it is naïve to think that you'll be able to remember all that while building a new system. Instead, he wants us to start re-architecting our systems. What that means and how to approach that will be the topic of my next post.

So what do you think? Do you recognize the phases that Randy uses? Love to hear your thoughts by commenting below. Oh, and follow me at @ddoomen to get regular updates on my everlasting quest for better solutions.

Thursday, June 23, 2016

Microservices: The State of the Union

After attending a full-day track with multiple sessions and open-spaces on microservices at QCon New York, it is clear that this technique took flight since I first heard about that at the same conference in 2014. A lot of companies made the jump to solve their technical and organizational scaling issues, including Uber, Amazon and Netflix. And many of them used QCon to share their experiences. This is a random list of thoughts, observations and ideas from that day.

  • A lot of my concerns from that time seem to have been resolved by the industry. It's incredible how many (open-source) libraries, products and tools have emerged since then. Kafka seems to be big one here.
  • Container technology seem to have become a crucial element in a successful migration to microservices. Some would even call this a disrupting technology that changes the way we build software. Deployment times went from months, to days, to minutes. But apparently the next big thing is Serverless Architecture . AWS Lambda and Azure Functions are examples of that.
  • Domain Driven Design is becoming the de-facto technique for building microservices. You can see that in this nice definition: "A loosely coupled service-oriented architecture defined by bounded contexts". Event the DDD statement that if you have need to know too much about surrounding services, you probably have your bounded context wrong, applies to microservices. Unfortunately, according to an open space discussion, defining those boundaries is the most complex task.
  • One of the next challenges the community is expected to resolve is the complexity of authorization, security groups, network partition and such.
  • Something that Daniel Bryant noted (and something I observed myself while talking to the people at QCon) is that microservices is becoming the new the solution-to-all-problems. This is dangerous and leads to cargo culting.
  • Microservices is not just about technology. It also has a significant effect on the organization. In fact, as the workshop on the last day showed (about which a blog post will follow), these two go hand in hand.
  • Failure testing by injecting catastrophic events using Chaos Monkey (part of the Simian Army), Failure Injection Testing and Gremlin seem to become a commodity.
  • For obvious reasons Continuous Delivery is not a luxury anymore, it has become a prerequisite.
  • Version-aware routing and discovery of services through an API gateway is being used by all those that moved to microservices and seem to be a prerequisite. Such a gateway also provides a service registry to find services by name and version and get a IP and port. Smart Pipes are supposed to make that even more transparent. Examples of gateways that were used included Kong, Apigee, AWS API Gateway and Mulesoft.
  • With respect to communication protocols, the consensus is to use JSON over HTTP for external/public interfaces (which makes it easy to consume by all platforms), and using more bandwidth-optimized protocols like thrift, Protobuf, Avro or SBE internally. XML has been ruled out by all parties. Next to that, Swagger (through Swashbuckle for .NET) or DataWire Quark were mentioned for documenting the interfaces. Next to that, developers proposed to have the owning team build a reference driver for every service. Consumers can use that to understand the semantics of the service.
  • Even a deployment strategy has formed. I've heard multiple stating that you should deploy the existing code with either a new/updated platform stack or new dependencies, but not both. Consequently, deploying new code should happen without changing the dependencies or platform. This should reduce the feedback loop for diagnosing deployment problems.
  • A common recurring problem is overloading the network, also called retry storming because each service has its own timeout and retry logic causes an exponential retry duplication. A proposed solution is introducing a cascading timeout budget. Using event publishing rather than RPC surely can prevent this problem altogether.
  • If you have many services, you might eventually run into connection timeout issues. So using shared containers allows reusing connection. Don't use connection pooling either. Next to that, if services have been replicated for scale-out purposes, you should only retry on a different connection to a different instance.
  • Regressions don't surface immediately apparently, so all of the speakers agreed that canary testing is the only reliable way to find them.
  • A big-bang replacement is the worst thing you can do, and most of the successful companies used a form of the Strangler pattern to replace a part of the functionality.
  • A nice pattern that was mentioned a couple of times is that every service exposes a /test end-point so that be used to verify versions and dependencies during canary testing.

So what do you think? Are you already trying or operating microservices? If yes, any practices to share? If not, are you planning to? Love to hear your thoughts by commenting below. Oh, and follow me at @ddoomen to get regular updates on my everlasting quest for better solutions.

Tuesday, June 21, 2016

Event Sourcing from the Trenches: Mixed Feelings

While visiting QCon New York this year, I realized that a lot of the architectural problems that were discussed there could benefit from the Event Sourcing architecture style. Since I've been in charge of architecting such a system for several years now, I started to reflect on the work we've done, e.g. what worked for us, what would I do differently next time, and what is it that I still haven't made my mind up about. So after having discussed my thoughts on projections, I still have a couple of doubts to discuss.

Don't write to more than one aggregate per transaction
As long as you postpone dispatching those events until all is done, you should be fine. I know that Vaughn Vernon stated this in his posts about effective aggregate design, I still don't see the real pragmatic value here. Really trying to use a single transaction per aggregate means you need to perfectly design your aggregates so that every business action only affects a single aggregate. I seriously doubt most people will manage to do that. And if you can't, sticking to the single aggregate-per-transaction means you need to build logic for handling retries and compensating logic for when other parts of the bounded context are interested in those events. However, never ever use transactions across bounded contexts.

Separation of the aggregate root class and its state
Inspired by Locad.CQRS, we used a separate class to contain the When methods that are used to change the internal state as a result of an event, both during AR method invocations as well as during dehydration. However, using state results in some cumbersome usage of properties that point to the state class. Having them on the main AR class is going to make it very big, but maybe using a partial class makes sense.

Functional vs unique identifiers
For the aggregates that have a natural key, we use that key to identify the stream events belong to. However, Greg Young once mentioned that using a Guid or something is probably better, but somehow that never aligned with what I've learned to value from Pat Helland's old article Data on the Inside, Data on the Outside. Maybe you should do both?

Share by contract vs by type
Share events as a binary package or through some platform-agnostic mechanism (e.g. Json Schema) is a difficult one for me. Some people argue that sharing the binary package is going to be cause an enormous amount of coupling. But I would think that sharing just some Json Schema still means you're tied to that contract. For instance, if you're in the .NET space, being able to use a NuGet package that only contains the events from a bounded context that can be consumed by another context sounds very convenient. The only thing a schema-based representation will help you with is that it will force you to add a transformation step from that schema into some internal type. By doing that, you have a bit more flexibility in decoupling versioning differences. But somehow, I'm not convinced the added complexity is worth it (yet).

Feedback, please!
So what do you think? Do my thoughts make sense? Am I too pragmatic here? Are you using Event Sourcing yourself? If so, care to share some experiences? Really love to hear your thoughts by commenting below. Oh, and follow me at @ddoomen to get regular updates on my everlasting quest for better solutions.

Monday, June 20, 2016

Event Sourcing from the Trenches: Projections

While visiting QCon New York this year, I realized that a lot of the architectural problems that were discussed there could benefit from the Event Sourcing architecture style. Since I've been in charge of architecting such a system for several years now, I started to reflect on the work we've done, e.g. what worked for us, what would I do differently next time, and what is it that I still haven't made my mind up about. So after having discussed the domain events, let me share some of my thoughts on projections.

Optimized for querying
Projections, a denormalized aggregation of events, have only one purpose: be optimized for reading. So if your API or user interface needs data to be grouped or aggregated in a certain way, project the events like that into whatever storage mechanism you use. The work load should be focusing on the projection logic, not on the reading side.

Projections should not enforce constraints
Projections should be seen as an aggregated cache of events and consequently shouldn't be used to enforce any constraints. In fact, projection code should never crash, ever. This may sound trivial, but if you're code base has been evolving for the a couple of years and events and the underlying constraints have changed several times, bugs are inevitable. So make sure you projection code is resilient. An (bit of naïve) example would be to always cut a string value to the maximum length of the underlying database column, even though you know the event value never exceeds it…now. That's why I love NoSQL databases so much….

Don't share projections
A logical consequence of optimizing queries for a single purpose is that they are optimized for a single purpose…. Even though the persisted projection schema looks like it is a good fit for another kind of query, don't reuse it. Even though they may look similar now, they are inevitably going to deviate. The worst problem you can have is that you need to add all kinds of alternate executing paths in your logic to make sure both interests are served equally. Just duplicate the data.

Keep projections close to the consumer
If you persist the projections to a persistent store like a database, don't assume that the projection belongs to something like a data access layer. As I said before, they should be seen as a local query cache for a very particular purpose. So keep it as close to the consumer as possible. If you use it in a particular HTTP API implementation, move the projection code next to the API code. In fact, I would be going so far by saying that you should test the two together. Such a test should use the events as input and observe the HTTP response as output.

Allow each projector to decide on the persistency store.
Another logical result of all those earlier statements is that each projection can be persisted to whatever storage fits best. Sure, you can store it to a relational database, but I highly recommend to use a NoSQL database instead. However, you could even build up your specific projection into memory as soon as the system starts. And what about projecting your events directly to a local HTML or JSON file that is served by an HTTP API or web site? That's the beauty of Event Sourcing.

Track progress locally
So if each projection can be using a different storage mechanism, you need to be able to have the projection code to track progress themselves. This is something that NEventStore got wrong. It relied on a central structure for determine whether an event was dispatched to all projections. Instead, make sure you design your projection logic to track progress yourself. It gives the projection logic a lot of autonomy, including the possibility to rebuild itself when code changes deem that necessary. Notice that this does mean your event store should allow arbitrary subscribers, each interested in a different starting point.

Asynchronous projection as a first-class concern
Another consequence of all that autonomy is that it essentially means all that projection logic should run asynchronously. We kind of started the notion of having the projection logic run as part of the same thread of control that caused the action on the domain. This allowed us to avoid having to change the UI to deal with the fact that projections are eventually consistent. But a lot of projections don't have to be up-to-date all the time can be run in the background perfectly. This scales much better, in particular when a new version of the code base needs a rebuild of the persisted projections. We added this possibility afterwards, which means it needs to work around the existing code base. So if you can, make this a primary architectural principle. And if you really need a 'synchronous' behavior, have your projection logic expose an interface that you can observe to see if it caught up.

Feedback, please!
So what do you think? Do my thoughts make sense? Am I too pragmatic here? Are you using Event Sourcing yourself? If so, care to share some experiences? Really love to hear your thoughts by commenting below. Oh, and follow me at @ddoomen to get regular updates on my everlasting quest for better solutions.

Friday, June 17, 2016

Event Sourcing from the Trenches: Domain Events

While visiting QCon New York this year, I realized that a lot of the architectural problems that were discussed there, could benefit from the Event Sourcing architecture style. Since I've been in charge of architecting such a system for several years now, I started to reflect on the work we've done, e.g. what worked for us, what would I do differently next time, and what is it that I still haven't made my mind up about. After having discussed aggregates in my last post, let me share some of my thoughts on domain events.

Avoid CRUD terminology
Don't use terms like create, update and delete in the names of your events. Business doesn't talk in those terms, so you should not either. Stick to the terminology from your Ubiquitous Language that applies to the involved bounded context. Again, Event Storming will help here as well since the names will surface during the discussion with business.

Prefer fine-grained events
Avoid coarse-grained events, even if the latter originated from an Event Storming session. High-level events might be more aligned with the business, but they also require a lot of context to understand. As an example, consider a risk assessment for some work to do. Only such an assessment involves a high-risk task, an entire team of specialists is needed. Now consider an assessment for a high-risk which risk level is reduced to low-risk. It basically means the assessment team needs to be disbanded. How would you model that? Have a high-level event like TaskRiskLevelReduced? But that means that all subscribers need to understand that this means that that assessment team is no longer necessary. I prefer to raise separate events, e.g. a TaskRiskLevelReduced first followed by a RiskAssessmentTeamDisbanded. If the rules changes in the future, you'll be more flexible.

Make event merging a first class concern
Since our aggregates never need to handle concurrent commands, we could get away with optimistic and pessimistic locking on the aggregate level. But after two years or so, we couldn't maintain this stance anymore. Adding event merging to your architecture afterwards can be challenging. So, please consider adding this to your ES implementation as soon as possible. I just wish there was a .NET open-source project that could do that.

Allow multi-event conversion
Events evolve, you can't avoid that. Sometimes a event receives an extra property that has a suitable default. Sometimes an event is renamed or converted in another event. But sometimes you need to convert a couple of events into a single new one, or the other way around. So if load your events from an event store, make sure your up converters can have state to support this. As far as I know, none of the .NET event store library have this out-of-the-box.

Use primitive types
In the DDD worlds, it's very common to build domain-specific value types to represent concepts such as ISBN numbers, phone numbers, zip codes, etc. Never ever put these on your events. Events represent something that has happened. Just imagine if somebody changes the constraints that apply to an ISBN number and the old event doesn't comply to it. Moreover, serializing and deserializing those types to the underlying store is usually more expensive than when you use simple primitive types.

Don't enrich events
Don't enrich events with information from the aggregate or its association just to make it available for projection purposes later on. It'll create unnecessary duplication of data and increases the change you'll need event versioning. The only exception I can think of is when that data is functional. E.g. when you add a product to an order and it's important to track the price at that point, you would probably make it part of the aggregate's events. In that case, you could handle all that in the projection code, but that means that that code needs to understand how prices are handled.

Events are your contract
Treat your events as the contract of your bounded context. Other bounded contexts or external systems can consume or subscribe to them to keep track of what's going on. It's the perfect integration technique for decoupling systems, in particular for building micro-services.

Don't use them as a notification mechanism within the bounded context
In the original definition of a domain event, events were used as a notification mechanism from one domain entity to another. So, if a product was discontinued, an event would be raised for that. A domain event handler somewhere else in the code base could handle that and perform a business action on all running orders that included that product. I used to love that mechanism, but found that it makes it particular difficult to trace what's going on in a system. Next to that, DDD introduced domain services for that. Notice the bounded context in this discussion. I'm only talking about the code within the bounded context.

Feedback, please!
So what do you think? Do my thoughts make sense? Am I too pragmatic here? Are you using Event Sourcing yourself? If so, care to share some experiences? Really love to hear your thoughts by commenting below. Oh, and follow me at @ddoomen to get regular updates on my everlasting quest for better solutions.

Thursday, June 16, 2016

Event Sourcing from the Trenches: Aggregates

While visiting QCon New York this year, I realized that a lot of the architectural problems that were discussed there could benefit from the Event Sourcing architecture style. Since I've been in charge of architecting such a system for several years now, I started to reflect on the work we've done, e.g. what worked for us, what would I do differently next time, and what is it that I still haven't made my mind up about. So please let me share some of my thoughts on this.

Event Sourcing (ES) is an awesome architecture style for high-performance systems that supports some powerful concepts like fine-grained conflict handling, optimized projections, potentially unlimited horizontal scaling, and great business buy-in. But it introduces a lot of complexity like eventual consistency, event versioning and projection migration challenges. As with every (design) pattern, methodology or tool, you need to consider the trade-offs and the problem you're trying to solve. Don't jump on the ES train just because it sounds like a cool thing to work on. We only migrated from a CQRS-based architecture to ES to build an application-level replication protocol, even though we knew about ES when the entire project started. Granted, because of my positive experience in the current project, I would definitely consider ES for any non-trivial system. But I'm fully aware I might be falling in the second-system trap.

Use Event Storming
Event Storming is a technique to identify your (business) events from conversations with the business rather than extract them from your domain. Since we migrated from a relational-database-based domain model loosely based on Domain Driven Design principles to event sourcing, we had to extract our events from the existing code. This resulted sometimes in what Yves Reynhout amusedly called "property sourcing". They were rather technical and never encompassed the actual business intent. Event Storming helps you to identify the dynamics of the process you're trying to model rather than the state-oriented domain modelling approach. A nice side-effect of it is that it will surface potential conflicts in definitions warranting the introduction of separate bounded contexts.

Don't rely on aggregates to be in sync at all times
If your order (logically) references a product, don't rely on that product to exist or to be in a certain state. By following this principle, your code will be designed to handle non-existing data from day one. If, in the future, you're performance requirements grow that far that you need to partition your event store, you can do so. Trying to make your code handle these situations later on is going to be extremely painful. Projection code that aggregates events from multiple aggregates or maintains lookups is particularly susceptible to this. Prepare for it.

Postpone snapshots until you really need it
It adds complexity that you may not need. For instance, in our project we have two kinds of aggregates. They live a couple of days and receive a lot of events, but get abandoned after that. Or they live very long (like a user aggregate) but receive only a couple of events over a period of months. In both cases, we had no need for snapshots since the number of events per aggregate is rather low.

Identify a partition key for your aggregate
Even though you won't need it immediately, it allows you to scale in the future by partitioning the event store by that key. For instance, orders in a purchase system may be tied to a country. It's not like an order suddenly moves from one country to the other. And if the unexpected still happens, you always have the choice to copy the aggregate into a new aggregate with a different partition key. If there's no natural partition key, try to come up with something synthetic anyway.

Feedback, please!
So what do you think? Do my thoughts make sense? Am I too pragmatic here? Are you using Event Sourcing yourself? If so, care to share some experiences? Really love to hear your thoughts by commenting below. Oh, and follow me at @ddoomen to get regular updates on my everlasting quest for better solutions.

Tuesday, June 14, 2016

How to optimize your culture for learning, growth and collaboration

At the first day of QCon New York, I attended several talks and open-spaces that had some relation with culture, be it about improving the efficiency of developers, handling disagreement in respectful way, and creating an environment that embraces the learning experience.

For instance, one of the questions the attendees asked is how to convert low performers to high performers. As you can expect, nobody has that magic formula, but several ideas came up. Peer reviews and pair programming are the obvious example. Giving homework or assignment are others. But while discussing, the question came up what motivates people. The agreement was that being able to make a difference is a success factor here, so working on something you don't value is obviously not helping. Somebody used the analogy of a journalist working on a story that you're not interested in; you need to finish it, and there's a deadline. One little side-track dealt with the insecurity some potentially talented developers suffer from. They may be afraid to ask questions to the most knowledgeable people within the organization, thereby holding them back from obtaining a deeper understanding of the code they are working on. They may try to fall back on trying to find solutions on the internet or coming up with an idea themselves, thereby completing missing the alignment with the architectural vision. So having some people around that are actively approachable is extremely important for them. In a way you can conclude that fear can seriously hamper the growth of a high potential. Which brings me to the next point.

During that same Open Space, somebody explained the situation in which the CTO likes to be challenged by professionals in that company by publicly arguing about topics. His way of working involves cheesy values such as "Fight fairly, but argue to win". I don't have to explain you how bad that is for the motivation of the people in that company. Even if you're a strong communicator and feel very secure, you might attempt to talk with this guy a couple of times. But eventually you'll just stop. Now imagine the same for the more introvert or insecure people. Healthy arguments are..well..healthy, but where do you draw the boundary?

Sonali Sridhar, who gave a talk on this same topic, explained how her no-profit organization, Recurse Center, used a set of simple social rules to stimulate healthy and respectful conversations. For instance, you're not allowed to feign surprise when somebody asks you a question that you expect them to know already. It's condescending and will cause people to avoid asking questions in the future. Another one is to rule out the use of the phrase "Well actually". It is often used by people to emphasize an error in some argument that is totally irrelevant. It derails the original argument and moves the focus from the person that was speaking to the person that used that phrase. She also mentioned the "no subtle -isms" rule, which she used to reference subtle inquiries that might have an origin in sexism, racism, etc. And finally, she stated the simplest rule of all: treat people as adults. The longer I think about this, the more it makes sense to me. I'm pretty sure I've feigned surprise here and there…

So what do you think? Do you agree that these thinks can help build a working environment where failure as way to learn is a good thing? And what do you think about those social rules? Love to hear your thoughts by commenting below. Oh, and follow me at @ddoomen to get regular updates on my everlasting quest for better solutions.

Monday, June 13, 2016

A Git collaboration workflow that provides feedback early and fast

At Aviva Solutions, we’ve been using Git for a little of over two years now and I can wholeheartedly say that after having worked with TFS for years, we'll never go back… ever. But with any new technology, practice or methodology, you need to go through several cycles before you find a way that works well for you. After we switched over from TFS, we kept kind of working in a centralized fashion (hey, old habits don't die) where all those feature and team branches are kept on the centralized repository. If the entire code-base involves only a couple of developers, all is fine and dandy. But if you're working with 20 developers, not so much, unless you love those rainbow-style historical graphs…

clip_image001[4]

So after a couple of months, in order to achieve a bit more isolation and less noise, the first teams started to fork the main repository. This worked quite well for most of the teams, in particular due to the power of pull requests. But it took more than a year before we managed to coerce all teams into doing that. That may surprise you, but teams get a lot of autonomy. For instance, they decide on their development process (Scrum, Kanban or a hybrid of that) or the way they work together. But switching from the nicely integrated combination of Visual Studio and TFS to the hybrid of command-line tools and half-baked desktop apps proved to be harder than I expected. Apparently the concepts of clones, forks and remotes were not as trivial than I thought.

We couldn't just revoke all write-access and force teams to work on forks without causing a riot. So it took another six months until we finally got all teams to agree on a process where everybody works on forks and uses pull requests (PRs) to submit their changes for merging by a small group of gate keepers. Teams use the PR for getting build status updates, tracking test status and performing code reviews by their peers and technical specialists. The gate keepers don't do code reviews themselves. They just make sure the code was reviewed and tested by the proper people. This has worked quite well for the last 9 months or so, but what about teams? How do developers within teams work together on their shared tasks and user stories?

Now that people can't directly push to the central repository anymore, they have no choice than to use a fork to do their work. We use GitHub, which does not have the concept of a team fork, so most teams use the fork of the first person that started working on a particular task. All developers involved in that work get write-access and can directly push to that repository. Most teams seem to be fine with that and I myself have worked like that as well. But in my experience, this approach has several caveats:

  • Code only gets reviewed when the combined work of the involved developers gets being scrutinized as part of the final pull request, which means the rework comes at the end as well. And if you value a clean source control history, it'll be very difficult to amend existing commits and result in those dreaded "Code review rework".
  • The changes being pushed by the individual developers can break the shared branch, e.g. by pushing compile-errors or failing unit tests. Even if that shared branch is linked to a WIP (work-in-progress) pull request, your build system might not be fast enough to trigger the team. They might be spending half an hour trying to figure out why their code doesn't compile, only to discover they pull a bad commit from the other team member.
  • A developer might push a solution that doesn't align with the agreed solution for the involved task. But if the other devs don't need to look at that code up until the final code review, the rework might be substantial.

That feedback loop is a bit too long for my taste. So we've tried an approach which probably could be best described as a multi-level pull request flow. It looks like this:

clip_image002[4]

The idea is simple. Everybody works on their own fork, but there's a single branch on one of the designated forks that serves as the integration point for each other's changes. So in this example, the shared branch is on John's fork and neither of the three will write directly to the branch. Whenever Dean has completed his task, rather than writing directly to the shared branch, he'll issue a pull request to John's fork. This PR is then used to do an early code-review by either Mike or John, after which Dean will do the rework on his branch. Since this team values a clean history, Dean will complete his task by cleaning-up/squashing/reordering the commits on his branch using an interactive rebase. As soon as this has been done and the involved builds report a success status, John or Mike will merge the PR. Obviously, Dean and Mike follow the same work flow.

When all collaborative work has been completed, one of the guys will rebase their work on the latest state of affairs of the central repository and file a pull request to it. If not all earlier mentioned pull requests were merged using GitHub's new squashing merge technique, an (interactive) rebase will get rid of those noisy merge commits. Granted, that final PR might still receive some code review comments. But if you have been involving the right people in any of the earlier PRs, that should be minimized. Again, the goal is to keep the feedback loop as short as possible.

What do you think? Did I tell you something you didn't know yet? And what about you? Any tips to add to this post? Love to hear your thoughts by commenting below. And follow me at @ddoomen to get regular updates on my everlasting quest for better solutions.

Thursday, June 09, 2016

How to get the best performance out of NHibernate (and when not to use it at all)

Use the right tool for the right problem

A very common sentiment I'm getting from the .NET community is the aversion against object-relational mappers like NHibernate and Entity Framework. Granted, if I could, I would use an (embeddable) NoSQL solution like RavenDB myself. They remove the object-relational friction OR/Ms try to solve and allow you to decouple your code from scalability bottlenecks like shared database servers. And quite often they provide some cool features such as map-reduce indexes and faceted search. If a NoSQL product is not an option, some would argue that writing native SQL is always the better option. But in my opinion, unless you need to squeeze out the last bit of performance from a database, writing your own SQL statements is a waste of time. But even then, I would probably prefer some lightweight mapping library such as Dapper then writing the mapping code myselfclip_image001Those same people would also argue that NHibernate adds a lot of overhead and complexity, and there's some truth in there. It's a very powerful OR/M that is very good at mapping complex object-oriented designs to a relational database schema, but there's also a lot you can do wrong that will completely kill your performance. I once made the mistake of creating an abstraction on top of NHibernate (inspired by this article). It sounded like a nice idea for testability purposes, but treating NHibernate as a persistent LINQ-enabled collection forced me to limit myself to the common denominator (a.k.a. LINQ).

Regardless, if you can't use a NoSQL solution like RavenDB, you can't apply an architectural style that avoids the object-relation mismatch (e.g. Event Sourcing and/or CQRS), you need to support multiple database vendors, and you don't need the raw performance of native SQL, I would still recommend NHibernate over Entity Framework.

Now, when you do, please don't make the mistakes I made, and apply some or all of the following tips & tricks. For the record, I'm assuming you use Fluent NHibernate to avoid those ugly XML files. I never bothered with the built-in fluent API because I kind of got the impression it is still work-in-progress (hopefully somebody can convince me otherwise).

Don't abstract NHibernate

If you're practicing Test Driven Development, don't abstract away the code that uses NHibernate and write unit tests against an in-memory Sqlite or SQL Server LocalDB using NHibernate's built-in schema generation tool. It will surface any edge cases in NHibernate's LINQ support much earlier, and you will be able to profile the underlying raw queries right from inside your unit test. Which brings me to the next point…

Understand the run-time behavior

Ayende's NHProf is an awesome tool to find performance bottlenecks and common mistakes. It will not only show you’re the queries you've been executed, but also show you the entire stack trace of the code that was involved. Next to that, it can provide you with the actual results of that query as well as the full query execution plan from the underlying database. For each query it shows what part of the execution time was spent in the database and what part is added by NHibernate as well. It will even show you whether or not the query benefitted from NH's 2nd-level cache. And did I mention all the warnings it will give you when you're making common mistake such as N+1 selects, inefficient transaction management, or requesting unbounded result sets? In a way NHProf gives you a holistic view of your application's database interaction.

clip_image002

Prefer NH's own QueryOver API over LINQ

LINQ is a common denominator and doesn't support everything NH supports such as e.g. inner joins, left and right out joins, aliasing, projection transformers, etc. One more reason not to abstract NHibernate…

decimal mostExpensiveProduct = session.QueryOver<Product>().Select(Projections.Max<Product>(x => x.Price)).SingleOrDefault<decimal>();

Use optimistic concurrency and dynamic updates

NH's default behavior is to include all columns in every UPDATE or INSERT statement, regardless if the mapped property has changed or not. NH will also include all columns in the WHERE clause when doing an optimistic concurrency check. You can improve the speed of the latter by adding some kind of incremental number or timestamp and map that one as the version for the entity. That ensures that only the versioning column is included in the WHERE clause. But you can do even better by enabling dynamic updates on the mapping, e.g. using the DynamicUpdate method of the ClassMap<T>. This will tell NH to only include the actual columns that changed in the UPDATE and INSERT statements. I don't need to explain why that will give you a nice performance boost.

public class ProductClassMap : ClassMap<Product>
{
    public ProductClassMap ()
    {
        DynamicUpdate();

        Id(x => x.ProductId);
        Version(x => x.Version);
        Map(x => x.Name);
        Map(x => x.Price);
    }
}

Avoiding the insert-insert-update for child collections

A long-standing issue that has caused a lot of confusion in many of my projects is the way NH deals with parent-child relationships (also known as HasMany associations). For reasons related to association ownership, NH will first insert the children without any foreign key, and then issue another update of those children after the parent has been added. Because of this weird algorithm, the foreign key column on the child table has to be nullable. Because of all of this, inserting a new parent with 5 childs involves a total of 11 SQL statements. 5 to insert the children, one to insert the parent and another 5 to update the foreign keys of those children. I only recently discovered that this has been changed in NHibernate 3.2 and you can now fix this by using the following (fluent) construct.

public class OrderClassMap : ClassMap<Order>
{
    public OrderClassMap ()
    {
        HasMany(x => x.Products)
            .Not.Inverse()
            .Not.KeyNullable()
            .Not.KeyUpdate()
            .Cascade.AllDeleteOrphan();
    }
}

Notice the Not.KeyNullable() and Not.KeyUpdate(). You need both to make this work. The additional Not.Inverse() is not really needed, but can be used to emphasize that the Order in this association is responsible for maintaining the foreign key relationships. In NH jargon, this means that this side owns the association. You only need the Inverse() option in bi-directional associations so that NHibernate knows whether the Parent or the Child is responsible for properly setting up foreign keys. If this inverse thing still confuses you, I can highly recommend this article.

Lazy-loading heavy loaded properties

Sometimes you need to map a property on your entity that is pretty expensive to load and save, e.g. a byte array or some serialized Json or Xml. I know, you may want to avoid that in the first place, but if you can't, you need to know that you can mark properties as lazy loading like this:

Map(x => x.Thumbnail).LazyLoad();

So, assuming the ProductClassMap from earlier examples, when you fetch one or more orders, it will exclude the thumbnail data from the SELECT statement. But as soon as you access that property, it will issue a separate SELECT to get the actual column data. One caveat though. If you have multiple lazy-loaded properties, NH will fetch all them as soon as you access any of them.

Eager fetching of associations

Associations between entities are never initialized by default. This is well-known source of the infamous N+1 SELECT problem that happens when you load a bunch of entities using a query and then iterate over them. The first SELECT will get the parent entities, but accessing the association property of each parent entity will cause another SELECT to fetch the related childs. You can tell NH to fetch those children as part of a query using @ on a case by case. But if you know you'll always need them together and you can't merge those two tables in a nice and eficient cartesion product, map the association as a Not.LazyLoad().Fetch.Join().

Components without value semantics

A very much misunderstood aspect of component mapping is that the classes that are mapped as a component must behave like a component. They must expose value type semantics and have no identity other than the combined values of all of its properties. In other words, they must override Equals() and GetHashCode(). This is especially important when you map a property to a collection of components, like this:

HasMany<Address>(x => x.Shipments)
    .KeyColumn("OrderId")
    .Table("OrderShipments")
    .Component(x =>
    {
        x.Map(c => c.ZipCode);
        x.Map(c => c.Number);
        x.Map(c => c.State);
    })
    .Cascade.AllDeleteOrphan();

If you don't, NH can't determine the equality of the objects in your collection property resulting in some weird behavior. I’ve seen NH delete and insert the same set of child objects every time somebody added an additional child to the collection. You don't notice that until you run that profiler again.

Some more tips & tricks

  • If you're in a position that you can't change too much of an existing database scheme, and you're application has vastly different needs in the way that data is read or written, you can consider multiple ClassMap to the same table. As long as all but one class maps are declared as ReadOnly, Nhibernate wil happily allow you. This has proved to be a very efficient technique to have different lazy-loading settings for the same table structure.
  • If you're into Domain Driven Design like me, you might be tempted to create all kinds of domain-specific and rich custom NHibernate types and map them to your columns. So rather than having a string-valued property to represent an ISBN number, you might define your own Isbn type. Now, if you value performance, don't. Just run a good CPU profiler like JetBrains' dotTrace to understand the impact of that.
  • Don't underestimate the power of NHibernate's second level cache offered by the likes of SysCache2. It can give you an enormous performance boost, especially if you deal with a lot of immutable data and you can avoid the infrastructural complexity of a distributed cache. Just don't forget to wrap all your code with a call to EnlistTransaction and CompleteTransaction. Ayende has written enough about that.
  • Consider you need to remove an entire range of entities from your database. You could query for them using LINQ or QueryOver and then issue individual Delete() statements on the session, but you can do better by employing NH's DML operations API. It supports HQL statements that resemble native SQL without any coupling to a specific database vendor like this:

    session.CreateQuery("delete Order order where order.CreatedAt > :minData")
        .SetDateTime(minData).ExecuteUpdate();
  • You might know that you need to define cascading operations on parent-child collections. Just don't make the mistake to do this on HasManyToMany or References mappings. They are meant to create associations between entities which lifetime is indepednent of other entities. Doing it wrong caught us by surprise a couple of times, only to discover somebody added a Cascade.All or Cascade.AllDeleteOrphan().
  • NHibernate allows you to map simple collections of single elements such as numbers or strings to a child collection. But if you do, think hard about the uniqueness of those elements. By default, NH will treat an IList or array as a bag and allow duplicate items. If you don't want that, add an AsSet to the mapping like this:
  • HasMany(x => x.Options).Table("Options").KeyColumn("ParentId").Element("OptionValue").AsSet();

Well, that got a bit out of hand. What do you think? Did I tell you something you didn't know yet? And what about you? Any tips to add to this post? Love to hear your thoughts by commenting below. And follow me at @ddoomen to get regular updates on my everlasting quest for better solutions.