Thursday, March 24, 2016

The responsibilities of an open-source developer

The proudest moment anybody initiating an open-source project can experience is when that project finally gains the momentum to make a difference within the community it targets. When my colleague Martin and I published the first release of Fluent Assertions on CodePlex in 2011 (yeah, those were the days), not even in our wildest dreams we expected that by 2016 our NuGet package would have been downloaded that much. Next to that, almost every week, somebody posts a blog post about our little .NET assertion library. And if you scan nugetmusthaves.com, you'll find 54 packages relying on Fluent Assertions. It supports all current .NET versions and platforms including Xamarin (by Oren Novotny). And even the .NET Core Lab team is using it, which is why we support CoreCLR, .NET Native and .NET 4.6 since it earliest betas.

But with that popularity comes great responsibility…

Rather than working on the next killer feature or API, you'll be spending a lot of your private time answering questions on GitHub, Gitter, StackOverflow or email. And if you do that, stay friendly and constructive. Thank them for taken the time to file an issue, and do that in a timely fashion. Nothing is more annoying for somebody that uses your library, runs into an issue he or she can't resolve, takes the time to post about it and doesn't get a response in weeks. That sure is a recipe for losing some fans quickly. Even if you believe they've made the mistake and everything is fine with your library, take the time to explain your reasoning behind it. More than often I expected to get some nasty repute when I rejected an issue. But I've come to learn that people are just happy to get some help to get them going.

Sometimes the issue is more complex, e.g. when somebody is using your library in an unsupported way (you'll be surprised what people think of sometimes). Explain them the design philosophy behind your library and why you're rejecting a particular issue. If possible and - like Fluent Assertions - you offer extension points, provide them with a link to some example that shows them how to build their own extensions. If it's a potential bug which you can't reproduce easily, be not afraid to ask them to help you reproduce it or request a little project that reproduces the problem. Now, most open-source developers I do know don't have the free time to build every possible feature that is requested. So if you do think their issue or suggestion has merits, tell them that you're accepting pull requests.

Now that I mentioned that, I need to explain my philosophy on using open-source software in professional software development. If it's up to me, you can use whatever open-source library you want to use, but only if it meets one of following two requirements.

  1. It's backed by a large group of developers or contribution base. So if one of the original authors abandons the project, there's no risk to the project itself.
  2. Its code is well-written, readable, properly documented and heavily covered by unit tests at a level that your confident enough to continue supporting the code as part of your own code base.

With this in mind, I think I have the following responsibilities towards my own project.

  • Ensure that all code uses the same layout, coding styles and naming conventions.
  • Make the code a testimony on how I envision high-quality code.
  • All public and protected APIs are properly documented.
  • All changes are backwards compatibility with prior versions. If breaking changes are needed, they get staged for a next major release.
  • The API is consistent and follows the Principle of Least Surprise.
  • All edge cases are covered by well-factored unit tests.
  • Ensure a clean source control history that will make it easy for contributors and users to find out what has been fixed when.

These will hopefully encourage more people to join the project and will lower the threshold for those cases where somebody needs to fork the code and start to maintain it themselves.

I expect it to be no surprise that these responsibilities in some form transfer to contributors who submit a pull request. And that's the hard part. You should show contributors the respect that they deserve. They've decided to commit a potentially substantial amount of their free time to provide you with a pull request that solves a particular bug or adds a feature, so show how much you value that. Sometimes you won't get a single PR in weeks, but then you suddenly receive a couple of them in a short time. If you're lucky, those PRs meet whatever requirements you've set (e.g. by using Github templates) to honor responsibilities such as I mentioned above. If not, you have to decide how to deal with that. You can decide to take the PR as-is and do the rework yourself, or, as I tend to do, do a thorough code-review and ask them if they're willing to pick up the rework. And that's a double-edged sword. Either you do the work in your precious free time, or you risk discouraging contributors from contributing again.

So what do you think about this? Do you agree with the responsibilities of an open-source project? Would you accept any PR or do you prefer to uphold you standards? I'd love to hear your thoughts by commenting below.

That being said. Now you know how I think about all this, if you would like to join the ranks of those that made Fluent Assertions such a great library, we have lots of work up for grabs. Oh, and follow me at @ddoomen to get regular updates on my everlasting quest for better solutions.

Thursday, March 10, 2016

Profiling legacy code using characterization tests

As you might have read, I've been refactoring some example code for a multi-threaded cache that I got from CodeProject into a source-only NuGet package which will soon be published as FluidCaching. Since this cache has been built to be very performant, the internal algorithms are not trivial to grasp. The fact that the code doesn't meet any coding conventions, doesn't have any unit tests and clearly violates all SOLID principles, doesn't help either. My intention was to do some serious internal refactoring and getting rid of custom-built thread-synchronization primitives that can be replaced with built-in .NET framework classes.

But how can I do that without breaking any of the existing functionality? Without unit tests, there's no safeguard that protects me from breaking things. Heck, the existing code might even contain bugs (and it did). Sure, I've used the original code in a RavenDB spike, but that doesn't prove it will work correctly under all circumstances. A couple of years ago, Michael Feathers wrote a brilliant book full of strategies for dealing with legacy code. I don't recall his exact definition for 'legacy code', but by my definition, legacy code is every line of code that isn't covered by some automated test, isn't governed by a well-defined coding conventions, and has been designed according to accepted object-oriented design techniques such as Clean Code and SOLID.

One of the many techniques he discusses in his book is to take an existing code base and try to write a couple of automated end-to-end tests that set-up the appropriate initial state and exercise the API to observe the results. Depending on the complexity of that code base, you might not even be able to predict the outcome of the test. So just run those test cases, observe the outcome and finalize the tests by adding the necessary code to assert that the next run gives you the same results. When I did this for FluidCaching, the mere act of tweaking the input parameters and observing the outcome gave me great insight in the underlying caching algorithm. By encoding those combinations as unit tests, I was gradually building what Michael coined characterization tests. In a way, you're building a profile of that code base in a similar fashion as FBI Profilers build a profile of serial killers.

image

When your characterization tests cover enough of the code base (and a code coverage tool like NCrunch can really help you here), you’re ready for the next steps. You can start refactoring the internal implementation while relying on the safety net provided by your tests. Or you can start isolating external dependencies like databases, file systems or other dependencies like web services or proprietary integration points. Without these, your tests will run much faster and allow you to replace expensive set-up code with more focused and readable test data builders or test doubles. Jeremy D. Miller used to refer to this approach as Isolate The Ugly Stuff. This is an important step to more isolation. More isolation leads to more control, less fear and a faster development cycle, which allows for more aggressive refactoring.

If you're far enough that the code base doesn't rely on expensive external dependencies, it's time to get rid of any static mutable state. In .NET, the use of DateTime.Now is a great example of that. Using thread-static class fields is another. This may sound like a decent idea, but just try to run your tests in parallel (like XUnit does out-of-the-box). Chances are that your tests start to fail in a non-deterministic way. Even the Service Locator pattern promoted by Microsoft's Enterprise Library is a bad example. Obviously, all these little refactorings should open up the possibility for introducing more well-written unit tests that would keep you out of the debugger hell. Just be careful to think deeply about the scope of a unit and consider these tips and tricks while you're on it. When you have enough unit tests in place and you feel confident that you have control of your code base, you can consider getting rid of those ugly and unmaintainable characterization tests. However, I would recommend to keep a few around as integration or subcutaneous tests, provided they are easy enough to understand in case they fail.

So what do you think? Did you go through a similar exercise before? I'd love to hear about those experiences by commenting below. Oh, and please follow me at @ddoomen to get regular updates on my everlasting quest for better solutions.