Hey architect! Building a complex system with four talented developers is one thing. Building one with 40 developers is a whole different league.
The quote above is closer to the truth than you might expect. Many large organizations develop software systems with hundreds of software developers. But how often does that go well? Not that often… And what is the better approach? Defining all aspects of the system upfront? Or postponing decisions as late as possible? Let's look at both…
A strict approach
How difficult is it to design a system in all its details that is completely unique (they always are) and for which many functional details will change along the way? Quite difficult if you ask me. Moreover, many stakeholders think they know exactly what they are asking from the development teams, but still, they keep on asking for new requirements as soon as the building starts. A solution many organizations choose, is to hire shiploads of project managers whose single task is to tightly control those changes. This happens a lot in organizations where a hierarchical structure is still the norm and where architects live in the proverbial ivory towers. What the project managers do to the functional scope, the architects do to the technical design: restrict change.
But, are those changes really something you want to avoid? Are these changes not needed to end up with a software system that will support the end-users as optimal as possible? I believe so.
I guess you might be wondering how to approach this problem then.
A flexible approach
At the other end of the spectrum you'll find those organizations that follow the agile principle. Agile projects can be characterized by time-boxed well-scoped periods of two to three weeks in which multi-disciplinary teams complete several functional increments, including functional design, construction, testing, and in some cases, even delivery. The function of an architect doesn't really exist here so many team members will take that up as one of their roles. The big challenge in this approach is that no big upfront detailed functional design is ever done. Instead, a team will do as much design is necessary to estimate and complete the work planned for the upcoming period of two or three weeks. That doesn't say the team just jumps into the project. Usually they work with a high-level reference architecture that defines the boundaries and responsibilities of the team and evolves along the way.
Agile teams have a lot of autonomy and need to be empowered to make the decisions they think are needed to complete the functionality they committed to. But, as soon as the number of teams increases, this approach tends to be susceptible to uncoordinated architecture changes and overlap where teams are solving the same problems in different ways or doing the same work. A solution that you see quite often is to introduce one or more architects whose single task is to try to keep up with what the teams are doing. Not a scalable approach if you ask me. Then what?
Best of both worlds
Is it possible to combine the strengths of these two approaches? Yes, you can!
First, you need somebody who'll sit with the most experienced representatives from the teams to define a reference architecture that can solve the problem at hand. Then those people will need to break down the system in smaller autonomous components which the teams can own and develop in relative isolation. Note though that this somebody, often called an agile architect, does not only need to look at the problem from a technological perspective, but will also need to consider the combination of people, the processes and the tools. Proper architecture involves all of these aspect.
But next to that, each component should be owned by a team with enough autonomy to make decisions without the involvement of some kind of global architect. At the same time, such a component should be open enough so that other teams can contribute improvements and features. In fact, they should be able to work with a temporary copy of that component as long as their contribution hasn't been fully accepted by the owning team. Only then can you avoid dependencies in which one team has to wait for another before they can continue their own work.
But that's not all. Teams should be able to look at a component's version and immediately tell whether that component contains breaking changes. To accomplish that, they'll need to agree on a versioning scheme that explicitly differentiates between bug fixes, backwards compatible functional improvements or breaking changes. Ideally, they use a distribution tool that understands that scheme, provides notifications about new versions, and empowers them to select a version that best suits their need.
In short, autonomy, clear responsibilities and boundaries, an unambiguous versioning scheme, and the right tools are a great recipe for scaling agile software development.
What else can you do?
In the sixties, Melvin Conway completed studies on the relationship between architecture and the physical organization of a company. His main conclusion stated that eventually, the structure of a complex architecture will adapt to the way teams are physically located within an organization. If you think about that, that's not a weird conclusion at all. If teams feel any barriers while trying to talk with other teams, such as physical separation, they tend to introduce rules of conduct on how they exchange information. For instance, introducing a well defined protocol with which components communicate is a practical example of this. Since then, many software practitioners have confirmed Conway's observations. I particularly favor co-located teams, e.g. teams that sit together in the same open space, because that guarantees short communication lines. Over time however, I've learned from first hand that this can also result in large monolithic systems. Instead, creating physical boundaries that closely align with the desired architectural boundaries might be just what you need to prevent that in the first place. In other words, don't deny Conway's Law.
Ok, now back to reality
So far so good. Now that I've covered a lot of theory, let's see if this is even remotely possible to do in reality. First of all, no off-the-shelf solution or product can help you here. The trick is to collect the right tools and services and combine them in a smart way.
Git & GitHub
To begin with, you're going to need Git, an open-source version control system that, unlike Microsoft's own Team Foundation Server, has been designed to be used in a distributed occasionally connected environment. Using Git, each component identified in the prior discussion will end up in a separate project where only the owning team can make changes. This gives those teams the power they need to control what they own. As an example of one of the many cloud services that can host Git repositories, we've been using Github for both private and public repositories for over a year now.
Forks & Pull Requests
Two very crucial concepts that Github offers are forks and pull requests. A fork, a term originating from the Unix world, allows you to make a personal or team-specific copy of an existing repository while retaining a reference to its origin. This allows a team to make changes to an existing component in isolation, without the need to have write access on the original repository. When you combine this with the pull request concept, those same changes can be contributed back to the owning team by sending a request to pull the local changes back into the owning repository. This is not a requirement though. It's perfectly fine for a team to fork the other team's repository and continue from there. But if you do use pull requests, you can use it as a central hub for code reviews, discussions and rework, all to make sure the owning team can incorporate the changes without too much hassle. In a way, the owning team gains maximum control over their components, without holding back any other team.
NuGet & MyGet
I didn't mention the form in which components are shared between teams. Yes, you can share the original source code, but that's a great recipe for loosing control quickly and creating way too many dependencies between the teams. A much better option, and the de-facto standard within the open-source .NET community is to employ NuGet packages, a ZIP based packaging format that combines binary DLLs with metadata about the supported .NET versions, release notes and such.
Obviously, you don't want your internal corporate packages to publicly appear on nuget.org. That's why a couple of guys have build MyGet, a commercial offering that allows you to share NuGet packages in a secure and controlled environment. MyGet offers a hosting solution, but you can also run it on-premise. If you do work on an open-source project, MyGet even allows you to use it as a staging area for intermediate versions, before you publish your package to nuget.org using MyGet's one-click publishing features.
Semantic Versioning & GitFlow
Both NuGet and MyGet have a versioning system that supports the notion of pre-release component, and give you fine-grained control on how to relate one version of a component to another. There are several schemes that you can use, but I'm pretty fond of one called semantic versioning. This scheme contains unambiguous rules on how to version minor and major changes as well as bug fixes and patches. It creates clarity for the teams using a component. But, determining the version number of a particular component is still a manual step. As an example, consider version 1.2 of a component. Compared to version 1.1, it should be fully backwards compatible and just add new functionality that doesn't effect existing consumers. Version 2.0 is different though. It is not backwards compatible and requires changes to component that uses it. This may sound trivial at first, but you might be amazed how often version numbers are increment just for commercial or marketing purposes. Also, wouldn't it be nice to have some kind of git branching strategy that could generate version numbers automatically based on conventions? Indeed, if you use the GitFlow branching strategy (which gives special meaning to the master, develop and release- branches) and combine this with GitVersion, your component versions will be derived from the branch name.
Even Microsoft has seen the light
Doesn't that sound great? But if all these tools and practices work so well, I hear you asking, why aren't any other organizations doing this already? More and more companies are adopting this approach. In fact, even Microsoft has seen the light and decided to make almost all parts of the .NET platform open-source. And not just through Microsoft's own CodePlex or Visual Studio Online services. No, they've dropped everything on Github and now even accept pull requests from the community. I gather you didn't think that would have been possible 5 years ago….
So what do you do to scale your agile software development teams? Let me know by commenting below or tweeting me at @ddoomen.