Monday, July 25, 2016

Scaling a growing organization by reorganizing the teams

During this year's QCon conference held in New York, I attended a full-day workshop on the scalability challenges a growing organization faces, hosted by Randy Shoup. In my previous two posts I discussed a model to understand the needs of an organization in its different life phases, as well as a migration strategy for getting from a monolith to a set of well-defined microservices.

The Universal Scalability Law…again

However, Randy also talked about people, or more specifically, how to reorganize the teams for scalability without ignoring the Universal Scalability Law. What this means is that you should be looking for a way to have lots of developers in your organization working on things in isolation (thereby reducing contention) without the need for a lot of communication (a.k.a. coherence). So any form of team structuring that involves a lot of coordination between teams is obviously out of the question, particularly skill-based teams, project-based teams or large teams.

For the same reason, Randy advises against geographically split teams or outsourcing to so-called job shops. Those do not only involve a lot of coordination, but local conversations will become disruptive in melding a team. Just like Randy, I find face-to-face discussions crucial for effective teams. But if your team is not co-located, those local conversations will never reach the rest of the team. Yes, you may persist on posting a summary of that discussion on some kind of team wiki, Flowdock/Slack or other team collaboration tool, but they will still miss the discussions that let to that summary. Even using a permanent video conferencing set-up doesn't always solve that, particularly if the people in the team don't share the same native language (which is already a problem for co-located teams).

The ideal team

He also said something about the effect of getting more people in the organization. In his view, 5 people is the ideal. That number of people can sit around a table, will benefit from high bandwidth communication and roles can be fluid. When you reach about 20 people, you require structure, which on turn, can be a potential trough of productivity and motivation. When you reach 100 people, you must shift your attention from coordinating individuals to coordinating teams. A clear team structure and well-defined responsibilities becomes critical. Knowing this, it's kind of expected that Randy likes to size his teams using the "2 pizza rule", the number of people you can feed from 2 pizza's. So a team consisting of 4-6 people in a mix of junior and senior and (obviously) co-located has his preference.

Ideally he wants to have that team take ownership of a component or service, including maintenance and support as well as the roadmap for that component or service. This implies that all teams are full-stack from a technological perspective and are capable of supporting their component or service all the way into production. But Randy emphasizes that managers shouldn't see teams like this as software factories. Teams should have an identity and be able to build up proud-of- ownership. This also implies taking responsibility over the quality of those services. He purposely mentioned the problem of not having the time to do their work right and taking shortcuts because of (perceived) pressure from management or other stakeholders. In his opinion, this is the wrong thing to do, since it means you'll need to do the work twice. The more constrained the team is in time, the more important it is to do it the right way first.

The effect of team structure on architecture

Another argument for his ideas is provided by Conway's Law. Melvin Conway observed that in a lot of organizations the structure of the software system closely followed the structure of the organization. This isn't a big surprise, since quite often, cross-team collaboration requires some kind of agreed way of working, both on the communication level as well as on the technical level. Quite often, architectural seams like API contracts or modules emerge from this. So based on that observation, he advices organizations to structure the teams along the boundaries you want to accomplish in your software architecture. And this is how Conway's Law is usually used. But in this workshop, Randy has already steered us towards the notion of using microservices for scaling the organization. So does Conway's Law apply here? Each team owns one or more microservices, or, the API contracts I just discussed. They work in isolation, but negotiate about the features provided by their services. I would say that is a resounding yes!

All things considered, it should not come as a surprise that he believes microservices are the perfect architecture for scaling an organization on both the people and the technical level. So what do you think? Are microservices the right technology to scale teams? Or is this the world up-side down? Let me know by commenting below. Oh, and follow me at @ddoomen to get regular updates on my everlasting quest for better solutions

Sunday, July 17, 2016

Scaling a growing organization by rearchitecting the monolith

During this year's QCon conference held in New York, I attended a full-day workshop on the scalability challenges a growing organization faces, hosted by Randy Shoup. In my previous post, I elaborated on Randy's classification system to illustrate the phases of a growing organization and how that affects technology. In his opinion, facilitating an organization that enters the scaling phase means re-architecting the monolith.

About Rearchitecting

In his words, rearchitecting is the act of re-imagining the architecture of an existing system so that there's a path to meet the requirements of the organization in its current form. As I said before, rebuilding a system is not an option, particularly not in the scaling phase. By that time, you'll have way too many clients that need new features and other improvements. On the other hand, you will very likely suffer from typical monolithical symptoms like lack of isolation in the code base, teams that are stepping on each other's toes, new engineers that need months to get a decent understanding of the code, painful and slow releases, etc., etc. Instead of that, you want to be able to have components which lifecycle is independent of others and that are deployed in an automated fashion. Sounds familiar? Enter microservices…

According to Randy, microservices are the perfect solution to rearchitect an existing monolith. Each service is simple, can be independently scaled, tested and deployed and allows optimal tuning without affecting any of the other services. The tooling, platform and practices have evolved considerably since people started talking about microservices two years ago. Building them is a lot less of a challenge then it used to be, but it still comes with a cost. You'll end up with lots of small source code repositories as well as the organizational structure to support all that (e.g. who owns what repo). Finding the repo that belongs to a particular micro-service requires naming conventions and tools not provided by the current online providers. On a technical level, you need to consider the network latency and the availability of a service. You'll also need sophisticated tooling to track, manage, version and control dependencies between the services. As many QCon sessions have demonstrated, a lot of tooling has emerged. But just like the wave of JavaScript frameworks and libraries that occurred when React became popular, I suspect it'll take a while until the dust has settled.

From monolith to microservices

So now that we've established the decision to re-architect the monolith into microservices, how are we going to do that? Well, if it's up to Randy, he wants you to start carving up the monolith by finding a vertical seam that allows you to wall off a functional feature behind an interface. This is obviously the hardest part since monoliths typically don't expose a lot of cohesion. The logic related to a feature is spread out over the codebase, sometimes crosses layers and involves way too much coupling. The next step is to write automated tests around that interface so you can replace the implementation with a remote microservice without causing breaking changes in the semantics of the feature involved. As you can see, this is anything but a big bang approach, and can be done in relatively small and riskless steps. However, Randy shared that in his own experience, it is very bad to combine a migration like this with the introduction of new features. He stressed the importance of first completing a migration of an existing feature so that it can serve as the basis of a new microservice and then add the additional functionality. Doing both at the same time is simply too risky and may blow up the project.

Now, he doesn't want you to become too overzealous and just start carving like crazy. Instead, he advises you to start with a pilot implementation. The pilot should represent an end-to-end vertical part of the system's experience, something significant enough to be representative for the monolith's complexity. Such a pilot provides an opportunity for the team to learn and use that experience to manage expectations. At the same time, it can be used to demonstrate the feasibility to the stakeholders.

When the pilot is deemed to be successful, it is time to continue the migration on a larger scale. However, Randy advices to prioritize future candidates for a migration to microservices based on their business value. In other words, prefer those parts of the monolith that give you the highest return-of-investment. If that doesn't help you, focusing on the the areas with the greatest rate of change. I mean, that was the whole premise of switching to microservices; being able to work and deploy in isolation. And finally, as you would approach any technological task with a lot of uncertainty, consider solving the hardest problems first.


He also identified a couple of anti-patterns while working on his own migration projects. For instance, the Mega-Service, similar to the God class, is a microservice that is not focusing on a single feature. If you're practicing Domain Driven Design, I think aligning a microservice with a single bounded context makes sense. Smaller, like a couple of domain Aggregates is probably fine too. But a microservice that crosses multiple domain boundaries is very likely a bad idea.

Another anti-pattern, the Leaky Abstraction Service, deals with the subtle issues of growing a microservice from its implementation rather than defining the consumer's contract first. Randy is clearly adamant about making sure microservices are designed from a consumer-first approach. He believes that the usage of a microservice is the true metric of the value of such a service. So a service that is designed without any particular business client, the so-called Client-less Service, is obviously an anti-pattern as well. One final anti-pattern he mentioned that day is the Shared Persistence anti-pattern: two or more microservices that share their data store. As microservices are supposed to be independent, introducing any kind of coupling is always a bad idea.

Well, that's all I got on technology from that day. Next time, I'll talk a bit on the people side of his story. What do you think? Are microservices the next big thing to move away from monoliths? And do you agree with his migration approach? Let me know by commenting below. Oh, and follow me at @ddoomen to get regular updates on my everlasting quest for better solutions.

Monday, July 11, 2016

Understanding a growing organization and the effect on technology

The characteristics of a growing organization
During this year's QCon conference held in New York, I attended a full-day workshop on the scalability challenges a growing organization faces, hosted by Randy Shoup. Randy explained us how every start-up goes through several phases, each with a different focus, specifically search, execution and scaling. The most difficult part of that growth is to scale the organization, the process, the culture and the technology at the same time. Some organizations have proven to be very good at the organizational level, but lacked on the technology level. Others tried to cling on the original culture, but didn't understand that this is something that changes as well. Randy emphasized that properly scaled agile teams, a DevOps culture and modern architectural styles like micro services aren't a luxury anymore.

To illustrate his pleading, he explained how the Universal Scalability Law, originally coined by Neil Gunther, applies to both software as well as organizational scalability. This law states that throughput is limited by two things; contention and coherence. Contention is caused by any form of queuing on a shared resource, be it some technical element, be it a authoritive person, department or process. Coherence defines the amount of coordination and communication is needed between nodes, machines, processes and people.

A real-world analogy of that could be the process of moving people out of a room through a door. If the door is narrow, only one person can get through, which means it'll take a while to empty the room. Having a wider door or even two different doors are obvious solutions to the problem. But if you have two doors, you'll need to coordinate the group or agree on an algorithm to decide who will go through which door. In other words, you need to ensure coherence. Most growing organizations can apply this law both on the organization itself, but also on the architecture. You can see that for example when more developers are hired. Work needs to be distributed over the teams, the architect or product owner becomes a bottleneck, and more coordination is needed between the people.

The phases of a startup
But, as Randy explained, there is a time where you don't need to think about this and a time where this becomes a real problem. In short, you need the right tool for the right job at the right time. Even a prototype or a monolith have merits under the right circumstances. To illustrate that, Randy divided the growth process of a start-up into three phases: search, execution and scaling.

The search phase is all about finding the right business model, finding a product that fits the market and acquiring the first customers. In other words, the organization is discovering the market. In this phase it is imperative to try new things quickly, so prototyping is an essential part of it. Scalability is not a concern yet, might even slow you down and thereby jeopardize the chance you reach your market in time. Paul Graham, one of the founders of the popular start-up investor Y Combinator, even encourages start-ups to do stuff that doesn't scale. So it's fine to take a technology or platform that your organization can't or doesn't want to support, as long as it allows you to quickly try out products and solutions. In fact, it's even advisable to take a non-conforming technology, since it might prevent you from converting that prototype in the real deal. Which brings me to the execution phase.

In the execution phase, an organization is focusing on meeting the near-time requirements as cheaply as possible to meet the evolving customer needs. In a way it's entering the market. Just enough architecture is the way to go and scalability concerns are not an issue yet. The point is that the organization wants to learn and improve rapidly, thereby expanding the market as fast as possible. Consequently, it will use familiar technology that is simple and easy to use and guarantees high team productivity. Organizations in this phase typically build monolithic systems that employ a single database. Although we all know that this will ultimately result in a lot of coupling, performance and scalability bottlenecks, trying to build something very scalable in this phase might actually kill your business. However, identifying natural seams in the architecture and using internal componentization can prepare you for the next phase.

The last phase is all about owning the market and scaling the organization to meet global demands. More centralized teams and standardization becomes necessary. Choices are made on the preferred network protocols, common services such as document management, as well as on development tools and source control systems. Tools are introduced to facilitate the discovery and sharing of code through libraries, searchable code and code reviews. But also the monolith must finally make way for the next generation architecture that uses scalable persistency and supports concurrency and asynchronity. Many of the prior concerns are replaced by dedicated services for data analytics, searching, caching and queuing. However, Randy emphasizes that rebuilding a system from scratch is out of the question. In his opinion (and mine), that would be the worst thing you can do. There's just so much information and history in that existing monolith, it is naïve to think that you'll be able to remember all that while building a new system. Instead, he wants us to start re-architecting our systems. What that means and how to approach that will be the topic of my next post.

So what do you think? Do you recognize the phases that Randy uses? Love to hear your thoughts by commenting below. Oh, and follow me at @ddoomen to get regular updates on my everlasting quest for better solutions.