Monday, July 10, 2017

Key takeaways from QCon New York 2017: The Tech Stuff

This year, for the third time since I joined Aviva Solutions, I attended the New York edition of the famous QCon conference organized by InfoQ. As always, this was a very inspiring week with topics on large-scale distributed architecture, microservices, security, APIs, organizational culture and personal development. It also allowed me the mental breathing room to form a holistic view on the way I drive software development myself. So let me first share the key takeaways on tech stuff.

The state of affair on microservices

QCon is not QCon without a decent coverage of microservices, and this year was no different. In 2014, the conference was all about the introduction of microservices and the challenge around deployment and versioning. Since then, numerous tools and products emerged that should make this all a no-brainer. I don't believe in silver bullets though, especially if a vendor tries to convince people they've build visual tools that allow you to design and connect your microservices without coding (they really did). Fortunately the common agreement is that microservices should never be a first-class architecture, but are a way to break down the monolith. Randy Shoup's summarized this perfectly: "If you don't end up regretting early technology decisions, you probably overengineered"

Interestingly enough, a lot of the big players are moving away from frameworks and products that impose too much structure on microservice teams. Instead, I've noticed an increasing trend to use code generators to generate most of the client and service code based on some formal specification. Those handle the serialization and deserialization concerns, but also integrate reliability measures such as the circuit breaker pattern. And this is where the new kid in town joins the conversation: gRpc. Almost every company that talked about microservices seems to be switching to gRpc and Protobuf as their main communication framework. In particularly the efficiency of the wire format, its reliance on HTTP/2, the versioning flexibility and gRpc's Interface Definition Language (IDL) are its main arguments. But even with code generators and custom libraries, teams are completely free to adopt whatever they want. No company, not even Netflix, imposes any restrictions on its team. Cross-functional "service" teams, often aligned with business domains are given a lot of autonomy.

About removing developer friction

Quite a lot of the talks and open space sessions I attended talked about the development experience, and more specifically about removing friction. Although some of it should be common sense, they all tried to minimize the distance between a good idea and having it run in production.

  • Don't try to predict the future, but don't take a shortcut. Do it right (enough) the first time. But don't forget that right is not perfect. And don't build stuff that already exists as a decent and well-supported open-source project.
  • Don't bother developers with infrastructure work. Build dedicated tools that abstract the infrastructure in a way that helps the developers get their work done quickly. Especially Spotify seems to be moving away from the true DevOps culture. They noticed that there was too much overlap and it resulted in too many disparate solutions.
  • Bugs should not be tracked as a separate thing. Just fix them right away or decide to not fix them at all. Tracking all of these bugs is just going to create a huge list of bugs that no one will look at again….ever.
  • Closely tied to that is to keep distractions away from the team by assigning a Red Hot Engineer on rotation. This person handles all incoming requests, is the first responder when builds fail, and keeps anybody else from disturbing the team.
  • To track the happiness of the team, introduce and update a visual dashboard that shows the teams sentiment on various factors using traffic light. Adrian Trenaman showed a nice example of this. This should also allow you to track or prove whether any actions helped or not.
  • Don't run your code locally anymore. If you're unsure if something works, write a unit test and learn to trust your tests. Just don't forget how to make those tests maintainable and self-explanatory.

clip_image001

Drop your OTA environment. Just deploy!

Another interesting trend at QCon was the increased focus on reducing overhead by dropping a separate development, testing and acceptance environments while trying to bring something into production. Many companies have found that those staging environments don't really make their product better and have a lot of drawbacks. They are often perceived as a fragile and expensive bottleneck. And when something fails, it is difficult to understand failure. The problems companies find in those environments are not that critical at all and never as interesting as the ones that happen in production. In fact, they might even give the wrong incentive, the one where developers rely on some QA engineer to do the real testing work on the test environment, rather than invest in automated testing.

According to their talks, both the Gilt Group and Netflix seem to wholeheartedly support this mindset by working according to a couple of common principles. For starters, teams have end-to-end ownership of the quality and performance of the features they build. In other words, you build it, you run it. Teams have unfettered control to their own infrastructure. They assume continuous delivery in the design process. For instance, by heavily investing in automated testing, employing multi-tenant Canary Testing and making sure there's one way to do something. A nice example that Michael Bryzek of Gilt gave was a bot that would place a real order every few minutes and then cancel it automatically. Teams also act like little start-ups by providing services to other dev teams. This gives them the mentality to try to provide reliable services that are designed to allow delay instead of outage. They may even decide to ship MVPs of their services to quickly help out other teams to conquer a new business opportunity, and then mature their service in successive releases.

You should be afraid for hackers

The second day's keynote was hosted by the CTO of CloudStrike, a security firm often involved in investigating hacking attempts by nation states such as China. It was a pretty in-depth discussion on how they and similar government agencies map the behavior of hacking groups. I never really realized this, but it's amazing to see how persistent some of these groups are. I kind of assumed that hackers would find the path of least resistance, but the patience with which they inject malware, lure people into webpages or downloading .LNK files that will install the initial implant is truly scary. I particular awed at the idea how hackers manage to embed an entire web shell into a page which allows them to run arbitrary Windows commands on a host system with elevated administrator rights. My takeaway from this session was that if you're targeted by any of these groups, there's nothing you can do. Unless you have the money to hire a company like CloudStrike of course….

The Zen of Architecture

Juval Lowy, once awarded the prestigious title of Software Legend, has been in our business for a long time. Over the years, I've heard many rumors about his characters but it is safe to say….all of them are true. Nonetheless, his one

day workshop was one of the best, the most hilarious and intriguing workshops ever. After ridiculing the status quo of the software development and agile community, he enlightened us on the problems of functional decomposition. According to Juval, this results in a design that focusses on breaking down the functionality into smaller functional component that don’t take any of the non-functional requirements into account. He showed us many examples of real-world failures to reinforce that notion.

Instead, he wants us to decompose based on volatility. He wants us identify the areas of the future system that will potentially see the most amount change, and encapsulate those into components and services. The objective is keep thinking about what would change and encapsulate accordingly. That this is not always self-evident, and may take longer than management is expecting, is something that we as architects should be prepared for. However, as many architects still do, this mindset does allow you to stop fighting changes. Just encapsulate the change so that it doesn't touch the entire system. Even when I'm writing this, I'm still not sure how I architecture my systems. What Juval says makes sense, but also sounds very logical. Regardless, his workshop was a great reminder for us architects that we should keep sharing our thought processes, trade-offs, insights and use-class analysis to the developers we work with.

Juval also had some opinions about Agile (obviously). First of all, unlike many agilists, he believes the agile mindset and architecture are not a contradiction at all. He sees architecture as an activity that happens within the agile development process. But he does hold a strong opinion on how sprints are organized. Using some nice real-world stories, he explained and convinced us that sprints should not go back-to-back. Any great endeavor starts with some proper planning, so you need some room between the sprints to consider the status quo, make adjustments and work on the plan for the next sprint. That doesn't necessarily mean that everything is put on hold while the architect decomposes the next set of requirements based on volatility. It's perfectly fine for the developers to work on client apps, the user interface, utilities and infrastructure.

Opinions please

So what do you think? Does any of this resonate well with you? If not, what concerns do you have? Let me know by commenting below. Oh, and follow me at @ddoomen to get regular updates on my everlasting quest for knowledge that significantly improves the way you build your systems in an agile world.