After listening to the podcast episode "Architecture Is Context — Making the Right Architecture Decisions — Eltjo Poort" at the techleadjournal.dev, I decided to write a summary since Eltjo Poort describes architectural topics and the role of architecture in software engineering in such a clear and unagitated way that I wanted to provide a wrap-up for other people like you. He also has a subtle nondogmatic approach to Agile, which is refreshing. However, it is absolutely worth listening to the podcast itself, and I can only encourage people to do that.
In the following sections, I will give a summary of Eltjo Poort's thoughts on the relationship between business context and architectural decisions. Furthermore, the responsibilities of an architect or architecture will be discussed. Afterward, we have a closer look at legacy and technical debt and finally on architecture evaluation and the two extremes of Eltjo Poort's model of responsibilities.
When people start their careers as software engineers, they learn many helpful design patterns like information hiding, low coupling, and high cohesion. It is essential to know these design patterns by hard to deliver proper software; however, to provide solid architecture, we have to consider more things. Eltjo Poort argues that we have to look not only at these generic good design principles but focus on the business context.
We have to show that in a particular business context with specific drivers, an architectural design decision has been the right. Every design decision has alternatives, and we can only evaluate the trade-offs if we understand the drivers. The drivers can vary from commercial driver to time pressure and a bunch of others depending on the business and company we are operating in or for.
Let's have a look at information hiding. It is an excellent design principle if our goal is to modify the software in the future easily; however, sometimes, delivering on time or other aspects are even more critical.
This is why architecture is all about context. Don't make the mistake of hiding in your ivory tower! We can only make the right decision if we talk to business and technical stakeholders.
Eltjo Poort says that we must first understand the software's requirements and how they are conflicting or if they are conflicting. In fact, many are not. Functional requirements (FR), for example, are mostly not contradictory. There is obviously no conflict in requiring a login and a checkout process for an e-commerce application. If we talk about the high scalability and development pace of the very same application, those two requirements are most probably conflicting. Therefore, in most cases, the non-functional requirements (NFR) are the driving forces in making architectural decisions. We can separate them into two groups, the quality attribute requirements like performance or usability and the constraints like delivering on time or creating the software with a particular group of people. Those are the things that really drive the architecture; FRs rarely drive the architecture.
To sum up, the first step is to understand the NFRs and find our criteria. The next step is to understand the alternatives. We should be fully aware of our choices and understand them very well. We have to think about thighs like what would be a good development language for our software. Maybe Java or Python, or should we use a development language at all and instead use a low code or no code option or configure some existing software? Good architecture decisions also take into account which choices we didn't make.
Finally, if we have the criteria based on the NFRs and the choices, then we have to make the trade-offs. This is very hard to do on our own or doesn't make sense since we don't have a holistic view of the business; therefore, we need business and technical stakeholders to come up with decisions together. The role of the architect or tech lead should not be making all the decisions but rather bringing the team and the stakeholders together and making sure architectural decisions are actually made. Eltjo Poort points out that the architect or tech lead should not have the mandate to make all decisions on his or her own. From my point of view, this is an essential statement.
To understand what really matters for the architecture, we always have to ask the business stakeholders why they want to have certain things. Just knowing what the business needs is not enough information; we have to understand why they need it to find the real architectural drivers. We can only support them by understanding the "why" behind business requirements to make the right decisions. Business stakeholders only say they want everything because they are probably unaware of the trade-offs. Therefore, it's part of the architecture role to point out the drawbacks in a way that business stakeholders understand. So we must speak their language and address their issues. If we say that certain requirements lead to a more complex system, business stakeholders won't care, and why should they? It's the job of the tech people to care about the complexity of systems, not theirs. If we, however, explain that a more complex system needs more people to maintain it, leading to higher costs, then we address real business problems. We have to map technical trade-offs into business language. Therefore, the key is to speak their language to bring about sound decisions.
Eltjo Poort talked about a funny and very on-point presentation about this topic from Eelco Rommes and Jochem Schulenklopper, which is really worth listening to: Why They Just Don't Get It.
At the end of the day, there is no perfect decision, and new people or stakeholders who were not involved in the decision-making process often don't understand why specific decisions have been made at the time. Even worse, sometimes they see the software and think: that it looks like lousy architecture. So how can we help stakeholders or project members to understand legacy decisions?
First, we should emphasize those bad decisions or legacy decisions from the past are inevitable. But why do we have a situation where many architectural decisions turn out wrong? The main problem is the state of knowledge when we have to make important, high-impact decisions. In the design phase at the beginning of the project, we have the least knowledge, and this is, of course, pretty challenging. We can try to delay architectural decisions as long as possible, but we have to make them at some point since we want to make the important, high-impact decisions first. Each decision reduces our design space, and we don't want the low-impact decisions to constrain our high-impact design decisions. For example, we don't want to decide on communication protocols for inter-service communication of an e-commerce application before we decide if we want to go for a monolith or a more distributed system. Therefore, we want the high-impact decisions to constrain the low-impact decisions.
So far, we have talked a lot about architectural decisions. Next, we want to discuss architectural deliverables.
Obviously, architectural decisions are one major deliverable. This doesn't mean that the architect must make all the decisions; it is rather an architectural responsibility to ensure those architectural decisions are made and that the right people are involved. This can be done by the team or organization department. The important thing is that these high-impact architectural decisions have to be made, whether someone is in the role of an architect or not. Therefore, it is possible not having a dedicated architect, but we can't avoid making architectural decisions.
This leads to the question of the most important responsibility in architecture. Is it making architectural documents or decisions? In an agile organization, or as Eltjo Poort says, an organization that deals with change upfront, it is more important to deliver the architectural decisions than the models and documentation. Nevertheless, the models are important and needed to preserve knowledge, validate assumptions, and communicate architecture. In a more traditional organization, architects usually produce extensive architectural documentation, up to 200 or 500 pages up front. One can imagine that it takes quite a long time to create those documents and that this also leads to very long feedback loops from stakeholders. Individual architectural decisions are a better choice if we want to accelerate the project's progress.
When dealing with a rapid change in the business context, we want to make short and fast architectural decisions. If we don't focus on huge architectural documents, how can we get feedback for smaller individual architectural decisions in practice? To answer this question, we must scrutinize what we try to achieve with architectural documentation. In the bottom line, we have three main goals. The first one is collaborative decision-making. We want to involve stakeholders and ask them to validate the trade-offs. The second is preserving and providing knowledge for later. The documentation gives the delivery team guidance for the implementation and helps teams or new team members to understand legacy decisions. This can be especially important for the maintenance of the application. So the two goals have quite different timelines. The first is more about collaboration and feedback loops; the second is about preserving information over a longer period. The language is also different. We most probably use another language to communicate with stakeholders than with the delivery team. The third goal is architectural or project governance. Many organizations or teams need it to show that they are compliant or in control of their IT delivery. This means that we must show that things we do go through a proper approval process. Therefore the goal is to produce an artifact that has been approved before the team commits to an architecture. This also has an entirely different lifecycle since we only need it once for the time of approval. The language is different as well and should only contain the bare minimum to convince the approving stakeholders and should only show that their concerns are addressed. This proves that we are in control of the cost and risks of the software. If we put everything in this document, it is clear that it's getting huge and only delays the approval.
To sum up, in the end, we want to split up the documentation into three parts:
Eltjo Poort said once: "The world is drowning in a pool of technical debt." In his opinion, one reason for the amount of technical debt is the culture of "move fast and break things", predominantly promoted by Facebook. At its core, this means delivering business value fast and taking into account that things break. Delivering new features at a rapid pace is for sure a good practice, although he argues that some teams or companies overdo that, and their mountain of technical debt is getting bigger. Another origin of technical debt is time pressures and KPIs of companies where delivering new features is more important than a healthy platform. To be honest, I personally think that this has a bigger impact on the amount of technical debt existing in the software industry. But how should we deal with it?
First, the name technical debt is a bit misleading. Why? If we talk to the business stakeholders about getting two sprints for fixing technical debt that has been made to hold the GoLive timeline without delivering business features, they will say no and probably argue that this is the problem of the development team. In fact, this is clearly a business problem since technical debt could be a vulnerability that could be exploited by attackers or messy code, which leads to longer feature development. Both situations lead to higher costs for the company, which is in fact, a business problem or a problem of the entire company, not an exclusive concern of the tech people. This raises the question of how to get rid of the technical debt.
Eltjo Poort proposes developing software at a sustainable pace as a remedy. We must first discuss the items in our backlog to understand what he means. Philipp Kruchten came up with a model to distinguish the items in the backlog into items that lead to direct business value, like new features for our end users, and indirect business value, which is invisible to the end users. Items with indirect business value are often called enablers since they enable business value to be created on top of them, like platform improvements or architectural components. A good example is caching, which can improve the performance of your application. We can split up enablers that pay into the architectural runway and technical debt. We want to get rid of technical debt since we are paying interest in the form of, e.g., velocity or risks like vulnerabilities. Since we are paying interest, we obviously want to reduce technical debt. In each sprint, we have to decide on how many enablers we want to put into our scope, which is a tough decision since each enabler ticket reduces the number of business features. If we do no enabler at all, the curve of creating new business value is steep; however, this is not very sustainable. You must also care about the enablers at a certain time, or it will backfire. The worst-case scenario is when you reach a point where you are talking about rewriting your entire application. A sustainable pace is where you spend a balanced amount of time on business features and enablers. There are, e.g., frameworks like SAFe that say you should always put a certain percentage of enabler stories in your sprint. If you want to hold certain deadlines, however, it is ok to skip the enabler stories for one or two sprints as long as we get back to our proportion after the deadline. This could be a pretty wise business decision.
Besides his lead of the architecture practice at CGI, Eltjo Poort helps other companies to improve and modernize the architectural way of working. Based on a model of responsibilities, they measure the architectural maturity level of teams or organizations. They defined the following five responsibilities:
Some patterns appeared when assessing companies on how good they are at delivering those responsibilities. Some teams had a strong focus on modeling and validation but no focus on decision-making and delivery support. They considered this to be somebody else's job. This pattern is called the waterfall wasteland. In the waterfall wasteland, the architecture teams are making upfront designs and handing them over to the delivery team for implementation. They are not in touch with the team, and after the team started, they have already been at another design. On the other side, some teams make quick technical decisions and implement them immediately. They don't do modeling and validation at all. They considered modeling and validation not to be agile. Maybe they get this idea from the agile manifesto that says that the best architecture is emerging from self-organizing teams. However, it doesn't say that they should not do modeling and validation. An argument against validation often is: "fail early and fail often". However, in most situations, upfront validation can save a lot of pain. Additionally, there are some businesses where you can't afford production issues or they are at least pretty expensive. No one wants to sit in an airplane where the development team engineered the software with the fail early principle in mind ;). This is called the agile outback. The goal would be to be somewhere in the middle of those two extremes. However, different contexts require to be more on the one or the other side.
Last but not least, my attempt to give you some key takeaways: