Deployments without risk are the dream, yet often a challenging goal to attain. Our team strives for this objective as well, more with every release. One possible solution would be to release frequently, with small changes, so the risk for every increment is very low. However, since we operate in a highly regulated market, where each release comes with some overhead, our ability to release frequently is limited. Consequently, our releases tend to be infrequent, substantial, and fraught with risk. Making changes to the release process is a formidable task since it must align with specific process requirements. Hence, our focus has been on fortifying our releases for increased safety. Even if you’re making low-risk releases, these techniques still make a lot of sense. In a perfect world, my suggestion would be to make frequent releases, deploying them in a secure and risk-free manner.
If you are interested in our way until here, I can recommend you to read my previous blog post about our Kubernetes migration.
But to make it short, the key to our progression for a more stable application and therefore releases with less risk has been the utilization of metrics, fundamentally altering our operational methodology. We improved our stability immensely, by analyzing the production metrics and finding issues, which only occur with a certain amount of traffic and pressure on our application. With that, we can instantly analyze a new release and be more confident about it.
Another important step for more secure releases has been rigorous testing of rollbacks to be prepared for unforeseen emergencies. However, rollbacks are always getting complicated, when there are any database changes — which, for us, is nearly every time.
Flyway serves as our primary tool for executing migrations. Nevertheless, during rollbacks, the manual reversion of changes becomes necessary, often leading to subsequent adjustments during following releases. So you have to prepare lots of manual migration steps, just to be fast and secure for the emergency, which is okay I guess. But we want to remove these manual steps because they are always error-prone.
Our objective to improve both rollbacks and rollouts centers around harnessing the capabilities of Argo Rollouts. However, achieving this goal mandates achieving a greater level of control over our database infrastructure — to achieve this, we have to make our database and all the changes and migrations applied to it backward compatible. That means that the “older” and “newer” versions of the application can communicate with the same database.
Consequently, every modification made to the database prompts a meticulous consideration of how a potential rollback could be effectuated. In light of this, I’ve revisited our recent release changes, focusing on making them inherently backward-compatible to streamline future operations.
Adding a new column:
There should be no problem with it, just make it nullable so the old version can ignore it. You can make it required in the next release if needed.
Removing a column:
Just remove the column in your code. In the next version, you can also drop the column in the database.
Dropping a table:
Just remove the entity code and keep the table, until the next release. Then you are safe and can also drop the table.
Rename a column:
Now it gets a little more complicated. You need to add a new column with the name you want to rename the column to, but keep the old column. In this version of your application, you must fill both columns with the same data, i.e. the old and the new column. In the next version of the application, you can delete the old column.
Moving a column to a new table:
As before, add the new table and column and fill both from the code. Remove the old one with the next release.
Renaming an enumeration, which is saved in the database:
Also here, keep the old enumeration and handle both in your code. With the new release, you can drop the old one.
As you can see, it’s not too hard, and with a little extra effort, you can make your database backward-compatible and be more relaxed when rolling out or back.
Having achieved safety measures on the database front, we can now explore Argo Rollouts.
Argo Rollouts is a Kubernetes controller used for managing and automating the rollout of changes in Kubernetes clusters. It is an open-source tool that extends Kubernetes’ native deployment capabilities to provide more advanced deployment strategies.
Fundamentally, it supports two key concepts: Canary and Blue-Green deployments.
Canary deployments and blue-green deployments are two common strategies for releasing software updates to production environments with minimum risk and zero downtime. With canary deployments, a small defined subset of production traffic is redirected to a newer version of the application, known as the canary, while the rest of the traffic continues to be served by the current version of the application. The canary deployment is closely monitored to ensure that there are no critical issues or performance problems before gradually increasing the traffic to the application. If there are issues with the canary deployment, the traffic can be rolled back to the previous version.
On the other hand, blue-green deployments involve running two identical environments side-by-side, one running the current version of the application (blue) and the other running the newer version (green). The traffic is initially routed to the blue environment, while the green environment is thoroughly tested to ensure that it is working as expected. Once the green environment is considered stable and ready for production release, the traffic is migrated to it, while the blue environment is taken down. This approach allows for fast and easy rollbacks if critical issues arise during production deployment.
Both variants can be implemented with different approaches. One is to provide two identical environments and switching between them; however, In Argo Rollouts this is handled differently. It deploys two services, which route the traffic to the “old” version of the application or, to the newer version. So only the application is deployed twice, the rest (like the database) is deployed just once.
Are you familiar with the concept of progressive delivery? This methodology gradually introduces new code to users, employing techniques such as blue-green and canary deployments. It also encompasses strategies like feature toggles, allowing specific user groups to access a feature while offering the ability to promptly disable it without requiring a new deployment. For applications operating within highly regulated environments like ours, progressive delivery balances speed and control. This approach aligns precisely with the requirements of our industry. For further exploration of this topic, I highly recommend reading a comprehensive blog post on progressive delivery, providing a deeper insight: https://launchdarkly.com/blog/what-is-progressive-delivery-all-about/
As previously mentioned, Argo Rollouts serves not only as a deployment tool but also as a robust mechanism for facilitating rollbacks in case of issues. While manual rollbacks are feasible with blue-green or canary deployments, relying solely on manual observation is prone to errors. And we do have metrics, so why not use them? Argo Rollouts integrates seamlessly with monitoring collectors like Prometheus, utilizing defined queries to analyze metrics. It then autonomously decides to proceed with the progressive rollout or initiate a rollback to the previous version. This process can be adjusted in speed by defining steps and wait periods, allowing Argo Rollouts to manage the rest. Manual intervention for rollback (or promotion) remains an option. For a more in-depth understanding of how Argo Rollouts utilizes analysis to enable progressive delivery, I highly recommend exploring their documentation here: https://argo-rollouts.readthedocs.io/en/stable/features/analysis/
If you’re already leveraging ArgoCD for its powerful GitOps-based continuous delivery and application lifecycle management, integrating Argo Rollouts into your deployment workflow can significantly enhance your capabilities. While ArgoCD ensures your applications’ desired state matches the deployed state with precision, Argo Rollouts introduces easier access to advanced deployment strategies such as canary, blue/green, and A/B testing, providing granular control over the rollout process. Furthermore, with features like automated rollbacks and observability integration, Argo Rollouts ensures safer, more reliable deployments, reducing downtime and mitigating risks associated with new releases. By combining Argo CD with Argo Rollouts, you create a robust, flexible, and resilient CI/CD pipeline that caters to complex deployment needs in modern Kubernetes environments.
Argo Rollouts offers lots of examples and demos to try it out. Just check out their GitHub repo and read their documentation: https://github.com/argoproj/rollouts-demo/tree/master
We also tried the canary deployment strategy locally with Minikube and the example from their repo, worked awesome. For a detailed explanation of how to set it up locally, check out this cool tutorial here: https://mahira-technology.medium.com/deploying-argo-rollout-on-minikube-bd49388230e1
If you take anything from reading this post (or just skipping to the conclusion) is that Argo Rollouts is pretty cool, can help your business to ship applications with less risk and you should definitely try it out locally because it is quite easy and you get the idea on what it could do for you.
If you have any questions, you can always message me, or if you did try it out I would be really interested in how it did work out for you.
Ship fast, stay safe, and, at the very least, stay in control.
I really liked the final statement from the launchdarkly post so I paste it shamelessly also to my post.