When you are a software engineer of a some kind or say you are in DevOps team, or maybe you are a VP of Engineering in a tech company. Anyway you need to deploy the application(s) to production... so how is that going to happen? Unless your clients don't care about downtime and how long your deployment will last for, or say they don't care if any sort of issues arise (thats like not good clients/product) - you need to make your deployment robust, easy to use, automated, with no downtime and fun. Sounds like a joke challenge..until you dig in. I want to tell a TLDR; version of this topic, so let's start digging.
Birds or "Canary Deployments"
Canary - a bird or a type of a deployment process. The basic idea is that say you have 2 production web nodes under Load Balancer: during the deployment process you remove 1st node from Load Balancer, so its no longer serving code to customers, and deploy your latest code to that 1st node. Now you have your app working as it was before and the only difference is that you have only 1 web node now, the other one is detached. You stop at this point. Now is the fun part starts, you start using that designated node::
- You run all sort of migration/long running scripts
- You test basic functionality
- You test the new feature you just rolled out
- You play with that node and do whatever you need to do
- You DO NOT perform any sort of DB CRUD operations with your tests, because its still points to production DB, however you can do real life cases.
Now, you are ready to do prod deployment. You pick the day and time and you deploy all your new code (that was deployed earlier to that 1 node) to all the nodes, and you can use Rolling Restarts technique to remove downtime time (see below). Canary Deployment is a great technique when you rolling out new fetures and you want to test them separately on a production environment.
Actually Amazon is doing similar thing when they want to test how the new feature will perform. What they are doing differently tho is that they do not remove that 1 node from Load Balancer, that allows them get some real feedback/results from real users. Basically that 1 node will accessed by real users, but only in a small portion of all transactions. Neat, eh?!
You need to be really well funded for this. In short: you need a complete copy of prod during the deployment process :) Kinda expensive right ?! So the setup is the next:
- You have green production environment, which serves users
- You have blue production environment, which doesn't serve users
- Thats it :)
What needs to be done is that you deploy new release to the blue environment, you do all the required maintenance etc.; and then on an actual deployment day and time you just flip the DNS, so that you start serving users from blue environment. Basically, green becomes blue and blue becomes green. That way, there is no downtime and you are sure that the code that you pushed is working fine. The only scary part here is that what if there will be some weird issues with the DB data (you don't replicate DB between green and blue, you just point to the correct one on the deployment day, so that prodcution db is always one)."Nothing behaves same way as production environment, except production!"
Everyone loves sushi (like almost everyone), you roll them one by one and then you have all the rolls ready. Thats the idea: to roll one by one. The trick with rolling deployments is that you deploy new code to your nodes one node at a time or say 2,3,4 at a time, depends on how many nodes you have under load balancer. So when you deploy new code say to 1 node, the idea is:
- Remove that node from Load Balancer
- Deploy new code to that node
- Add that node back to Load Balancer
- Repeat these steps for all the nodes
You can do rolling deployments to half of your fleet at a time. Say if you have 10 nodes under Load Balancer(LB), then you will remove 5 of those from LB, deploy code to them (do all the required restarts etc), and add them back to LB. Thats it. This way you will achieve 0 downtime during the deployment process. While you will be working on first half of the nodes, the other half will continue working and making users happy; once firts half is done and back to LB, they will start serving and the other half will be under deployment. For this process to work as expected, you and your team need to make sure you write quality code that is backwards compatible which means that the old code can work with new DB schema (if that changed somehow) and so on. Backwards comaptibility is a different topic tho :)
There are also other ways of doing deployments and there are a lot of great SaaS companies that provides great functionality and do that for you. And that is all great, however having knowledge in this and knowing how it all works is also a huge plus. To name a few: you should know why it is important to write backwards compatible code; you need to understand that database has to backwards compatible as well in case you need to rollback or you do Canary Deployments; you need to know that data migration scripts should be very efficient; etc. There are great tools out there that can be used to implement efficient deployment strategy: Ansible, Ansistrano, Travis, Jenkins, AWS, GCP.
Resources and good to read things:
Subscribe to Geek's Life
Get the latest posts delivered right to your inbox