Overview
A bit of background. Our development process suffered from a lot of tech rot. There was a base vagrant image that spun up infrastructure that sort of matched our development environment in AWS. The big issue was it was not supported officially by anyone in the organization. The config management to bootstrap the vagrant images was in Chef but in development and production we were using Salt. A lot of the developers didn’t even use the vagrant image and just setup their local environments to try to mimic production as best they could.
So as an organization we decided to use Docker to help fix the problem. The goal was for developers to develop on a container that would then get pushed into production which was running Kubernetes.
The idea was if it worked on their local laptop then it would work in production. So we set out on ideas on how to make this happen. The mile high overview is that all development infrastructure should be spun up and down using docker-compose and the same Dockerfile that compose uses should be used in our dev and production environments.
So the basic overview of our pipline looks like this.
- Developers use docker-compose to spin up infrastructure and develop locally.
- They check in their code to gitlab and create a merge request
- Once the merge request is merged into the dev branch Jenkins then builds the application and if that passes it runs a build job which builds the container and pushes it to a local registry.
- Once the container image is in the registry it sends a API request to our deployment tool which then uses a mix of the Kubernetes API and fabric to deploy the application into Kubernetes.
- The deployment tool uses a blue/green deployment process which launches up a mirror of the current running replication controller on a different NodePort. Once the replication controller is online and ready to accept traffic it will tell the load balancers to reload their configs to send traffic to the new color and then shutdown the old color.
Details
The details of how we handle production traffic in kubernetes.
We spin up all AWS infrastructure using a tool called Terraform. This allows us to have our infrastructure as code and easily make changes that will mirror in all environments. The difficult part was planning for how Kubernetes handles load balancers in AWS. If you create a load balanced service in AWS, Kubernetes will create the ELB for you. This is great if you want to do some leg work around that and use Kubernetes built in rolling upgrade. Since we use terraform to build our ELBs and also match the route53 DNS entries to those ELBs that isn’t a great solution for us.
We decided to use nodePorts in Kubernetes. A node port will allocate a since port on the Kubernetes node for the service to use and map that port to the internal containers in Kubernetes. It is pretty much port forwarding traffic from outside the cluster to inside via kube-proxy. This is great for us since it allows us to setup a single ELB as an entry point into Kubernetes and send all outside traffic through it.
Our network flow looks something like this for our website.
Traffic to website -> ELB -> Nginx on Kubernetes nodes -> kube-proxy -> www containers
We decided to run nginx locally mainly because it was the simpliest thing to do and deal with new applications coming online or nginx config changes. We just run salt over the nodes and the nginx configs are updated. There is no need to deal with nginx containers at all.
So a sample of what our nginx config looks like for a host would be something like
include /etc/nginx/backends.d/blog.conf;
server {
listen 80;
server_name blog.domain.com;
if ($http_x_forwarded_proto != "https") {
return 301 https://$host$request_uri;
}
access_log /var/log/nginx/blog.access.log extra;
error_log /var/log/nginx/blog.error.log;
root /srv/blog/current/public;
index index.php index.html index.htm;
location ~* \.(png|jpg|jpeg|gif|swf|xml|txt|ico|pdf|flv)$ {
expires 365d;
}
location / {
location ~ /\.ht { deny all; }
try_files $uri $uri/ /index.php$is_args$args;
}
location ~ \.php$ {
try_files $uri =404;
fastcgi_pass blogLB;
proxy_set_header X-Forwarded-Proto https;
fastcgi_index index.php;
fastcgi_param SCRIPT_FILENAME /code/public/$fastcgi_script_name;
include fastcgi_params;
fastcgi_param APPLICATION_ENV production;
}
}
The key thing in the example above is the include and the fastcgi_pass blogLB;
The included file looks like
upstream blogLB {
server 127.0.0.1:30000;
server 127.0.0.1:30001 down;
}
The idea here is that one port is blue
and the other port is green
. We
distribute this file via salt and salt will only create the file and will never
change it if the file is there. The salt config looks like
/etc/nginx/backends.d:
file.recurse:
- source: salt://kubernetes/files/node/nginx/backends.d
- template: jinja
- file_mode: 644
- makedirs: True
- replace: False
So we have a convention for what ports can be in use. It will make sense later in the blog post on why we do it this way.
We start at port 30000
and go up in 10s from there per application in kubernetes that needs
outside access.
- The blue port will always end in
0
so blog would be30000
and www might be30010
- The green port will always end in
1
so blog would be30001
and www might be30011
Salt controls our kubernetes service setup files and a service yaml looks like
apiVersion: v1
kind: Service
metadata:
name: blog-blue
namespace: blog
labels:
name: blog-blue
spec:
type: NodePort
ports:
- port: 9000
nodePort: 30000
selector:
name: blog-blue
---
apiVersion: v1
kind: Service
metadata:
name: blog-green
namespace: blog
labels:
name: blog-green
spec:
type: NodePort
ports:
- port: 9000
nodePort: 30001
selector:
name: blog-green
We also namespace each application. We have a few applications that need redis so it makes
everything a lot easier to just call tcp://redis:6379
instead of adding un-needed config
entries.
So you’ll see above that creates a nodePort on 30000
and tells kube-proxy to send it to port 9000
on the blog-blue containers and port 30001
and send it to port 9000
for the blog-green replication controller containers.
So once that is setup our deployment application is a flask web tool that is in charge of deployments. The flow of the deployment tool goes something like this
- Finds all kubernetes nodes via the AWS api using a tag search.
- Makes a request to the Kubernetes API server and find the current color that is running.
- Looks to see how many replicas are currently running so it can create the new color replication controller with the same number.
- Creates a new replication controller
- Runs a loop that checks the status of all pods in the replication controller to make sure they are all online before continueing.
- Deploys the application to all kubernetes nodes (this might be optional. Some apps have static assets that nginx serves.)
- Runs a few sed commands over the
/etc/nginx/backends.d/blog.conf
To bring down all backeneds and then brings the new color online. - Reloads nginx so the new color is now taking traffic.
- Waits 30 seconds for connections in the old replication controller to drain.
- Deletes the old replication controller.
I mentioned we use fabric above and fabric is in charge of deploying the application to the Kubernetes nodes, running the sed commands and reloading nginx. The logic for that is as follows.
@parallel(pool_size=20)
def post_deploy():
#
# deploy the app
sudo("/sbin/deploy-blog")
with cd("/etc/nginx/backends.d"):
#
# remove the down from all
sudo("sed -i -r -e 's/ down;/;/g' {f}".format(f=env.backend_file))
#
# setting blue to the new live set
if env.color_new == "blue":
sudo("sed -i -r -e 's/1;/1 down;/g' {f}".format(f=env.backend_file))
else:
sudo("sed -i -r -e 's/0;/0 down;/g' {f}".format(f=env.backend_file))
sudo("service nginx reload")
with cd("/srv/blog/releases"):
sudo("rm -rf $(ls -t | awk 'NR>5')")
As I mentioned above, you’ll see why we use the 0
and 1
conventions for nodePorts.
The logic is pretty simple.
- Remove
down
from all lines in the backend file. - Then based on the
color_new
decide which color to bring down. - Reload nginx
- Remove old releases
Output from out deployment tool looks like this
Collecting hosts from us-east-1
--- Starting deploy for blog in blog ---
- Deploying to replica color: green
--- Deploying green replication controller ---
- Using the failback replica count of 1 for blog
- Creating the new replication controller now for green at version 20151025.143403
- Got the following return code: 201
- Successfully deployed the new replication controller
- Waiting for the pods to come online fully before moving on
- Waiting for all pods to come online. 1/1 are now online
- All pods are online!
--- Running post deploy now ---
[10.122.43.70] Executing task 'post_deploy'
[10.122.41.136] Executing task 'post_deploy'
[10.122.41.136] sudo: /sbin/deploy-blog
[10.122.43.70] sudo: /sbin/deploy-blog
[10.122.41.136] out: s3://domain-deploy/blog/blog.tgz -> ./blog.tgz [1 of 1]
[10.122.41.136] out: s3://domain-deploy/blog/blog.tgz -> ./blog.tgz [1 of 1]
[10.122.41.136] out:
[10.122.43.70] out: s3://domain-deploy/blog/blog.tgz -> ./blog.tgz [1 of 1]
[10.122.43.70] out: s3://domain-deploy/blog/blog.tgz -> ./blog.tgz [1 of 1]
[10.122.43.70] out:
[10.122.43.70] out: 4096 of 12106639 0% in 0s 54.91 kB/s
[10.122.43.70] out: 12106639 of 12106639 100% in 0s 16.29 MB/s done
[10.122.41.136] out: 4096 of 12106639 0% in 0s 103.34 kB/s
[10.122.41.136] out: 12106639 of 12106639 100% in 0s 12.89 MB/s done
[10.122.41.136] out:
[10.122.41.136] sudo: sed -i -r -e 's/ down;/;/g' blog.conf
[10.122.43.70] out:
[10.122.43.70] sudo: sed -i -r -e 's/ down;/;/g' blog.conf
[10.122.41.136] sudo: sed -i -r -e 's/0;/0 down;/g' blog.conf
[10.122.43.70] sudo: sed -i -r -e 's/0;/0 down;/g' blog.conf
[10.122.41.136] sudo: service nginx reload
[10.122.43.70] sudo: service nginx reload
[10.122.41.136] out: Redirecting to /bin/systemctl reload nginx.service
[10.122.41.136] out:
[10.122.41.136] sudo: rm -rf $(ls -t | awk 'NR>5')
[10.122.43.70] out: Redirecting to /bin/systemctl reload nginx.service
[10.122.43.70] out:
[10.122.43.70] sudo: rm -rf $(ls -t | awk 'NR>5')
--- Running the cleanup ---
- Pausing for 30 seconds to let connections drain from the old replication controller
- Deleting old replication controller
- Deleting old pods in the replication controller
- Setting the current color status for the deployment
-- DEPLOY DONE --
Done.
This process has worked out extremely well for us and really helped the workflow from development to production. It also gives us a lot of control over deployments as we have the option to verify a deployment. That means if blue is currently in production and green comes online the deployment will pause and notify a slack room of a URL to test. The team can test the production release and then issue a command to finish the deploment which will tear down the old replication controller and put the new one into service.
If you have any questions on this process, I encourage you to join a great slack DevOps
community at DevOps Engineers and join the #kubernetes
room and ping Mike Zupan
if you have any questions.