Overview

A bit of background. Our development process suffered from a lot of tech rot. There was a base vagrant image that spun up infrastructure that sort of matched our development environment in AWS. The big issue was it was not supported officially by anyone in the organization. The config management to bootstrap the vagrant images was in Chef but in development and production we were using Salt. A lot of the developers didn’t even use the vagrant image and just setup their local environments to try to mimic production as best they could.

So as an organization we decided to use Docker to help fix the problem. The goal was for developers to develop on a container that would then get pushed into production which was running Kubernetes.

The idea was if it worked on their local laptop then it would work in production. So we set out on ideas on how to make this happen. The mile high overview is that all development infrastructure should be spun up and down using docker-compose and the same Dockerfile that compose uses should be used in our dev and production environments.

So the basic overview of our pipline looks like this.

Details

The details of how we handle production traffic in kubernetes.

We spin up all AWS infrastructure using a tool called Terraform. This allows us to have our infrastructure as code and easily make changes that will mirror in all environments. The difficult part was planning for how Kubernetes handles load balancers in AWS. If you create a load balanced service in AWS, Kubernetes will create the ELB for you. This is great if you want to do some leg work around that and use Kubernetes built in rolling upgrade. Since we use terraform to build our ELBs and also match the route53 DNS entries to those ELBs that isn’t a great solution for us.

We decided to use nodePorts in Kubernetes. A node port will allocate a since port on the Kubernetes node for the service to use and map that port to the internal containers in Kubernetes. It is pretty much port forwarding traffic from outside the cluster to inside via kube-proxy. This is great for us since it allows us to setup a single ELB as an entry point into Kubernetes and send all outside traffic through it.

Our network flow looks something like this for our website.

Traffic to website -> ELB -> Nginx on Kubernetes nodes -> kube-proxy -> www containers

We decided to run nginx locally mainly because it was the simpliest thing to do and deal with new applications coming online or nginx config changes. We just run salt over the nodes and the nginx configs are updated. There is no need to deal with nginx containers at all.

So a sample of what our nginx config looks like for a host would be something like

include /etc/nginx/backends.d/blog.conf;

server {
    listen       80;
    server_name blog.domain.com;

    if ($http_x_forwarded_proto != "https") {
        return    301 https://$host$request_uri;
    }

    access_log  /var/log/nginx/blog.access.log extra;
    error_log /var/log/nginx/blog.error.log;

    root   /srv/blog/current/public;
    index  index.php index.html index.htm;


    location ~* \.(png|jpg|jpeg|gif|swf|xml|txt|ico|pdf|flv)$ {
        expires 365d;
    }

    location / {
        location ~ /\.ht { deny  all; }

        try_files $uri $uri/ /index.php$is_args$args;
    }

    location ~ \.php$ {
        try_files $uri =404;
        fastcgi_pass  blogLB;
        proxy_set_header X-Forwarded-Proto https;
        fastcgi_index index.php;
        fastcgi_param SCRIPT_FILENAME /code/public/$fastcgi_script_name;
        include fastcgi_params;

        fastcgi_param APPLICATION_ENV production;
    }
}

The key thing in the example above is the include and the fastcgi_pass blogLB;

The included file looks like

upstream blogLB {
  server 127.0.0.1:30000;
  server 127.0.0.1:30001 down;
}

The idea here is that one port is blue and the other port is green. We distribute this file via salt and salt will only create the file and will never change it if the file is there. The salt config looks like

/etc/nginx/backends.d:
  file.recurse:
    - source: salt://kubernetes/files/node/nginx/backends.d
    - template: jinja
    - file_mode: 644
    - makedirs: True
    - replace: False

So we have a convention for what ports can be in use. It will make sense later in the blog post on why we do it this way.

We start at port 30000 and go up in 10s from there per application in kubernetes that needs outside access.

Salt controls our kubernetes service setup files and a service yaml looks like

apiVersion: v1
kind: Service
metadata:
  name: blog-blue
  namespace: blog
  labels:
    name: blog-blue
spec:
  type: NodePort
  ports:
  - port: 9000
    nodePort: 30000
  selector:
    name: blog-blue


---

apiVersion: v1
kind: Service
metadata:
  name: blog-green
  namespace: blog
  labels:
    name: blog-green
spec:
  type: NodePort
  ports:
  - port: 9000
    nodePort: 30001
  selector:
    name: blog-green

We also namespace each application. We have a few applications that need redis so it makes everything a lot easier to just call tcp://redis:6379 instead of adding un-needed config entries.

So you’ll see above that creates a nodePort on 30000 and tells kube-proxy to send it to port 9000 on the blog-blue containers and port 30001 and send it to port 9000 for the blog-green replication controller containers.

So once that is setup our deployment application is a flask web tool that is in charge of deployments. The flow of the deployment tool goes something like this

I mentioned we use fabric above and fabric is in charge of deploying the application to the Kubernetes nodes, running the sed commands and reloading nginx. The logic for that is as follows.

@parallel(pool_size=20)
def post_deploy():

    #
    # deploy the app
    sudo("/sbin/deploy-blog")

    with cd("/etc/nginx/backends.d"):
        #
        # remove the down from all
        sudo("sed -i -r -e 's/ down;/;/g' {f}".format(f=env.backend_file))

        #
        # setting blue to the new live set
        if env.color_new == "blue":
            sudo("sed -i -r -e 's/1;/1 down;/g' {f}".format(f=env.backend_file))
        else:
            sudo("sed -i -r -e 's/0;/0 down;/g' {f}".format(f=env.backend_file))

    sudo("service nginx reload")

    with cd("/srv/blog/releases"):
        sudo("rm -rf $(ls -t | awk 'NR>5')")

As I mentioned above, you’ll see why we use the 0 and 1 conventions for nodePorts. The logic is pretty simple.

Output from out deployment tool looks like this

Collecting hosts from  us-east-1
--- Starting deploy for blog in blog ---
    - Deploying to replica color: green
--- Deploying green replication controller ---
     - Using the failback replica count of 1 for blog
     - Creating the new replication controller now for green at version 20151025.143403
     - Got the following return code: 201
     - Successfully deployed the new replication controller
     - Waiting for the pods to come online fully before moving on
     - Waiting for all pods to come online. 1/1 are now online
     - All pods are online!
--- Running post deploy now ---
[10.122.43.70] Executing task 'post_deploy'
[10.122.41.136] Executing task 'post_deploy'
[10.122.41.136] sudo: /sbin/deploy-blog
[10.122.43.70] sudo: /sbin/deploy-blog
[10.122.41.136] out: s3://domain-deploy/blog/blog.tgz -> ./blog.tgz  [1 of 1]
[10.122.41.136] out: s3://domain-deploy/blog/blog.tgz -> ./blog.tgz  [1 of 1]
[10.122.41.136] out:
[10.122.43.70] out: s3://domain-deploy/blog/blog.tgz -> ./blog.tgz  [1 of 1]
[10.122.43.70] out: s3://domain-deploy/blog/blog.tgz -> ./blog.tgz  [1 of 1]
[10.122.43.70] out:
[10.122.43.70] out:      4096 of 12106639     0% in    0s    54.91 kB/s
[10.122.43.70] out:  12106639 of 12106639   100% in    0s    16.29 MB/s  done
[10.122.41.136] out:      4096 of 12106639     0% in    0s   103.34 kB/s
[10.122.41.136] out:  12106639 of 12106639   100% in    0s    12.89 MB/s  done
[10.122.41.136] out:

[10.122.41.136] sudo: sed -i -r -e 's/ down;/;/g' blog.conf
[10.122.43.70] out:

[10.122.43.70] sudo: sed -i -r -e 's/ down;/;/g' blog.conf
[10.122.41.136] sudo: sed -i -r -e 's/0;/0 down;/g' blog.conf
[10.122.43.70] sudo: sed -i -r -e 's/0;/0 down;/g' blog.conf
[10.122.41.136] sudo: service nginx reload
[10.122.43.70] sudo: service nginx reload
[10.122.41.136] out: Redirecting to /bin/systemctl reload  nginx.service
[10.122.41.136] out:

[10.122.41.136] sudo: rm -rf $(ls -t | awk 'NR>5')
[10.122.43.70] out: Redirecting to /bin/systemctl reload  nginx.service
[10.122.43.70] out:

[10.122.43.70] sudo: rm -rf $(ls -t | awk 'NR>5')
--- Running the cleanup ---
     - Pausing for 30 seconds to let connections drain from the old replication controller
     - Deleting old replication controller
     - Deleting old pods in the replication controller
     - Setting the current color status for the deployment
-- DEPLOY DONE --

Done.

This process has worked out extremely well for us and really helped the workflow from development to production. It also gives us a lot of control over deployments as we have the option to verify a deployment. That means if blue is currently in production and green comes online the deployment will pause and notify a slack room of a URL to test. The team can test the production release and then issue a command to finish the deploment which will tear down the old replication controller and put the new one into service.

If you have any questions on this process, I encourage you to join a great slack DevOps community at DevOps Engineers and join the #kubernetes room and ping Mike Zupan if you have any questions.