Deploying a multi-environment Ruby on Rails site to DigitalOcean Kubernetes

This guide is focused on a small, single-instance server. Initialisation logic in particular is probably not safe against horizontal scaling. Beware! ūüôā

This guide will go through:

  • Setting up your development environment for Kubernetes deployment.
  • Dockerising your Rails application
  • Setting up a container registry for deploying images to.
  • Setting up Kubernetes for your Rails project, with a focus on supporting multiple environments deployed in production (e.g. a staging and production instance).
  • Setting up Postgres instances for each of your environments.
  • Setting up a single Nginx ingress for your cluster which allows ingress to all your environments (which are otherwise isolated).
  • Setting up TLS with cert-manager, as well as a current caveat with the Nginx ingress and how you can avoid it.

Running Ruby on Rails in production can be one of the most hair-pulling steps to getting your new application up and running, especially in contrast to how elegant most of the process of writing a Rails application is.

One of Kubernetes’ biggest benefits is how it allows you to scale applications and leverage the power of the cloud, but similarly nice is how it lets you write declarative (as opposed to imperative) configuration for your services, rather than managing a VPS yourself, with all the trouble that entails. You can free yourself from manual iptables / ufw management, not worry as much about things like what starts your service & restarts it if it crashes, as well as developing skills that can come in useful in modern cloud-based businesses.

All that said, it presents its own difficulties. I ran into quite a few hold-ups, ranging from certificate issuance to serving static files from Rails through Nginx.


First, you’ll want to make sure you have a local Kubernetes development environment with Kustomize installed. If you’re on macOS that’s as easy as running:

brew install kubectl   # Kubernetes' CLI
brew install kustomize # Fantastic templating engine
brew install doctl     # DigitalOcean CLI

As well as installing Docker which you can currently do at

You’ll then want to set up your Kubernetes cluster in DigitalOcean. I went with a simple two-$10-node setup. Keep in mind you’ll also need a load balancer (currently $10/month), a container registry, as well as a bunch of persistent volumes. The latter aren’t hugely expensive, but will likely add up to a couple of dollars a month.

Dockerising Rails

If you don’t know much about Docker, it’s worth having a quick read up on it. But in short, Docker allows you to generate portable images of your application with batteries included, which can then be pushed to a container registry, which allows you to run them inside Kubernetes pods.

Docker containers are specified by a Dockerfile. Most commands generate a new layer, and layers are composed together to create the final image. Docker has intelligent caching, which means it’s best to put things that don’t change much (like system library installs for Nokigiri) first, and things that change often further down (such as your applications’ files).

Here’s my Dockerfile, which may help you dockerise your application. Full disclaimer – there may be better ways to do it – but I’ve found this to work quite well. You’ll also need to substitute your application’s ‘name’ where I’ve written <APPLICATION NAME> in a format that works as a folder name. If you choose to use this, you’ll need to save it as a file called Dockerfile in the same folder as your rails root.

FROM ruby:2.7.0

RUN curl -sS | apt-key add -
RUN echo "deb stable main" | tee /etc/apt/sources.list.d/yarn.list
RUN apt-get update -qq && apt-get install -y build-essential nodejs yarn

# Postgres
RUN apt-get install -y libpq-dev

# Nokigiri
RUN apt-get install -y libxml2-dev libxslt1-dev

# Capybara-webkit
RUN apt-get install -y libqt4-dev xvfb


ADD Gemfile* $APP_HOME/
RUN bundle config build.nokogiri --use-system-libraries
RUN bundle install

# Dummy value to satisfy Rails
# You can still run non-production environments from 
# this Dockerfile, but this makes sure assets are compiled
# targeting production.
ARG RAILS_ENV=production
RUN yarn install --check-files
RUN bundle exec rake assets:precompile RAILS_ENV=production

Notice that we set SECRET_KEY_BASE=DUMMY. We will be deploying our Rails master key as a Kubernetes secret later, but sadly rake assets:precompile currently expects it to be around due to a dependency within that command, even though it doesn’t use it for anything. As a result, setting it to a dummy variable allows everything to run smoothly.

One more thing – notice that we specifically add Gemfile* (i.e. both Gemfile and Gemfile.lock) separately to everything else. This is because our application as a whole probably changes a lot more often than our Gemfile does. By ordering our Dockerfile like this, Docker can cache the layers involved in installing and setting up gems and avoid doing it every time something in your application changes.

Once your Dockerfile is set up, running docker build . should work. If so, you’re ready to continue (although you may want to actually run it to test it works correctly and everything is set up right, that’s out of the scope of this guide).

Setting up a container registry

Just like source code is best pushed to a source control repository, containers are best served col–.. er, I mean, in a container registry. This allows Kubernetes to pull them down and centralises your application’s runnable images.

DigitalOcean has a private container registry system in beta right now. You can set one up under Images -> Container Registry. Once that’s done, you’ll need to run doctl registry login in a terminal, which will set your Docker CLI up to be able to push to your container registry.

Once done, try it out. Your previous docker build (or just run docker build . now if you haven’t already) should have given you a hash at the end, for example it might look like:

Successfully built b952cefba0ac

You can tag the hash there to tag and push an image, as follows:



docker tag "$DOCKER_IMAGE_ID" "$IMAGE"
docker push "$IMAGE"

Setting up your Kubernetes cluster

You can set up your Kubernetes cluster using Terraform, but for this guide I suggest doing it in the UI. Note that currently during the early access, DigitalOcean seems to limit container registries to Amsterdam (AMS3). If so, it’s probably worth colocating your Kubernetes cluster in the same region if you don’t have a good reason not to. Use the latest Kubernetes version, and customise your Kubernetes cluster however you like. Personally, I went with two small ($10) nodes.

Then you’ll want to set up your kubectl CLI to be able to access the cluster. That’s pretty easy:

doctl kubernetes cluster kubeconfig save <CLUSTER NAME>

Deploying your Rails application

Now we’ll deploy our Rails application. While setting up, you’ll probably want to hard-core your application controller to show a maintenance page and perhaps even use a subdomain for the time being.

First, you’ll need to set up your Kubernetes configuration. Make a folder structure as follows (with empty files for now):


Recall your application name you used earlier for your Docker folder name. You needn’t use the same name for your Kubernetes labels, but it’s probably best to be consistent so I’ll be assuming you are doing so.

Within the base folder, you set up the basics shared between all of your deployed environments, so we’ll start with application.yaml:

apiVersion: v1
kind: Service
  type: ClusterIP
  - name: rails
    port: 80
    targetPort: 8080
  - name: assets
    port: 81
    targetPort: 80
apiVersion: apps/v1
kind: Deployment
  replicas: 1
        app: <APPLICATION_NAME>
      - name: public-assets 
        emptyDir: {}
      - name: init-static-files
        - name: public-assets
          mountPath: /public
      - name: db-migrate
        command: ["bin/rails"]
        args: ["db:migrate"]
      - name: db-seed
        command: ["bin/rails"]
        args: ["db:seed"]
      - name: <APPLICATION_NAME>
        command: ["bin/rails"]
        args: ["server", "--environment", "production", "--port", "8080"]
        - containerPort: 8080
        - name: public-assets
          mountPath: /<APPLICATION_NAME>/public
          subPath: public
        - name: RAILS_MASTER_KEY
              name: rails-master-key
              key: key
        - name: DATABASE_USERNAME
          value: postgres
        - name: DATABASE_PASSWORD
              name: rails-db-key
              key: key
      - name: nginx
        image: nginx
        - containerPort: 80
        - name: public-assets
          mountPath: /usr/share/nginx/html
          subPath: public

Replace <APPLICATION_NAME> with your application name throughout, and <REGISTRY> and <IMAGE> with your DigitalOcean registry name and image name throughout. Do not include a version on your images – Kustomize will handle that for us later on.

There’s a lot to unpack here. We’ve included both a Service and a Deployment in the same file, although you can split it into two files if you so wish. The triple hyphen in YAML separating the two definitions is essentially a “file break”.

First off, the deployment. We’re running our Rails server on port 8080, and an Nginx server on port 80. These are pod-specific ports and won’t be exposed to the internet, don’t worry. They’ll be used in our networking within the cluster.

The most confusing thing going on here is how we’re managing public asset serving. There’s certainly better ways to do it than this than what I’ve done here, like pushing your static assets to e.g. a CDN, S3 bucket, DigitalOcean space, etc. however this is a fairly simple approach that works pretty well. What we do is make use of the fact that our built image has all our public assets sitting nicely in the public/ folder. We create a volume called public-assets, which is mounted to both our Nginx container (which actually serves the static assets) and our application container. We abuse Kubernetes’ support for init containers, which sequentially run prior to your application’s container running, and make a container that runs your application’s image and copies all the public files onto the public volume mount.

This trick actually works slightly better in docker-compose instead of Kubernetes, as you can mount a shared volume onto an existing folder to automatically include the files in that folder. Sadly, it doesn’t appear to be possible in Kubernetes, but this gets around that limitation, albeit not incredibly elegantly.

We also run two other init containers, one to migrate our database and another to seed it. I’m assuming your db:seed operation is coded to be idempotent, that is to say, running it multiple times has no effect. This is generally good practice because it means you can seed new data (such as when you add a new table in a migration) when it’s added. If your seeding is not idempotent, you will want to remove the relevant init container and seed manually the same way we do a database setup below.

Note that we do not set up arguments to our Rails command to set the environment and port; don’t worry, that will be in the environment-specific configuration to come.

Next we set up database.yaml:

apiVersion: v1
kind: PersistentVolumeClaim
  name: pvc
  - ReadWriteOnce
      storage: 5Gi
  storageClassName: do-block-storage
apiVersion: v1
kind: Service
  name: db
    app: postgres
    - protocol: TCP
      port: 5432
      targetPort: 5432
  type: ClusterIP
apiVersion: apps/v1
kind: Deployment
  name: postgres
  replicas: 1
      app: postgres
        app: postgres
        - name: postgres
          image: postgres:12.4
          imagePullPolicy: "IfNotPresent"
            - containerPort: 5432
            - name: POSTGRES_PASSWORD
                  name: rails-db-key
                  key: key
            - mountPath: /var/lib/postgresql/data
              name: postgredb
              subPath: postgres
        - name: postgredb
            claimName: pvc

This sets up a persistent volume claim which will automatically set up a 5Gi DigitalOcean volume for you, and attaches it to the Postgres database which it also sets up. Nice and simple.

This is a good time to note that this guide does not cover exporting metrics and logs – you won’t get any warning when your database is getting full, or or it’s erroring. That’s something you’ll want to set up afterwards as part of productionising.

We refer to a cluster-issuer in this, which we’ll set up soon, but first let’s fill in the kustomization.yaml within base/:

kind: Kustomization

  - application.yaml
  - database.yaml

We’re getting close, now, but there’s still a few more pieces to slide into place. Next, we set up a ClusterIssuer, which is one of the resources provided by cert-manager (which we’ll install into our cluster shortly) inside certificate_issuer.yaml:

kind: ClusterIssuer
  name: letsencrypt-prod
    # Email address used for ACME registration
    email: <YOUR EMAIL>
      # Name of a secret used to store the ACME account private key
      name: tls-key
    # Add a single challenge solver, HTTP01 using nginx
    - http01:
          class: nginx

Make sure you replace your email. Cert-manager automatically manages our TLS certificate renewal for us. Our ingress we wrote earlier references the cluster issuer above in annotations, which will automatically cause it to issue certificates for them.

Notice that this file is not contained within the base folder. This is because you only need a single ClusterIssuer in a cluster, and it will work across all Kubernetes namespaces. If you prefer to have an issuer per environment, you can instead move it in, add it to the Kustomization file, and change it from a ClusterIssuer to an Issuer (the rest of the file can remain the same).

Next, we set up our individual environments.

First, ingress.yaml:

kind: Ingress
  name: ingress
  annotations: nginx letsencrypt-prod
  - hosts:
    secretName: tls-key
  - host: <DOMAIN NAME>
      - path: /assets
          serviceName: <APPLICATION NAME>
          servicePort: 81
      - path: /packs
          serviceName: <APPLICATION NAME>
          servicePort: 81
      - path: /
          serviceName: <APPLICATION NAME>
          servicePort: 80

Notice that our ingress rules set up the public folders to forward to port 81 (the Nginx file server) on our application service, and everything else to our Rails backend on port 80.

Next, the environment-specific application.yaml:

apiVersion: apps/v1
kind: Deployment
        - name: <APPLICATION_NAME>
          args: ["server", "--environment", "production", "--port", "8080"]

Kustomize will merge this with our top-level base deployment; all we’re doing here is adding the argument list to set the environment. You may prefer to do this through an environment variable instead.

Next, namespace.yaml – which is pretty simple, it just sets the namespace up for this environment of our application:

apiVersion: v1
kind: Namespace

You’ll want to switch out -prod accordingly. You’ll probably want APPLICATION_SHORT_NAME to be something quick and easy to type, like initials of your website.

And, finally, kustomization.yaml:

kind: Kustomization

namespace: <APPLICATION_SHORT_NAME>-prod

- namespace.yaml
- ingress.yaml
- ../../base

- application.yaml

Make sure your namespace matches what you previously created.

Now we’re done setting up our configuration! Onto preparing our cluster…

Preparing your cluster for deployment

There’s two things you’ll need set up in your cluster: an Nginx ingress controller, and cert-manager. These commands should get them both set up nicely:

helm install ingress-nginx ingress-nginx/ingress-nginx \
           --set controller.publishService.enabled=true \
           --set-string controller.config.use-forward-headers=true,controller.config.compute-full-forward-for=true,controller.config.use-proxy-protocol=true \
           --set "''=true"

helm install cert-manager jetstack/cert-manager \
           --namespace cert-manager \
           --version v1.0.1

This will set up a LoadBalancer on DigitalOcean which you will be automatically billed for. There is no good way around this that gives you a reliable static IP that I am aware of, even if you don’t think you need the full power of a load balancer. That said, it’s reasonably affordable – currently $10/month – and should allow you to scale quite a bit before causing any problems.

Go to your LoadBalancer in DigitalOcean, and in the settings make turn on PROXY protocol. We’ve set up the Nginx ingress above so that it will use PROXY protocol, and what this means is your Rails app will be able to get the IP of your users correctly. Otherwise, all of your connections will appear to be coming from your load balancer… Not ideal! And it might make for some very interesting demographic conclusions: all of our users seem to live in the same house in Amsterdam!

Finally, you also need to deploy your certificate.yaml file which is independent of versions, unless you decided not to user a ClusterIssuer. You can do that as follows, from your rails root:

kubectl apply -f k8s/certificate.yaml

The first deploy

Now you’re ready to deploy your application for the first time.

First, you’ll want to set your image version. We discussed earlier how to tag and push an image, and I gave commands for pushing version 0.0.0. If you didn’t do that, go back and do it now. Then, you can run the following commands within the overlays/prod directory – make sure you fill in the three variables at the top first:



kustomize edit set image "${DOCKER_IMAGE}=${VERSIONED_DOCKER_IMAGE}"

The interesting thing here is kustomize edit set image. What it does is add some stuff to your kustomize.yaml so that it will set the image version to 0.0.0 everywhere your image is referenced, which makes it super easy to change version later – just this one kustomize command. You can also add or configure the relevant kustomize configuration by hand, but this command is super useful for building more reliable automation flows.

Once you’ve done that to set it to version 0.0.0, or whatever version you’ve chosen to deploy first, you’re finally ready to deploy your application to Kubernetes.

Run this from your Rails root (or anywhere else and adjust the path accordingly):

kustomize build k8s/overlays/prod/ | kubectl apply -f -

And, boom! Your application is deployed. But you won’t be able to access it right now. First things first, run kubectl get services to find your load balancer’s external IP. If you visit that IP, you should get an Nginx error: it doesn’t know what to do with you, because all it knows to do is route your domain name. So we’ll set that up next. You may have noticed your application’s service is not visible in the results of that command. That’s because it’s deployed to a separate namespace, don’t worry.

Take that external IP, and configure your DNS’ A record accordingly to point to it. It might take a little while to propagate. If you use your ISP’s default DNS (if you don’t know what that means, you probably are), then consider setting up Cloudflare or Google’s. They’re free, easy to set up, and will likely make your browsing faster and more reliable, as well as stopping your ISP DNS jacking you. In this case, it means you should see your domain update instantly!

There’s still some things left to do: your database isn’t set up yet, so your initialisation containers will be failing to run migrate and seed, and your TLS certificate won’t be working yet, but more on that soon…

Secret setup

You need to set up two secrets: a database secret, and your Rails master key.

The files above assume these are stored in rails-db-key and rails-master-key. You need to push these to the right namespace, which I recommended calling <APPLICATION_SHORT_NAME>-prod, but you may have called it something else. Run the commands as follows, using a random password for your DB key:

kubectl create secret generic rails-db-key --namespace <NAMESPACE> --from-literal="key=<RANDOM PASSWORD>"
kubectl create secret generic rails-master-key --namespace <NAMESPACE> --from-literal="key=<YOUR RAILS MASTER KEY>"

And your database needs a first-time setup. That’s an easy fix:

kubectl run -it --rm db-setup --namespace <NAMESPACE> --image=<YOUR RAILS IMAGE PATH WITH VERSION> -- bash

This will give you a bash terminal into your rails app. Just run the usual:

RAILS_ENV=production bin/rails db:setup

Quit the container with exit, and it will automatically get recycled (since we passed the --rm flag). Now your application service should automatically boot up, connect to your database fine, and be working… In HTTP at least…

About those certificate errors…

Now, cert-manager automatically sets up TLS certificates, however it won’t be working right now. For reasons which seem to be being worked on by the Kubernetes folks in collaboration with the various cloud providers, cert-manager cannot do a self-check on ACME challenges while the PROXY protocol is in place, which I gather is because the network routing doesn’t end up leaving the cluster, which means it doesn’t go through the load balancer, and doesn’t get the right headers set up and then gets rejected by the ingress (I may be misunderstanding, but I think this is the gist of it…).

It’s a pretty easy fix, but it’s potentially disruptive: disable the PROXY protocol, delete the certificate to prompt cert-manager to try again (it will do so in due course, but it’s faster to just force it to), and then re-enable once TLS is working. This means for a small period of time once every 90 days (the default renewal length) you will need to either have scheduled downtime or accept the loss of client IP address resolution in your Rails app.

If you truly don’t care about client IP resolution, you can avoid using the PROXY protocol altogether, but I don’t recommend this: IP addresses can be very useful for all sorts of things, not least of all post-incident security analysis.

Anyway, you can do that as follows:

# Or whatever you named your namespace

echo "Disabling proxy protocol, must also be disabled on DigitalOcean load balancer"

helm upgrade ingress-nginx ingress-nginx/ingress-nginx --set controller.publishService.enabled=true --set-string controller.config.use-forward-headers=true,controller.config.compute-full-forward-for=true

echo "Deleting existing certificate"
kubectl delete certificate --namespace "${NAMESPACE}"
echo "Sleeping while certificates refresh..."
sleep 15

echo "Re-enabling proxy protocol"
helm upgrade ingress-nginx ingress-nginx/ingress-nginx --set controller.publishService.enabled=true --set-string controller.config.use-forward-headers=true,controller.config.compute-full-forward-for=true,controller.config.use-proxy-protocol=true --set "''=true"

Make sure you disable PROXY protocol on your DigitalOcean load balancer settings (on the DigitalOcean website) beforehand and re-enable it afterwards. The sleep 15 is likely to be far more time than is actually necessary; you can refresh your website in HTTPS and run the final helm command and adjust the load balancer to re-enable PROXY protocol after the certificate has been issued if you like.


You can add new environments super simply – just copy the overlays/prod folder to, for example, overlays/staging, and then accordingly adjust the files within it to fix the namespace (both in namespace.yaml and in kustomize.yaml), the rails environment flag, and the hostnames in the ingress settings. You’ll need to do everything from secret setup onwards again for that new environment to set it up, but it should mostly be familiar to you.

Note that because of how database migration and seeding is done in init containers, it probably isn’t thread safe, so you can’t just up the replica count as you’d normally want to with Kubernetes. You’ll need to configure something more complicated to be safe against this, sadly. You could have your deployment script automatically shell into the cluster and run db:migrate whenever you run a deploy, for example, or have some fancy CI/CD solution doing it all for you.


I hope this guide has been of use to someone – it took a lot of trial and error to get this working properly and I thought it might be valuable to share. However, I’m very open to feedback to improve this! Please feel free to drop comments with any problems you ran into, constructive criticism, or even just a hello if it helped :).

Graphick – Simple(r) graphing

For one of my courses, I’ve been drawing a load of graphs of a program’s performance; something I also had to do in a course last year. To say the least, it’s a bit of a nightmare.

What stood out to me was that most of the time, the process I was following was almost mechanical — run an application a bunch of times with different inputs, save a portion of the output somewhere, and then either write a script to parse the CSV and generate a graph, or throw it into Excel and generate one by hand. Sometimes I’d throw together a quick shell script to generate the initial data too, but either way it was a lot of context switching between different languages and if I wanted to regenerate the data after a change I also had to mess around to make sure the graph was redrawn.

As we know, though, everything is improved with large configuration files! When in history has a project started with a configuration file, and gradually became more and more complicated? Never, of course!

As a result, I thought it would be fun and helpful to develop a reasonably general utility for creating line graphs to analyse program data – whether it be temporal data for performance analysis, or just plotting the output with varying inputs.

Motivating spoiler: A graph which I generated for my coursework using Graphick

I set out with a few goals, and picked up a few more along the way:

  • It should be easy to write the syntax; preferably nicer than GNUPlot
  • It should be able to handle series of data — multiple lines
  • Any output data or input variable should be able to be on the X axis, the Y axis, or part of the series
  • The data series should be able to be varied by more than one variable – i.e. you might have, as depicted in the picture above, four lines which represent varying two different variables.
  • It should be extensible, so it can support new ways of data processing and rendering easily.
  • Data should be cached, so if the same graph is drawn it can be redrawn without re-running the program
    • Ideally, cache per-data-point, but the current implementation of Graphick just caches per-graph based on the variables and program. This can definitely be implemented in future though.

After a week of hacking, Graphick is the result. Graphick parses a simple file and generates graph(s) based on it.

Here’s a simple Graphick file:

command echo "6 * $num" | bc
title Multiples of six

varying envvar num sequence 1 to 10
data output

When you run Graphick with a file like this, it will proceed to generate the data to graph (or load it from a cache if it has previously generated it) by running the program for every combination of inputs.

Each line of a Graphick file, besides blank lines and comments (beginning with a %), represent some kind of directive to the Graphick engine. The most important directive is the command directive. This begins a new graph based on the command following it.

The text after command is what is executed. In this case, it’s actually a short shell script which pipes a short mathematical expression into bc, which is just a built-in calculator program on Unix. Most of the time, you’ll probably write something more like command ./myApplication $variable.

There are a number of ‘aesthetic’ directives – title, x_label, y_label, series_label. The only complicated one is series_label, which I’ll go into later. For the rest, the text following the directive is simply put where you’d expect on the graph.

The varying and data directives are the most important. varying allows you to specify which variables to run the program for. If you have two variables, which each have six values, then the program will be run with every combination of them — thirty six times. Right now, only environment variables are supported. You write varying envvar <name> <values>. Values can either be a numeric sequence (as in the above example) or a set of values. For example, sequence 1 to 5 or vals 1 2 4 8 15.

Data is the other important one. Only output is supported, currently, which corresponds to lines of stdout. You can also filter for columns, by adding a selection after the directive – for example, data output column 2 separator ,. This would get the second comma-separated column.

Another type of directive, which isn’t featured in this example, is filtering. If you have a program which outputs lots of lines, and you only care about a certain subset of them, you can filter them. There is more detail on this in the repository README, but suffice to say you can filter for columns of output data to be either in or not in a set of data, which can be defined either as a sequence or a set of values. The columns you filter on need not be selected as data, which means you can filter on data which isn’t presented on the graph.

Graphick files can contain multiple graphs by just adding more command directives. Currently, there is no way to share directives between them, so properties like title need to be set for each graph. Here’s an example of two graphs in a single file:

command echo "6 * $num" | bc
title Multiples of six
output six.svg

varying envvar num sequence 1 to 10
data output

command echo "12 * $num" | bc
title Multiples of twelve
output twelve.svg

varying envvar num sequence 1 to 10
data output

As you can see, there’s no need to do anything except add a new command directive – Graphick automatically blocks lines to the most recent command.

As an example of generating more complicated graphs, the graph I featured at the start of this post, which was for my coursework, was generated as follows:

command bin/time_fourier_transform "hpce.ad5615.fast_fourier_transform_combined"
title Comparison of varying both types of parallelism in FFT combined
output results/fast_fourier_recursion_versus_iteration.svg
x_label n
y_label Time (s)
series_label Loop K %d, Recursion K %d

varying series envvar HPCE_FFT_LOOP_K vals 1 16
varying series envvar HPCE_FFT_RECURSION_K vals 1 16

data output column 4 separator ,
data output column 6 separator ,

As you can see, adding the series modifier allows you to turn the variable into data which is used to plot lines, rather than as part of the X/Y axis. There must always be two non-series data sources (where a data source is either a data directive or a varying directive), and the first one always represents the X axis (the second the Y axis). You can have any number of series data sources, which combine in all combinations to create lines. In this graph, both variables take the values one and sixteen, to create four lines in total. The series_label directive takes a format string. The n-th formatting template (both %d in the string) indicates to put the value used for the n-th series variable at that position in the label.

Finally, there is one more directive which is useful: postprocessing. Postprocessing directives allow you to run arbitrary Ruby code to process the resultant data before it is rendered on the graph. Currently, only postprocessing the y axis is supported, but it would be straightforward to add support for postprocessing the x axis and series data. The postprocessing attributes are fed three variables – x, y, and s. x and y are the corresponding values for each data point, and s is an array of all series values at that point, ordered by the definition in the file. For example, if you wanted to normalise the y axis by a certain value, you might do this:

postprocess_y y / 2

Or, you might want to divide it by x and add a constant:

postprocess_y y / x + 5

Imagine the postprocess_y directive to be y = and this syntax should be reasonably intuitive.

So, in summary, Graphick is a somewhat powerful tool for generating of program graphs. You can plot multiple columns of the output, or run the program multiple times to generate multiple outputs — or maybe even a combination of both! Graphick should handle what you throw at it reasonably how you’d expect.

If you come across this, and have feature requests, drop a GitHub issue on the repository, or a comment on this post, and I’ll definitely consider implementing it – especially if it seems like something which is widely useful.

Array length in high-level languages

While pondering writing a standard library for a language I’ve written for my RPG engine, I’ve been stuck considering a question I find pretty interesting – how can one allow a standard library of a language, and ideally only the standard library, to do special operations, like direct memory accesses? Specifically, in my instance, array length accesses, but I was hoping for a more general philosophy.

First, let’s consider the problem of getting the length of an array. In my language, as in many, arrays are a first-class construct, and their allocated length is written in the first word of the array in memory. As a result, getting the length of the array involves reading the word of memory pointed at by the array’s reference.

For example, suppose we had an array of five integers. This would occupy six words of memory; the initial set to the length (5), and then the five elements, all initialised to zero.

This is where the problem arises for my standard library: the language allows  users to index the array in a fairly typical manner – x[0] through x[5] – but these do not correspond directly to the memory accesses (in fact, the memory accesses are all shifted upwards by one to skip the length). As a result, it is not possible to access the array’s length directly through the language.

A trivial solution would be to implement some special new syntax – length x – which generates a direct memory read to the array’s address, hence returning the length. But that’s no fun – it involves making the parser more complicated, adding a special case to code generation, and causes what I would call a “break in immersion” when coding – it’s one less thing that is intuitive and natural to users, when they can do array.sort() but not array.length(). Taking this vein of thought further, we could instead parse it as normal, and hijack it during code generation – if we’re generating code for a method call on an array, then we don’t try generating a classic method call, but instead directly output memory access code.

This approach has many benefits, in that it’s trivial to implement, and doesn’t add any special cases to parsing (just code generation), or increase the mental load for users too much. Essentially, to end users, this is a fairly seamless approach, but it still leaves something to be desired – now some array logic, such as sorting, is encoded in higher level code, but some is just hard-coded into the compiler.

Maybe that’s an acceptable loss, but I was still interested in how other languages had solved this problem, so I looked into how Java and C# solved this issue.


Java seems to solve this by having a dedicated JVM instruction called arraylength – this is along the lines of what I was saying above, where the compiler hijacks what syntactically looks like a field access. Syntactically, it is next to identical to your average field, but you can use reflection to prove that it’s not actually a field.


C# seems to take a very similar approach to Java (unsurprisingly, given the similarity between the two), with a CIL instruction ldlen (this article is a goldmine for related information)


I really intended to look into quite a lot more languages – specifically Python, Ruby and Lua, but didn’t really have time. Digging through the Python compiler to find the answer was taking me quite a long time. If anyone stumbles upon this and happens to know how they handle it, I’d love a comment.

It does seem like the mainstream approach is just a special case in code generation, though. Personally, I was expecting an approach where verified library code would be able to hold lower-level code in it (like inline assembly in C) to avoid this, but this seems like quite an overkill feature in retrospect.

RPG engine development blog 1 – Inventory

I thought I’d start posting updates on my RPG engine’s development, as it’s starting to near to a point where a simple game could actually be built on it, and it’s nice to discuss it somewhere. While many features have already been implemented – dialog, maps and movement, NPCs, among others – there’s still a lot to do; dialog is still not as dynamic as I’d like, the scripting language isn’t super great so far, combat doesn’t exist, items barely exist, there’s no way to change area, etc. If you’re interested, a more detailed documentation is available at it’s README on GitHub, or on my projects page.

My first goal was to get some form of visualising the inventory. I’d made some GUI system for the dialog, so I leveraged that existing system and improved it where necessary. There wasn’t a way of having a GUI element render at a fixed position so I added a container that just offsets its children by a fixed amount, which is used to get the 30 item slots aligning nicely.

Initial stages of inventory development – moving items

After this, I hooked it up to the player’s inventory, and started displaying items from the inventory in the slots. After that, some trivial changes made it so that you can click an item to pick it up and then click elsewhere to drop it. This is visible in the image above.

Next, I added some inventory slots for nearby items on the ground, modified the map and hooked it up to allow the player to drop items where they are standing and pick up close-by items.

Some entity mis-rendering is visible in this video – you can see them clipping on top of the tree and over each-other. Currently, entity rendering is done after the entire map is rendered, rather than being interlaced, which leads to this issue. However, dropping items and collecting them works correctly as one may expect.

Next, I made some minor polish to hide the mouse cursor and render held items in the correct place when the player is moving an item.

I’ll be updating this post as I progress through the inventory system, hopefully completing it sometime in the coming weeks.

IC Hack 17 – My first hackathon!

I recently attended IC Hack 17, Imperial College’s Department of Computing Society’s annual hackathon, where I and three others grouped together to create a game.

IC Hack Projection on Queen’s Tower – Picture from @ICHackUK (By Paul Balaji @thepaulbalaji)

The general atmosphere at the event was great – the team had done a fantastic job of making Imperial’s campus feel like an event centre, and gratuitous amounts of plugs (for laptops!), posters, signs, and even a gigantic IC Hack logo projected onto the resident Queen’s Tower gave the venue a strong feeling of organisation and, for the briefest of times, I managed to suspend the belief that I was just sitting in a cafeteria.

The catering deserves a second, more thorough mention – whenever snacks were even remotely near running out, a brand new bunch would appear almost like magic (but perhaps some credit should also be given to the volunteers); not to mention the Domino’s, the dinners provided, and the breakfast of sausages in buns. (Hotdogs?)

In terms of my hack, to show for our 30+ sleepless hours, my team and I created a zombie shooting game which we rather imaginatively called “Zombie Hack,” in Unity with C#. It’s a classic wave-based zombie game, where the waves progressively get bigger and tougher, but with a twist – after you save up enough money, you can buy walls and towers.

The walls and towers would completely block the zombies, but they’re not outsmarted that easily – with some help from Unity (and our resident Unity expert, Marek Beseda), we added pretty awesome path-finding so they’d find their way to you; a bit like a tower defence game!

Unfortunately, as we suspected, playtesting showed a bit of a flaw in our strategy: players would build up towers and create an impenetrable square defence, meaning zombies just walk up to your defences and hang around until they get killed by them.

Fighting a small wave

We initially decided to solve this by just making zombies damage structures, but a combination of zombies dying before they get close, and also some difficulties in making the pathfinding engine happy to walk into walls made this a difficult path. Instead, we created a new variant of zombies to complement the existing 7 (Stupid zombies who have low stats in general, Slow zombies which just walk a bit slowly, Normal zombies that are relatively balanced, Fast zombies who move faster than usual, Fighter zombies who move a tiny bit faster but do a lot more damage, tank zombies with tens of times more health, and boss zombies with loads of health, speed and damage), which got colloquially (and semi-officially) named kamikaze zombies. These zombies would spawn with other zombies in their wave, but had a special quirk: Unlike other zombies, who only chased the player, these zombies would raycast towards the player when they spawned.


If this raytrace hit the player, the zombies act normal, with the exception of them blowing up when they get close, immediately killing both themselves and the player. But, and the real quirk is here, if the raytrace hit a wall or tower, the zombies go into charge mode – they target that specific building the ray hit, 5x their speed, and charge at it. If they succeed (Which we found to be around 60-70% of the time), they spawn¬†2 more of this variant of zombie. This means that even structures made entirely of towers (Which we thought were slightly overpowered), eventually you’d get a few unlucky combinations of tank and kamikazes in a row, and before you know it there’s explosions on all sides of your base, and the towers quickly get overwhelmed.

A much bigger horde!

We thought after this that the game-play was fairly exciting and balanced, and we were fairly excited to demo it to other participants and the judges. Whilst we didn’t win anything, it was great to see everyone’s reactions and even better to see how long we managed to keep some of the volunteers occupied!

On the topic of winning, I think it’s definitely worth a mention to the winning team in our category (games) – Karma, a horror game. You can see it here¬† The polish they managed to produce in just a weekend was incredible.

Another great submission I particularly liked, which unfortunately haven’t got a video or photographs on DevPost, called Emotional Rollercoaster, was a Kinect-based game where it would show you an expression name (such as disgust) and a photograph of someone (usually Trump or Clinton) making that expression, which you would then have to try to make. If you managed to convince the Microsoft API they then sent their data to, then you’d get some points – which were displayed in a pretty awesome fashion. They’d created a small roller-coaster in wood, with a car that drives forward a bit when you made the correct expression (varying based on how well) and went back if you didn’t. While their balancing seemed a little off – they had to hold the coaster car back to demo it sufficiently – that’s a pretty small issue and easy to fix, and it still looked pretty fun to play – I’m sure there’s a lot of untapped potential in analysing the expressions of players in a video game.

If you’d like to give my team’s game a go, you can clone it here¬† It was built for Unity, so you’ll need that too. I’ve built it for Windows users here:¬†¬†Let me know if you beat my high score of 32 waves! ūüôā

High Scores:

Alberto Spina: 110 waves [I do not recommend attempting to beat this score if you want to achieve anything productive in your life]

vTables and runtime method resolution

My first second year university group project recently came to an end, in which we had to implement a fully functional compiler that generated machine code from a relatively simplistic procedural language. It had functions, nested expressions, and a strong dynamic typing system. In addition to typical stack-stored primitive types such as int, char, and bool, it also had two dynamically allocated types; arrays and pairs.

For the final milestone of the compiler, we had two weeks to implement our choice of an extension. We chose, fairly ambitiously, to implement a number of extensions, one of which was a fully working Object Oriented system which I took most of the responsibility for implementing.

In the end, our OO system ended up supporting three main types of user-defined types: structs, interfaces, and classes. Structs were relatively simplistic – little more than a fixed-size pieces of memory on the heap. Interfaces allowed you to declare methods which a class implementing the interface would have to implement, and classes allowed you to define methods and fields together.

Our classes supported extending a single other class, and implementing any number of interfaces. Semantically, this worked nearly identical to in Java Рin fact, our implementation ended up creating a language that was essentially a subset of Java. The implementation had what are often referred to as the three pillars of object oriented programming Рinheritance, encapsulation (public, private and protected, where private was class-only and protected was class-and-subclass-only), and polymorphism (run-time method dispatch).

This post is primarily about how I structured objects on the heap to allow them to support the run-time method dispatch – this was something we found difficult to research, as most resources about vTables and the like tend to be about C++ and rarely are about the low-level implementation. We found that C++ compilers would often optimise away the vTables, even with optimisations turned off, making it very difficult to analyse the ones it generated. As a result of these, I decided to write a summary of how I went about implementing it, in the hope that it is useful to others.

The system I ended up settling with results in a tiny amount of extra overhead for method calls on classes, but adds a significant overhead on calls to interfaces. This is something that I am sure is not done optimally, and as such I certainly do not wholeheartedly recommend this as a perfect way of laying out objects on the heap.

First, to consider why this is not a trivial problem, consider a basic example:

class A { void printAString() { print "A string" } }
class B extends A { void printAString() { print "Another string" } void anotherFunction() { print "Hello, World!" } }
void main() {
    A a = new B();

A na√Įve implementation of this would print out something that may be unexpected if you come from a background that leads you to expect the kind of behaviour that run-time dispatch allows, which is “A string”, rather than “Another string”. This comes about as the call to “a”¬† seems, to the compiler,¬†to be acting on an A, not a B. This could theoretically be avoided by intelligently (as far as a compiler is concerned) noticing that it’s instantiated as a B, but this causes some problems – what if you have a function accepting an A as a parameter, or you instantiate the A in two mutually exclusive branches differently (for example an “if” that sets it to a B, but an “else” that sets it to an A)? These problems, so far as I am aware, make it nigh-on impossible to implement it with¬†compile-time code analysis (Though, theoretically, you could generate a huge file that accounts for every branching possibility and expands them cleverly, this would certainly not be viable for large files).

So the way I solved this, and indeed, I believe, the way it is typically solved, is to create a table of methods for each object at compile-time, which is included in the assembly code generated (and hence the runtime, in some form). I implemented it in a fashion such that every object would have a table for itself, and an additional one for each object it “is” – that is, that it extends or implements. In the example above, the following might be generated:


Function labels here are named as “o_ImplementingType_functionName”. As you can see, the superclass function is in the same location in both tables. This means that, were a function calling a method on an object typed as A to use this as a layer of indirect to access A’s method, then it would be “tricked” into calling B’s version instead.

We then stored objects on the heap similarly to the instance of B that follows:

B's type identification number
B's vTable address
A's fields
B's fields
Interface information about B's implemented interfaces

To call a method on an instance of a class, you would have to navigate a word down from the object pointer, load the vTable pointer stored, use this to load the vTable, then add the offset of the method (this is known at compile time – for example, “printAString” will always be the zeroth method in the vTable). Then, load the value¬†stored here¬†into the program counter to jump to the method.

This is trickier for interfaces – since an object can implement many, we can’t just have¬†a fixed position in the vTable for each method. There is undoubtedly a better way to do this, but I chose to put a few pieces of data about them at the end of the object. For each interface, the following was stored:

The interface's type identification number
The address of a custom version of this interface's vTable with correctly overloaded methods

Additionally, at the end of an object, a zero is stored, which denotes the end of the object. Interface vTables are looked up by a simple, but fairly significant overhead-adding algorithm: look to the interface section with the interface offset information stored at the beginning of the object, and then repeatedly jump two words until either the interface identification number you want is found, or you reach a zero (The zero indicating an explicit cast has¬†failed – it should never occur on an implicit cast). Once found, jump a single word to get the vTable of the interface, then look down a fixed offset to find the method you are interested in calling. The overhead added by this is somewhere in the line of 6 instructions, plus potentially up to four times the total number of interfaces (depending on how far down the one you are interested in is in the list). This is clearly suboptimal, and an approach I considered is, when an object is cast to an interface, moving it’s¬†pointer down to the start of the relevant interface, but this would have been considerably more difficult to integrate into the codebase, as all implicit cast locations would need to be considered.

There is undoubtedly a better way to handle interfaces РI am very interested in finding this out; feel free to contact me if you know of a more optimal way or are wondering about anything I have explained above.