Deploying a multi-environment Ruby on Rails site to DigitalOcean Kubernetes

This guide is focused on a small, single-instance server. Initialisation logic in particular is probably not safe against horizontal scaling. Beware! ūüôā

This guide will go through:

  • Setting up your development environment for Kubernetes deployment.
  • Dockerising your Rails application
  • Setting up a container registry for deploying images to.
  • Setting up Kubernetes for your Rails project, with a focus on supporting multiple environments deployed in production (e.g. a staging and production instance).
  • Setting up Postgres instances for each of your environments.
  • Setting up a single Nginx ingress for your cluster which allows ingress to all your environments (which are otherwise isolated).
  • Setting up TLS with cert-manager, as well as a current caveat with the Nginx ingress and how you can avoid it.

Running Ruby on Rails in production can be one of the most hair-pulling steps to getting your new application up and running, especially in contrast to how elegant most of the process of writing a Rails application is.

One of Kubernetes’ biggest benefits is how it allows you to scale applications and leverage the power of the cloud, but similarly nice is how it lets you write declarative (as opposed to imperative) configuration for your services, rather than managing a VPS yourself, with all the trouble that entails. You can free yourself from manual iptables / ufw management, not worry as much about things like what starts your service & restarts it if it crashes, as well as developing skills that can come in useful in modern cloud-based businesses.

All that said, it presents its own difficulties. I ran into quite a few hold-ups, ranging from certificate issuance to serving static files from Rails through Nginx.

Setup

First, you’ll want to make sure you have a local Kubernetes development environment with Kustomize installed. If you’re on macOS that’s as easy as running:

brew install kubectl   # Kubernetes' CLI
brew install kustomize # Fantastic templating engine
brew install doctl     # DigitalOcean CLI

As well as installing Docker which you can currently do at https://docs.docker.com/engine/install/.

You’ll then want to set up your Kubernetes cluster in DigitalOcean. I went with a simple two-$10-node setup. Keep in mind you’ll also need a load balancer (currently $10/month), a container registry, as well as a bunch of persistent volumes. The latter aren’t hugely expensive, but will likely add up to a couple of dollars a month.

Dockerising Rails

If you don’t know much about Docker, it’s worth having a quick read up on it. But in short, Docker allows you to generate portable images of your application with batteries included, which can then be pushed to a container registry, which allows you to run them inside Kubernetes pods.

Docker containers are specified by a Dockerfile. Most commands generate a new layer, and layers are composed together to create the final image. Docker has intelligent caching, which means it’s best to put things that don’t change much (like system library installs for Nokigiri) first, and things that change often further down (such as your applications’ files).

Here’s my Dockerfile, which may help you dockerise your application. Full disclaimer – there may be better ways to do it – but I’ve found this to work quite well. You’ll also need to substitute your application’s ‘name’ where I’ve written <APPLICATION NAME> in a format that works as a folder name. If you choose to use this, you’ll need to save it as a file called Dockerfile in the same folder as your rails root.

FROM ruby:2.7.0

RUN curl -sS https://dl.yarnpkg.com/debian/pubkey.gpg | apt-key add -
RUN echo "deb https://dl.yarnpkg.com/debian/ stable main" | tee /etc/apt/sources.list.d/yarn.list
RUN apt-get update -qq && apt-get install -y build-essential nodejs yarn

# Postgres
RUN apt-get install -y libpq-dev

# Nokigiri
RUN apt-get install -y libxml2-dev libxslt1-dev

# Capybara-webkit
RUN apt-get install -y libqt4-dev xvfb

ENV APP_HOME /<APPLICATION NAME>
RUN mkdir $APP_HOME
WORKDIR $APP_HOME

ADD Gemfile* $APP_HOME/
RUN bundle config build.nokogiri --use-system-libraries
RUN bundle install

ADD . $APP_HOME
# Dummy value to satisfy Rails
ARG SECRET_KEY_BASE=DUMMY
# You can still run non-production environments from 
# this Dockerfile, but this makes sure assets are compiled
# targeting production.
ARG RAILS_ENV=production
RUN yarn install --check-files
RUN bundle exec rake assets:precompile RAILS_ENV=production

Notice that we set SECRET_KEY_BASE=DUMMY. We will be deploying our Rails master key as a Kubernetes secret later, but sadly rake assets:precompile currently expects it to be around due to a dependency within that command, even though it doesn’t use it for anything. As a result, setting it to a dummy variable allows everything to run smoothly.

One more thing – notice that we specifically add Gemfile* (i.e. both Gemfile and Gemfile.lock) separately to everything else. This is because our application as a whole probably changes a lot more often than our Gemfile does. By ordering our Dockerfile like this, Docker can cache the layers involved in installing and setting up gems and avoid doing it every time something in your application changes.

Once your Dockerfile is set up, running docker build . should work. If so, you’re ready to continue (although you may want to actually run it to test it works correctly and everything is set up right, that’s out of the scope of this guide).

Setting up a container registry

Just like source code is best pushed to a source control repository, containers are best served col–.. er, I mean, in a container registry. This allows Kubernetes to pull them down and centralises your application’s runnable images.

DigitalOcean has a private container registry system in beta right now. You can set one up under Images -> Container Registry. Once that’s done, you’ll need to run doctl registry login in a terminal, which will set your Docker CLI up to be able to push to your container registry.

Once done, try it out. Your previous docker build (or just run docker build . now if you haven’t already) should have given you a hash at the end, for example it might look like:

Successfully built b952cefba0ac

You can tag the hash there to tag and push an image, as follows:

REGISTRY_NAME="YOUR_REGISTRY_NAME_HERE"
IMAGE="YOUR_APPLICATION_NAME_HERE"
DOCKER_IMAGE_ID="YOUR_HASH_HERE"
VERSION="0.0.0"

DOCKER_REGISTRY="registry.digitalocean.com/${REGISTRY_NAME}"
IMAGE="${DOCKER_REGISTRY}/${IMAGE}:${VERSION}"

docker tag "$DOCKER_IMAGE_ID" "$IMAGE"
docker push "$IMAGE"

Setting up your Kubernetes cluster

You can set up your Kubernetes cluster using Terraform, but for this guide I suggest doing it in the UI. Note that currently during the early access, DigitalOcean seems to limit container registries to Amsterdam (AMS3). If so, it’s probably worth colocating your Kubernetes cluster in the same region if you don’t have a good reason not to. Use the latest Kubernetes version, and customise your Kubernetes cluster however you like. Personally, I went with two small ($10) nodes.

Then you’ll want to set up your kubectl CLI to be able to access the cluster. That’s pretty easy:

doctl kubernetes cluster kubeconfig save <CLUSTER NAME>

Deploying your Rails application

Now we’ll deploy our Rails application. While setting up, you’ll probably want to hard-core your application controller to show a maintenance page and perhaps even use a subdomain for the time being.

First, you’ll need to set up your Kubernetes configuration. Make a folder structure as follows (with empty files for now):

k8s/
  certificate_issuer.yaml
  base/
    database.yaml
    application.yaml
    kustomization.yaml
  overlays/
    prod/
      application.yaml
      ingress.yaml
      kustomization.yaml
      namespace.yaml

Recall your application name you used earlier for your Docker folder name. You needn’t use the same name for your Kubernetes labels, but it’s probably best to be consistent so I’ll be assuming you are doing so.

Within the base folder, you set up the basics shared between all of your deployed environments, so we’ll start with application.yaml:

apiVersion: v1
kind: Service
metadata:
  name: <APPLICATION_NAME>
spec:
  type: ClusterIP
  ports:
  - name: rails
    port: 80
    targetPort: 8080
  - name: assets
    port: 81
    targetPort: 80
  selector:
    app: <APPLICATION_NAME>
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: <APPLICATION_NAME>
spec:
  replicas: 1
  selector:
    matchLabels:
      app: <APPLICATION_NAME>
  template:
    metadata:
      labels:
        app: <APPLICATION_NAME>
    spec:
      volumes:
      - name: public-assets 
        emptyDir: {}
      initContainers:
      - name: init-static-files
        image: registry.digitalocean.com/<REGISTRY>/<IMAGE>
        volumeMounts:
        - name: public-assets
          mountPath: /public
      - name: db-migrate
        image: registry.digitalocean.com/<REGISTRY>/<IMAGE>
        command: ["bin/rails"]
        args: ["db:migrate"]
      - name: db-seed
        image: registry.digitalocean.com/<REGISTRY>/<IMAGE>
        command: ["bin/rails"]
        args: ["db:seed"]
      containers:
      - name: <APPLICATION_NAME>
        image: registry.digitalocean.com/<REGISTRY>/<IMAGE>
        command: ["bin/rails"]
        args: ["server", "--environment", "production", "--port", "8080"]
        ports:
        - containerPort: 8080
        volumeMounts:
        - name: public-assets
          mountPath: /<APPLICATION_NAME>/public
          subPath: public
        env:
        - name: RAILS_MASTER_KEY
          valueFrom:
            secretKeyRef:
              name: rails-master-key
              key: key
        - name: DATABASE_USERNAME
          value: postgres
        - name: DATABASE_PASSWORD
          valueFrom:
            secretKeyRef:
              name: rails-db-key
              key: key
      - name: nginx
        image: nginx
        ports:
        - containerPort: 80
        volumeMounts:
        - name: public-assets
          mountPath: /usr/share/nginx/html
          subPath: public

Replace <APPLICATION_NAME> with your application name throughout, and <REGISTRY> and <IMAGE> with your DigitalOcean registry name and image name throughout. Do not include a version on your images – Kustomize will handle that for us later on.

There’s a lot to unpack here. We’ve included both a Service and a Deployment in the same file, although you can split it into two files if you so wish. The triple hyphen in YAML separating the two definitions is essentially a “file break”.

First off, the deployment. We’re running our Rails server on port 8080, and an Nginx server on port 80. These are pod-specific ports and won’t be exposed to the internet, don’t worry. They’ll be used in our networking within the cluster.

The most confusing thing going on here is how we’re managing public asset serving. There’s certainly better ways to do it than this than what I’ve done here, like pushing your static assets to e.g. a CDN, S3 bucket, DigitalOcean space, etc. however this is a fairly simple approach that works pretty well. What we do is make use of the fact that our built image has all our public assets sitting nicely in the public/ folder. We create a volume called public-assets, which is mounted to both our Nginx container (which actually serves the static assets) and our application container. We abuse Kubernetes’ support for init containers, which sequentially run prior to your application’s container running, and make a container that runs your application’s image and copies all the public files onto the public volume mount.

This trick actually works slightly better in docker-compose instead of Kubernetes, as you can mount a shared volume onto an existing folder to automatically include the files in that folder. Sadly, it doesn’t appear to be possible in Kubernetes, but this gets around that limitation, albeit not incredibly elegantly.

We also run two other init containers, one to migrate our database and another to seed it. I’m assuming your db:seed operation is coded to be idempotent, that is to say, running it multiple times has no effect. This is generally good practice because it means you can seed new data (such as when you add a new table in a migration) when it’s added. If your seeding is not idempotent, you will want to remove the relevant init container and seed manually the same way we do a database setup below.

Note that we do not set up arguments to our Rails command to set the environment and port; don’t worry, that will be in the environment-specific configuration to come.

Next we set up database.yaml:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
  storageClassName: do-block-storage
---
apiVersion: v1
kind: Service
metadata:
  name: db
spec:
  selector:
    app: postgres
  ports:
    - protocol: TCP
      port: 5432
      targetPort: 5432
  type: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: postgres
spec:
  replicas: 1
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
        - name: postgres
          image: postgres:12.4
          imagePullPolicy: "IfNotPresent"
          ports:
            - containerPort: 5432
          env:
            - name: POSTGRES_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: rails-db-key
                  key: key
          volumeMounts:
            - mountPath: /var/lib/postgresql/data
              name: postgredb
              subPath: postgres
      volumes:
        - name: postgredb
          persistentVolumeClaim:
            claimName: pvc

This sets up a persistent volume claim which will automatically set up a 5Gi DigitalOcean volume for you, and attaches it to the Postgres database which it also sets up. Nice and simple.

This is a good time to note that this guide does not cover exporting metrics and logs – you won’t get any warning when your database is getting full, or or it’s erroring. That’s something you’ll want to set up afterwards as part of productionising.

We refer to a cluster-issuer in this, which we’ll set up soon, but first let’s fill in the kustomization.yaml within base/:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
  - application.yaml
  - database.yaml

We’re getting close, now, but there’s still a few more pieces to slide into place. Next, we set up a ClusterIssuer, which is one of the resources provided by cert-manager (which we’ll install into our cluster shortly) inside certificate_issuer.yaml:

---
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    # Email address used for ACME registration
    email: <YOUR EMAIL>
    server: https://acme-v02.api.letsencrypt.org/directory
    privateKeySecretRef:
      # Name of a secret used to store the ACME account private key
      name: tls-key
    # Add a single challenge solver, HTTP01 using nginx
    solvers:
    - http01:
        ingress:
          class: nginx

Make sure you replace your email. Cert-manager automatically manages our TLS certificate renewal for us. Our ingress we wrote earlier references the cluster issuer above in annotations, which will automatically cause it to issue certificates for them.

Notice that this file is not contained within the base folder. This is because you only need a single ClusterIssuer in a cluster, and it will work across all Kubernetes namespaces. If you prefer to have an issuer per environment, you can instead move it in, add it to the Kustomization file, and change it from a ClusterIssuer to an Issuer (the rest of the file can remain the same).

Next, we set up our individual environments.

First, ingress.yaml:

apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: ingress
  annotations:
    kubernetes.io/ingress.class: nginx
    cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  tls:
  - hosts:
    - <DOMAIN NAME>
    secretName: tls-key
  rules:
  - host: <DOMAIN NAME>
    http:
      paths:
      - path: /assets
        backend:
          serviceName: <APPLICATION NAME>
          servicePort: 81
      - path: /packs
        backend:
          serviceName: <APPLICATION NAME>
          servicePort: 81
      - path: /
        backend:
          serviceName: <APPLICATION NAME>
          servicePort: 80

Notice that our ingress rules set up the public folders to forward to port 81 (the Nginx file server) on our application service, and everything else to our Rails backend on port 80.

Next, the environment-specific application.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: <APPLICATION_NAME>
spec:
  template:
    spec:
      containers:
        - name: <APPLICATION_NAME>
          args: ["server", "--environment", "production", "--port", "8080"]

Kustomize will merge this with our top-level base deployment; all we’re doing here is adding the argument list to set the environment. You may prefer to do this through an environment variable instead.

Next, namespace.yaml – which is pretty simple, it just sets the namespace up for this environment of our application:

apiVersion: v1
kind: Namespace
metadata:
  name: <APPLICATION_SHORT_NAME>-prod

You’ll want to switch out -prod accordingly. You’ll probably want APPLICATION_SHORT_NAME to be something quick and easy to type, like initials of your website.

And, finally, kustomization.yaml:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

namespace: <APPLICATION_SHORT_NAME>-prod

resources:
- namespace.yaml
- ingress.yaml
- ../../base

patchesStrategicMerge:
- application.yaml

Make sure your namespace matches what you previously created.

Now we’re done setting up our configuration! Onto preparing our cluster…

Preparing your cluster for deployment

There’s two things you’ll need set up in your cluster: an Nginx ingress controller, and cert-manager. These commands should get them both set up nicely:

helm install ingress-nginx ingress-nginx/ingress-nginx \
           --set controller.publishService.enabled=true \
           --set-string controller.config.use-forward-headers=true,controller.config.compute-full-forward-for=true,controller.config.use-proxy-protocol=true \
           --set "'controller.service.annotations.service.beta.kubernetes.io.do-loadbalancer-enable-proxy-protocol'=true"

helm install cert-manager jetstack/cert-manager \
           --namespace cert-manager \
           --version v1.0.1

This will set up a LoadBalancer on DigitalOcean which you will be automatically billed for. There is no good way around this that gives you a reliable static IP that I am aware of, even if you don’t think you need the full power of a load balancer. That said, it’s reasonably affordable – currently $10/month – and should allow you to scale quite a bit before causing any problems.

Go to your LoadBalancer in DigitalOcean, and in the settings make turn on PROXY protocol. We’ve set up the Nginx ingress above so that it will use PROXY protocol, and what this means is your Rails app will be able to get the IP of your users correctly. Otherwise, all of your connections will appear to be coming from your load balancer… Not ideal! And it might make for some very interesting demographic conclusions: all of our users seem to live in the same house in Amsterdam!

Finally, you also need to deploy your certificate.yaml file which is independent of versions, unless you decided not to user a ClusterIssuer. You can do that as follows, from your rails root:

kubectl apply -f k8s/certificate.yaml

The first deploy

Now you’re ready to deploy your application for the first time.

First, you’ll want to set your image version. We discussed earlier how to tag and push an image, and I gave commands for pushing version 0.0.0. If you didn’t do that, go back and do it now. Then, you can run the following commands within the overlays/prod directory – make sure you fill in the three variables at the top first:

REGISTRY="<YOUR REGISTRY NAME>"
IMAGE="<YOUR IMAGE NAME>"
VERSION="<YOUR VERSION>"

DOCKER_IMAGE="registry.digitalocean.com/${REGISTRY}/${IMAGE}"
VERSIONED_DOCKER_IMAGE="${DOCKER_IMAGE}:${VERSION}"

kustomize edit set image "${DOCKER_IMAGE}=${VERSIONED_DOCKER_IMAGE}"

The interesting thing here is kustomize edit set image. What it does is add some stuff to your kustomize.yaml so that it will set the image version to 0.0.0 everywhere your image is referenced, which makes it super easy to change version later – just this one kustomize command. You can also add or configure the relevant kustomize configuration by hand, but this command is super useful for building more reliable automation flows.

Once you’ve done that to set it to version 0.0.0, or whatever version you’ve chosen to deploy first, you’re finally ready to deploy your application to Kubernetes.

Run this from your Rails root (or anywhere else and adjust the path accordingly):

kustomize build k8s/overlays/prod/ | kubectl apply -f -

And, boom! Your application is deployed. But you won’t be able to access it right now. First things first, run kubectl get services to find your load balancer’s external IP. If you visit that IP, you should get an Nginx error: it doesn’t know what to do with you, because all it knows to do is route your domain name. So we’ll set that up next. You may have noticed your application’s service is not visible in the results of that command. That’s because it’s deployed to a separate namespace, don’t worry.

Take that external IP, and configure your DNS’ A record accordingly to point to it. It might take a little while to propagate. If you use your ISP’s default DNS (if you don’t know what that means, you probably are), then consider setting up Cloudflare or Google’s. They’re free, easy to set up, and will likely make your browsing faster and more reliable, as well as stopping your ISP DNS jacking you. In this case, it means you should see your domain update instantly!

There’s still some things left to do: your database isn’t set up yet, so your initialisation containers will be failing to run migrate and seed, and your TLS certificate won’t be working yet, but more on that soon…

Secret setup

You need to set up two secrets: a database secret, and your Rails master key.

The files above assume these are stored in rails-db-key and rails-master-key. You need to push these to the right namespace, which I recommended calling <APPLICATION_SHORT_NAME>-prod, but you may have called it something else. Run the commands as follows, using a random password for your DB key:

kubectl create secret generic rails-db-key --namespace <NAMESPACE> --from-literal="key=<RANDOM PASSWORD>"
kubectl create secret generic rails-master-key --namespace <NAMESPACE> --from-literal="key=<YOUR RAILS MASTER KEY>"

And your database needs a first-time setup. That’s an easy fix:

kubectl run -it --rm db-setup --namespace <NAMESPACE> --image=<YOUR RAILS IMAGE PATH WITH VERSION> -- bash

This will give you a bash terminal into your rails app. Just run the usual:

RAILS_ENV=production bin/rails db:setup

Quit the container with exit, and it will automatically get recycled (since we passed the --rm flag). Now your application service should automatically boot up, connect to your database fine, and be working… In HTTP at least…

About those certificate errors…

Now, cert-manager automatically sets up TLS certificates, however it won’t be working right now. For reasons which seem to be being worked on by the Kubernetes folks in collaboration with the various cloud providers, cert-manager cannot do a self-check on ACME challenges while the PROXY protocol is in place, which I gather is because the network routing doesn’t end up leaving the cluster, which means it doesn’t go through the load balancer, and doesn’t get the right headers set up and then gets rejected by the ingress (I may be misunderstanding, but I think this is the gist of it…).

It’s a pretty easy fix, but it’s potentially disruptive: disable the PROXY protocol, delete the certificate to prompt cert-manager to try again (it will do so in due course, but it’s faster to just force it to), and then re-enable once TLS is working. This means for a small period of time once every 90 days (the default renewal length) you will need to either have scheduled downtime or accept the loss of client IP address resolution in your Rails app.

If you truly don’t care about client IP resolution, you can avoid using the PROXY protocol altogether, but I don’t recommend this: IP addresses can be very useful for all sorts of things, not least of all post-incident security analysis.

Anyway, you can do that as follows:

# Or whatever you named your namespace
NAMESPACE="<YOUR APPLICATION>-prod"

echo "Disabling proxy protocol, must also be disabled on DigitalOcean load balancer"

helm upgrade ingress-nginx ingress-nginx/ingress-nginx --set controller.publishService.enabled=true --set-string controller.config.use-forward-headers=true,controller.config.compute-full-forward-for=true

echo "Deleting existing certificate"
kubectl delete certificate --namespace "${NAMESPACE}"
echo "Sleeping while certificates refresh..."
sleep 15

echo "Re-enabling proxy protocol"
helm upgrade ingress-nginx ingress-nginx/ingress-nginx --set controller.publishService.enabled=true --set-string controller.config.use-forward-headers=true,controller.config.compute-full-forward-for=true,controller.config.use-proxy-protocol=true --set "'controller.service.annotations.service.beta.kubernetes.io.do-loadbalancer-enable-proxy-protocol'=true"

Make sure you disable PROXY protocol on your DigitalOcean load balancer settings (on the DigitalOcean website) beforehand and re-enable it afterwards. The sleep 15 is likely to be far more time than is actually necessary; you can refresh your website in HTTPS and run the final helm command and adjust the load balancer to re-enable PROXY protocol after the certificate has been issued if you like.

Notes

You can add new environments super simply – just copy the overlays/prod folder to, for example, overlays/staging, and then accordingly adjust the files within it to fix the namespace (both in namespace.yaml and in kustomize.yaml), the rails environment flag, and the hostnames in the ingress settings. You’ll need to do everything from secret setup onwards again for that new environment to set it up, but it should mostly be familiar to you.

Note that because of how database migration and seeding is done in init containers, it probably isn’t thread safe, so you can’t just up the replica count as you’d normally want to with Kubernetes. You’ll need to configure something more complicated to be safe against this, sadly. You could have your deployment script automatically shell into the cluster and run db:migrate whenever you run a deploy, for example, or have some fancy CI/CD solution doing it all for you.

End

I hope this guide has been of use to someone – it took a lot of trial and error to get this working properly and I thought it might be valuable to share. However, I’m very open to feedback to improve this! Please feel free to drop comments with any problems you ran into, constructive criticism, or even just a hello if it helped :).

Graphick – Simple(r) graphing

For one of my courses, I’ve been drawing a load of graphs of a program’s performance; something I also had to do in a course last year. To say the least, it’s a bit of a nightmare.

What stood out to me was that most of the time, the process I was following was almost mechanical — run an application a bunch of times with different inputs, save a portion of the output somewhere, and then either write a script to parse the CSV and generate a graph, or throw it into Excel and generate one by hand. Sometimes I’d throw together a quick shell script to generate the initial data too, but either way it was a lot of context switching between different languages and if I wanted to regenerate the data after a change I also had to mess around to make sure the graph was redrawn.

As we know, though, everything is improved with large configuration files! When in history has a project started with a configuration file, and gradually became more and more complicated? Never, of course!

As a result, I thought it would be fun and helpful to develop a reasonably general utility for creating line graphs to analyse program data – whether it be temporal data for performance analysis, or just plotting the output with varying inputs.

Motivating spoiler: A graph which I generated for my coursework using Graphick

I set out with a few goals, and picked up a few more along the way:

  • It should be easy to write the syntax; preferably nicer than GNUPlot
  • It should be able to handle series of data — multiple lines
  • Any output data or input variable should be able to be on the X axis, the Y axis, or part of the series
  • The data series should be able to be varied by more than one variable – i.e. you might have, as depicted in the picture above, four lines which represent varying two different variables.
  • It should be extensible, so it can support new ways of data processing and rendering easily.
  • Data should be cached, so if the same graph is drawn it can be redrawn without re-running the program
    • Ideally, cache per-data-point, but the current implementation of Graphick just caches per-graph based on the variables and program. This can definitely be implemented in future though.

After a week of hacking, Graphick is the result. Graphick parses a simple file and generates graph(s) based on it.

Here’s a simple Graphick file:

command echo "6 * $num" | bc
title Multiples of six

varying envvar num sequence 1 to 10
data output

When you run Graphick with a file like this, it will proceed to generate the data to graph (or load it from a cache if it has previously generated it) by running the program for every combination of inputs.

Each line of a Graphick file, besides blank lines and comments (beginning with a %), represent some kind of directive to the Graphick engine. The most important directive is the command directive. This begins a new graph based on the command following it.

The text after command is what is executed. In this case, it’s actually a short shell script which pipes a short mathematical expression into bc, which is just a built-in calculator program on Unix. Most of the time, you’ll probably write something more like command ./myApplication $variable.

There are a number of ‘aesthetic’ directives – title, x_label, y_label, series_label. The only complicated one is series_label, which I’ll go into later. For the rest, the text following the directive is simply put where you’d expect on the graph.

The varying and data directives are the most important. varying allows you to specify which variables to run the program for. If you have two variables, which each have six values, then the program will be run with every combination of them — thirty six times. Right now, only environment variables are supported. You write varying envvar <name> <values>. Values can either be a numeric sequence (as in the above example) or a set of values. For example, sequence 1 to 5 or vals 1 2 4 8 15.

Data is the other important one. Only output is supported, currently, which corresponds to lines of stdout. You can also filter for columns, by adding a selection after the directive – for example, data output column 2 separator ,. This would get the second comma-separated column.

Another type of directive, which isn’t featured in this example, is filtering. If you have a program which outputs lots of lines, and you only care about a certain subset of them, you can filter them. There is more detail on this in the repository README, but suffice to say you can filter for columns of output data to be either in or not in a set of data, which can be defined either as a sequence or a set of values. The columns you filter on need not be selected as data, which means you can filter on data which isn’t presented on the graph.

Graphick files can contain multiple graphs by just adding more command directives. Currently, there is no way to share directives between them, so properties like title need to be set for each graph. Here’s an example of two graphs in a single file:

command echo "6 * $num" | bc
title Multiples of six
output six.svg

varying envvar num sequence 1 to 10
data output

command echo "12 * $num" | bc
title Multiples of twelve
output twelve.svg

varying envvar num sequence 1 to 10
data output

As you can see, there’s no need to do anything except add a new command directive – Graphick automatically blocks lines to the most recent command.

As an example of generating more complicated graphs, the graph I featured at the start of this post, which was for my coursework, was generated as follows:

command bin/time_fourier_transform "hpce.ad5615.fast_fourier_transform_combined"
title Comparison of varying both types of parallelism in FFT combined
output results/fast_fourier_recursion_versus_iteration.svg
x_label n
y_label Time (s)
series_label Loop K %d, Recursion K %d

varying series envvar HPCE_FFT_LOOP_K vals 1 16
varying series envvar HPCE_FFT_RECURSION_K vals 1 16

data output column 4 separator ,
data output column 6 separator ,

As you can see, adding the series modifier allows you to turn the variable into data which is used to plot lines, rather than as part of the X/Y axis. There must always be two non-series data sources (where a data source is either a data directive or a varying directive), and the first one always represents the X axis (the second the Y axis). You can have any number of series data sources, which combine in all combinations to create lines. In this graph, both variables take the values one and sixteen, to create four lines in total. The series_label directive takes a format string. The n-th formatting template (both %d in the string) indicates to put the value used for the n-th series variable at that position in the label.

Finally, there is one more directive which is useful: postprocessing. Postprocessing directives allow you to run arbitrary Ruby code to process the resultant data before it is rendered on the graph. Currently, only postprocessing the y axis is supported, but it would be straightforward to add support for postprocessing the x axis and series data. The postprocessing attributes are fed three variables – x, y, and s. x and y are the corresponding values for each data point, and s is an array of all series values at that point, ordered by the definition in the file. For example, if you wanted to normalise the y axis by a certain value, you might do this:

postprocess_y y / 2

Or, you might want to divide it by x and add a constant:

postprocess_y y / x + 5

Imagine the postprocess_y directive to be y = and this syntax should be reasonably intuitive.

So, in summary, Graphick is a somewhat powerful tool for generating of program graphs. You can plot multiple columns of the output, or run the program multiple times to generate multiple outputs — or maybe even a combination of both! Graphick should handle what you throw at it reasonably how you’d expect.

If you come across this, and have feature requests, drop a GitHub issue on the repository, or a comment on this post, and I’ll definitely consider implementing it – especially if it seems like something which is widely useful.

Array length in high-level languages

While pondering writing a standard library for a language I’ve written for my RPG engine, I’ve been stuck considering a question I find pretty interesting – how can one allow a standard library of a language, and ideally only the standard library, to do special operations, like direct memory accesses? Specifically, in my instance, array length accesses, but I was hoping for a more general philosophy.

First, let’s consider the problem of getting the length of an array. In my language, as in many, arrays are a first-class construct, and their allocated length is written in the first word of the array in memory. As a result, getting the length of the array involves reading the word of memory pointed at by the array’s reference.

For example, suppose we had an array of five integers. This would occupy six words of memory; the initial set to the length (5), and then the five elements, all initialised to zero.

This is where the problem arises for my standard library: the language allows  users to index the array in a fairly typical manner – x[0] through x[5] – but these do not correspond directly to the memory accesses (in fact, the memory accesses are all shifted upwards by one to skip the length). As a result, it is not possible to access the array’s length directly through the language.

A trivial solution would be to implement some special new syntax – length x – which generates a direct memory read to the array’s address, hence returning the length. But that’s no fun – it involves making the parser more complicated, adding a special case to code generation, and causes what I would call a “break in immersion” when coding – it’s one less thing that is intuitive and natural to users, when they can do array.sort() but not array.length(). Taking this vein of thought further, we could instead parse it as normal, and hijack it during code generation – if we’re generating code for a method call on an array, then we don’t try generating a classic method call, but instead directly output memory access code.

This approach has many benefits, in that it’s trivial to implement, and doesn’t add any special cases to parsing (just code generation), or increase the mental load for users too much. Essentially, to end users, this is a fairly seamless approach, but it still leaves something to be desired – now some array logic, such as sorting, is encoded in higher level code, but some is just hard-coded into the compiler.

Maybe that’s an acceptable loss, but I was still interested in how other languages had solved this problem, so I looked into how Java and C# solved this issue.

Java

Java seems to solve this by having a dedicated JVM instruction called arraylength – this is along the lines of what I was saying above, where the compiler hijacks what syntactically looks like a field access. Syntactically, it is next to identical to your average field, but you can use reflection to prove that it’s not actually a field.

C#

C# seems to take a very similar approach to Java (unsurprisingly, given the similarity between the two), with a CIL instruction ldlen (this article http://mattwarren.org/2017/05/08/Arrays-and-the-CLR-a-Very-Special-Relationship/ is a goldmine for related information)

Summary

I really intended to look into quite a lot more languages – specifically Python, Ruby and Lua, but didn’t really have time. Digging through the Python compiler to find the answer was taking me quite a long time. If anyone stumbles upon this and happens to know how they handle it, I’d love a comment.

It does seem like the mainstream approach is just a special case in code generation, though. Personally, I was expecting an approach where verified library code would be able to hold lower-level code in it (like inline assembly in C) to avoid this, but this seems like quite an overkill feature in retrospect.

Film review – Hereditary

This post contains fairly mild spoilers for the film Hereditary. If you hate spoilers, look away!

As a pretty avid horror viewer, I’ve been pretty excited about the release of Hereditary for quite a while. Besides A Quiet Place, which I thought was fantastic, I haven’t seen a good horror film in the cinema for quite a long time, and the reviews I’d seen leading up to the release were all very positive.

That in mind, I went in with fairly high expectations, and coming out I can see pretty well why the reviews are so torn – most viewers seem fairly disappointed, and most critics are raving. The film was certainly great from an artistic perspective, and it told a story without spoon-feeding you what was going on, which is something the horror genre sorely lacks in my view – though I feel it’s quite a borderline horror film and does coast more towards being a mystery drama, which is what Google reports as its genre. Unfortunately, and I’m not sure if this hurt it for all viewers or just me, it was one I feel you have to put serious concentration into, but which doesn’t necessarily invite that throughout.

For example, there were scenes where very serious issues were brought up, but in which the dialog was less than stellar and delivered slightly ineffectually. In fairness, the most memorable of these occurred in a dream, so I do wonder if this was intentional on some level, however it caused the audience to laugh, which broke the tension the scene had been otherwise building.

Actually, audience reaction was a problem throughout. Perhaps it’s my fault for heading over at 7PM on a weekday night, but the cinema had groups of people who laughed during tense moments, made audible jokes, and imitated the characteristic tongue click of the daughter. This was pretty distracting and caused me to lose focus on moments which, in retrospect, were important to pick up on to get the full experience and piece together what was going on during the film. I’ve read a few reviews that mentioned similar problems, and think that in part this is enabled by the film – the aforementioned clicking is repeated just enough to make it memorable (intentionally, of course), and there were plenty of slow, dramatic pans of the camera enabling people to take full advantage of loudly copying this in the cinema.

I’ve focused on this so much because I don’t think other suspenseful films suffer from this problem nearly as badly. In A Quiet Place, for example, the tense scenes really are edge-of-your-seat moments, with a constant fear that any second could be the last for the characters — the precedent of serious danger for this is set up immediately at the beginning of the film, which ensures the audience has no misconception as to how dangerous and tense the setting is: there’s real danger from the start. In contrast, there were many tense scenes in¬†Hereditary which didn’t have a very strong payoff. I don’t think this, in and of itself, is a bad thing, but it can definitely lead to some of the tension being taken less seriously.


Small spoilers follow

Putting that aside, the story was pretty great, but I think the characters could’ve done with some work. The father was the clich√© supernatural skeptic who, despite having evidence placed directly in his lap, refuses to believe anything out of the ordinary is going on. The mother was the clich√© main character who alternates between seeming fairly put together and completely fallen apart. I think her character was one of the more believable and realistic, but with scenes leading me to not know for sure whether she was a protagonist or an antagonist, it was hard to empathise with her. Altogether, the cast had very few empathetic characters – the son and daughter both lacked a strong personality and seemed detached, the mother and father felt too predictable, and there weren’t enough recurring cast members to cast them against, besides a few students who aren’t fleshed out beyond the fact that they take drugs. That said, Joan, one of the supporting characters, who appears about halfway through the film, is played brilliantly and was one of the better characters I felt – it would’ve been great if she’d had more screentime.

To be clear, I don’t think these characters on their own are written badly or lazily – these clich√©s are such because they’re, I imagine, fairly realistic reactions to what the characters are going through – but it made too many of the suspenseful scenes too predictable and hence detracted from a lot of the film for me.


More major spoilers follow

Focusing on the story, there were a few issues which really stuck out to me. The first was that around mid-way through, the son accidentally kills his sister while frantically driving her to the hospital while apparently high on marijuana by swerving past a post when she has her head out the window. When they move onto the police investigation of this, the believability was ruined for me – because there… wasn’t one? They have a scene for the funeral, so time has definitely progressed sufficiently enough so that there should’ve been one, but apparently this film exists in a universe where manslaughter and reckless & intoxicated driving aren’t crimes worth investigating.

Another scene that bothered me, albeit to nowhere near the same degree, is when the mother’s sleeve lit on fire and not only did she only realise about ten to twenty seconds after the fire started, but she then attempted to put it out by ineffectually brushing part of the sleeve that wasn’t visibly on fire. Sure, it turns out that the fire was supernatural in some way and probably hence not extinguishable directly, but she didn’t know this at the time and I feel she reacted very unnaturally given that.

I feel like a film like this works best when it’s got you concentrating on the edge of your seat, and ruining immersion even for a second can throw the whole atmosphere out the window. Despite this, I think the film really did redeem itself with a fantastically presented story – it’s been a while since I’ve seen a horror which had a story you really had to pay attention for, and I really think it deserves props for that. I’ve certainly focused more on the negatives in this review, but I think that putting these aside it’s definitely worth watching, and one of the better films I’ve seen recently.

All things considered, if I had to rate it, I’d give it around a 4/5, but I would definitely recommend viewing it in a quieter showing or, in future, in your own home. I think that in a better setting, the film would be a lot more suspenseful and enjoyable, but that without a respectful audience, its tension is severely detracted from and a lot of the power of the film is lost.

RPG engine development blog 1 – Inventory

I thought I’d start posting updates on my RPG engine’s development, as it’s starting to near to a point where a simple game could actually be built on it, and it’s nice to discuss it somewhere. While many features have already been implemented – dialog, maps and movement, NPCs, among others – there’s still a lot to do; dialog is still not as dynamic as I’d like, the scripting language isn’t super great so far, combat doesn’t exist, items barely exist, there’s no way to change area, etc. If you’re interested, a more detailed documentation is available at it’s README on GitHub, or on my projects page.


My first goal was to get some form of visualising the inventory. I’d made some GUI system for the dialog, so I leveraged that existing system and improved it where necessary. There wasn’t a way of having a GUI element render at a fixed position so I added a container that just offsets its children by a fixed amount, which is used to get the 30 item slots aligning nicely.

Initial stages of inventory development – moving items

After this, I hooked it up to the player’s inventory, and started displaying items from the inventory in the slots. After that, some trivial changes made it so that you can click an item to pick it up and then click elsewhere to drop it. This is visible in the image above.

Next, I added some inventory slots for nearby items on the ground, modified the map and hooked it up to allow the player to drop items where they are standing and pick up close-by items.

Some entity mis-rendering is visible in this video – you can see them clipping on top of the tree and over each-other. Currently, entity rendering is done after the entire map is rendered, rather than being interlaced, which leads to this issue. However, dropping items and collecting them works correctly as one may expect.

Next, I made some minor polish to hide the mouse cursor and render held items in the correct place when the player is moving an item.


I’ll be updating this post as I progress through the inventory system, hopefully completing it sometime in the coming weeks.

IC Hack 17 – My first hackathon!

I recently attended IC Hack 17, Imperial College’s Department of Computing Society’s annual hackathon, where I and three others grouped together to create a game.

IC Hack Projection on Queen’s Tower – Picture from @ICHackUK (By Paul Balaji @thepaulbalaji)

The general atmosphere at the event was great – the team had done a fantastic job of making Imperial’s campus feel like an event centre, and gratuitous amounts of plugs (for laptops!), posters, signs, and even a gigantic IC Hack logo projected onto the resident Queen’s Tower gave the venue a strong feeling of organisation and, for the briefest of times, I managed to suspend the belief that I was just sitting in a cafeteria.

The catering deserves a second, more thorough mention – whenever snacks were even remotely near running out, a brand new bunch would appear almost like magic (but perhaps some credit should also be given to the volunteers); not to mention the Domino’s, the dinners provided, and the breakfast of sausages in buns. (Hotdogs?)

In terms of my hack, to show for our 30+ sleepless hours, my team and I created a zombie shooting game which we rather imaginatively called “Zombie Hack,” in Unity with C#. It’s a classic wave-based zombie game, where the waves progressively get bigger and tougher, but with a twist – after you save up enough money, you can buy walls and towers.

The walls and towers would completely block the zombies, but they’re not outsmarted that easily – with some help from Unity (and our resident Unity expert, Marek Beseda), we added pretty awesome path-finding so they’d find their way to you; a bit like a tower defence game!

Unfortunately, as we suspected, playtesting showed a bit of a flaw in our strategy: players would build up towers and create an impenetrable square defence, meaning zombies just walk up to your defences and hang around until they get killed by them.

Fighting a small wave

We initially decided to solve this by just making zombies damage structures, but a combination of zombies dying before they get close, and also some difficulties in making the pathfinding engine happy to walk into walls made this a difficult path. Instead, we created a new variant of zombies to complement the existing 7 (Stupid zombies who have low stats in general, Slow zombies which just walk a bit slowly, Normal zombies that are relatively balanced, Fast zombies who move faster than usual, Fighter zombies who move a tiny bit faster but do a lot more damage, tank zombies with tens of times more health, and boss zombies with loads of health, speed and damage), which got colloquially (and semi-officially) named kamikaze zombies. These zombies would spawn with other zombies in their wave, but had a special quirk: Unlike other zombies, who only chased the player, these zombies would raycast towards the player when they spawned.

 

If this raytrace hit the player, the zombies act normal, with the exception of them blowing up when they get close, immediately killing both themselves and the player. But, and the real quirk is here, if the raytrace hit a wall or tower, the zombies go into charge mode – they target that specific building the ray hit, 5x their speed, and charge at it. If they succeed (Which we found to be around 60-70% of the time), they spawn¬†2 more of this variant of zombie. This means that even structures made entirely of towers (Which we thought were slightly overpowered), eventually you’d get a few unlucky combinations of tank and kamikazes in a row, and before you know it there’s explosions on all sides of your base, and the towers quickly get overwhelmed.

A much bigger horde!

We thought after this that the game-play was fairly exciting and balanced, and we were fairly excited to demo it to other participants and the judges. Whilst we didn’t win anything, it was great to see everyone’s reactions and even better to see how long we managed to keep some of the volunteers occupied!

On the topic of winning, I think it’s definitely worth a mention to the winning team in our category (games) – Karma, a horror game. You can see it here¬†https://devpost.com/software/karma-lsyi81. The polish they managed to produce in just a weekend was incredible.

Another great submission I particularly liked, which unfortunately haven’t got a video or photographs on DevPost, called Emotional Rollercoaster, was a Kinect-based game where it would show you an expression name (such as disgust) and a photograph of someone (usually Trump or Clinton) making that expression, which you would then have to try to make. If you managed to convince the Microsoft API they then sent their data to, then you’d get some points – which were displayed in a pretty awesome fashion. They’d created a small roller-coaster in wood, with a car that drives forward a bit when you made the correct expression (varying based on how well) and went back if you didn’t. While their balancing seemed a little off – they had to hold the coaster car back to demo it sufficiently – that’s a pretty small issue and easy to fix, and it still looked pretty fun to play – I’m sure there’s a lot of untapped potential in analysing the expressions of players in a video game.

If you’d like to give my team’s game a go, you can clone it here¬†https://github.com/MarekBeseda/ICHack. It was built for Unity, so you’ll need that too. I’ve built it for Windows users here:¬†http://davies.me.uk/ZombieHackWin64.zip¬†Let me know if you beat my high score of 32 waves! ūüôā

High Scores:

Alberto Spina: 110 waves [I do not recommend attempting to beat this score if you want to achieve anything productive in your life]

vTables and runtime method resolution

My first second year university group project recently came to an end, in which we had to implement a fully functional compiler that generated machine code from a relatively simplistic procedural language. It had functions, nested expressions, and a strong dynamic typing system. In addition to typical stack-stored primitive types such as int, char, and bool, it also had two dynamically allocated types; arrays and pairs.

For the final milestone of the compiler, we had two weeks to implement our choice of an extension. We chose, fairly ambitiously, to implement a number of extensions, one of which was a fully working Object Oriented system which I took most of the responsibility for implementing.

In the end, our OO system ended up supporting three main types of user-defined types: structs, interfaces, and classes. Structs were relatively simplistic – little more than a fixed-size pieces of memory on the heap. Interfaces allowed you to declare methods which a class implementing the interface would have to implement, and classes allowed you to define methods and fields together.

Our classes supported extending a single other class, and implementing any number of interfaces. Semantically, this worked nearly identical to in Java Рin fact, our implementation ended up creating a language that was essentially a subset of Java. The implementation had what are often referred to as the three pillars of object oriented programming Рinheritance, encapsulation (public, private and protected, where private was class-only and protected was class-and-subclass-only), and polymorphism (run-time method dispatch).

This post is primarily about how I structured objects on the heap to allow them to support the run-time method dispatch – this was something we found difficult to research, as most resources about vTables and the like tend to be about C++ and rarely are about the low-level implementation. We found that C++ compilers would often optimise away the vTables, even with optimisations turned off, making it very difficult to analyse the ones it generated. As a result of these, I decided to write a summary of how I went about implementing it, in the hope that it is useful to others.

The system I ended up settling with results in a tiny amount of extra overhead for method calls on classes, but adds a significant overhead on calls to interfaces. This is something that I am sure is not done optimally, and as such I certainly do not wholeheartedly recommend this as a perfect way of laying out objects on the heap.

First, to consider why this is not a trivial problem, consider a basic example:

class A { void printAString() { print "A string" } }
class B extends A { void printAString() { print "Another string" } void anotherFunction() { print "Hello, World!" } }
void main() {
    A a = new B();
    a.printAString();
}

A na√Įve implementation of this would print out something that may be unexpected if you come from a background that leads you to expect the kind of behaviour that run-time dispatch allows, which is “A string”, rather than “Another string”. This comes about as the call to “a”¬† seems, to the compiler,¬†to be acting on an A, not a B. This could theoretically be avoided by intelligently (as far as a compiler is concerned) noticing that it’s instantiated as a B, but this causes some problems – what if you have a function accepting an A as a parameter, or you instantiate the A in two mutually exclusive branches differently (for example an “if” that sets it to a B, but an “else” that sets it to an A)? These problems, so far as I am aware, make it nigh-on impossible to implement it with¬†compile-time code analysis (Though, theoretically, you could generate a huge file that accounts for every branching possibility and expands them cleverly, this would certainly not be viable for large files).

So the way I solved this, and indeed, I believe, the way it is typically solved, is to create a table of methods for each object at compile-time, which is included in the assembly code generated (and hence the runtime, in some form). I implemented it in a fashion such that every object would have a table for itself, and an additional one for each object it “is” – that is, that it extends or implements. In the example above, the following might be generated:

A:
 o_A_printAString
B:
 o_B_printAString
 o_B_anotherFunction

Function labels here are named as “o_ImplementingType_functionName”. As you can see, the superclass function is in the same location in both tables. This means that, were a function calling a method on an object typed as A to use this as a layer of indirect to access A’s method, then it would be “tricked” into calling B’s version instead.

We then stored objects on the heap similarly to the instance of B that follows:

B's type identification number
B's vTable address
A's fields
B's fields
Interface information about B's implemented interfaces

To call a method on an instance of a class, you would have to navigate a word down from the object pointer, load the vTable pointer stored, use this to load the vTable, then add the offset of the method (this is known at compile time – for example, “printAString” will always be the zeroth method in the vTable). Then, load the value¬†stored here¬†into the program counter to jump to the method.

This is trickier for interfaces – since an object can implement many, we can’t just have¬†a fixed position in the vTable for each method. There is undoubtedly a better way to do this, but I chose to put a few pieces of data about them at the end of the object. For each interface, the following was stored:

The interface's type identification number
The address of a custom version of this interface's vTable with correctly overloaded methods

Additionally, at the end of an object, a zero is stored, which denotes the end of the object. Interface vTables are looked up by a simple, but fairly significant overhead-adding algorithm: look to the interface section with the interface offset information stored at the beginning of the object, and then repeatedly jump two words until either the interface identification number you want is found, or you reach a zero (The zero indicating an explicit cast has¬†failed – it should never occur on an implicit cast). Once found, jump a single word to get the vTable of the interface, then look down a fixed offset to find the method you are interested in calling. The overhead added by this is somewhere in the line of 6 instructions, plus potentially up to four times the total number of interfaces (depending on how far down the one you are interested in is in the list). This is clearly suboptimal, and an approach I considered is, when an object is cast to an interface, moving it’s¬†pointer down to the start of the relevant interface, but this would have been considerably more difficult to integrate into the codebase, as all implicit cast locations would need to be considered.

There is undoubtedly a better way to handle interfaces РI am very interested in finding this out; feel free to contact me if you know of a more optimal way or are wondering about anything I have explained above.

Ruby, a few months on

Sidenote: I’ve pretty much dumped my¬†Thing of the Month plans, because they proved to be too difficult to balance with university work and general life, where¬†I’m trying to branch out more and also trying to be more active in game development. That said, I’m still always trying to learn new things in the software engineering field, as I always have; but just in a less forced and artificial way, which I think does not work as well for me. I’m looking into Kotlin right now, and may put up a post about it sometime soon.

Since I posted about learning Ruby, I think I’m getting rather good at it. My most recent project was a hundred or so line integration test runner for a university compiler project written in Java. It executes the project with different test files, checking the output is as expected, all the while producing a nice output, which overwrites a line in the terminal to give an updating appearance without spamming it. It then proceeds to allow manual test verification, where you can see source code, and the error the compiler produced, to manually verify if the error produced looks sensible and understandable.

Soon, we realised that running such a complicated Java program over 250 times was¬†slow, so I looked into multithreading the script. I was pleasantly surprised by just how easy it was to integrate concurrency into my test runner, and it essentially consisted of two additions; wrapping the main logic in¬†Thread.new‘s block, and then storing that in a list, making up to 8 (Though this is variable by a command line parameter) threads, and waiting for the oldest one to finish before making a new one.

I’ve also started coding a little game akin to how I remember The Hobbit, a lovely game I used to play on a ZX Spectum emulator when I was pretty young. I emphasise how I remember it, because I recently watched a playthrough and my memories weren’t very reliable, but the main thing that my younger self found attractive was the method of input – you would type something, like¬†“light the fire,”¬†or¬†“kick Gandalf,” and it was like¬†magic – it seemed like it always had a well-written reaction to anything you could imagine, and I can remember being really interested in knowing how it worked. I think I’ve got a rather sensible approach to mimicking it, but I don’t think I could ever¬†hope¬†achieve the same kind of magic I felt playing that game. Wikipedia is fairly complimentary to it’s approach:

The parser was very advanced for the time and used a subset of English called Inglish.[5][6] When it was released most adventure games used simple verb-noun parsers (allowing for simple phrases like ‘get lamp’), but Inglish allowed one to type advanced sentences such as “ask Gandalf about the curious map then take sword and kill troll with it”. The parser was complex and intuitive, introducing pronouns, adverbs (“viciously attack the goblin”), punctuation and prepositions and allowing the player to interact with the game world in ways not previously possible.

https://en.wikipedia.org/wiki/The_Hobbit_(1982_video_game)

Anyone who’s interested in retro games, I’d heavily recommend it. It’s truly something I wouldn’t have thought would have existed at the time if I didn’t know about it.

All this is to say that, despite my initial doubts about how long it would last, I still really do love the language, and it’ll definitely be one of my first choices for future projects. I’d probably lean towards C#, C++ or Java for any game that requires better performance, but for most other things I think Ruby is going to indefinitely be one of my favourite choices.

Ruby, and why it quickly became my favourite language

I’d be¬†taken by surprise if I were told by someone that their favourite programming language is one that they’d never written more than a few¬†lines of code in, but that’s my situation right now. Due to a number¬†of unrelated circumstances,¬†I’ve been unable to install and use Ruby, but¬†I’ve been reading a book I obtained recently – Eloquent Ruby, by Russ Olsen – which I’ve found to be a fantastic read. I’m currently about halfway through, and am almost certainly going to pick up some other books in the series – Design Patterns in Ruby, also by Russ, Practical Object-Oriented Design in Ruby, and, (though this isn’t in the same series of books), when it’s released, Agile Web Development in Rails 5.

So far, I’ve learnt that Ruby seems to be exactly how I want a programming language to be – very consistent, intuitive, expressive, and clean. As a short history, I began programming in Lua. At the time, I was pretty young – either eight or nine – and didn’t quite grasp the fundamentals of how a programming language is written. I could write code, but it was a short while before I realised that essentially everything was an expression, which could be nested and used in funky ways – meaning I could write lines like (Not that I’m advocating this style, of course)

tab[index + 3] = get_variable(get_function()({ [“a”] = 5}));

Or to realise that the functions provided to me by my environment (At the time, Roblox), such as their event system, where you’d subscribe a listener¬†in a manner similar to:

object.event:connect(function() … code … end);

Were often something I could manufacture myself, by making a table (Vaguely similar to a hash and an array butchered and stitched together), with a function called “connect” that accepts a function as it’s parameter. These kinds of complex nested expressions and the use of closures and anonymous functions hadn’t really made much of an impression on me, and the higher-level constructs I was using merely felt like a black magic that just worked. Once I realised this, I gradually drifted to feeling like Lua wasn’t ideal for many things – both in terms of speed, and a limited syntax (allowing for some incredible OO systems such as MiddleClass, but still falling short of true OO languages).

I then transitioned to writing object oriented code with C# and Java, languages that, of course, have methods coupled with data, so I could write code that kept functionality with it’s associated data; something that felt sensible and correct. I¬†still wasn’t completely satisfied, though. While “everything” (for the most part)¬†was an object, there were still things that I felt I should be able to do that I couldn’t. Primitives, for example, are essentially special cases, and although autoboxing is nice, it’s a bit clunky. While C# hides the detail better than Java (Array<int>, anyone?),¬†it’s still got its own problems.

Another two languages I’ve used widely are PHP (boo!) and Python. With these, I loved how it was object oriented, but you could flexibly pass objects around. I still prefer static typing, and I do think it’s often more optimal for larger projects (if only so your editor can be a bit¬†lot more intelligent; there’s sometimes type annotations, but they always seemed like a poor man’s static typing to me), but I think dynamic typing can, when used well, be a great convenience.

I had a placement this summer at Netcraft, an internet services company in Bath. It introduced me to Perl, a language which I’d heard about but never really been interested in – my first year of university was my first serious venture into the Unix world, and I’d spent most of that trying to hide from calculus and trigonometry, while trying to improve my ability with Haskell, a language we used in our first term.

At first, I really didn’t like Perl. I’m still not overly fussed, but it managed to persuade me and I ended up writing a few scripts in it at home. I find a few things about it rather annoying – it’s inconsistent, too many things have unexpected side effects or sets special variables, there’s too many ways of expressing the same idea, the object system is not just¬†unintuitive, but feels completely hacked on (Though, in fairness, Moose fixes this, I disagree with the principle that you should have to use a library for something like this). I find it ridiculous that¬†it took decades to add method signatures, and even now they’re considered experimental! I don’t like how I have to think for ages before I can even begin to get the length of an array in a data structure, and when I do, I end up producing code that looks a bit like this:

scalar @{$structure->[{[@@{$}->{a}->$@{ } ]]] ) }->{key}->[3]->{3}}

I jest, but this is certainly how it feels. Even if the speed of thinking comes with practice, it’s still a bit gross how many extra symbols I need to access children of arrayrefs and hashrefs, compared to other languages where you can just nest these kinds of structures effortlessly without thought. Even some of the dedicated Perl community seems to agree here – Perl 6 eliminates a lot of the variation in the symbols to make them at the very least more consistent.

But there are also a lot of things I love about it, and wish more other languages I use had: statement modifiers are a big one. For context, a statement modifier lets you suffix any statement with a little expression like:

print “hello” if should_print_hello;

For some reason, this seems infinitely more elegant than a faux statement modifier in C#, which would be:

if (should_print_hello) print “hello”;

Realistically, they’re similar, but the latter feels more bulky, doesn’t read as well, and I’m not particularly keen on¬†if statements with omitted curly braces.

All that said, whenever I have a basic task to do, my first thought is “Hey, I could write a 10 line Perl script to do this!”. The Perl community isn’t lying when it says it makes the “easy things easy” – it really does. This is something I’ve heard almost unanimously from all of my intern colleagues; most of us seem to harbour some level¬†of disdain for Perl, but still want to use it a lot, because it’s just that damn easy. It’s like an infection that grows on you, presumably eventually turning you into a fully fledged Perl monk before you go to live in a monastery and dedicate your life to answering questions on http://perlmonks.org/.

Ruby is a language I’ve wanted to learn ever since I got pretty good at Lua and decided to move to greener pastures. It was for a completely superficial reason: I thought it’s website was really well designed. Looking at http://lua.org/ and comparing it to http://ruby-lang.org/, you can probably see why I thought this.

For some reason, I slowly became under the impression that I didn’t like the look of Ruby, despite never taking a decent look, and avoided it. Until I decided to learn a new thing, and made it Ruby, after seeing a colleague writing some Rails code.

Soon after looking into it, I realised something: Ruby seemed to be just as good a language for writing quick scripts to solve problems as Perl, a trivially superior one for web development thanks to Rails and Sinatra, and seemed to take all the nice features, like statement modifiers, but wrap it in with a¬†very consistent object oriented approach, where¬†literally everything behaves like an object. “hello”.upcase? Well, “HELLO”, of course – no syntax errors to be found here!

I love the¬†loop structures, and how enumerating is handled so elegantly. I love how there’s a culture of writing DSLs (Though the term is used very loosely) to do all sorts of things, from testing, to make tools.¬†Everything I’ve read about the language makes me itch to rewrite all of my code-bases in it, but I think I’ll just settle for using it for personal systems administration and future website development.

No doubt this is just some kind of initial language infatuation, and it may pass, but right now, Ruby is my favourite language, and I’ve yet to even use it properly.

Thing of the Month 1: Ruby on Rails

I’m trying to learn a new¬†thing¬†(language or framework) every month. Each time,¬†I’d like to begin by answering¬†What, Why, Prior Experience, What, Why, and Compromises.¬†Respectively, those are: What language and why, previous experience I have that I think is relevant, what do I hope to build in the process and why, and any compromises I expect I may have to make to succeed. I’m open to varying what I plan to build through the month if I decide what I chose was too optimistic (or even not optimistic enough), or if I have a particularly busy month and don’t find enough time to learn my Thing of the Month.


Month 1: September 2016

What?

I’m going to try to learn¬†two things: Ruby and Rails. I’m cheating a little, because it’s technically still August, but I think I can forgive myself.

Why?

I’ve seen a lot of Ruby and always wanted to give it a try, and¬†I think that it’s important for me to vary my server-side technologies more, as I’ve not used ASP.NET for a long time, and so am mostly limited to PHP, which is something I would like to change going forward.

Prior Experience?

I’ve written MVC code on top of ASP.NET and CakePHP before; in fact, my current main web project, Gamer-Island, is written on top of CakePHP. This should make learning the Rails aspect much simpler. A strong background in scripting languages should assist in learning Ruby. Overall, I think prior experience will make it easier, but certainly not easy, to gain a degree of fluency in Ruby on Rails.

What?

I’d like to remake an old project of mine, which was a Minecraft server administration panel. Servers would get their own subdomain, and it’d monitor users, chat, and logs, which could then be accessed by staff of the server. Wish this, they could then, for example, issue time bans on users and associate the bans with chat messages, meaning server owners and admins can keep track of bans to ensure they are all fair.

Why?

I used to run a Minecraft server (running the Tekkit modpack). I stopped (due to a change in the EULA not allowing donations in exchange for in-game items on servers, which previously made the server self-sustaining), but I still think there is potential in this idea. I found, during my time running it, that it was difficult to find ‘staff’ who could be trusted to be cool-headed and fair in all situations. Initially, the panel was to ensure my own server’s staff had to provide evidence with their actions, but soon I realised other servers would likely be suffering the same issues. Additionally, Minecraft server plugins all tend to log in their own funky ways. If you don’t capture their messages at run-time, and parse them into a standard form, then the information is dumped in a log file full of a jumble of all different logging formats.

Additionally, I think this provides an ample challenge, as it will require well configured routing, lots of AJAX while maintaining a secure front against CSRF attacks, and configurable levels of access.

Compromises?

I suspect I will have to compromise on the core of the application: I do not believe I will have time to write a Java plugin to hook into servers and securely communicate with the admin panel, uploading user chat, logged information, and others. Instead, I will focus on writing the Ruby end, which would be the web front to the data, and the API for uploading data.

Then, in a future Thing of the Month, I have the option of writing a Minecraft server plugin in Java to upload this data, and create a fully-functioning product.