In Search of a Brain
I am not an expert in machine learning by any means. I do not understand the mathematics behind a neural network, much less how to implement one. My preferred method of teaching computers what to do is by writing code, not training machine learning models. Yet, strangely, today I find myself in need of a machine learning model.
Last week I dreamed up an idea for a videogame. Visually, it’s similar to slither.io, but it’s not a massively multiplayer game. Instead of every snake being controlled by a human player, there is only one controlled by a human. The rest are computer-controlled. Accordingly, the objective of the game is changed: Instead of trying to survive the longest and not run into another snake and die, the objective for the human player is to train (or trick) the AI snakes to do certain tasks, getting progressively more advanced and difficult in each subsequent level. I called the game “Parasite” because instead of the environment determining what the AIs learn, the player observes the snakes and makes that decision – essentially controlling the minds of the snakes, much like a mind-controlling parasite. The user would also be able to muck around with each snake’s proprioception and perception coefficients, changing how the snake responds to its environment.
I quickly designed off a minimalistic user interface using my preferred web page framework (vanilla HTML and CSS, yum) and then got to work on the snakes that would exist in the game’s environment. I based the snakes off of an old Processing demo I made in 8th grade. The snakes in that demo were extremely stupid – unlike a proper Snake game or something like like slither.io, the snakes in the demo could cross back over themselves and each other with no ill effects and the tail followed the exact same path the head. (It was implemented with a ring-buffer of previous head positions that were used to draw short, heavy line segments that made up the snake’s body.) This was not something I wanted for this game and so I started thinking about how to implement something more advanced.
After a lot of head-scratching and reviewing my old trigonometry and geometry notes, I got bored and went on to YouTube and eventually landed on the Brick Experiment Channel’s 100-wheel LEGO car. It sure ended up looking like a snake. And, at 3:34 I realized I had struck upon something good: the tail of the car snake pulls inward around bends – exactly the behavior I wanted for the snakes in my game. The implementation was dead simple: since I’m trying to simulate or mimic a physical phenomenon, I can just use a physics engine! So I downloaded Matter.js into my project and made each snake out of a chain of circles. That worked extremely well and after rigging up a bit of dummy code to make the snakes move, they looked very lifelike.
Now I moved on to part 2 of the snakes’ code: the machine-learning back box that is the AI brain of each snake. After a bit of googling around I learned that the approach I was going for is known as deep reinforcement learning, where the machine-learning model (often called the “actor”) is given information about the system it is trying to learn, and then produces some action to be performed. An external process (called the “critic”), which in Parasite is the human player/trainer, observes the result of the action and produces a reward value. The model then learns to produce actions that lead to higher reward values.
Because each snake is going to have its own AI model (they’re not going have shared “hive mind” brains), I started to look for a small, fast, and lightweight Javascript reinforcement-learning library. I initially found Andrej Karpathy’s REINFORCEjs, but none of the models fit the snakes in Parasite very well.
The snakes’ input is a large high-dimensional vector containing information about the snake itself (energy level, length, etc) as well as the environment around it (position of food particles close to the snake, distance to the wall, etc). The snakes’ outputs, controlling what the snake does, is a bit different. The output state space is partly continuous, partly discrete – for example, two of the continuous outputs are the angle and length of the snake’s tongue. For the snake to be able to eat, the tongue has to physically overlap with a food particle, and the AI has to also turn on another discrete (boolean) output from “full” to “hungry”. Only if the snake is hungry and the tongue is touching a food particle will the snake gain energy.
The REINFORCEjs model that best fit this situation was the DQNAgent
(I have no idea what DQN stands for), and you might be convinced that it does indeed work with continuous outputs by looking at the Waterworld demo that uses it. In that demo the agent is a small ball in a box filled with other small balls that all bounce around randomly, and the agent is fed information about the position and velocity of of balls close to the agent as well as the agent’s own velocity, and is rewarded positively for contacting red balls and negatively for contacting green balls. The agent then learns to move itself around the box, eating all the red balls while avoiding the green ones.
Turns out the Waterworld agent still had discrete outputs: the four actions were to apply force to the agent up, down, left, or right. The reason it worked is the same reason that electronic pulse-width modulation, which only ever assumes two possible values (on or off), can dim a light to any possible brightness between off and fully on. By rapidly modulating the proportion of “up” outputs to “down” outputs, and “left” to “right”, the Waterworld agent is able to simulate varying levels of force in any direction, due to the fact that it has mass and can’t move instantly.
Unfortunately in the case of the snakes’ tongues, a modulation scheme just won’t work. Even if the proportion of $-\frac{\pi}{2}$ to $\frac{\pi}{2}$ outputs is correct, if the food particle is somewhere in the middle, the snake’s tongue will simply dance all around it and never actually touch the food!
My best kludge for this is to take all of the outputs that have to be continuous and make them integrating processes. In the case of the snakes’ tongues, that would mean an output action of “tongue right” would not instantaneously move the tongue to an angle of $\frac{\pi}{2}$, instead it would turn the tongue by a small angle relative to where it already is. I’m not sure how well this would work, but considering control schemes like this are already common (e.g. a sous-vide cooker that can only turn its heating element on or off but can keep any set temperature), it’s worth a shot.
The next hard part is designing all the levels…
Related Posts
- So far ahead, yet so far behind
- Boy, Have I Been...
- Perhaps It Was Too Complicated
- Zero-Thickness Tree
- Continuations and the thunk queue