Going Multi-Node with Elixir

Posted on 17 May 2018 by Eric Oestrich

From the last update of ExVenture, ExVenture can now be configured to run in a multiple node configuration. In this post I’ll show you the basics of how I did that.

What is ExVenture?

ExVenture is a multiplayer text-based game, a Multi-User Dungeon (MUD), server.

You can see more information about ExVenture at exventure.org and GitHub. You can see it running on MidMUD. There is also a public Trello board.

Starting point

ExVenture was heavily geared towards running as a single node to start out. I used Registry very heavily, which only works for a single node. I also have a few local cache processes that out of the gate wouldn’t work well when spanning multiple nodes.

My first thoughts where about splitting up the app into an umbrella app and figuring out how to boot the web on one node, the telnet connection on another, etc. I started talking about this in the MUD Coders Guild, and I got a question of “why?”

This was a great question and got me thinking about what else I could do.

I eventually settled on trying to get the same application booting on all nodes and a leader node starting the processes that can only exist once in the cluster, e.g. the world.

Clustering

First up was connecting up nodes in an automated fashion. Since my app was heavily geared towards being a single node, this should be fine. Each node wouldn’t talk to each other and the entire world would be spun up on each node.

This was extremely simple with libcluster. It was as simple as installing the hex package and adding this to my configuration files:

config :libcluster,
  topologies: [
    local: [
      strategy: Cluster.Strategy.Epmd,
      config: [hosts: [:"world1@host", :"world2@host"]]
    ]
  ]

Then when starting my app I switched to booting with iex to get the sname flag available.

iex --sname world1 -S mix
iex --sname world2 -S mix

Picking a Leader

Once the world was clustered, I started on picking a leader. I had heard about Raft before, but never really looked into it.

If you are interested in clustering at all, I’d highly recommend giving the paper a read. It is very simple to follow along and understand. Which was the main point of creating Raft, a simple to understand consensus algorithm.

For ExVenture, I went with implementing my own Raft module because I only wanted the leader election part of Raft. I don’t need (at least yet!) the rest of Raft.

You can see that in the Raft module. There is a lot to this module that doesn’t need to be covered here, but it boils down to the group picks a leader and that leader calls the subscriptions that care about who was leader.

A leader is picked

Once a leader is picked, the Game.World.Master process uses pg2 to find all of the other Game.World.Master processes and sees what zones are alive in the cluster.

After finding out what nodes are alive it spins up the zones not online across the cluster, using a simple rebalacing algorithm.

Global process registry

I also switched to using the global process registry as part of this. I started looking at swarm but it seemed to be something different than what I was looking for.

I will most likely end up changing this in the future but switching {:via, Registry, {Game.NPC, id}} to {:global, {Game.NPC, id}} was an extremely simple change that worked. So I went with it and haven’t looked back.

Messages spanning the cluster

The game was now officially multi-node and you could play on either iex servers and see other players and NPCs on the other one.

There was only one final step and that was setting up pg2 groups for each of my caches and my communication layer.

This is very simple and can be done as follows in the init function and a slight change to casting to your GenServers.

@key :items

def init(_) do
  :ok = :pg2.create(@key)
  :ok = :pg2.join(@key, self())

  #...
end

def insert(item) do
  members = :pg2.get_members(@key)

  Enum.map(members, fn member ->
    GenServer.call(member, {:insert, item})
  end)
end

Next Steps

With this up and running I was able to get MidMUD in a multi-node (3 world servers) set up for production. It has been working out pretty well so far and only once I found a large bug (of not being able to communicate across nodes in channels.)

I will post more updates as I continue enhancing the distributed nature of ExVenture.

You can see everything described here in these two pull requests, #37 and #39.

comments powered by Disqus
Creative Commons License
This site's content is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License unless otherwise specified. Code on this site is licensed under the MIT License unless otherwise specified.