I was watching The Hitchhiker’s Guide to the Unexpected (YouTube link) by Fred Hebert and in that there is a neat exercise of writing out your supervision tree on a whiteboard and seeing how things would fail. With this you could better determine what happens to your application as things go wrong.
I decided this would be a good exercise to do on ExVenture. This is a fairly long post that goes through the full supervision tree for ExVenture.
You can see ExVenture in action on MidMUD.
Supervision Tree
This is the supervision tree that ExVenture ships with now. There are roughly 3 levels in the photo.
First Level
This is the top level directly underneath the application. It contains, in start up order:
Data.Repo
- the Ecto repoWeb.Supervisor
- the Phoenix supervisorGame.Registries
- a collection ofRegistry
sGame.Supervisor
- a the top level supervisor of the game- A ranch listener is also started at this level, but it spins off into the ranch application
At this level the supervision strategy is rest_for_one
. This is fine because if the Repo dies the rest of the app should be rebooted, something went wrong. As we’ll find later on the loads process with an ID to fetch from the database to ensure a clean state is fetched on process restarts (if something crashes.)
Second Level - Web.Supervisor
This supervisor is mostly sitting on top of the Phoenix Endpoint
along with a few process monitors for the TelnetChannel
and a Cachex cache. It is handled by a one_for_one
strategy. This is fine as none of them are really connected to the other, this supervision level is mostly to break sections up for my benefit.
Second Level - Game.Supervisor
This supervisor contains the “world” along with supporting processes. In start up order:
Game.Config
- an agent that caches game configurationGame.Caches
- a supervisor of Cachex caches along with GenServer processes that are related to cachingGame.Server
- a tiny process that used to do more, but now keeps player telemetry up to dateGame.Session.Supervisor
- the supervisor for player sessionsGame.Channel
- a gen server that tracks player sessions and which channels they are joined to, inspired by Phoenix ChannelsGame.World
- the supervisor that supervises the game world, see more belowGame.Insight
- a small GenServer that tracks bad command parsingGame.Help.Agent
- an agent that load internal game help
This level has one_for_one
as its strategy. At this level most sub-trees are fairly separate and can handle rebooting (to my knowledge) without interfering with other sub-trees.
Third Level - Game.World
This is the heart of the app. It contains everything the user interacts with in the game. Its direct children are Zone.Supervisor
supervisors. This level has a strategy of one_for_one
. This is fine because each zone is self contained and can reboot on its own.
Zone.Supervisor
This level has in startup order:
Game.Zone
- the zone’s state, which tracks what rooms/npcs/shops are onlineGame.Room.Supervisor
- A supervisor of rooms that belong to the zoneGame.NPC.Supervisor
- A supervisor of NPCs that belong to the zoneGame.Shop.Supervisor
- A supervisor of shops that belong to the zone
The reboot here is one_for_all
. If any of these processes die something bad happened and the whole zone should restart. To further go into this, the Zone process tracks processes inside the sibling supervisors and if that dies then the rest should go as well. If the supervisors at this level died something really bad beneath them happened and the rest should be restarted.
When the sibling supervisors start they are started with the zone id. With this they figure out which children should be loaded at boot. Tese supervisors start processes as transient
because they may be terminated normally and should not be rebooted, e.g. if someone deletes a spawner for an NPC then the process will be terminated cleanly.
The sibling supervisors are also a one_for_one
strategy. This is fine as each process under them are fairly self contained and separated mostly for programmer benefit, this could probably be a big bag of processes directly under the Zone.Supervisor
.
Take Aways
While doing this I was able to rework some of the tree. I pushed Game.Config
further up the tree since that seems important. I also pushed more GenServers into the Cache
sub-tree since they were similar.
One of the other reasons I did this was to figure out how to split up the app on separate nodes. This exercise taught me that it’s currently not as easy as I was hoping. I figure the Web
tree could be pulled off without doing much of anything, yet I found out that the Game
tree is connected in a few spots that prevent it from immediately being pulled off. This would have been an annoying lesson to learn as I did that, now I know before hand and can fix the problems I found first.
In going multi-node, each of the first level would be good as a separate OTP app in an umbrella app. I had previously started with that but the application was too new for that to be useful. If I split them up again, I can boot nodes that are just for web, just for telnet connections, or just the world. I think this is a next step for going multinode.
I hope this was useful reading through seeing why I picked what I did and also finding out I had a few things ordered wrong. I hope you go through your own apps and try out a similar exercise on them.