In the week or two leading up to ElixirConf I started trying to see how far I could push ExVenture with concurrent players. These are the three fairly simple tweaks I took to go from maxing out at 230 concurrent players to maxing out at 3500 concurrent players on the same machine.
Tests were performed on a desktop with a Intel Core i7-6700K and 64GB of RAM. I used VentureBot to connect to my local instance going across the local network, including wifi. I used a copy of MidMUD as the world.
Single process overloaded by messages
The first time I ran VentureBot pointed at my local development version of ExVenture, I was able to get to around 230 players before room processes started falling over. The room process is a router of sorts for players in the same place of the world.
Anything that happens in a room by a player is broadcast to every other player in the room. Player processes also call the room to get the current state of it before acting on it through commands. The combination of these two was causing the room process to not be able to keep up with the level of messages.
Enter a side process event bus. Any notifications that the room process wants to do is pushed into this side process. This unblocks the room process from performing the relatively slow notification of characters in the room.
This pushed concurrent players up to about 600. You can see this in Pull Request #72.
Single process overloaded by state size
ExVenture has a player session registry that keeps track of connected players. This is very similar to the Elixir Registry, but is a custom process on each node. The next thing that could not keep up with players was this.
When a player connects, their full user, class, skills, race, etc are all preloaded from the database. This is then pushed into the session registry. The session registry process was being slowed down by the state it held.
Almost all of this data was not required by other characters in the world. The answer then was to massively cut down what was stored in the session registry. I created a new Character.Simple struct that the session actually stored.
This pushed concurrent players up to about 1200. You can see this in Pull Request #73.
Processes overloaded by inbox size
At this point something I really was not expecting to happen happened. I ran out of ram. Mind you this is on a desktop with 64GB of it.
The reason was almost for exactly the same reason as the previous tweak. I had fixed the registry, but the player gets pushed around a lot in messages and other process state (like the room process.)
MidMUD has about 250 rooms, which meant that there was an average of 5 players per room, but every player spawns in the same room on start. This meant that a few rooms were getting up to 30-40 players in the same room. Each action a player took was then pushed to 40 other processes. Because the player struct was preloaded to the brim it was making a lot of copies of that data and blowing my RAM.
This isn’t a memory leak per se, but it wasn’t good.
The way around this then was to use the same simple character struct for all messages. After adding this ram usage dropped substantially. 1200 players took about 1GB of ram vs the previous 50GB.
This is the final guage I was able to get from Grafana on concurrent players.
For reference, most MUDs these days are over the moon with anything above 20, the top one getting ~800 players. ExVenture pushed into MMO territory with these changes.
The current reason ExVenture cannot take more is because the session registry can’t keep up with new players connecting which fetches the full list to see if they are already in the game. To get around this I would change up the session registry to act more like the Elixir Registry and create a worker pool to manage incoming calls.
But, there’s almost no reason to try and fix this. 3500 real players is a very, very far off problem in the real world. I’ll settle for 15x performance for now.
I hope these small tweaks are actionable in your own Elixir applications, especially message size between processes.
The last month of ExVenture kicked back in action now that Gossip is mostly stable. I tried to stick to a general theme of bug fixes and world building additions though. I wanted to get MidMUD read in advance of ElixirConf.
Just because I let Gossip “sit”, doesn’t mean I didn’t work on it! There were a few minor features that got touched on.
The home page now features a random connected game to be highlighted. Any game that is connected and has a home page url might show up on the front page.
Gossip also got a cleaned up README since a lot of people mistakenly thought you needed to install NodeJS to connect. This was on the README as a requirement, but as a requirement to start the server itself.
There is a media page that contains a footer you can add to your homepage if you’re apart of the network and want to show it off.
Gossip Elixir Client
The Elixir client got some updates as well. There were a few bugs hanging out related to the player list. If a game went offline completely with players attached, they would never go away from your games list. The list gets sweeped regularly for games that haven’t been seen in a while.
There were a few multi-node bugs that got cleaned out in preparation for my ElixirConf talk. I wanted to make sure that this was very stable before getting on stage to talk about it. The raft leader selection had a few small bugs in it. If a node went offline zone rebalancing wasn’t actually happening (which is bad.)
Everytime a node came back online, a new election was triggered, no matter what. This one wasn’t a horrible bug per se, but having the leader force itself should have been enough.
The world leader also now checks to see if all of the zones are online now periodically. I had noticed once that MidMUD had a zone down after a node died. I don’t really know what happened, but scanning for all zones being online shouldn’t be a bad thing.
I also found a slight issue with using :pg2 as a world leader list. Occasionally :pg2 hadn’t caught up with the node dying before the world leader was trying to rebalance. Which resulted in the leader calling a dead node. I got around this by first filtering out the members list against the connected node list.
Performance Enhancements
I was curious about the possible performance of MidMUD during ElixirConf. Of course it will be a huge hit and everyone will be signing into it, so I wanted to make sure it could stand up to the brunt of that.
It’s a good thing I did look into this as before any changes ExVenture completely fell over at about 230 connected players. After a few tweaks I was able to push this to 1200 connected players before it fell over due to RAM usage (how crazy is that.)
I will expand upon the changes I did in a future blog post. I don’t want to give it all away in this!
Until then, you can checkout ou Venture Bot which is the bot framework I set up to connect as a player to ExVenture.
General UX Improvements
No one could figure out how to complete a quest, which is my bad. I updated the hint system to let the player know whenever a quest is completable and also displays exactly how to complete that quest.
HINT: You have a quest that you can complete, use {command}quest complete [quest_id]{/command}. See {command}help quests{/command} for more information.
In addition to completing quests, picking them up was also a tad confusing. The hint message for NPCs talking to you was also updated. This will display only once per sign in.
HINT: You got a tell from an NPC. This might be the lead up to a quest. Please read carefully what they are asking about and you can {command}reply{/command} with your response.
With any luck these two hint additions should make questing much more accessible.
Templating Enhancements
The templating system got a decent set of additions. There is now a context struct that is very similar to a Phoenix conn struct. You can pipe it through a set of assigns and then finally “render” it through a template string. This makes it nicer to read in the code and I think removes some complications in calling template.
A few extra variables are available in room descriptions, such as the name of the room and the zone.
Every NPC, Item, Room, and Zone are also available to template in a lot of game strings via a new global resource template system. I was referring to a lot of these resources in quests and room descriptions on MidMUD and finally got annoyed enough while constantly renaming things to fix the problem.
Now you can refer to any of those 4 resources with [[room:1]] in game strings (the admin will say if it’s available) and the templating system will find room 1 and print its name.
Room descriptions can also template room features in specific spots now. The feature key is available as a template item. This lets you weave in the features into specific spots of the description. All features will still be appended to the end of the description if none are used inside the text.
Script Additions
The final “big” thing for the month was an addition to quest scripts. You can now mark a line as triggering to another line with a delay. This lets you break up huge blocks of text with multiple tells, eventually leading to a line that has listeners or triggers a quest.
This was fun to add as the outcome was much more realistic in chatting with an NPC.
Smaller Tweaks
Global room features
Telnet login link is a link
Admin displays inline help for quest steps
Runs could parse poorly and cause a crash
Duplicate users in the who list due to issues in Session.Registry
Players going AFK at the login prompt could show as signed in
Channel command by itself was bugged
Gossip: API to view currently connected games
Multiple message of the days and after sign in messages
Updating many depdencies, we’re on Elixir 1.7
Disable skills so they don’t display or are usable
Strip colors from notification text
Add any flag to users, to add “Patron” text to patrons
Older saves did not migrate cleanly when some stats were no longer defaulting
Social Updates
This was a pretty big month for ExVenture and Gossip. The cowboy websockets blog post was picked up a few places and a lot of people found out about both. The discord server has gotten a few new people, some of which are new to Elixir and looking forward to learning it through ExVenture which is great!
I also have stickers if anyone spots me at ElixirConf next week.
Next Month
With ElixirConf over after the first week of September, I might get back to bigger features. I would like to split characters apart from users, which is a huge refactor. But a refactor that has been waiting for a while. Some of this move was started with the tweaks that pushed ExVenture to 1200 players.
For my side project Gossip I wanted to have a websocket connection for non-Phoenix connections. I did this by going straight to Cowboy and using a handler at that level. This explains how I did this for Gossip.
Gossip is a cross game chat service for MUDs, check it out.
This is a snipped version of the full file, but the basics are here. This shows the websocket upgrading, a cowboy and websockets requirement. The init function upgrades to websockets and the websocket_init function is called after the upgrade.
The other function shown is when a new message is received. The message is JSON (or should be), so it gets parsed and then run through an implementation module elsewhere. Depending on the response from that submodule, different responses will get send back.
There are a few other cool things in the real module, so I encourage you to check it out.
Ping/Pong
One thing I had to add that I thought was included as part of the cowboy handler was a pong response to a client side ping. This actually crashed the websocket process a few times so I needed to add this as per the websocket spec:
Since we’re using a lower level websocket (than Phoenix) we have to manually set up the cowboy dispatcher. This configuration shows the cowboy websocket handler along with a separate Phoenix channel, since I want to have both options.
If you go this route, you need to manually specify any Phoenix channels from here on out.
Setting up your own lower level websocket in Elixir/Phoenix turned out to be pretty simple. I am happy I went this route so I didn’t have to worry about forcing the higher level Phoenix channels protocol on top of external clients.
The last month of ExVenture has been pretty sparse compared to previous months. This has been due to me starting a new side service called Gossip. Gossip is a cross game chat network for MUDs.
Gossip has been rolling around in my head for a while while working on ExVenture, but there has always been more pressing matters. About a month ago we were talking on the MUD Coders Guild about cross game chat and that inspired me to kick this off.
Gossip is similar to the I3 Network except it uses more standardized technologies. Secure WebSockets are the transport layer and all events are in JSON. You can see the documentation on Gossip.
ExVenture Remote Channels
The first big feature of Gossip is remote channels. You can flag a channel as a Gossip channel and it will try to send all communications up to the network.
ExVenture Remote Player Status
When your game is configured for Gossip, all signed in players are pushed up to Gossip. This lets other connected games see your players sign in and out. Right now all notifications are displayed to users on ExVenture. This is an optional but highly suggested feature for games that are not based on ExVenture.
This should help make your game feel more alive by letting your small pool (maybe just 1!) of players see others on the network.
The local who list will also display remote players, so users can see who is on the network.
ExVenture Remote Tells
The final big feature for Gossip right now is remote tells. If you’re syncing your local players up to Gossip (and ExVenture does) then remote games and send tells to your local players. ExVenture lets you initiate tells and also handles reply for remote players.
This degrades nicely for remote games that do not support tells. As part of connecting a Gossip client says what features they support. For games that are not built on ExVenture remote tells are optional, but highly suggested.
Gossip Games List
I definitely suggest you check out the Gossip Games list. So far we have 5 games connected, more seem to be checking out Gossip every day!
Gossip Clients
The Gossip client from ExVenture has been pulled out into it’s own Elixir hex package. If you’re developing your own Elixir MUD (and there are a lot of you out there) then come join the fun and add the package.
Right now you do need to implement the full set of callbacks for all of the features of Gossip, but I would like to let you dictate which features your game supports and slowly add them in.
There is also a Ranvier bundle that supports remote channels and player status updates.
Next Month
Next month I hope to start back on ExVenture more and leave Gossip to sit for a bit. I think Gossip has a good enough feature set to let other games implement what’s there and get some more feedback from the community.
I got to try out a new OTP Supervision tree pattern in ExVenture, that I am going to call a tether.
Background
I started a new service, named Gossip, that ExVenture creates a persistent websocket connection to.
When this service dies or is not alive when the ExVenture server starts, the websocket connection will crash or refuse to start entirely. It crashes immediately and then cannot reconnect. This ripples up the supervision tree taking the entire application down in short order.
To get around this, I added a layer of supervision trees for just that socket process. My tree looks like this when fully booted:
The tether supervision starts with no childspecs. After boot the Monitor (or the cluster leader) will start the Socket inside the tether. When the socket process dies, it will eventually crash the tether supervisor causing it to restart.
When the tether supervisor restarts, it will restart with no children breaking the restart loop (“cutting the tether”.)
Eventually the monitor process will try to restart the Gossip socket which may crash, causing the tether to crash and so on. Either way the crashing process was contained and the application stayed up.
Conclusion
The code for this was pretty simple, you can see it on GitHub in the gossip folder.
I learned of this at least part of this technique from Adam, over at the MUD Coders Guild. If you are interested in Elixir and multiplayer game programming, come check out the Slack and say Hi.