Saturday, March 23, 2019

How to Make Accessible HTML5 Games that Work with Screen Readers, Part Three

This is the first of a three-part series.  Links to the other two articles will be added to this page as the other parts become available.

In the first two parts of this series, we discussed an overview of screen reader capabilities in HTML5 games and some of the technical implementation details of setting up a system that allows playing games with a screen reader.  In this part, we will discuss some of the more esoteric, design-level considerations that the previous two parts bring to the fore.

In many ways, these are the most important elements to get right.  All the technical details mentioned earlier, even if executed flawlessly, will fall down if the design side of things does not leverage them properly.  So this is what we will discuss in this part.

Writing Considerations

First, let's talk about writing considerations.  When you interact with a game solely through the script you write for the game, the manner in which we write the utterances that will be spoken matters.

Brevity

One thing I struggled with in Battle Weary was how succinct to make the text.

On one hand, I wanted the language used in the game to be as short as possible, to reduce the amount of time that players had to sit there waiting through exposition to get to the meat of what's being described so they can make decisions.

On the other hand, stripping all the "flavor text" from the game would rob the game world of all of its immersiveness.  The sighted users get nicely-decorated maps that (try to) evoke the game world with visual decorations.  I felt that screen reader users might want to have the flavor text to give them an equivalent feel for the world their character was exploring.

I have to admit, I am at a bit of a loss as to how to reconcile those two concerns.  The best balance between brevity and flavor is a subjective choice, but it still should probably be driven by some guiding philosophy so it doesn't go all over the place over the course of a game.

I'm pretty sure the best approach is not to default to only speaking the bare minimum and having the player hit a key for the flavor text.  And I'm pretty sure it's a bad idea to have paragraphs of florid text for every interaction.

What I settled on for Battle Weary was trying to be very succinct, but still having at least a little flavor to the succinct statements.  Instead of "The goblin hits you for 3 damage," I say, "The goblin stabs you with a crude spear for 3 damage."  The latter is longer, but not so long that it is tedious to listen to, yet still helps evoke a mental image of what your hero is facing.

Punctuation and phrasing

The reality is that text-to-speech technology, while very good, still isn't anywhere near as good as "normal" human speech, so you may have to help the text-to-speech parser along.

One thing I encountered during development was the problem of speaking about cards.  For instance, consider the following text:
Play the attack card.
You, as a human, might speak this text in one of two ways, and the intonation you'd use would make a difference in what you mean.  If there are multiple types of cards, and "attack cards" are one type of card, the above text might mean, "choose a card of class attack and play it".  In this case, you'd speak "attack card" as a contiguous text phrase without pauses, connecting the two words.

But if the title of the card is "attack", you'd use a different intonation, adding slight pauses around the word "attack" to convey where the title begins and ends.

Unfortunately, the text-to-speech parser has no idea about these nuances, so you have to help it out a bit.  The default speech would be the first approach, but if you needed the second approach – the title of the card is "Attack" – then you'd ask the text-to-speech parser to speak:
Play the "Attack" card.
Placing quote marks around "Attack" causes the text-to-speech parser to intone that part of the phrase differently, with the desired pauses.

Luckily, this is very intuitive and straightforward syntax, and in most cases, it can even be displayed as screen text, since it matches up nicely with what you'd normally display visually.  Unfortunately, this is not always the case.  Sometimes you need to add punctuation to the "spoken" text that you wouldn't want to display visually.

A good example is headings.  If you handed the text-to-speech parser the text for this section of this article, it would speak:
Punctuation and phrasing the reality...
 ...without a significant pause or concluding intonation between the end of the heading and the first paragraph, because headings don't typically include punctuation.  The solution is to place a period at the end of the heading to help the text-to-speech parser understand that the content is disparate:
Punctuation and phrasing.  The reality...
...but of course you probably don't want to display that period.

There are also some phrasings that TTS simply doesn't handle where.  For instance, the following phrase simply doesn't render well:
Where you sample the water matters.
...because the "where" gets lost in the shuffle at the beginning of the sentence.  The word "where" doesn't start many sentences, so it comes off sounding awkward.  But if we were to rewrite this sentence to:
The location where you sample the water matters.
...then the TTS system handles it elegantly and it is much more understandable.

The end result of all of this is that you often need to keep "what gets spoken" and "what gets displayed" separate.  It's wise to implement that in your systems from the start when possible; don't just store text.  Store text that will be presented to the user as two components: the spoken aspect and the display aspect.

Add Whitespace Generously

Another good best practice is to always add a space at the end of your text content, or be sure that when you concatenate strings, you add a space.  I encountered a few places where I'd have my game speak two sentences, concatenated from two sources, and failed to put a space between them, so what was given to the TTS parser was something like:
You walk west.You are in the dark cave.
This causes the TTS parser to speak something that sounds like this:
You walk west dot you are in the dark cave.
...all as one contiguous sentence, which is of course not what I intended.  Adding a space between the two sentences fixes it.  Since TTS parsers seem to behave much like HTML, where whitespace is "collapsed", you can add as many spaces between items as you need, so getting in the habit of having whitespace around your stored strings and inserting it when you concatenate them for speech is fine.

Help versus Status

In the previous article part, we talked a bit about the idea of always providing a way for the player to call up the game's Help and Status.  I'd like to talk about that in a little bit more detail now.

These two things are different in character, but are also closely related.  Help describes how to interact with the game, but does not convey much about its state, beyond it describing the current space of commands you have available.  State describes the game's state that you are making decisions to act upon, but doesn't describe how to actually act on it.  Taken together, you can understand the game status and how you can respond to it with your own decisions.

So why separate them?  It's a good question, and it took me a while to arrive on the conclusion that they need to be explicitly and consistently separated.  It boils down to the fact that this is what is required to streamline the play experience and maximize immersion.

Ideally, when you play the game, you've already memorized the key commands, you have them down to "muscle memory", and you aren't even aware that you are physically pressing that up arrow key in order to move your character north.  So to achieve that "flow" state for a screen reader user, we need to omit, to the extent possible, the exposition about how to move around and just let them move around.  The status, then, is what we emit to give the player what they need to know to make decisions, but not anything that explains what controls to use.

This is especially true since the player will likely need to hear the game's status over and over.  The more succinct we can make it, the better, and controls which they should internalize after only a few moves are prime candidates for culling verbiage.

But of course the reality is that everyone who plays your game doesn't come in with that familiarity with the game controls, and will need help.  That's why you also need to have the help-like content.

It was that inherent push-and-pull that led me to the realization that we need two affordances here, not just one.  Originally, I was envisioning a single "status" key that the user could hit at any time to get all that information, but it was cumbersome and often made the game feel more didactic than it needed to.  Once I separated the two, the feel of the game improved significantly.

Additionally, since users not reliant on screen readers could also benefit from a "Press H at any time to get help" for the key commands they use, too, it made sense to separate them on that front, too.  You don't need to tell a sighted user where the X's and O's are on Tic-Tac-Toe, but you do need to tell them the key commands.  You can kill two birds with one stone by exposing the "accessible" Help to all users.

Now, all the above said, you may find that there are some places where you can "inline" help to make things less likely that people will need to call up the help at all.  For instance, if you're going to tell players that they are currently deciding where to move right now, you can very succinctly mention that you can use the arrow keys to do so if they use the arrow keys to move.  Since we automatically emit the status, that would prevent people from having to refer to the help to learn to use the game.  Just use this sparingly; every help item that appears in status will be heard multiple times.

Consider Order

In many games, it's not important what order you present content in.  Indeed, in many games, much of the content is presented concurrently.  In Tic-Tac-Toe, for instance, most players just look at the board directly, seeing all the cells as they relate to each other at once.  While their eyes may focus on one cell at a time, the presentation is all concurrent.

This poses a difficulty when it comes to translating a game state like that to the spoken word.  How do you take concurrently-presented information and turn it into something sequential?  Obviously, you will have to just list everything out one by one, but the order in which we do that matters.

This is because the user can strike a key at any time to take an action.  We cannot measure when the user is done listening to the screen reader's spoken text, so we have to assume that content can be interrupted at any time with a player decision.

So, we need to put the most pertinent information first in any given utterance.  The goal is to let the user who has already "mind mapped" the game to wait the least amount of time to get the information they need to move on to the next game state.

But this is counterbalanced by the fact that the easiest way to get lost is to assume you're one place when you're not, and continue taking actions.  Orienting information must come first, especially if your game can interrupt the "normal" flow of things with new questions.

In other words, the most important thing to know about a cell in Tic-Tac-Toe is whether it is an X or O.  But to understand its significance, we need to be firmly aware of where that X or O is.  So the state for a Tic-Tac-Toe game would have to lead with the cell you are in currently, and then say whether it's an X or O.  A person very familiar with the game and able to envision where they are on the grid might prefer to hear X or O first...until they get lost, screw up their mental map of the game, and make a bad decision as a result.

So instead of:

"X.  You are in the left center cell."

...we should write:

"Left center cell.  X."

Things get even more complicated when you announce changes to the game state.  If I hit the up arrow key while the cursor is on the left center cell, I need to know that it accepted the change to the game status first.  So now we're looking at:

"Moved up. Top left cell. O."

Now, the most important thing about the cell is in the third position, but it's always clear and reliable, to veterans and newbies alike, how they are navigating, where they are, what they're doing, and what the status is.

Note that a veteran can still navigate this quickly.  If they are in the top left cell, they can hit the down arrow twice in quick succession to go check the bottom left cell.  They'll miss the status for the intervening cell, but will end up in the right place, and, importantly, they'll know they landed where they intended to "zoom" to before hearing the cell status:

"Mov...Moved down.  Bottom left cell.  Empty."

This "Action > Position > Status" hierarchy seems to work pretty well, especially when the full game status can get rather long.  Most people playing your game will try to build a mental model of the game, which will be supported by the recurring status utterances, but they won't need to listen to it every time.  It's crucial information, but if they know what its state was and they are familiar with the ways in which their actions can permute the game state (or leave it alone), they don't often need to hear the status unless something unexpected happens in the game...at which point the "Position" part of the game comes into play early.

Explicit Distinctions

You may have noticed in the above discussion that I had to come up with names for cells in the hypothetical Tic-Tac-Toe game.  "Bottom left cell".  "Left center cell".  And so on.

This illustrates an important point.  A lot of the distinctiveness we get for free in a visual medium is not available in a text-based format.  In Battle Weary, I wanted the ability to have multiple enemies in one location, but choosing between "goblin" and "goblin" to attack renders them indistinguishable.  In a typical roguelike, you could have dozens of similar enemies with the same name, but they are distinguishable simply by their position.  One is north of you and one is west of you.

In a text format, this is often no longer the case.  I solved this by giving each goblin a specific adjective.  "Slimy goblin".  "Smelly goblin".  "Dirty goblin".  Now, not only are they distinguishable, but it also adds a little bit of character and world-building to the game as a result.

I did a similar trick with room names.  In the maze-like forest level that Battle Weary generates, a sighted user has a distinct advantage over the screen reader user because they can see the relative placement of their character amidst the more fulsome environment, whereas describing all the rooms adjacent to the current one with spoken audio would quickly get tiresome.  So to give the screen reader user a hook to hang their mental model of the game world on, I tried giving each map cell a unique, evocative name, like "Dark Hollow", "Shadowy Clearing", etc.  Now, at least, the player has a way of distinguishing the rooms and has a chance of recognizing that they're back to known territory if they got lost.

Of course, sometimes, distinction is worthless.  If you have three "Attack" cards in your deck, it doesn't really matter which one is in your hand if they're all the same, so there's no reason to differentiate them.  Obviously, you need to use your judgement on which things warrant extra distinction and which don't.

Additional Details for Spoken Content

Early on in development, I had a function that could display a pop-up choice for the player.  It quickly became clear during testing that sometimes, it is easier to connect different pieces of content when presented in a visual medium.  If you see a dialog box that says, "How many gold pieces do you want to spend?" and the options are "One", "Two", "Three", it is easy to connect the question with the answers when seeing it visually, but more difficult if it is spoken – especially if the player "skips over" the spoken part by quickly pressing a keyboard key.

What I needed was a way to add extra exposition for choices selected by the screen reader.  For the sighted user, it would show "One", but for the screen reader, it would speak "spend one gold piece" to make clear what was happening.

I also found that, when showing help, that I could omit some parts for the sighted user; for instance, there is no need to tell the sighted user they can press "S" to have the screen reader announce a summary of what they can already see.  So having some affordance to be able to define different text for announcing and displaying is generally needed, and as you write, you need to keep in mind what parts are relevant for which media – spoken versus display – to ensure that the best possible experience is delivered by the text.

Structural Considerations

The design considerations implied by screen readers do not end at how we phrase things or the order in which we utter them.  There are also other design constraints that we need to put on the game itself in order to make it enjoyable – or even playable – by screen reader users.

Rethink Gameplay Features for Flow

As I mentioned earlier, the impetus for this project was a desire to make the roguelike game genre readily playable by screen reader users.  But the reality is that if I were to attempt to bring over the full roguelike experience, it would be a slog.

It could be done.  All traditional roguelikes are turn-based games on a grid, and that could be exposed to be fully discoverable.  (Indeed, any roguelike that runs in a Unix shell is playable in this sort of context because it's all just text.)  That doesn't mean it would be pleasant, enjoyable, and appropriately designed for the task, though.

Roguelikes often have a heavy emphasis on the map, which is a large, tile-based affair, usually with hundreds of tiles per level.  Rooms can often have dozens of elements of interest in them, like masses of white worms or "monster zoos".  They can have odd, irregular shapes with multiple doors in walls at odd intervals and locations.  And they can have bottlenecks and open spaces that play into how many enemies can attack you at once, so not understanding the geography can, in edge cases, get you killed, and yet the vast majority of the time, positioning is irrelevant.  Most of the time, you can just constantly hit the arrow keys to wander around a large dungeon filled with mostly-empty tiles.  All this is technically describable in text, but I wouldn't want to sit there listening to something like this every time I moved one tile:
There is a white worm mass two tiles west and one tile north.  There is a white worm mass two tiles west.  There is a white worm mass two tiles west and one tile south.  There is a sword three tiles west and four tiles north.  There is a white worm mass four tiles west and one tile north.  It is wounded.  This room is rectangular, twelve tiles wide and four tiles tall.  You are seven tiles from the west wall and three tiles from the north wall.  There is a door five tiles west and four tiles north.  It is locked.
Nor would I want to have to drop out of my movement actions every turn to move a cursor around to check everything I can see to make sure nothing surprising or dangerous is going on.

So how did I handle it?  I decided to replace the largely irrelevant nature of the tile-based movement with a room-by-room movement.  This preserves a lot of the roguelike activity (roaming around, killing monsters, and taking their stuff) while streamlining away the parts that would be exceedingly dull as a narrative piece.  Enemies that would have normally been masses of multiple tiles now just require a single mention.  So instead of having to hear that above description a dozen times or more to get through a room, you just hear something like this once:
There is a white worm mass here, slightly wounded.  There is a sword here.  There is a locked door in the north wall.
A huge difference!  It also makes things more streamlined from a UI perspective, since you can just choose from three things rather than having to choose something at a position.

We do lose a small amount of nuance – we lose that element of using the geography to take advantage of bottlenecks, for instance.  But it's a good trade-off, and in any case, the important thing is whether the game is fun to play, not whether it "ticks the boxes" of the Berlin Interpretation.

Parallel Mechanics and Interfaces where Possible

One thing to keep in mind is that because there are so few screen reader accessible games on the web currently, you'll probably be asking the user to learn some entirely new UI conventions for your game, especially if you go the route I'm describing in these articles.

That means that users will likely already be unfamiliar with your game's controls.  If you turn around and fill your game with many interaction sub-modes, that's just going to make it that much harder for your players to learn your game well enough to seamlessly play it.

To a certain extent, that's unavoidable.  Your "Main Menu" is going to have different interactions than your core gameplay, probably.  But you can still try to minimize it by looking for places where you could unify the gameplay interaction.

You can do a lot with arrow keys and space bar, for instance.  Battle Weary only has three real interaction modes: moving around the map (arrow keys move you and space bar interacts with something there), playing cards (arrow keys select cards and space bar plays it), and dialog boxes (arrow keys select options and space bar confirms the choice).  In all cases, it's arrow keys and space, so learning the game is quick, and even when the player gets into a new place they're not familiar with, trying what they already know will lead them in the right direction.

There will be some games out there which cannot be boiled down to arrow keys and space bar (plus the "S" and "H" keys), but even then, I suspect the interaction paradigm can be made consistent across all game modes.

But even in cases where there is a new interaction paradigm needed (say, a fishing mini-game or something that is totally different than the core gameplay), ensuring that some of the interaction elements are consistent – like the persistent availability of the "S" and "H" keys – will help orient users and smooth transitions between interaction schemes.

The more similar and unified your UI control scheme is, the faster a screen reader user can internalize the controls.

Games that Lend Themselves to Screen Reader Play

Obviously, some games are going to lend themselves to screen reader play more than others.  A "choose your own adventure" style game is going to be easier to play with a screen reader than a "Bullet Hell" shooter.  If you're looking to make games for a screen reader, keeping in mind the limitations of funneling all gameplay through textual descriptions is paramount.

One thing that I struggled with in Battle Weary was the fact that sighted players have an advantage over non-sighted users because the layout of the map is so visual.  A sighted user can see at a glance not only their current room, but also the rooms around them.  If they've gotten lost in the forest, and they see the Forest Guide in an adjacent room, it's suddenly clear where they need to go.  Screen reader users do not have that information, so they're going to have to keep wandering around.

I could address this by having a way for screen reader users to "look around".  I actually thought about adding a third persistent key – "L" for "look" – that would let the player get a very detailed breakdown of everything a sighted user would see.

But I wasn't sure that would actually assist gameplay.  It would encourage the same problem mentioned above, where, for optimal play, the player would have to stop after every move and use an interface for "looking around", and I didn't want that.  Ultimately, the brevity of the game jam's seven days made the decision for me, but even if I hadn't been under time pressure, I'm not sure I would have implemented it.

Given time, I think I may have instead opted to address that disparity in challenge levels a different way, such as changing the procedural generation of the forest mazes to be less "twisty" and long, adding a "backtrack" command to head back towards the exit, or giving the player access to purchasing items that could teleport you out of the forest to remove that challenge for those stymied by it.  Addressing the underlying problem that the screen reader makes difficult is a better plan than "kludging" a solution that will approximate the advantage a user who can see the screen has.

Conclusion

All in all, it was an interesting experiment, and I believe I've been able to deliver a solid, playable game that a screen reader can be used to play.  With some crucial design choices and a little effort, the game experience is quite comparable whether you can see it or not, and it opens a genre that has very little in the way of screen reader support to players who might be interested in it.

The principles and best practices outlined in this article series can be applied to just about any HTML5 game, and could also be used for interactive educational applications and other "managed" interactive experiences.  There are still a few weak spots and places for possible improvement, but the interaction experience is so vastly improved over the tedious and clumsy method that would otherwise be used (by exposing the entire DOM to screen readers in a confusing way), that it appears to be the best approach for heavily interactive experiences like a game.

However, all the above has only been tested with VoiceOver.  We need to do more testing with more screen readers to see if any other problems crop up or any other best practices become needed.  Until then, I'm going to use this structure for future projects in an attempt to make them more playable by a wider audience (at least, for the projects that would lend themselves to text-based play, or which are required by law to be due to receiving federal funding).












Friday, March 22, 2019

How to Make Accessible HTML5 Games that Work with Screen Readers, Part Two

This is the second of a three-part series.  Links to the other two articles will be added to this page as the other parts become available.

In this article, I will talk about the technical aspects of how I exposed the gameplay of Battle Weary to work with screen readers.  Battle Weary is an HTML5 roguelike game written in Javascript, and it uses the WAI-ARIA specification to expose its behavior to screen readers.

Before we begin, be aware that much of what follows builds upon (and in some cases, goes against) the WAI-ARIA specification for exposing web applications to screen readers.  It would probably be good for anyone looking to make HTML5 games accessible to at least skim the important parts of the specification to understand the discussion below.  (Or at least bookmark it so that you can come back to it later once you start diving into it yourself.)

Overview

This section will summarize the components we need to have in place at a high level, and then we'll go into detail for each below.

The model for how we structure the interactive game experience is that of a conversation.  Rather than trying to expose all of the UI elements as navigable items that must be individually discovered, understood, and activated, we instead manage the gameplay experience in a more linear format that expects a sort of call-and-response interaction.

To achieve this, we use an ARIA-live region that will serve as our mouthpiece to speak things to the user.  We'll call this the "Emcee", because it will essentially serve as our "master of ceremonies", giving announcements and framing the context of the presentation.

We then, essentially, shut down the normal navigation and exploration features of the screen reader so that they don't "clutter up" the experience of playing the game.  We do this so that the user doesn't have to navigate around, doesn't get off-focus to interrupt gameplay, and doesn't accidentally break out of the game when they don't expect it.

And finally, we structure our activity to ensure that the conversation is intelligible, and add affordances that allow recovery if something else interrupts our conversation (such as an alert from an email message coming in).

In the following sections, we will look at each of these elements and how to achieve them.

The Emcee

The Emcee is going to serve as our mouthpiece for communicating textual content back to the user.  It's an ARIA-live region placed outside of the main game display area which is both atomic and assertive.  Here's the HTML for it:

 <div id="EMCEE"
  class="screen-reader-only"
  aria-atomic="true"
  aria-live="assertive"
  tabindex="-1"
  ></div>

This will cause any text that you inject into the #EMCEE div to be spoken aloud (because it is an ARIA-live region), in its entirety (because it is atomic), at the earliest possible opportunity (because it is assertive).

For instance, if you have jQuery available on the page, you can have it speak "Hello World" by typing into the console:

 $('#EMCEE").html( "Hello world!" );

In Battle Weary, I went ahead and showed this div for debugging purposes and for it to provide a sort of automatic subtitling, but it could be hidden from view for non-screen-reader users.  However, you cannot simply use "display: none" to do so, because screen readers are smart enough to know that this means it's not in the flow of the document and therefore ignore it, which would kill your Emcee from doing its job.  So you have to hide the Emcee div without actually hiding it.  There are many techniques for doing so; here is one example:

 .screen-reader-only {
  position: absolute;
  height: 1px;
  width: 1px;
  clip: rect(1px 1px 1px 1px); // IE 6 and 7
  clip: rect(1px,1px,1px,1px);
  clip-path: polygon(0px 0px, 0px 0px, 0px 0px);
  -webkit-clip-path: polygon(0px 0px, 0px 0px, 0px 0px);
  overflow: hidden !important;
 }

This approach keeps the item in the active page flow without actually showing it to sighted users.

Narrating the Gameplay

The Emcee is the key for exposing the game to the player in a way that removes a lot of cumbersome UI navigation, but for it to do its job, we need to take over one side of the conversation and make it intelligible to the player by emitting relevant information to it whenever the user takes an action.

To accomplish this, I made an announce() function that could be called from Javascript to gather text to be spoken aloud, and optionally flush the content to the live region.  Why?

Consider a case where you are making a card game (like Battle Weary's combat system).  You want to announce the activity that the player does, and you want to announce the new state that this card play has led you to.  The code for the card playing is almost certainly separate from the code for the new state that you're going to.

But if you send these two utterances to the Emcee separately, the second one would override the first one, cutting it off.  You would never hear the first one.  So what you want is a model where we can send utterances to the Emcee, but have the Emcee only utter them when things are "settled down" and it is ready to speak them all in their entirety.

To accomplish this, we add the concept of accruing all the accumulated to-be-spoken content and then "flushing" it all to the Emcee at once when we're done talking and need the player to make a choice, much in the same way that CB-radio users say, "over" when they're done talking and the next person can respond.

We keep a variable called currentMessage and accrue spoken content in it until we are ready to flush it to be spoken.

 announce( text, flush ) {
  this.currentMessage += "  " + text;
  if (!flush) return;
  var safeMessage = this.currentMessage.stripTags();
  $('#EMCEE').html( safeMessage )
  this.currentMessage = "";
 }

This uses an extension to the String function to add the stripTags() function:

 String.prototype.stripTags = function() {
  var tmp = document.createElement("DIV");
  tmp.innerHTML = this;
  return tmp.textContent || tmp.innerText || "";
 }

We strip the tags so that we can give the announce() function an HTML string without the screen reader saying things like, "less than bold greater than important thing less than slash bold greater than".  Since we will often want to display and speak the same text, this comes in exceedingly handy, especially if we want to have some text that is spoken but not shown – in that case, we just wrap that content in a span that is given a class that has a display:hidden; CSS rule on it.

Once you have the above in place, you can do this when the user plays a card:

 GAME.announce( 'You played the "Attack" card.' );

...and then in the code that shows the result, you can do this:

 GAME.announce( 'Enemy died. Press space to continue.', true );

This will eventually speak, "You played the Attack card.  Enemy died. Press space to continue."  You have the ability to cleanly annotate the action that leads out of the previous state and prompt the user for the next state.

Interaction Organization

Which brings us to the mechanics for organizing the interaction with the game.  For this system to work, the game has to be, essentially, a well-defined state machine that always has a codified set of interactions that can lead to other states.  This is because, at any time, the user needs to be able to have the game re-speak the current game status.

If you imagine a game of Uno, if something interrupts the game when it is telling you what card is on top of the discard pile, you're out of luck if you can't get it to speak that again.  At any time, the user needs to be able to have the game elucidate the current status of the game and what it expects from the user.

In Battle Weary, the game switches between several different contexts: traveling around the map, battling monsters with cards, responding to choices or notices, etc.  Because games are complex, the keyboard commands that are meaningful in one part of the game are not meaningful in other parts, or they may have different meanings.  For instance, when traveling around a map, the arrow keys move the player to adjacent rooms – pressing the up arrow key moves the player north.  But when making a choice in a choice dialog, the arrow keys select different options.

So, there should always be a key that allows the player to get a spoken declaration of the current status and context of the game.  In Battle Weary, I used the "S" key, for "Status", but it could be anything as long as it's consistent throughout the game and announced up-front.

This means that you will probably want to organize your game structure in such a way that your game engine can query the current game state for a string that describes the state.  I used the concept of a "current controller" object that represents each of the possible game states, and which could be queried for this string as needed.

Similarly, you should also have a key for getting help.  In Battle Weary, I used the "H" key, for "Help".  This would do a similar task, only instead of speaking the current game status, it speaks the current list of key commands and interaction context.

The difference between status and help

"Wait, what's the difference there?" you might be asking.  There is definitely some nuance here that merits discussion.

When we interact with a game, there are two levels we do it at.  There's the conceptual model of the game - the interesting part of the game - that we deal with using our brain.  It's the "mind's eye" that looks into the game and understands it as an experience.  If we were playing a game of Tic-Tac-Toe, this is the part of the game that considers things like where the X's and O's are, whether we're winning or losing, noting where our opponent just played, and figuring out where we'll play next.

But there's also the interaction model for the game - the part of the game that allows us to interact with that higher-level conceptual model.  Do we click a square to add our 'X'?  Do we use the arrow keys to highlight a square and press space bar?  Do we drag an "X" symbol into a grid?  Or what?  There's a rote, mechanical element that we need to learn in order to express ourselves and participate in the conceptual model.

Typically, once we internalize that second category of game information - the part that tells us how to interact with the game - we typically don't need to reference it again.  Or we may only need to refer to it if we want to do something "weird" that we don't normally do while playing, like splitting or taking insurance when playing BlackJack or announcing "Uno!" when we are playing our second-to-last card in a game of Uno.  We will seldom trigger the help text ourselves, and the game won't automatically speak it - it will only speak it on our request.

But the conceptual information, though - we may need to refer to that over and over again, and will likely be something that is automatically emitted, since it is crucial information that helps orient the player on a turn-by-turn basis.

Luckily, both are simple to implement, since it's really just storing two HTML strings whenever the game state changes: one for the status and one for the help.  When the user issues a command that changes the game state, you just update them again.  And when the user hits the given key command, you simply emit the associated string.  Easy peasy!

(Well, it's "Easy Peasy" from a technical perspective.  The hard part is authoring those strings so they are useful, intelligible, and terse.  We'll talk more about that in Part Three.)

Also, one little optional improvement you could make to the above: Since the status is not something sighted users need, but the help could benefit them, you might consider actually showing a dialog box with the help when the user hits "H", while only announcing the text when the user hits "S".  This is what I did with Battle Weary, and it worked quite well.  Making a game accessible often helps all users, not just the ones who must rely on those affordances.

Switch Support

This section is a draft recommendation, and should be considered experimental and untested.

In addition to screen readers, you can make it so that switch users can also play your game by adopting a model where all options for a given game state can be tabbed through and shown to the user.  To accomplish this, you'd listen for the TAB key and show a new menu of choices.  As the user continues pressing the TAB key, it cycles through this menu, and pressing SPACE would select that choice as if the user had pressed the corresponding key command.  (To support non-switch users, you could also support shift-TAB to move backwards through choices.)

The way I handled this is that the "normal" controller would push a new "linear" controller onto the stack, which would present the choices to the user, and then pass the selected choice back to its calling controller.  That way, a single controller handles all of the activity; you just have a new controller that can present the choices in a different way.

Managing the Focus

The other sticky thing that was problematic when testing with VoiceOver was that it was easy to focus an element that wasn't the game.

The way the WAI-ARIA specification works, keyboard commands are understood to behave in different ways depending on what has the keyboard focus.  Pressing space while focusing on a button clicks the button, while pressing space while focusing on a text area adds a space character to the text area, for instance.

For an HTML5 game, then, the game grinds to a halt if it loses focus.

Typically, this is fine and intended behavior if the game loses focus because the user decided to navigate to a different web page.

But it's bad if the game loses focus because some sub-element has gained focus, like the letter "f" in one of your dialog boxes.

So, we take steps to aggressively manage the focus.  The goal is to ensure that the only thing on the page that the screen reader sees is the game itself.  If you're on the page, you're playing the game.

Unfortunately, in my testing, that appears impossible.  But we can approach it by implementing a few tricks.

First, we set up the page's HTML content so that it makes the game one big "widget".  We set the game's div to have the ARIA-role of "application" and make it the only object that can naturally receive focus:

 <div id="GAME"
  role="application"
  aria-roledescription="game"
  aria-activedescendant=""
  aria-label=''
  tabindex="0"
  >

Then, we mark all of the sub-elements, and any other web page elements on the page (except the game itself and the live region mentioned above), as being ARIA-hidden with a tabindex of -1:

 <div id="GAME-UI"
  aria-hidden="true"
  tabindex="-1"
 >

This (usually) causes the screen reader to only see one big "application" object on the page and prevents it from indexing those other parts of the page and exposing them as navigable options to the user.  This is exactly what we want, because we don't want two places for the user to go to do things.  Since the game is going to be considered one big widget, basically, we want the application focus to go there when we load the page, and stay there.

(Note that the specification for the ARIA-hidden attribute states that, in general, we should only use it for elements that are truly hidden from all users, but it carves out an exception for cases where we hide elements from screen readers for expediency and improving the screen reader user's experience.  The canonical example is a headline with an image; the image is ancillary and may be hidden in order to prevent the user from having to listen to (and navigate between) two different linked elements.  In our case, we are hiding the elements because we want the screen reader to think of the game itself as atomic.  I think this satisfies the requirement, since we truly are replacing the otherwise-cumbersome UI with something much more streamlined, but a purist may object to the level at which this intervenes.  To that argument, I can only respond that the WAI-ARIA spec is otherwise unsuited to the task, so no matter what we do, there will be concessions, so we might as well go the route that makes the most compelling and immersive game experience.)

Even with the above, it is possible to "sneak" into the descendants of the game, say by clicking on them directly, such as might happen when initially trying to give the application focus.  So we also set up a sentinel to watch for cases where the current focus is placed on something in the game's hierarchy and then instead bump the focus back up to the game element itself.  I do this with a focus manager.

Unfortunately, as far as I am aware, there is no event you can add a listener for that tells you when the focus changes, so until a better way comes to light, we brute-force it and just check several times a second that the focus is in a valid place, and if not, push it into one.  I do this with a FocusManager object:

class FocusManager {

 constructor() {
  this.gameHasFocus = false;
  setInterval( function() {
   this.manageFocus();
  }.bind(this), 100 );
  var game = document.getElementById('GAME');
  game.focus();
 }
 
 manageFocus() {
  var current = document.activeElement;
  if (current.id == 'GAME') {
   this.gainFocus();
   return;
  }
  while( true ) {
   /*   The EMCEE is part of the game, so if it has focus,
        go ahead and set it back to the main game. */
   if (current.id == 'EMCEE') {
     var game = document.getElementById('GAME');
     game.focus();
     this.gainFocus();
     return;
   }
   if (current.id == 'GAME') {
    current.focus();
    this.gainFocus();
    return;
   }
   if (!current.parentNode) break;
   current = current.parentNode;
  }
  this.loseFocus();
 }
 
 loseFocus() {
  if (this.gameHasFocus == false) return;
  this.gameHasFocus = false;
  if (GAME.controller.unfocus) GAME.controller.unfocus();
 }
 
 gainFocus() {
  if (this.gameHasFocus == true) return;
  this.gameHasFocus = true;
  if (GAME.controller.refocus) {
   GAME.controller.refocus();
  } else {
   if (GAME.controller.status) {
    GAME.announce( GAME.controller.status() );
   }
  }
 }
 
}

This code just watches the focus, and if it is ever a descendant of the game div, or the "Emcee", it refocuses the primary game div.  (In other words, if it the current focus is any game-related DOM element, it refocuses the main game DOM element.)

It also triggers some hooks into the game engine, allowing the current game state to respond to when the game has lost focus and regained focus.  By default, it simply announces the current controller's status when the game refocuses (i.e., if you come back to the Tic-Tac-Toe game, it will tell you the state of the board when you left off).

Now, people familiar with the ARIA spec may be crying foul right now.  You'll note that the ARIA-label for the GAME div is empty.  That is against the specification; there should always be an ARIA-label or an ARIA-labelled-by so that the screen reader knows what to say when it is highlighted if the content itself is not emittable as a description.

But here's the thing – that label gets spoken automatically and often while the game has focus, but not in a predictable and reliable fashion.  Whatever we put in there is going to be randomly echoed into the game stream at unpredictable times, and may not be uttered in its entirety at all.  We cannot stop it and we cannot rely on it.  So we leave it blank.  Otherwise, it will make the gameplay cumbersome and confusing.   (This is the same reasoning that leads to the valid exception for the usage of ARIA-hidden properties when the content isn't actually hidden from sighted users.)

Now, taking this route means we have a responsibility to do our due diligence and make sure that the lack of information in the ARIA-label never leaves the player hanging.  That's why we implemented the gainFocus() function above; when the game receives focus, we make sure to always emit something useful.

(One possible further improvement: If the user is idle long enough, announce() a notice that the player can always press 'H' for help.  Choosing a good idle length might be difficult and require testing with actual players, though.)

Edit: Since the above code was written, I identified an issue with it.  Depending on how you structure your game, you may wish to keep a flag and only issue that initial GAME.announce() if it's not the first time the game has gained focus.  Otherwise, this announcement could override a more fulsome first utterance that your game may emit upon startup.

Handling Key Input

The last little piece is handling keyboard input.  When the game has focus, we need to be able to respond to keyboard events.  But when the game does not have focus, we should not interfere with the keyboard commands to prevent stepping on the toes of screen reader users trying to navigate away.

Here is the code I am currently using for this:
didKeyDown( e ) {

 // First, we check to make sure the game
 // is what has focus.
 // If not, we ignore key presses, to allow
 // screen readers to do their thing unimpeded.

 // You can add other valid focus targets here
 // if you need to.
 var validFocus = [ 'GAME' ];

 if (validFocus.indexOf( document.activeElement.id ) == -1) {
  return false;
 }

 // Otherwise, we're going to handle the keystroke.
 e.preventDefault();
 
 // Give the controller "first shot" at the key.
 if (this.controller.keyDown) {
  if (this.controller.keyDown( e ) == true) return true;
 }
 // Any other default key actions for your app may go here.
 switch( e.key ) {
  // Player asks for status.
  case 's':
   var status = this.controller.status();
   GAME.announce( status );
   return true;
  case 'h':
   this.showHelp();
   return true;
 }
 return true;
}

As you can see from this code, the key commands from the player are completely ignored if the game does not have focus.  This way, if the user is navigating away from the game, it doesn't interfere or speak while the player is doing other things.  And if it does have focus, it kills keystrokes to prevent things like accidental selection of characters when pressing arrow keys, even if the current game context doesn't explicitly look for those keys.

But how does the user navigate away from the game if we kill all key events?  Well, certain "escape focus" key commands take precedence over what is given to the web page.  So those crucial navigation elements that allow a user to leave a web page and go to another tab or another application are not affected.  Only when it comes to navigation within the web page does this level of control kick in.  So we're safe!  (At least this is true with VoiceOver; it has not been tested in other screen readers such as Jaws, so I may have to amend this approach in the future if those other screen readers have different behavior.  I can't imagine that they would, though, because otherwise, a web page could "capture" a screen reader user and never let them go.)

Onboarding and Sentinel Pages

One thing I was not satisfied with in Battle Weary was the onboarding.  Once you're playing, Battle Weary is streamlined, fun, and immersive when used with a screen reader.  But getting to the point where you have the game in focus, the key commands internalized, and the game ready to play is still a bit rocky.  Browsers do not have any affordances that streamline that, so we're going to need a way to assist with that.

A screen reader user following a link to the game will get dumped into an itch.io page with the game framed in an iframe.  If they do nothing, the game will start with the correct focus and they can start playing, but if they start navigating around the page, they'll break out of the game's focus and it can get pretty difficult to get back to the game.

Honestly, I don't see a lot of ways around this other than to prep players with a "sentinel page" – a page that comes before the game itself, using standard, plain-jane HTML markup.  This would be a good place to give an overview of the game's context, rules, and the ubiquitous "H" and "S" keys, too.

This is quite close to what QuentinC's Playroom does.  The benefits for it are clear, and it seems to be battle-tested, so I think it's safe to call sentinel pages a best practice for screen reader accessible games.

One problem, though, is that this approach could be easily and accidentally subverted by someone sharing the link to the game page itself rather than to this "sentinel" page.  That would circumvent the gentle onboarding.

What I am experimenting with now is to include the game in the sentinel page itself, but hidden, and add a button on the page that kills the sentinel page content, reveals the game, and forces focus on it.  If the player is navigating the sentinel page, it already has focus, and pressing a button on the page to kill the sentinel content and start the game should work flawlessly.

I've experimented with this approach, and it seems to, indeed, work well, dumping you straight into the game with focus.  It works well enough that I'm going to consider it a best practice for now, but it will need more testing.

Conclusion

Once you have the above elements in place, you have the building blocks of an accessible HTML5 game.  You can:
  • Capture the focus and keep it
  • Announce text to the user to reflect game state
  • Accept keyboard commands to navigate the game space
...all in a manner that screen readers can work well with.

In Part Three of this series, I'll talk about the design elements that can help you avoid pitfalls and improve the gameplay quality for the screen reader user.




Wednesday, March 20, 2019

How to Make Accessible HTML5 Games that Work with Screen Readers, Part One

This is the first of a three-part series.  Links to the other two articles will be added to this page as the other parts become available.



This year, for the 2019 Seven Day Roguelike Game Jam, I decided to do something a little different.  Instead of focusing on new game mechanics, I focused on widening the audience for roguelikes by attempting to make a screen reader friendly roguelike.

The result is Battle Weary, an HTML5 roguelike that is playable with a screen reader.  (It has been tested with VoiceOver, and it appears to work with NVDA, and hopefully it will work with others.)

History

When I embarked on this project, I did a lot of searching online for examples of HTML5 games that work with VoiceOver, and for advice on how to technically create them, and I came up with very little advice or examples to learn from.  I was unable to find a single web-accessible game that seemed to work well with screen readers, and I found very little information or advice on making HTML5 games work with them, either.

I did find a rather small amount of advice for making web applications accessible, but it turns out that the vast, vast majority of the consideration of this approach is working under the assumption that what is being presented is, basically, an elaborate web form.

To a certain extent, one could consider an online HTML5 game to be a very complex web widget, so I tried writing a series of "hello world" type implementations of an interactive item that would respond to keyboard commands and act like a game, using the standard WAI-ARIA recommendations for creating web applications.

The result was, to say the least, disappointing.

In a traditional web application, the screen reader exposes affordances to help the user navigate from item to item, learn its role, and activate it. But the sheer verbosity of what results was very confusing.  Users accustomed to using screen readers would probably not find it so, but it still destroyed immersion.  A player would spend far more time navigating the UI than playing the game.

The larger problem, though, was that the core model of the exposure of web applications to the screen reader was that the screen reader scans the page to understand its structure, builds a parallel model of the structure of the UI from that, and then exposes that to the user.  It does this only once, at page load, and largely considers that content static and unchanging, offering the user the ability to step through it all linearly, learn about each control, and optionally activate it.

That's great for a web application page, but not so great for something like a game where the game is expected to change state, in very fundamental ways, with every player input.  There are a few affordances in the WAI-ARIA spec to accommodate changing page content (the "live regions"), but for the most part, I found that changing the game structure dramatically during play – even something as simple as switching between a "main menu" and the core gameplay – sowed confusion and errors, and often put the screen reader in a state that precluded sensible navigation.  I was getting very frustrated, and almost gave up on the effort.  It appeared that the WAI-ARIA spec just wasn't up to the task of exposing deep, complex, interactive games in any way that would be satisfying or understandable, let alone enjoyable.

The Turning Point

It was about this time that a fellow Twitter user pointed me to Quentin's Playroom.  Somehow, in all my searching, I had missed this site.  Quentin's Playroom is a site specifically made for blind players to play multiplayer games like Uno, Poker, Spades, Yahtzee, etc.  And it had been online for over a decade, with millions of gameplays under its belt.  Suddenly, I had something I could look at to learn from!  I immediately registered and played a game of Uno, and it was an eye-opener.


The entire game is essentially played in a single ARIA-live region, a DOM element that simply announces changes to its content.  While the player can use the screen reader affordances to navigate over to other menu items that let them choose options to play cards, draw cards, roll dice, etc., it was clear that the intent was for players to not do that.  Instead, they were expected to internalize a few keyboard commands and just use those to play the game.  The game would react to each keyboard command by adding commentary into the live region, which in turn would be spoken to yield the current game state.

Suddenly, the path forward seemed clear.  Instead of trying to dynamically expose all the game options of every game state as individual WAI-ARIA items that could be navigated between to understand the current game state and issue commands to change it, I could instead expose none of the game controls as targetable objects and instead react to keyboard commands directly and announce them, turning the game into, essentially, a conversation, much like how tabletop roleplaying games work.  The web application would serve as the "dungeon master", acting as the guide for the player, and the player could respond by making choices assigned to keyboard commands.  The "dungeon master" would then describe the results.  The user wouldn't have to navigate around to try to see the results of their activity; it would announce the activity for them.

It would require diligence on the part of the programmer.  Every game state would have to be able to make clear the key commands at every stage, and design considerations like consistency and clarity would be imperative because we'd now be responsible for all discoverability and usability of the page content.  But it would work, and it would be far more pleasant and enjoyable than navigating web forms, making it conducive to gameplay.

Before long, I had a reference "hello world" implementation that could be interacted with, and it felt natural, discoverable, and best of all, enjoyable.  The nut had been cracked!  In the following days, I was able to produce an entire roguelike game that was easily and comfortably playable with only a screen reader, and it didn't require untenable amounts of extra work to make it happen.

So this series of articles is going to talk about how to actually achieve this, in the hopes of encouraging other people to make their HTML5 games work with screen readers, and also talk about some of the things I learned about how to design games that lend themselves well to this approach.

Part Two will get into the nitty-gritty technical details of making this stuff work.

Part Three will talk about the design considerations and other "best practices" I've identified while doing this work.

An Important Caveat

I still consider most of this largely unproven.  It has not had the benefit of being hammered by thousands of screen reader users playing millions of games.  I'm still a novice at this stuff, so it's entirely possible there are deep, fundamental flaws with the way I'm doing this.  It could be that some screen reader users would prefer a series of non-interactive web pages to an interactive widget that commandeers the page activity and does its own, nonstandard thing.  And it could be fundamentally incompatible with some brands of screen readers it is as yet untested with.  I cannot vouch that this approach will work for everyone, nor can I vouch that this will satisfy legal requirements for accessibility standards (say, for entities that receive federal funding).

I have heard from a regular screen reader user that this approach worked well for them, so hopefully, the approach I outline in this article series is worthwhile and helpful and does, indeed, meet all these goals.  But even if it turns out that this approach is flawed for some reason, at least it should serve as a jumping-off point that will get us to where we need to go, because as it stands, it seems there isn't even really a conversation happening about making HTML5 games accessible.  At least I can help get that conversation going!

Saturday, March 11, 2017

"Arkham After Midnight" is now playable



Arkham After Midnight
 is now playable!

It doesn't have nearly the scope and diversity of plot elements I was hoping for, but it certainly serves as a strong tech demo for what I was thinking of.  If the game gains a lot of traction, I can fairly easily extend it with new content so that new mysteries can be generated (although I have since had some better ideas as to how to structure mysteries, so once the time pressure is done, I'd like to revisit the mystery creation process).

I'd also like to add in support for fan-created ploxels.  Since Lua has a way of building in sandboxed support for loading scripted content, it seems like this would be the perfect sort of game to support it. Fans could make their own "ploxels" and submit them to a central archive, and then you could download them, drop them into your "mysteries" folder, and get whole new adventures generated for you.

But for now, head on over to itch.io and check it out.  It's short and sweet, but hopefully you will enjoy it!

Thursday, March 9, 2017

Mid-week progress on "Arkham After Midnight"

Sadly, I haven't gotten as much done on Arkham After Midnight as I was hoping.  Despite taking several days off of work, I still have found myself with precious little time thanks to other pressures.  But it hasn't stopped me completely - I am making progress!

Here are some screenies:

  

Part of what ate up my time was having to go back to the drawing board a few times on the "ploxel" engine that undergirds the mystery generation.  I think I have a pretty workable, modular system now, but it took some doing to get it right.

It's taking me longer to create content than I had hoped, as well.  The sprite sheets are going quickly, and I think they look good, but the coding that goes behind them gets a little complicated to allow them to "play nice" with the generator.  For instance, I can't just create a gate to R'Lyeh that is opened with the opening ritual at the Standing Stones, because I want it so that as I add more ploxels, the gate can go to a different realm, the ritual comes from other places, and the location you open the gate can change.  Which means absolutely everything has a layer of abstraction on it that makes it a little hard to wrap your head around sometimes.

But it's working!  And it's powerful.  I can do things like spawn entire set pieces into maps.  For instance, I can have a ploxel add a whole ship to the harbor location, complete with bad guys, loot drops, etc., embedded in the sub-map in the same way I draw regular maps.

The main problem is that content is taking a long time to craft.  I think I'll have a playable mystery for the end of the seven days, but the mysteries it generates will be pretty uniform.  But with the modular setup, I should be able to flesh out the system with many more ploxels in the weeks following the 7DRL.  Like Lone Spelunker, I think I have a pretty good game on my hands that is passable as a 7DRL, but will really shine if I put some more work into it post-compo.  We'll see what the response is like - if people seem to be enjoying it and see its potential, then I will probably flesh it out more.






Monday, March 6, 2017

Monday Progress on "Arkham After Midnight"

Wow did I get a lot done today.

The ploxel engine is spitting out vignettes that actually hook up to each other.  For instance, I have an old shack in the woods where you can find mundane items, and an old tomb where you can find arcane items.  When I add mundane and arcane items into the mix, they show up appropriately in those locations.  There's a "thread" of a mystery going from scene to scene, and it's all working pretty well.

The down side is that it's a lot more complicated to make these pieces modular and yet sensical.  Things that would be straightforward if I didn't want the mystery to be procedurally generated get a little complicated, which gets in the way of producing the ploxels quickly, but I was still able to generate several ploxels tonight and the sprites to go with them in fairly short order, so I'm pretty happy.

Here's what the game looks like currently:






On the docket for tomorrow:

  • Add in hand-to-hand combat support.  Currently, there's only ranged combat.
  • Add in a character creation screen.  I've got several investigator sprites already made, but I need a way to let the player choose one.  I also want to give the player a little agency on what they start out with, although I have yet to come up with some good differentiators because there's not a lot of mechanical depth to the gameplay yet.  So far, about all I could do is adjust a balance of first aid resources to bullets to gun accuracy to gun damage.
  • More ploxels!  More sprites!  More enemies!
  • Figure out how I'm going to handle "finale" encounters.  I doubt I'll get to the implementation phase tomorrow, but it would be good to get some planning in for that.
But now, it's time for a break.

Sunday, March 5, 2017

Ploxel engine working

After some debugging difficulties, I think I have a workable ploxel engine constructed.  It currently has three ploxels, one with only a plug, one with only a socket, and one with a plug and a socket that match the sockets and plugs of the other two.

The engine successfully finds and connects them into a chain.  For instance, if I start with these three ploxels in the "open" list:

{
id = "Cthulhu",
name = "Cthulhu",
ilk = "enemy",
description = "Cthulhu lies dreaming, and must be defeated.",
describe = function(self)
 print( "Connection = " .. self.plugs[1].connection )
 return self.description .. "  He sleeps in " ..
  mystery:ploxel( self.plugs[1].connection ):describe()
end,
plugs = { "lair" }
}

{
id = "R'Lyeh",
name = "R'Lyeh",
ilk = "middle",
description = "the sunken city of R'Lyeh",
describe = function(self)
 return self.description .. ", which can only be reached by means of " ..
  mystery:ploxel( self.plugs[1].connection ):describe()
end,
plugs = { "gate" },
sockets = { "lair" }
}

{
id = "Passage",
name = "Passage",
ilk = "trivial",
map = "maps/manor.lua",
description = "a passage to another world",
describe = function(self)
 return self.description .. ""
end,
plugs = {},
sockets = { "gate" }
}

Then, when the mystery is generated and describe() called on the topmost enemy ploxel, it will spit out Cthulhu lies dreaming, and must be defeated.  He sleeps in the sunken city of R'Lyeh, which can only be reached by means of a passage to another world.

So the engine is successfully starting with a "big bad" and connecting ploxels to it to flesh out the mystery, based on the character of plugs and sockets.  A "plug" is an "aspect of this thing that I will need to generate something for."  A "socket" is an "aspect of this thing that can fulfill a requirement for another ploxel".  In the above example, Cthulhu has a "lair" plug, because it needs to find a "lair" socket on some other ploxel to connect to.  R'Lyeh has a lair socket, because it can act as a lair to a big bad.  Thus, they can be connected, giving Cthulhu a lair.  But the lair itself has requirements of its own; a gate to that mysterious location is needed, so a gate plug is created for it.  Eventually, the plug finds the matching socket in the passage, and the gate to R'Lyeh is established.

It took a bit of doing to allow the ploxels to "play nice" with things like functions in the ploxel objects.  I wanted to be able to serialize the mystery structure without having to save all the ploxel data in the save file (especially difficult-to-serialize things like functions).

My goal for tomorrow is to get maps generating in a way that is informed by the ploxel representing the map in the mystery.