Watson Isn’t the Future of the Interactive Fiction Parser

Watson, the IBM program that beat two top Jeopardy! champions, is an eye-catching advance in natural language and reason-engine-style processing. It crushed its two human competitors at reading answers, teasing out the clues in those answers, and responding with the appropriate question.

It’s also not where text-based interactive fiction parsers should be going.

Interactive fiction, at least as I’m talking about it here, is a type of text-based game in which you type imperative sentences to move around in and interact with a game world. You type the sentences at a “>” prompt which in theory promises, “Type in anything and the game will understand you.”

That’s a lie, of course. What the prompt really means is, “Type a sentence that matches the pattern of commands the game understands and it might respond appropriately.” The natural language processing behind interactive fiction hasn’t changed much since Infocom set the standard back in the 1980s even though computers have become much more powerful. Could you make interactive fiction better by improving its natural language processing capabilities? Brian Moriarty, former Infocom implementor, sees NLP as a near-necessity for IF to be better. And in the wake of Watson’s victory, others have wondered why IF parsers don’t take advantage of computers’ increased processing power to do better parsing.

Would Watson, or something similar, make IF better? Watson’s requirements of 90 IBM Power 750 servers with some 2,880 processor cores and 15 terabytes of memory puts it out of the reach of the general IF audience, but you could certainly improve the natural language processing capabilities of interactive fiction without going to those lengths. Other games with text input, like Façade, respond to any typed input without requiring you to follow IF’s established imperative sentence structure. Why shouldn’t IF?

Many other genres of games, from first-person shooters to role-playing games, have a limited interface. Xbox and PS3 game controllers have eight buttons, two joysticks, and a D-pad. That limits the number of actions you can perform with one button press, and to get more you either have to make certain buttons context-sensitive, like the “use” button common to many games, or ask users to chain together a long string of button presses, like in old-style fighting games. Mainstream PC games use a mouse, arrow keys, and perhaps a set of function keys.

IF, on the other hand, has a much larger interface. Few works of IF let you type in “USE DOOR”. Instead it’s “OPEN DOOR” and “SEARCH DRESSER” and “PUT THE BOX ON THE TABLE”. That gives you a wide range of possible actions at the cost of a complex interface. To help guide players, authors adopted a standard set of commands. If your game requires new or less-usual commands, you have to spend time guiding players to learn and use those commands.

Now imagine a game accepts any input. You can type any English sentence you want and the game will attempt to parse your input. What should you type?

Game interfaces are about expressing agency in the game world. They’re how players communicate their intentions to the game and affect what’s going on inside the game. Modern videogames spend the first part of the game teaching players the available game mechanics and how to use the interface, helping them climb up the game’s learning curve. They guide players explicitly. If your game accepts any text input, then you have to work much harder to teach players what to type. To overcome option paralysis, you have to narrow those options.

Even if you had a perfect parser that could understand everything you typed, the game has to know what to do with it. Parsing is no good if you don’t do something with the results. Watson’s processing power let it parse text input and, based on that and its knowledge of how Jeopardy! answers are structured, make inferences about what related question fit the input. How much power would a game need to respond appropriately to sentences like “What have I been doing?” or “Measure out my life in coffee spoons”?

Take the case of an IF parser that accepted adverbs. Current IF parsers accept commands that are of the form VERB THE ADJECTIVE NOUN, occasionally with an added preposition and second noun: “PUT THE BOX ON THE TABLE”, “OPEN THE RED DOOR”, and similar. Now add in adverbs, so that you can “OPEN THE RED DOOR SLOWLY” or “PUT THE COFFEE CUP DOWN QUICKLY”. Now the game must decide the difference between putting something down quickly or slowly. What does it mean in game terms to TURN THE KNOB ANGRILY? You’ve added more nuance to a player’s interaction with the game world, and the IF author has to handle that nuance. It’s more work for the IF author; does it add enough to the game to be worth that work?

To rein in that increase in complexity, previous and current attempts have restricted this kind of accept-anything natural language processing to conversation with characters in the game instead of to affecting the game world as a whole. In those games, it quickly becomes apparent that the characters you’re conversing with don’t really understand what you’re saying. Worse, sometimes they mis-understand you in ways that mar your the game experience. Player agency is reduced, and you soon get the feeling that you can’t know what effect your actions will have on the world at all.

That’s why I don’t see Watson-style NLP taking IF by storm. The promise that you can type anything and the game will understand and respond appropriately has not yet been fulfilled, but even if it were, I don’t think it would make better IF.

Share

9 Comments

  1. Tombstone
    on February 24, 2011 at 10:19 am | Permalink

    Having only an extremely limited experience with IF, please excuse me if I’m just completely off-base.

    However, I’m not sure that I agree with your statement that, “To overcome option paralysis, you have to narrow those options.”

    One thing that I just absolutely love about Pen-and-Paper Role Playing Games is the fact that you literally can do anything. The GameMaster takes the place of the IF parser you’re describing and allows the player to fully explore the GM’s world to whatever extent they desire. This can lead to a level of immersion that I’ve never found in any video game.

    So when you ask, “It’s more work for the IF author; does it add enough to the game to be worth that work?”, my answer is a resounding, “Yes!”

    If an IF parser could fully replicate the interactivity of a GM in a PnP RPG, I’d find it amazing. Of course, you probably really are talking about a Watson-level of processing power to achieve anything like this, and it’s (therefore) not currently viable. But my point, I guess, is that truly universal interactivity in IF wouldn’t necessarily lead to option overload. When you can do anything in the world, then the player can effectively make their own game.

    The IF game’s point may have originally been to figure out who the murderer was or whatever. So? The player’s goal is to have fun. If the player would rather sit in his apartment and build a Lego castle, then so be it. If the game can really allow interactivity with anything, then *anything* can be the goal.

  2. Jason Dyer
    on February 24, 2011 at 12:11 pm | Permalink

    Tombstone, you seem to be mixing together unlimited natural language parser with unlimited world modeling.

    An unlimited world model isn’t necessarily a bad thing, although the only actual IF attempts (like Amnesia) have not had anything close to the capability you’re talking about. The only IF game I’d say had genuinely emergent game play is The Hobbit from the mid-80s. There’s certainly a lot of room to grow.

    Unlimited parser is more of a ‘what would we do with this?’ situation. Unlimited dialogue possibilities, sure, but: what regular command wouldn’t include a verb somewhere? How many different ways can you say to throw the rock at somebody? Could you possibly make a sample mock-up transcript of what you’d like to see?

  3. on February 24, 2011 at 12:39 pm | Permalink

    In pen-and-paper RPGs, the game master can guide the game, take player input, and help narrow player options. The rules similarly narrow options. You could play freeform games of “let’s pretend!”, but the popularity of RPGs with rules over completely free-form play makes me think that we need that structure.

    Parsing just lets you recognize what’s being typed. Responding to it is where it gets hard. If we had a full AI behind the curtain then, yeah, I think you’re right. Absent that, though, you run into the problem of the author having to deal with ever-increasing interaction possibilities and the need to guide players through the experience. If a player would rather just play “let’s pretend!” then why use the computer at all?

  4. Tombstone
    on February 24, 2011 at 3:12 pm | Permalink

    Jason and Stephen, I see your points. And I do realize that we’re talking about an extreme amount of back-end requirements to have anything like what I’m describing.

    Stephen, your at least partially talking about the level of work required for the IF author versus the actual implementation of the story that the author is wanting to tell. I certainly consider that a valid point.

    If the IF author is wanting to tell a story about escaping a derelict space station (/wink), how much time does the author really want to spend modeling interactions with the left over food rations?

    What almost seems to be needed is an “IF engine” similar to generic video game engines like “Unreal” or what-have-you. If the video game author is wanting to tell a story about zombie-hunting, maybe he doesn’t want to worry about every last physics interaction, so you use a basic engine that they bought in order to get the common interactions like “gravity” already implemented.

    I dunno. Maybe somebody starts an open-source IF-backend engine that handles common world interactions. And then the IF author could use that engine to handle most out-of-the-ordinary requests. The author then grafts on additional commands to handle the specifics of his story line.

  5. on February 24, 2011 at 3:26 pm | Permalink

    We’ve actually got the start of what you’re describing. Modern programming tools like Inform 7 and TADS 3 give you a library that implements a world model. That world model handles dividing the game’s space into rooms, moving between the rooms, containment (putting things on top of or inside other things), and the like. They also have a lot of hooks for you to add new behaviors that aren’t covered in the library.

    The thing is, that’s pretty boring stuff, so you spend a lot of times adding new behaviors or coding exceptions to the base behavior. When you eat something, does it disappear? Does it add health? Does it poison you? Most of the time you won’t be able to eat things, so you only add exceptions for the things you do eat.

  6. on February 25, 2011 at 12:51 am | Permalink

    Stephen, have you looked at the 2-D graphical language stuff Chris Crawford did with Trust & Betrayal: The Legacy of Siboot?

  7. on February 25, 2011 at 8:17 am | Permalink

    I haven’t, though I’ve looked at various incarnations of Erasmatron/Storytron. One of the early versions used a graphical programming approach. Do you know if that’s similar to what he did with Betrayal & Trust 2?

  8. on March 6, 2011 at 8:36 pm | Permalink

    Trust & Betrayal certainly relates to his Erasmatron/Storytron work, but I’m not sure of the exact link. He has a great chapter on it in “Chris Crawford on Game Design.” You should read the whole book it you haven’t already…) The main innovation of the 2-D aspect of the language was that the sentences sort of diagrammed themselves.

    Chris Crawford and Brian Moriarty each gave *amazing* talk at GDC this year, BTW; I am very glad to have seen both. Many people were calling them the “real” keynote of the conference (the “official” talk was more or less a Nintendo press conference).

  9. on March 7, 2011 at 5:35 pm | Permalink

    Both Crawford’s and Moriarty’s talks sounded really cool, and I wish I could have seen them.