Watson, the IBM program that beat two top Jeopardy! champions, is an eye-catching advance in natural language and reason-engine-style processing. It crushed its two human competitors at reading answers, teasing out the clues in those answers, and responding with the appropriate question.
It’s also not where text-based interactive fiction parsers should be going.
Interactive fiction, at least as I’m talking about it here, is a type of text-based game in which you type imperative sentences to move around in and interact with a game world. You type the sentences at a “>” prompt which in theory promises, “Type in anything and the game will understand you.”
That’s a lie, of course. What the prompt really means is, “Type a sentence that matches the pattern of commands the game understands and it might respond appropriately.” The natural language processing behind interactive fiction hasn’t changed much since Infocom set the standard back in the 1980s even though computers have become much more powerful. Could you make interactive fiction better by improving its natural language processing capabilities? Brian Moriarty, former Infocom implementor, sees NLP as a near-necessity for IF to be better. And in the wake of Watson’s victory, others have wondered why IF parsers don’t take advantage of computers’ increased processing power to do better parsing.
Would Watson, or something similar, make IF better? Watson’s requirements of 90 IBM Power 750 servers with some 2,880 processor cores and 15 terabytes of memory puts it out of the reach of the general IF audience, but you could certainly improve the natural language processing capabilities of interactive fiction without going to those lengths. Other games with text input, like Façade, respond to any typed input without requiring you to follow IF’s established imperative sentence structure. Why shouldn’t IF?
Many other genres of games, from first-person shooters to role-playing games, have a limited interface. Xbox and PS3 game controllers have eight buttons, two joysticks, and a D-pad. That limits the number of actions you can perform with one button press, and to get more you either have to make certain buttons context-sensitive, like the “use” button common to many games, or ask users to chain together a long string of button presses, like in old-style fighting games. Mainstream PC games use a mouse, arrow keys, and perhaps a set of function keys.
IF, on the other hand, has a much larger interface. Few works of IF let you type in “USE DOOR”. Instead it’s “OPEN DOOR” and “SEARCH DRESSER” and “PUT THE BOX ON THE TABLE”. That gives you a wide range of possible actions at the cost of a complex interface. To help guide players, authors adopted a standard set of commands. If your game requires new or less-usual commands, you have to spend time guiding players to learn and use those commands.
Now imagine a game accepts any input. You can type any English sentence you want and the game will attempt to parse your input. What should you type?
Game interfaces are about expressing agency in the game world. They’re how players communicate their intentions to the game and affect what’s going on inside the game. Modern videogames spend the first part of the game teaching players the available game mechanics and how to use the interface, helping them climb up the game’s learning curve. They guide players explicitly. If your game accepts any text input, then you have to work much harder to teach players what to type. To overcome option paralysis, you have to narrow those options.
Even if you had a perfect parser that could understand everything you typed, the game has to know what to do with it. Parsing is no good if you don’t do something with the results. Watson’s processing power let it parse text input and, based on that and its knowledge of how Jeopardy! answers are structured, make inferences about what related question fit the input. How much power would a game need to respond appropriately to sentences like “What have I been doing?” or “Measure out my life in coffee spoons”?
Take the case of an IF parser that accepted adverbs. Current IF parsers accept commands that are of the form VERB THE ADJECTIVE NOUN, occasionally with an added preposition and second noun: “PUT THE BOX ON THE TABLE”, “OPEN THE RED DOOR”, and similar. Now add in adverbs, so that you can “OPEN THE RED DOOR SLOWLY” or “PUT THE COFFEE CUP DOWN QUICKLY”. Now the game must decide the difference between putting something down quickly or slowly. What does it mean in game terms to TURN THE KNOB ANGRILY? You’ve added more nuance to a player’s interaction with the game world, and the IF author has to handle that nuance. It’s more work for the IF author; does it add enough to the game to be worth that work?
To rein in that increase in complexity, previous and current attempts have restricted this kind of accept-anything natural language processing to conversation with characters in the game instead of to affecting the game world as a whole. In those games, it quickly becomes apparent that the characters you’re conversing with don’t really understand what you’re saying. Worse, sometimes they mis-understand you in ways that mar your the game experience. Player agency is reduced, and you soon get the feeling that you can’t know what effect your actions will have on the world at all.
That’s why I don’t see Watson-style NLP taking IF by storm. The promise that you can type anything and the game will understand and respond appropriately has not yet been fulfilled, but even if it were, I don’t think it would make better IF.