NPC Goals

When I first started using Quest, I had an idea for a game that I called “What Will Be”. It was loosely based on a story I had started writing but never finished, involving a group of people brought together by government forces for (initially) unknown purposes. It was going to be a parser game, and I wanted it to have multiple autonomous NPCs. It was during my attempt to create the infrastructure for this game that I first came up with the idea of “goals”.

A “goal” is conceptually similar to its real life counterpart, though expressed in terms of the game world: a goal is, roughly speaking, a desired world state. Perhaps it’s an NPC wanting to be somewhere. Perhaps it’s a door being opened or an object given. The idea would have to be extended to more internal things as well (e.g. speaking to another character with the goal of conveying information), but I figured I’d get to that once I got the more mundane situations out of the way. Trying to bite off too much at once can lead to either indecision or madness.

I chose some initial goal situations to implement. They were these:

  1. An elevator
  2. NPCs getting from point A to point B in the game world (a three story building, in this case), including riding the elevator and using key cards to enter rooms.
  3. An initial scene where an NPC leads the PC to a meeting.

With respect to number 1, I seem to have this thing for elevators. Perhaps it’s because they have straightforward, well-defined behavior but with multiple parts (e.g. the car itself, buttons, doors, lights). And NPCs moving around and pursuing agendas was something I really wanted as well.

My first stab at code for goals had a form which I realize now was incorrect. I’ll briefly describe it and then get into where that led me, which is to where I am today.

A goal had three main pieces:

  1. a “try” action,
  2. an “achieve” action, and
  3. code to work out whether either of those was possible (Can the goal be achieved? Can the world be changed – can I try – in order to create the conditions where the goal can be achieved?)

If the necessary conditions for a goal existed, then the goal could be achieved. A goal had behavior when the goal was achieved. It might be an NPC transitioning to a new room. It might be some other change in world state.

If the world conditions were not such that the goal could be achieved, then there was code to try to get the world into that state. And the “try” section had conditions as well.

Let’s give an example.

An NPC wishing to enter the elevator would acquire an “enter elevator” goal. The conditions for entering the elevator were that the NPC had to be in the elevator foyer, and the elevator doors had to be open. In that case, with those conditions satisfied, the “achieve” action moved the NPC into the elevator car.

If the doors were not open (but the NPC was in the elevator foyer), the NPC had an action to try to change the world to achieve the goal: pushing the elevator button, if it wasn’t already lit up.

So we have this:

  • achieve condition = NPC in foyer and door open
  • achieve behavior = NPC moves into elevator
  • try condition: NPC in foyer and elevator button not pressed yet
  • try behavior: press elevator button

If the NPC was in the foyer and the button was already pressed, the NPC had nothing to do. It effectively “waited”. Once the elevator showed up and the doors opened, the NPC could achieve its goal by entering the elevator.

The elevator itself had two goals: “close door” and “arrive at floor”. The close door goal’s achieve behavior was to close the elevator doors. The one for the “arrive at floor” goal was to open them. So they were mutually exclusive goals, with mutually exclusive conditions. The “try” action for “close door” was to count down a timer set when the doors had opened. When it reached zero, the doors could be closed. The “try” behavior for the “arrive at floor” goal was to move the elevator to a floor that has been requested by an NPC or PC.

If the elevator doors were closed and no buttons were pressed (either inside or outside the elevator), it did nothing.

The initial “lead player” sequence was a complex mix of path following (both to the player and to the target room) as well some canned dialogue meant to coax the player to follow. There was also a “hold meeting” goal sequence, which was really canned and really unsatisfying to me.

What I found most unworkable about this method of doing goals was the need to manually string them together. For example, any path following (move from A to B) was explicitly programmed. There was nothing in the NPC that decided on a room or worked out how to get there. Plus, I wanted it to be possible to “interrupt” an NPC’s goal chasing. They might be heading to their room, but if you started talking to them, I wanted that goal to be put on hold (if it wasn’t too pressing) to take part in the conversation, with moving toward their room to resume once the conversation was over – unless some other more pressing goal had come up. The key here is that each step along the way in path following needed to be its own goal, to be evaluated and next steps considered at each turn.

To the extent that it worked, it worked nicely. But something wasn’t right with it.

Fast forward to my work with ResponsIF, and I found myself once again trying to implement an elevator. For one thing, I already had done it in Quest, so it was a sort of known quantity. The other was that if I couldn’t implement that, then I probably couldn’t implement much of anything I wanted to do.

Right away, I ran into the same problem I had had before with the Quest “goal” code: I was having to program every little detail and hook everything together. There was no way to connect goals.

After much thought, I had a sort of epiphany. Not only did I realize what needed to be done, I also realized why that original goal code seemed awkward.

First the original code’s flaw: the “try” and “achieve” sections were actually two separate goals! For example, the “enter elevator” goal included not only that goal but the goal that immediately preceded it. In order to enter the elevator (the desired state being the NPC in the elevator), the doors had to be open. But the doors being open is also a world state! And the “try” code was attempting to set that state. Strictly speaking, they should be two separate goals, chained together. I had unconsciously realized their connection, but I had implemented it in the wrong way. And that left me unable to chain anything else together, except in a manual way.

In this case, we have a goal (be inside the elevator) with two world state requirements: the NPC needs to be in the foyer, and the door needs to be open. Each of these is a goal (world state condition) in its own right, with its own requirements. In order for the NPC to be in the foyer, it must move there. In order for the doors to be open, the button must be pressed. I’ll break this down a bit in a followup post, to keep this one from getting too large.

So what needs to be done?

What needs to be done is to connect the “needs” of a goal (or, more specifically, the action that satisfies a goal) with the outputs of other actions. We need to know what world state an action changes. And there is where we run into a problem.

“Needs” in ResponsIF are just expressions that are evaluated against the world state. The game designer writes them in a way that reads naturally (e.g. ‘.needs state=”open”’), but they are strictly functional. They are parsed with the intent of evaluating them. There is no higher level view of them in a semantic sense.

In order to have a true goal-solving system, we need to know 1) what world state will satisfy goals, and 2) what world state other goal actions cause. The goal processing methodology then is, roughly, to find other goals that satisfy the goal in question. Then we iterate or recurse: what conditions do those goals need? Hopefully, by working things back enough, we can find actions that satisfy some of the subgoals which are actually able to be processed.

It’s a bit more complex than that, but the first coding addition needed is clear: we have to be able to hook up the effects of actions with the needs of other actions in a way that the code can do meaningful comparisons and searches and make connections. We need to be able to chain them together. Once we have a way to do that, then the code can do itself what I had been doing by hand before – creating sequences of goals and actions to solve problems and bring to a life a game’s overall design.

Oscillation

The universe is filled with things that oscillate.

The earth rotates – we have the shift from day to night and back again.

The moon orbits the earth, causing the rise and fall of tides.

The earth orbits the sun. We experience the recurrence of the seasons.

Light and sound have frequencies.

Electromagnetic waves have frequencies.

The device you’re using now to read this (assuming it’s being viewed electronically) has a tiny crystal inside that generates a train of impulses: on / off / on /off. Those impulses drive the rest of the system.

String theory postulates that the universe itself is composed of tiny strings, vibrating, oscillating.

We breathe in and out.

Our hearts beat.

The fundamental operations of our cells may have a feedback oscillation component to them that drives the chemical reactions.

Even the human brain has a frequency, a frequency that changes depending on what we’re doing, slowing when we’re asleep or meditative, faster when we’re deep in thought or otherwise using our gray matter in an active way.

The last one is of particular interest to me. When you look at neural nets in computing, they’re often, at least initially, set up as a sort of function: data comes in, data goes out. You train the network to respond to the inputs, and then you feed it input and see what is gives you for output. For example, a neural net could be trained for the identification of letters in an optical character recognition (OCR) application, where the “read” letter is output based on images presented as input.

But the human brain is more than that.

Part of what makes the human brain incredible is its massive size, in terms of connections. Depending on where you read, the estimates are 100 trillion up to 1000 trillion connections. What is equally critical for me is the fact the it’s not just a one-way network. The human brain is interconnected. It not only receives stimulation from the outside world; it also stimulates itself, continuously. It feeds back. It influences itself. It oscillates.

You see images play before your eyes. You hear that damn song in your head that won’t go away. (“But it’s my head!”) You relive scenes both pleasurable and horrific. You dwell on things that affect your emotional state, even though, for that moment, the stimulus is your own mind.

What does that have to do with IF? Perhaps nothing. But it is an interesting topic for me, and there is a higher level sort of recurrence that might be applicable, which is the recurrence of thoughts and feelings in our mental spaces. You can see an example of this in an earlier blog about associations.

ResponsIF‘s “fuzzy” design with weighted topics, decaying topics, and responses keyed off of those topics seemed to lend itself to experimentation with oscillation.

The first attempt, which was not topic based, was a miserable failure. I tried to simulate feedback based solely on values approaching each other. Even with layers of variables in between, the values kept achieving a steady state, where nothing changed. Not a very interesting state of affairs.

I achieved better success by having both varying state and target values which flip-flopped based on the direction it was going. Not ideal, and not really feedback as such, but it did oscillate. There are a couple of samples in the ResponsIF repository that illustrate these, one being a traffic light and one being an oscillation sample.

I ended up discovering a different approach to recurrence which may hold promise. I think of it as “ripples in the pond”.

The basic setup is to have a long-term topic associated with a lesser-weighted response. The lesser weighting places the response at a lower priority, which means it won’t trigger as long as higher priority responses can. This behavior for this response is to add in a decaying topic. That topic then triggers a response which may itself add in a decaying topic. The fact the topics decay and that their responses add in new fresh topics causes a ripple effect. Topic A triggers topic B which triggers topic C, on and on, to the level desired. Each triggering topic decays, causing its effect to lessen over time. Once the temporary topics decay sufficiently, the original, lower-priority response triggers, and the process begins all over again.

From a normal mental process point of view, this is equivalent to having long-term topics triggering transient short-term topics, with the long-term responses recurring until satisfying conditions are met. A bit like nagging reminders…

This might have simple uses, but it can also have more advanced uses if you have multiples of these all going off at once. Which responses actually get triggered depends on the state of the system: what the various response needs and weights are. This means that the internal state of an NPC can vary over time, both in terms of what has been experienced by the NPC and what its long-term goals are, as well as any short-term influences. And the NPC can influence itself.

Association

ResponsIF supports associations among topics, which is something I consider important. What does this mean? Let’s give an example from real life.

Say it’s morning, and you go out for a walk with the dog. The sky is gray, but your mind is pondering the day ahead of you. There’s a problem you’ve been assigned to solve, and it’s important that it be done soon. You walk absently, your mind exploring possibilities.

As you approach the corner shop, the sight triggers a memory – you need bread. For a moment, your thoughts of the problem at work have been displaced. You think about when you might be able to get some bread as you continue on, and soon that thought fades. Re-gathering, you return to your original pressing thought.

Waiting to cross the road, your thoughts are displaced again by the need to be aware of the traffic and the proximity to your dog. An old fashioned car cruises by, and you’re reminded of that film that you saw when you were younger, about the car that could fly.You think about the actor who was in it, and of how you used to like to watch his TV show. It was quite an old-fashioned TV show, black and white, but your parents liked to have it on in the evenings in the living room.

That living room was what you entered when you were done playing with your friends outside. You remember the cool evening air when you used to ride your bike and play ball. You remember the time you went off that insanely high jump on your bike, and that time you fell. You also remember that time you feel off your skateboard and fractured a rib.You can almost feel the pain, and how you almost couldn’t breathe.

You register that the traffic has cleared, and you head across, alert to any oncoming cars you may not have seen coming or that may be coming quickly.

Having crossed the road, you follow a path beside the grass. There are flowers growing there, and delicate purple ones remind you of – you almost can’t even bear the name, you miss the person so. Suddenly, you feel the loss all over again. You can’t help but recall images of the last time you were together, of the words spoken, of the tears shed. The heartbreak has faded to a dull ache, but it doesn’t take much to flare it up.

An elderly man approaches you along the path. He smiles and waves as he passes. He told you his name once, but it didn’t stick.

For a moment, you’re in tune with what’s going on around you. You hear birds chatting in the trees. You hear cars drifting by. You smell the remnants of last night’s rain, which you remember made the windows shake and the dog jump. It was impossible sleeping during the downpour, but it was ok, because you like the sound of rain.

The smell of fresh, hot food greets you as you approach a small shop. You remember you need bread. You’re not sure when you’ll be able to get it, because of the time pressure at work…

And you’re back to  your original thoughts.

I’m not sure if that seems familiar or not, but it certainly is for me. I find my head is an ongoing stream of thoughts, steered left and right and up and down by ongoing associations, either to other thoughts or to external stimuli.

Assuming that is how a human mind works, on some level, the obvious question is whether or not you’d actually want that going in your NPC’s “minds”. The interesting thing about what happened above is that there was no external indication of what was happening. There was no speaking or obvious signs of emotion. It was all internal. Does it make sense for NPCs to have an inner life? Traditionally, IF has been about what the character does: what it says, where it moves, whether it attacks you , etc. The invisible inner workings have taken a back seat or not been bothered with at all. After all, does the player care?

I can only say that I’m interested in giving it a try. The reason why is that the inner state influences the outer. Let’s say that someone were to approach you along your walk and attempt to engage you in conversation. The response you give might well be determined to some extent by what’s happening inside. If you’re thinking about what you have to do and how you have no time and how you don’t know what to do and how you really have to work out the problem, your response might be annoyance at any intrusion. If you are forced to respond while you’re contemplating your lost love, the response might be considerably different.

Having this sort of NPC inner turmoil (or at least ongoing inner state fluctuations) could help provide a more varied response during game play, depending on what’s happening or has happened in the past.

One thing you may have noticed in the narrative above was the continual return to thinking about the pressing long term goal. Little distractions came in, some even causing ripples that lasted a while, but eventually, when the water calmed again, the long-term goal came to the fore again. This points to several key things:

  1. Long vs short term topics. Short-term topics have a short lifespan. They are the introduction of topics into the thought stream that cause ripples or even a splash, but they have no sustaining energy. They’re fleeting, disappearing as quickly as they come, replaced by either their own descendants or simply fading away unless sustained. Long-term goals, on the other hand, are constantly there, although perhaps at a lower level.
  2. Higher and lower priorities to topics. Thoughts of work took precedence over thoughts of buying bread, ultimately. But external stimuli were able to temporarily overpower long-term thinking. Responses occurred or not depending on which topics had current weight, and those weights changed over time.
  3. The internal topic patterns oscillated or could oscillate. This is a powerful concept for me, one I’d like to explore further in a separate blog post.

Without delving too deeply into it, there are two primary mechanisms ResponsIF uses to associate topics. First, a response can trigger via multiple topics. This allows topics to be grouped in terms of the effect they have. Second, topics can trigger responses which can then add possibly different (but related) topics either back into the responder or out to other responders via suggestions. And the topics can be weighted appropriately. So, for example, the topic “family” might introduce topics for “mother”, “father”, and “Sarah” with relative strong weights, but it might also introduce the topic “farm” and “Pine Valley” with lesser, but still existent, weights. And if the discussion just prior also had weights for one of those lesser topics, it might be enough to push it up enough in priority to cause a response.

Topics are the glue that link response together. Responses tell you what happens in a riff. Topics tell you what the riff is all about.

Design Goals and Uber Philosophy

I have some fairly specific design goals I intend to explore in my IF “adventures”. Some of them can be seen in my first (and, so far, only) game spondre. That game not only gave me my first experience trying to realize these, it showed me some problems that need to be addressed. Not all of them are easily solvable. But you have to start somewhere!

If you don’t fail at least 90 percent of the time, you’re not aiming high enough. – Alan Kay

First, and not surprisingly, I want the game to be responsive. I don’t mean that just in the sense that you will get a response to your input. That goes without saying. What I mean is that the game should respond dynamically to not only what you are doing but to what you have done in the past. (Wouldn’t it be something if it could respond as well to what you might be about to do in the future?)

Now that’s not unheard of in games, even IF games, but it’s also not unheard of in IF games where, if you provide the same input, even after having advanced the game, you get the same response. To me, that just feels robotic.

Ideally, the game should respond as if it were someone there with you. This isn’t going to be Turing Test stuff, but the responses should make it feel that the game is in some form a conversation with you.

To that extent, I see a certain amount of AI involved in this, not just as far as any in-game characters are concerned but the overall game experience as well. All the text you see should have some sort of intelligence behind (with “intelligence” in quotes, of course) as much as possible and necessary.

I guess what I mean is, I want there to be a unified experience. Just as each of our brain cells is a part of a larger brain, and just as each of us in turn is part of a larger realization which is the collective activity of all of us together, all the elements of a game – rooms (if any), characters, plot, events, etc – are just parts of the combined, collective game. In other words, the game itself could be a considered a collective “non-player character”, working both with and against you to create a resulting story unique to you, using components of itself along the way.

A single “mind” instead of a disparate group of individual ones.

Hopefully, when I read this again in a month or two, it won’t seem like gibberish!

 

50-50

I’ve been learning about and experimenting a bit of late with fuzzy logic. In fact, I decided to implement a variant of it in ResponsIF. For those who don’t know, unlike in “crisp” logic where you have only two values (“true” or “false”), in fuzzy logic you have a continuum of values ranging from 0 (“false”) to 1 (“true”). A value represents a level of certainty or degree of something. For example, if you’re happy, your “happy” value might be 1. If you’re almost perfectly happy, but not quite, your “happy” value might be only 0.85. If you’re only a little bit happy, you might be down to 0.1.

Such variability comes in handy for attributes that don’t have hard edges. A person who is six feet tall would be considered tall, but how about someone only five-nine? And how about down to five-seven or five-five? As you can see, there is no clear line for “tall”, where you can say everyone above this value is tall and those below are not. There is not “you must be this tall” in life. The “tallness” attribute tapers of as height decreases. As it decreases, there is less certainty. Things become iffy. Things become fuzzy.

This blog entry came to be because I happened upon something interesting in relation to that.

In normal logic, you have certain tautologies. One of them is that “A and not A” is always false. This is so because “A” and “not A” can never both be true at the same time, and “and” requires both arguments to be true for it to be true. Similarly, “B or not B” is always true, as either “B” or “not B” will always be true, which makes the “or” true as well.

In fuzzy logic, things aren’t quite so simple.

Fuzzy logic uses “min” for “and” and “max” for “or”. That is, to compute “A and B”, you take the smaller value of “A” and “B”. To compute “A or B”, you take the larger value of “A” and “B”. These fuzzy operations happen to have the same behavior as the standard logic ones when “A” and “B” have extreme values (0 or 1). Check out the numerous truth tables on the web if you’re unsure.

In between the extremes, things aren’t so clear.

I came to think about this because, in ResponsIF, a “need” was initially considered met if the final value evaluated to 0.5 or greater. I began to think about the 0.5 case. Should 0.5 be considered met or not?

First, it occurred to me that “A and not A” when A is 0.5 evaluates to 0.5 So that means that an expression like that would actually satisfy the need at 0.5. That seemed wrong to me. “A and not A” is always false! (Or should be.)

But then I thought of “A or not A” when A is 0.5, and I realized it too is 0.5. Hmm. But “A or not A” should always be a true value. Or should it?

So when A is 0.5, both “A and not A” and “A or not A” have the same middling value.

I had a choice to make: do I consider 0.5 satisfying a need or not? I was getting no help from “and” and “or”. They were both ambiguous at 0.5. And that’s what led to my final decision: 0.5 does not satisfy a need. That’s the 50-50 case, the “it could be or it couldn’t be”. It’s just shy of a preponderance of the evidence. It isn’t anything definite.

I made the change. It might seem a bit strange to someone reading this that I pondered such a small thing at all, or for any length of time, but such tiny decisions are what go into designs like this. You have to work out the minutiae. You have to make the little decisions that sometimes end up to having large ramifications.

Sometimes pondering this level of things brings unexpected thoughts: if 0.5 is on that edge between false and true, if it’s a no-mans land, perhaps it becomes a coin toss. Perhaps when evaluating a need, if it’s 0.5, then it generates a random 50-50 result, sometimes true, sometimes false. That has some interesting ramifications which I might explore someday. In particular, I began thinking perhaps it should always be evaluated that way, as a chance. Not an “above or below this limit” but a random coin toss, where the coin is weighted by the need expression result. For example, if it evaluates to 0.8, you’d have an 80% chance of the need being considered met. Similarly, 0.1 would only give you a 10% chance. The extremes would still work as expected.

I don’t know if that makes sense at all. But it is fun to think about. It did have the result of making me start to consider whether “needs” should stop being a crisp choice at all, whether the “needs” value should just be an additional weight. That goes against its original emphasis, which is that a response allows you to configure it in both crisp and fuzzy ways. But I have to have these design crises on occasion to make sure I’m doing the right thing. Sometimes “the right thing” only becomes clear after doing something else for a while.

Like having 0.5 satisfy a need. It did and now it doesn’t.

Though sometimes I wonder if I made the right choice. It could have gone either way, really…

Oh, I don’t know.

Response-based IF

This will be a quick overview of what I’m calling “response-based” IF, which is what the ResponsIF engine is designed to do. It won’t get too in-depth into the nuances of the engine itself but will strive more to give the overall design considerations.

In general terms, a “response” is a unit of behavior. A response can output text (or other elements), change variables in the world state, move objects around, invoke other responses, etc. or any combination of the above. Anything that occurs in the course of interaction is done through some sort of response.

Responses are triggered by “topics”. A topic is anything you can think of. Nouns, verbs, ideas, emotions, pretty much anything you can think of can be a topic. If you can express it, then it can be a topic. A topic is “called” to generate responses.

Responses do not exist on their own. Rather, responses are provided by “responders”. A responder provides one type of context for when a response can be triggered – a response can only trigger when its responder is part of the active scene. (There are other sorts of context for responses as well, including the topic(s) to match and any necessary world state conditions.)

Let’s give a simple example of a response: the ubiquitous “Hello world” response.

.responses player
    .response START
        .does .says Hello, world!
.end

The key elements are here. The first line (“.responses player”) states that we are defining the responses for the “player” responder. The “player” responder is the only built-in responder in ResponsIF, and it is active in any response context. It is meant to correspond to the person playing (or reading or whatever) the current IF piece.

The next line (“.response START”) begins a definition for a response, responding to the topic “START”. The “START” topic is called by default when a ResponsIF piece begins.

The subsequent line defines what the response does: it outputs the string “Hello, world!”

Finally, “.end” terminates the set of responses for the player.

This is just a single response, and it doesn’t do much. But it should give an idea about how a ResponsIF game or work (termed a “riff”) is structured. An entire riff contains statements to set up the game world along with a collection of responders and their responses which will occur when various topics are called. Responders can be arranged in a hierarchical structure (parent/child) to establish the necessary contexts or even model “world” structures like rooms, hallways, caverns, etc.

How are topics called? Sometimes they’re called internally, as the “START” example shows. They can also be called by other responses. The vast majority of topics will be called due to player action, typically through UI elements such as buttons, hyper-links and textual input.

For example, the following rather pointless example shows how hyper-links can be created to call topics.

.responses player
    .response START
        .does .says Once upon a time, there was a {!princess!} who lived in a tall {!tower!}.
    .response princess
        .does .says She had long luxurious hair.
    .response tower
        .does .says The tower had smooth, stone walls that not even ivy could scale.
.end

The notation “{!princess!}” is called “link markup”. It creates a link that will call the specified topic (“princess”) when clicked.

The exact ResponsIF details are not terribly relevant here, beyond showing the basic concepts. (More details can be found at the ResponsIF github site: https://github.com/jaynabonne/responsif.)

A key part of ResponsIF is its use of HTML, CSS and JavaScript, which provides great freedom in how a work looks and feels. Due to this, ResponsIF can be used to create a large number of possible works. As a general-purpose engine, it is not restricted to any particular form of IF. Of course, being a general-purpose engine, it’s also not specialized for any particular form of IF, which means it may make more sense to use other tools more dedicated to a particular form, depending on what end result is desired. As always, choose the tool that best matches the task at hand!

In upcoming posts, I will go over my own personal goals and design choices for my IF work(s).

Conversations

(These are a series of postings I made in the textadventures.com (Quest) forums almost three years ago. The ideas put forth here eventually led to my Quest “Response Library”, which was then rewritten in JavaScript as ResponsIF. Not everything in it is still relevant for me, but I’m including here because it has some nascent ideas I think are worth preserving as well as including in this sequence of blog posts. I’m concatenating the posts together, as they’re all related.)

(Intro)

I wanted to start a dialogue (no pun intended) about conversations. Part of this is philosophical/theoretical, which can be fun on its own, but I also wanted to possibly work toward the creation of something new – a Quest conversation engine.

First, the idea bantering phase… So, what exactly do I mean by a conversation?

Part of that should be fairly obvious: the player having communication with other characters (also known as “non-player characters” or “NPCs”). There is already code in Quest – plus an extended such library written by The Pixie in the “Libraries and Code Samples” forum – which supports basic “ask” and “tell”. This takes the form of something like “ask Mary about George” or “tell janitor about fire”. You have a target character and what’s known as a “topic”. Typically, this basic implementation responds with a fixed response (e.g. look up string “topic” in character’s “ask” string table), with an occasional implementation of selecting randomly from a set of fixed responses.

But we usually want more…

There have always been questions, wishes, requests for NPCs to be more interactive. Now, this is black art at this point, and I don’t propose to have a definitive answer (maybe we can move toward that through this), but the point is, game author’s want more dynamic responses, based on any number of variables including previous topics, the time of day, what has occurred in the game, the state of play, and possibly even the emotional state of the character.

My own (limited) game design and wishes took me further.

Consider related desires. It has often come up that author’s want, say, room descriptions to change based on, well, more or less the same things listed above (though emotions come into it less). Similar design choices could be made for object descriptions, NPC descriptions, etc.

I had this revelation one day, as I began to see patterns in this. It finally occurred to me:

In a text adventure (or “interactive fiction”), everything is a conversation.

Let’s list some examples.

1) NPC conversation. The player wishes to ask characters in the game about things, requesting responses to topics. The character responds with information (or some non-committal shrugs or “I don’t know”, if not).
2) Room descriptions. The player makes queries to the game about the room. We ask the room, “Tell me about yourself,” and it responds. This extends as well to scenery. We often will put scenery objects into a room just to hang descriptions off of. These are not interactive objects as such – they exist in the game framework just so the user can type something like “x carpeting” and get a response. In the mindset of conversations, this could be thought of as asking the room about its carpeting. (In other words, the room is the character we’re conversing with, and the “carpeting” is the topic.)
3) Object descriptions. Similarly, when we “x” an object, we are asking it to describe itself.
4) NPC descriptions. Sort of like object descriptions but with a different feel.
5) Hints. Hints are really a bit of a one-way conversation, a running commentary by the game on what you’re doing. Certainly, there could be a “hint” command that forces a response, but there is also the system whereby hints are offered unbidden at the appropriate times. This, in a real sense, is the game conversing with you based on the state of play.

So you might be wondering why I’m trying to lump everything into conversation. The reason why was mentioned before: because we want all of the above things to have a certain dynamic to them, and that dynamic is common across them. We want them to not be static. We want them to change over time, not only randomly but in response to the game world. In a very real sense, as we are playing a game, we are in a conversation with the game itself, and we want this conversation to be alive and engaging.

Currently in Quest, the above areas are handled separately. Room, character and object descriptions (well, rooms and NPCs are objects, after all) are handled via a “description” attribute, which can be a string or a script. Ask/tell are handled via string dictionaries. And hints are basically “roll your own.” Which means that if you wanted to add dynamic responses for all of them, you’d need to implement separate code in separate places.

I’d like to change that. I know in my games (largely contemplated and only minimally implemented, mainly due to wanting too much), I have made stabs at implementing the above pieces.

One example: I wanted my rooms to have “directional descriptions”. By that, I mean that you would get a different description depending on which way you came into a room (or “area”). As an example, let’s say you’re outside in a clearing and you see a path to the north, so you type “n”. A stock description once you get there would be “You’re on a path. To the north, you can see a pond. A clearing lies to the south.” That bugged me a bit. Of course a clearing lies to the south – I was just there. I have walked north. In my head, I’m facing north.

What I wanted was that if you came north onto the path, it would say something like, “The path continues north. You can see a pond in the distance.” Whereas, if you came onto the path from the north, heading south, it would say, “The path winds further south, leading to a clearing.” Now, we can get into debates about where the full set of exits is listed. My option was to have a ubiquitous “exits” command that would dump the out (e.g. “You can go north and south.”) There are other ways to do it. But the point is: the game responds in context. To me, that is much more immersive than “push level B and get response Q”.

Now even if directional descriptions don’t make sense to you, other things undoubtedly will. For example, if you have a game that spans days within the same environment, you might want the descriptions to change based time of day or what the weather is. A whole host of possibilities come up. The same might be true even of NPC descriptions: in the beginning, the sidekick is immaculately dressed in blue. After the flight from the enemy agents, her top is torn and her hair is askew. The next day, having changed, she’s in black jeans, and she has donned a graying wig and wrinkled skin prosthetics like in “Mission Impossible.”

How would you do all that? Lots of “if” statements? I must confess – I love writing code, but it makes my skin crawl to write lots of special-case code, because it’s incredibly brittle and requires all sorts of mental gymnastics to keep straight. I’d rather write some nice general, data-driven code that seemingly by magic does the right thing.

So now we’re down to the proposal part – I’d like to begin working on a community Quest “conversation” engine. Not a library. Not “ask/tell”. I mean a full-on, topic-based, evolving over time sort of engine that anyone can use and enjoy. To that end, I’d like to get any thoughts that people have on the subject, any design considerations, possibly even any code designs that you’ve tried yourself. For anyone who’s interested, let’s put our heads together (and chickens are welcome as well as pigs – if you don’t know what that means, either skip it or look it up).

There are different approaches to NPC conversation, but I want this engine to be something that can be used universally in a game as well as (on some level) for all those things listed above. Perhaps that’s too big an ask, but I want to set the bar high going in. Then we can see where we get to.

I’m going to follow this post up with some of my own (soon), but I wanted to get this started.

(First of several parts)

What we will call “conversing” (I’m going to stick with that, as perhaps it will someday become more like that) is in reality selecting text to display to the user – choosing something for a character to “say” – either in response to a query (direct trigger) or based on conditions (indirect trigger).The trigger might have a topic or topics associated with it (e.g. a query about a topic), or the trigger might be neutral (e.g. a hint engine that runs on each turn searching for phrases to present, or a particularly chatty NPC who offers up information in his/her own when the time is right).

How do you select what to say? For things like object/room/character descriptions, the standard is either a hard-coded string or some code which typically selects and assembles the output based on whatever specific logic is programmed in. For something like ask/tell, the implementation is a string lookup in a dictionary.

That is the current state-of-the-art in Quest. There’s any number of ways to go as far as how we can implement string selection. I’ll cover a couple of directions I have explored.

Boolean Logic

The most straightforward is to implement some sort of hard “yes or no” selection. That is, we look through the available phrases and see which ones match. Either a phrase matches or it doesn’t. What to do if more than one matches depends on the situation (and so complicates things a little bit). For interactions with an NPC, you’d probably want it to spit out a phrase at a time in response. You could randomly select one, for example. Another possibility is to show all matching phrases.

The latter is what I did in my room description implementation. Each room could have various “topics”. A topic would then have conditions. Standard topics included “base” (the base description), “first” (text to show the first time), “later” (text to show when not the first time), “north”/”south”/”west”, etc (text to show based on arrival direction), etc. Custom topics could also be added, depending on conditions and needs.

A sample room might have text like:

<baseDescription>"This is the second floor landing."</baseDescription>
<westDescription>"Further to the west you can see an archway leading to another hallway. A broad staircase leads up and down."</westDescription>
<eastDescription>"The hallway continues east. A broad staircase leads up and down."</eastDescription>
<upDescription>"The stairs continue up. A hallway heads off to the east, and through an archway to the west, you can see another hallway."</upDescription>
<downDescription>"The stairs continue down. A hallway heads off to the east, and through an archway to the west, you can see another hallway."</downDescription>

The description would be a combination of the base description and any directional ones. So, for example, if you move west to get the landing it will show:

This is the second floor landing. Further to the west you can see an archway leading to another hallway. A broad staircase leads up and down.

which is a combination of the “base” and “west” descriptions. (The fact that strings are surrounded by quotes is due to the fact that I would “eval” the attribute to get its real value. This allowed me to put expressions in the strings.)

Here are some of the conditions (in the base room type) that trigger the various pieces:

<baseCondition>true</baseCondition>
<northCondition>lastdir="northdir"</northCondition>
<northwestCondition>lastdir="northwestdir"</northwestCondition>
<westCondition>lastdir="westdir"</westCondition>
<firstCondition>not GetBoolean(room, "visited")</firstCondition>
<laterCondition>GetBoolean(room, "visited")</laterCondition>

Now, this is all well and good and works fine for rooms. The conditions can vary, and the text shown will vary accordingly. But it is very static. What it’s missing is any sense of history, any inclusion of what has been shown (to some extent, what has been “discussed”). For NPC interactions, I wanted more.

Fuzzy Logic

Moving beyond the yes/no, hot/cold, show/don’t show of binary, Boolean logic, we arrive at another alternative – that of fuzzy logic. I have only toyed with a test implementation of this; with no real world uses, it might end up needing some more design to make it all work properly. I’ll describe what I have considered so far.

The basic idea behind this is the notion of a current “conversation context”. This context holds information about the current set of topics. This set of topics changes over time as different topics are explored.

Each NPC would have its own conversation context. Each possible phrase would also have related topics. Triggering a phrase will update the conversation context, which will then influence subsequent phrase selection. Let’s see how this might work with an example.

Here are the phrases. The first part is the string. After that is a list of topic weights associated with that phrase. Weights range from 0 to 100. (There might be a use for negative weights as well.) I hope this isn’t too contrived…

[id=1] “My father was a farmer.”, father=100, farming=100, family=50, history=80
[id=2] “I grew up on a farm.”, farming=100,history=80,home=100
[id=3] “My brother’s name is John”, brother=100, family=50, john=100

Let’s look at these a bit. For the first one (“My father was a farmer.”), we have the topics “father” and “farming” being a strong match. The topic “family” matches 50%. The weights have two purposes:

1) They help in matching by showing how much a topic matches the phrase.
2) They are used in modifying the conversation context when they are used. (More on this below.)

The idea behind 2) is that our minds are very associative – when we discuss topic “A”, it brings in other related topics (“B”, “C”, and “F”). We want to have a similar behavior in our phrase selection.

Initially, the conversation context is empty. Nothing has been discussed.

Context = {}
Let’s say the player enters “ask about father”. Due to this, the “father” topic gets injected into the current conversation context:

{ father = 100 }

Now, we search. The search is done by generating scores for each phrase via a sort of “dot product”. (If you don’t know what that is, don’t worry.) Basically, we multiply the context with each phrase’s topics and generate a score. In the following, the first number multiplied is the value in the context; the second number is the number in the phrase weights. A missing weight has value 0.

[id=1] score = (father) 100*100 + (farming) 0*100 + (family) 0*50 + (history) 0*80 = 10000
[id=2] score = (father) 100*0 + (farming) 0*100 + (history) 0*80 + (home) 0*100 = 0
[id=3] score = (father) 100*0 + (brother) 0*100 + (family) 0*50 + (John) 0*100 = 0

In this case, phrase 1 is the clear winner. If the topic were “brother”, I hope it’s clear that phrase 3 would be the winner.

Let’s see what happens if we have “ask about family” as the topic. In this case, the context would be {family=100}. Running scores, we get:

[id=1] score = (family) 100*50 + (father) 0*100 + (farming) 0*100 + (history) 0*80 = 5000
[id=2] score = (family) 0*0 + (farming) 0*100 + (history) 0*80 + (home) 0*100 = 0
[id=3] score = (father) 100*50 + (brother) 0*100 + (John) 0*100 = 5000

In this case, both phrases 1 and 3 match “family” about the same. What happens is to be defined. Either it could spit out one phrase (chosen randomly perhaps, or in order), or it could spit the both out (“My father was a farmer. My brother’s name is John.”).

Let’s go back to the “father” case. In that case, we would have a result of phrase 1. So the NPC would say, “My father was a farmer.” Based on standard human conversational associations, we would want to update the conversation context with any related topics brought in due to that phrase. I’m not sure of the ideal algorithm for that. For this case, let’s assume we generate averages. (A better weighting scheme might make more sense.)

Context = {father = 100}
Phrase = {father = 100, farming = 100, family=50, history = 80 }

new “father” = (100 + 100)/2 = 100
new “farming” = (0 + 100)/2 = 50
new “family” = (0 + 50)/2 = 25
new “history = (0 + 80)/2 = 40

New Context = {father = 100, farming = 50, family = 25, history = 40)

This is now the current conversation state. What I had wanted was for NPCs to be able to initiate conversation as well, not just respond to player queries. If the player were idle in this case, the NPC might decide to come back with its own search for something to say. Performing the search again with the current conversation context (let’s assume we won’t repeat a phrase), we get these weights:
[id=2] score = (father) 100*0 + (farming) 50*100 + (family) 25*0 + (history) 80*40 + (home) 0*100 = 8200
[id=3] score = (father) 100*0 + (farming) 50*0 + (brother) 0*100 + (family) 25*50 + (John) 0*100 + (history) 40*0 = 1250

In this case, phrase 2 matches, so the NPC will follow up with: “I grew up on a farm.” After this phrase is uttered, the conversation context will be updated to:

New Context = {father = 50, farming = 75, family = 12.5, history = 60, home = 50)

Note that we didn’t discuss “father” again, and so its weight has decreased. Also note that farming has been emphasized. And we now have a new topic (“home”) which may trigger additional, subsequent phrases.

There are many variants to this, possibilities for manipulating the parameters. Perhaps the new conversation context should not be a simple average but a weighted one. Perhaps the conversation context should “decay” over time, if there is no conversation (that is, if you cease talking, the weights gradually move to 0 and disappear with each command). Perhaps a new topic injected should enter into the context with some lower weight than 100. As far as not repeating phrases, perhaps the “memory” of a phrase being spoken decreases over time (possibly at a rate determined by the age of the NPC), such that it will eventually be repeated. There would also need to be some determination of what to do if either no phrases match (“I don’t know about that”), or all matching phrases have already been spoken (“I don’t have any more to say about that.”)

There is one downside to these weights which needs to be addressed. A queried subject might want to be searched more forcefully than one where the NPC follows up on its own. For example, after “ask about father”, if the player enters “ask about the moon”, it wouldn’t make sense for the NPC to come back with “I grew up on a farm” (which it would if straight weights were used and the priority of the topic wasn’t taken into account). One way to work that out is that a new query from the player generates a new temporary search context with just that topic, with the weights from any chosen phrase adding into the current context afterwards.

Next: Bringing in the world state, the best of both worlds, and some emotions.

(Continued)

(Note – this article is referenced below: http://emshort.wordpress.com/how-to-play/writing-if/my-articles/conversation/)

Assuming what we have been discussing so far actually works, then we now have some sort of scheme for modeling an evolving conversation. It does mean that you as the author have to actually go in and create all the response phrases – and decide all the topic weights. While the thought of having NPCs actually generating utterances on their own (putting words together to form coherent sentences) is a wild pipe dream, that is not what we’re talking about here.

(Aside: In Emily Short’s article listed above and in her other ones, she uses the word “quip” for what a character says. While that is a bit of a fun word, it had a connotation to me of levity which I didn’t think was generally applicable. So, being the programmer that I am, I am going with the more generic “phrase”. Even that might not be entirely accurate, but we need something.)

What we have gone over attempts to address selecting phrases based on topics, but it’s fairly self-contained. The only inputs are topics. We also want to interact with the game world, to somehow have the world state influence phrase selection.

By “world state,” I mean pretty much anything of interest in the game world. It can be things like:
– where the player is
– where they have been
– what they possess
– what they have possessed
– whether a murder (or some other significant event) has occurred
– the time of day
– the presence or absence of other NPCs
– the emotions of the various NPCs
– who has just walked past in the corridor
– choices the player has made in the past
– anything else relevant to the story being told (or the puzzles therein)

You could, in theory, use weights for that, and inject state in as topics. I tried that for my room descriptions at first but quickly abandoned it. There are two problems:
1) You need to create a topic for each piece of world state (“has_stake”, “visited_crypt”, “saw_vampire”, “killed_vampire”, etc or, in my case, “east”, “west”, “north”, etc ), which can quickly become unwieldy.
2) Such state is typically binary – we only want certain phrases to be eligible when certain conditions are true. In order to do binary in a weighted scheme, you have to not only add weights for what you want but also (short of some notation which I tried and gave up on) add negative weights for what you don’t want, to prevent the phrase from being selected when those topics are present. It’s possible but, again, quickly becomes painful.

What I have come to is that we want to somehow utilize both schemes, “binary” and “fuzzy”. The “binary” part would be used for phrase eligibility, based on world state; the “fuzzy” part would be used for selecting from among the eligible phrases, based on topic or other fuzzy considerations. This is the next area of thought for me, so I don’t have any ready answers yet. Perhaps some sort of uniform notation could be adopted. I’m open to that if it would work; in fact, I’ll probably revisit it myself, as I like to unify things.

One piece of “world state” we might want to consider is the emotions of the NPC. For example, if I ask a character about her father, if she loves me, she might respond, “I can’t wait for you to meet Papa!” whereas if she is feeling antagonistic towards me, such queries might be met with, “That’s none of your business!” or something even more venomous. Whether such state is “hard” (binary) or “fuzzy” is really a question of design, depending on how the emotions are modeled. That’s another discussion to be had. It could be implemented as simply (but crudely) as a boolean “inLove”, if that’s all you care about, or as a variable “loveState” ranging from 0 to some upper number, or as a scale ranging from hate to love. You can also have different ranges for different kinds of emotions. The point is: you will need to determine how that is going to be modeled before you can decide how to integrate it with a conversation model. Hopefully, if this all comes together, you’ll have plenty of flexibility to do it the way you want.

(Continued)

A random, somewhat related thought: this one’s about room descriptions.

Typically, room descriptions are centralized. When I say “look”, it goes to the room object (in Quest) and says, “Give me your description”, and that description is either a string or a script. And there is boilerplate code that dumps out the objects in the room, the exits, etc.

Now, let’s say that conditions can change in the room. Maybe a chair is there sometimes and other times not. How do you handle it?

The typical way in Quest is via an “if”. You make your room description a script, you dump out the stock part of the room description, and then you put in an “if”: “if the chair is in the room, also print *this*”. It’s all a bit clunky, and it’s also rather centralized. The room has to know about the possibility of the chair. It’s all very static, very hard-wired, very brittle.

Let’s turn the notion on its head a bit. What if we look at the command “look” as an invitation from the player for description, which goes not just to the room but to all within it. Now, instead of the room having to know that a chair is possibly there in order to add in custom text, we will have the room respond with its description and we’ll let the chair chime in as well with its own description (with such description being the minimal “There is a chair in the corner” sort of thing, with a full description forthcoming if the player directly queries the chair further). And if the chair isn’t present, well, then nothing is there to say it. Imagine the room saying in a booming voice, “You’re in the library” and then you hear a small voice from the corner say, “There’s a plush chair in the corner.” They get written out together in one block of text. Now, instead of a single voice, we have mutiple voices all responding together. Instead of having to put “To the east, there is a foyer”, the east exit will itself add in the text to alert the player to its presence – if it wishes. “Hey look! You can go this way.”

A bit bizarre on first thought, perhaps, but I really like this sort of thing…

(Finally)

I’m in the process of working on a general “response” library, which I’m using in a fledgling game. The game is forcing me to actually use it in a real-world way, and it’s helping to point out where the design has flaws or can be expanded. Lots of uses cases being folded in…

To back up, I had a bit of turn of thought, based on (what else?) terminology. I came to realize that the word “phrase” was not very good, as it just didn’t fit with what it actually was, since a phrase is a piece of a sentence. So I went in search of a better word. I came across “response” and it stuck. But that word opened up new directions in thought. A response need not be verbal. It could be a reply, but it could also be a shrug, a smile, or even a punch in the face.

Explorations in Interactive Fiction

This is my first post before diving deeper into both my thoughts and efforts around interactive fiction. It will provide an overview of where I am and where I’d like to go.

Let me begin by saying that my definition of “interactive fiction” may or may not be the same as anyone else’s. The term has been applied to such disparate concepts as parser-based “text adventures” and hyperlink-based “choose your own adventure” or “gamebook” creations. In fact, if you look at the Wikipedia entry for “Interactive Fiction”, it begins with the definition that IF “use[s] text commands to control characters and influence the environment.” It states later that the term “can also be used to refer to” gamebooks. There has always been a bit of a divide between those two camps.

For my own purposes, I consider “interactive fiction” as hewing closely to the name – fiction that is interactive. That’s it. It can include all of the above and more – where “more” can include new ways to make fiction interactive that might not even have been conceived yet. If it’s not clear yet, I have my own ideas about the types of IF works (“games”? “pieces”?) that I’d like to create.

I won’t define IF for you, but I hope the things explored under this umbrella can provide food for thought, if nothing else. The majority of ideas discussed here will be in relation to my “ResponsIF” system, a response-based interactive fiction engine written in JavaScript, but that doesn’t mean that they can’t be applied to other systems or just considered in general. You don’t have to use ResponsIF or even have a desire to do so in order to take something from all of this.

With that, I’ll end this general overview, and we can get down to business.

(For those interested, the latest version of “ResponsIF” can be found here: https://github.com/jaynabonne/responsif)