Using State Machines in Alexa Skills

Using the Alexa Skills SDK you have the ability to save state for a particular session. A session is one continuous use of the skill by a user. If you're familiar with web development then you've probably used sessions before. Here sessions are more about saving state information in a back-and-forth conversation with a user.

For example a user might ask for meal recommendations that are spicy. You might the follow-up asking the user if they like pork. In this scenario you want to remember that spicy is something the user cares about. If you have follow-up questions you might also need to save that pork was asked about.

Keep in mind that users might change any of the above arguments (called slots in Alexa parlance) so you have to be sure to check for that each time your skill code runs.

The Imperative Way

const MealRecommendationIntent = {
  handler(handlerInput) {
    let am = handlerInput.attributesManager;
      
    let sessionAttributes = am.getSessionAttributes();
      
    let { temperature, meat } = sessionAttributes;
      
    if(temperature === 'spicy' && !meat) {
      am.setSessionAttributes(sessionAttributes);
      return recommendPork();   
    } else if(temperature === 'mild' && !meat) {
      am.setSessionAttributes(sessionAttributes);
      return recommendChicken();
    } else if(temperature === 'spicy' && meat) {
      return makeRecommendation(temperature, meat)
    } else {
      am.setSessionAttributes(sessionAttributes);
      return askIfTheyLikeSpicy();  
    }
  }
};

The following is pseudo-code to remove a lot of the ceremony involved with building Alexa skills, but it goes roughly like this:

This is minimized and doesn't handle all of the combinations of things that can happen. It also doesn't account for users changing their mind on a previous answer. Just imagine how complex this code will be if we add a few more slots.

Solving with State Machine

Given the scenario we can see that there's a lot of state to manage and different pathways that the logic can take. Of course a Finite State Machine is perfect for such a scenario.

The following is roughly the state flow for this skill: A state machine showing the flow of this meal recommendation skill This is complex! Even with so few possible states and pieces of data. The state machine provides a mechanism for modeling the above flow in a more organized fashion.

Building the State Machine

Here I'm going to use Robot as my state machine library. Robot allows you to model the skill's interactions so that any time an intent is called you go directly into the state machine and let its logic handle everything else.

Let's start off by adding our states and basic transitions. The above diagram shows most of that.

import {
  state,
  createMachine,
  immediate
} from 'robot3';

const recommendations = createMachine({
  idle: state(immediate('validate')),
  validate: state(),
  missingTemp: state(),
  missingMeat: state(),
  notReady: state(),
  ready: state()
}, ctx => ctx);

Now within the intent let's pass in the existing session variables and arguments as the initial context. This is again pseudo-code but it will look like this:

const MealRecommendationIntent = {
  handler(handlerInput) {
    let am = handlerInput.attributesManager;
      let env = handlerInput.requestEnvelope;
      
    let session = am.getSessionAttributes();
    let args = {
      temp: Alexa.getSlotValue(env, 'temp'),
      meat: Alexa.getSlotValue(env, 'meat')
    };
      
    let context = {
      ...session,
      ...args,
      response: handlerInput.responseBuilder,
      attrs: am
    };
      
    interpret(recommendations, () => {}, context);
      
    return handlerInput.responseBuilder.getResponse();
  }
};

Ok, if it's not clear the above:

Creates a context object to hold values that the machine will need to read / modify. session are values from previous runs. args are new slot values. response is the response builder, which we need to speak back to the user. And attrs is the session attribute manager.
Uses Robot's interpret function to start the state machine ( recommendations ) and passes the context in.
Doesn't provide a callback for changes to the state, since we are going to handle everything internal to the machine.

Ok, back to work on our state machine we need to do the following things still:

Move to the missingTemp or missingMeat states if those values are not provided.
If we need to request temp or meat, assign the session values back to the Alexa session.
Add actions to speak back to the user for all of our scenarios.

Validating state

To validate state we'll use 2 of Robot's guard function. Guard is used to only transition if a condition is true. This is a simple addition:

import {
  state,
  createMachine,
  interpret,
  immediate,
  guard,
  reduce
} from 'robot3';

const recommendations = createMachine({
  idle: state(immediate('validate')),
  validate: state(
    immediate('missingTemp',
      guard(ctx => !ctx.temp)
    ),
    immediate('missingMeat',
      guard(ctx => !ctx.meat)
    ),
    immediate('ready')
  ),
  missingTemp: state(),
  missingMeat: state(),
  notReady: state(),
  ready: state()
}, ctx => ctx);

Transitions (of which immediate is a type) are evaluated in top-down order so:

If there is no temp move to the missingTemp state.
If there is no meat move to the missingMeat state.
Otherwise, move to the ready state.

Assign to the session

If we reach either the missingTemp or missingMeat states we are going to have to request more info from the user, but we want to remember any values we currently have.

handlerInput.attributesManager.setSessionAttributes is how you do that. Using an action in our machine is how we are able to call this side-effectual method. Below we add this for each of the missing states:

import {
  action,
  state,
  createMachine,
  interpret,
  immediate,
  guard,
  reduce
} from 'robot3';

const assignSession = action(ctx => {
  ctx.attrs.setSessionAttributes({
    temp: ctx.temp,
    meat: ctx.meat
  });
});

const recommendations = createMachine({
  idle: state(immediate('validate')),
  validate: state(
    immediate('missingTemp',
      guard(ctx => !ctx.temp)
    ),
    immediate('missingMeat',
      guard(ctx => !ctx.meat)
    ),
    immediate('ready')
  ),
  missingTemp: state(
    immediate('notReady', assignSession)
  ),
  missingMeat: state(
    immediate('notReady', assignSession)  
  ),
  notReady: state()
  ready: state()
}, ctx => ctx);

We created an action that is reusable 💪

Respond to the user

Lastly we need to respond back to the user for each of our scenarios, if the temp or meat are missing, and if we are ready to make our recommendation.

For the first two we add more actions that call a method on the response builder.

import {
  action,
  state,
  createMachine,
  interpret,
  immediate,
  guard,
  reduce
} from 'robot3';

const assignSession = action(ctx => {
  ctx.attrs.setSessionAttributes({
    temp: ctx.temp,
    meat: ctx.meat
  });
});

const recommendations = createMachine({
  idle: state(immediate('validate')),
  validate: state(
    immediate('missingTemp',
      guard(ctx => !ctx.temp)
    ),
    immediate('missingMeat',
      guard(ctx => !ctx.meat)
    ),
    immediate('ready')
  ),
  missingTemp: state(
    immediate('notReady', assignSession,
      action(ctx => ctx.response.speak('Do you like spicy or mild food?'))       
    )
  ),
  missingMeat: state(
    immediate('notReady', assignSession,
      action(ctx => ctx.response.speak('Do you like pork or chicken?'))
  ),
  notReady: state()
  ready: state()
}, ctx => ctx);

That's simple!

For the last part we probably need to query our API to find the right food recommendation given our parameters. I'll spare writing this function as it will vary depending on where you are getting this information. But we will use invoke in order to call our external service and then finish in the complete state.

import {
  action,
  state,
  createMachine,
  interpret,
  immediate,
  invoke,
  guard,
  reduce,
  transition,
  state as final,
} from 'robot3';

const assignSession = action(ctx => {
  ctx.attrs.setSessionAttributes({
    temp: ctx.temp,
    meat: ctx.meat
  });
});

const recommendations = createMachine({
  idle: state(immediate('validate')),
  validate: state(
    immediate('missingTemp',
      guard(ctx => !ctx.temp)
    ),
    immediate('missingMeat',
      guard(ctx => !ctx.meat)
    ),
    immediate('ready')
  ),
  missingTemp: state(
    immediate('notReady', assignSession,
      action(ctx => ctx.response.speak('Do you like spicy or mild food?'))       
    )
  ),
  missingMeat: state(
    immediate('notReady', assignSession,
      action(ctx => ctx.response.speak('Do you like pork or chicken?'))
  ),
  notReady: final()
  ready: invoke(callRecommendationAPI,
    transition('done', 'complete'),
    transition('error', 'error')
  ),
  complete: final(),
  error: final()
}, ctx => ctx);

And that's it! We invoke our callRecommendationAPI method. Using transition we go to the complete state when the promise resolves. If there's an error we go to the error state instead.

Writing error conditions is no fun so I'll spare blogging about it here.

Conclusion

And that's it! See that with a relatively small finite state machine we are able to cover every scenario the user throws at us. And we are able to scale to more arguments very easily.

If you're building an Alexa skill and are thinking about using state machines, hit me up on Twitter, I'd learn to hear about what you're doing.