AWS Alexa
Alexa is Amazon’s cloud-based voice service and the brain behind tens of millions of devices including the Echo family of devices, FireTV, Fire Tablet, and third-party devices with Alexa built-in.
Alexa Design Patterns
- Adaptability – understands and processes what user says. Let users speak in their own words.
- Personalization – remembers user interaction. Individualize entire interaction.
- Availability – keeping all options open. Collapse your menus; make all options top-level.
- Relatability – Having a conversation with actual person. Talk with them, not at them
Alexa Interaction Model
Wakeword
- An Echo device is always listening but in a dormant state. It wakes up when it hears a phrase or specific work called the wakeword.
- Amazon offers a choice of wakewords like ‘Alexa’, ‘Amazon’, ‘Echo’, or ‘Computer’, with the default being ‘Alexa’.
- These words are reserved and cannot be changed beyond these four options by users or by developers.
Note; a ‘wakeword’ wakes the assistant, but does not trigger your specific skill, that would be an invocation (we’ll get to this later).
Skills
- A skill is ‘an app for Alexa’, however they are not downloadable but just need to be enabled.
- A skill can be enabled, either within the Alexa App or by asking Alexa to enable it
- Alexa supports three types of skill:
- Custom Skills – most common type of skill, and gives the most control over the user experience. This type of skill lets you develop just about anything you can imagine.
- Smart Home Skills – specifically for controlling smart home appliances. Provides less control over the user experience, but is simpler to develop.
- Flash Briefing Skills – specifically for compatibility with Alexa’s native ‘Flash Briefing’ ability. This type of skill also gives you reduced experience control, but again is simpler to develop.
Invocation Name
- An ‘invocation name’ is the word or phrase used to trigger the skill.
- Invocation name is only required for custom skill.
- Invocation name cannot be changed after the skill goes live
- Invocation name Requirements
- must not infringe upon the intellectual property rights of an entity or person
- must be compound of two or more works. One-word invocation names are not allowed, unless its unique to the brand/intellectual property.
- must not include names of people or places
- if two-word invocation names, one of the words cannot be a definite article (“the”), indefinite article (“a”, “an”) or preposition (“for”, “to”, “of,” “about,” “up,” “by,” “at,” “off,” “with”).
- must not contain any of the Alexa skill launch phrases and connecting words. Launch phrases include “run,” “start,” “play,” “resume,” “use,” “launch,” “ask,” “open,” “tell,” “load,” “begin,” and “enable.” Connecting words include “to,” “from,” “in,” “using,” “with,” “about,” “for,” “that,” “by,” “if,” “and,” “whether.”
- must not contain the wake words “Alexa,” “Amazon,” “Echo,” or the words “skill” or “app”.
- must contain only lower-case alphabetic characters, spaces between words, and possessive apostrophes
- must spell characters like numbers for e.g., twenty one
- can have periods in the invocation names containing acronyms or abbreviations that are pronounced as a series of individual letters, for e.g. NASA as n. a. s. a.
- cannot spell out phonemes for e.g., a skill titled “AWS Facts” would need “AWS” represented as “a. w. s. ” and NOT “ay double u ess.”
- must not create confusion with existing Alexa features.
- must be written in each supported language
- should be distinctive to ensure users can enable the skill. Invocation names that are too generic may be rejected during the skill certification process, or result in lower discoverability.
Intent
- An intent is what a user is trying to accomplish.
- defines an action that fulfills the user’s request
- Intent name requirements
- can only contain case-insensitive alphabetical characters and underscores
- cannot include numbers
- cannot include special characters
- cannot include spaces
Utterance
- Utterances are the specific phrases that people will use when making a request to Alexa.
Slot
- A slot is a variable that relates to an intent allowing Alexa to understand information about the request. for e.g., country to travel, date to travel from, city to travel to etc.
- Slot can be either be a Amazon predefined slot such as dates, numbers, durations, time, etc. or a custom one specific to the skill.
- Custom values can be added to a subset of the built-in list slot types. Extending a built-in slot type only applies to the specific skill and those changes do not apply to any other skills
Alexa Skill Architecture
Alexa Voice Service (AVS)
- Cloud-based service that allows device makers to integrate an ever-increasing set of Alexa features and functions into a connected product
- AVS maps the user request to a skill and sends the skill the request in a structured format.
Alexa Skill Kit (ASK)
- Alexa Skills Kit lets you teach Alexa new skills
- provides APIs, tools, documentation and code samples
Progressive Responses
- progressive responses allows you to keep the user engaged while the skill prepares a full response to the user’s request.
- Progressive responses can also reduce the user’s perception of latency in the skill’s response.
- A progressive response is interstitial SSML content (including text-to-speech and short audio) that Alexa plays while waiting for the full skill response
- Progressive response can be used to
- Send text-to-speech confirmations that your skill has received the request and is processing an answer.
- Play short soundmarks associated with your skill.
- Provide other engaging content to the users while waiting on the full response.
Speech Synthesis Markup Language (SSML)
- Alexa Skills Kit supports Speech Synthesis Markup Language (SSML) to control how Alexa interprets the speech from the text in the response
- SSML is a markup language that provides a standard way to mark up text for the generation of synthetic speech.
- Alexa Skills Kit supports a subset of the tags defined in the SSML specification
amazon:effect
only supports whisperedaudio
allows playing MP3 player while rendering a response- must be hosted at an Internet-accessible HTTPS endpoint. Self-signed certificates cannot be used.
- must not contain any customer-specific or other sensitive information
- must be a valid MP3 file (MPEG version 2).
- cannot be longer than 240 seconds.
- bit rate must be 48 kbps.
- sample rate must be 22050Hz, 24000Hz, or 16000Hz.
phoneme
provides a phonemic/phonetic pronunciation for the contained textprosody
modifies the volume, pitch, and rate of the tagged speech.say-as
describes how the text should be interpreted using with the interpret-as attribute. for e.g. date, time, telephone, digits etc.
Personalization
- Skill personalization enables skill to differentiate an individual user who has a voice profile.
- Skill personalization can be provided using
userId
orpersonId
- With Alexa Settings APIs allow developers to retrieve customer preferences for the settings like time zone, distance measuring unit, and temperature measurement unit
- With Device services, a skill can request the customer’s permission to their address information, which is a static data filled by customer and includes the country/region, postal code and full address
- With Customer Profile services, a skill can request the customer’s permission to their contact information, which includes name, email address and phone number
- With Location services, a skill can ask a user’s permission to obtain the real-time location of their Alexa-enabled device, specifically at the time of the user’s request to Alexa, so that the skill can provide enhanced services.
- Requirements
- must include a link to the Privacy Policy that applies to the skill,
- skill is child-directed, cannot use personalization.
- skill uses information protected by HIPAA (Health Insurance Portability and Accountability Act), cannot use personalization.
- Do not use personalization to handle sensitive user information.
- Personalization is not authentication.
Service Endpoint
- Requests can be processed using AWS Lambda or any Webservice hosted on cloud or on premises.
- Requirements for custom service endpoints
- must be accessible over the internet.
- must accept HTTP requests on port 443.
- must support HTTP over SSL/TLS, using an Amazon-trusted certificate.
- must verify that incoming requests come from Alexa.
- must adhere to the Alexa Skills Kit interface.
AWS Other components
- Service endpoints can be implemented using AWS Lambda
- Lambda has a default 3 seconds timeout and a max of 15 mins
- Lambda has a default memory 128 mb
- Lambda has a concurrency soft limit of 1000 and can be increased by raising a AWS support ticket.
- CloudWatch can be used for monitoring and logs
- Lambda logs are stored in CloudWatch
- DynamoDB can be used for state persistence
Alexa Skill Lifecycle
Development
- ASK Command Line Interface (CLI)
- provides command line interface to test the skill with ASK CLI commands such as invoke-skill and simulate-skill.
- Skill Management API – SMAPI
- provides a restful HTTP endpoints for testing
- helps manage skills programmatically
- AWS Developer Console
- provides a user interface for skill development
- Access can be shared across multiple users for collaboration
- First user associated with an Alexa developer account is considered the owner and will retain full rights to administer the developer account.
- Additional users can be invited to have access to the developer account and will have the rights associated with the role(s) assigned to the user.
- All roles will grant users the full access to create, modify and delete Alexa skills using the developer console.
- Administrator: This role grants complete access to all sections of the developer account, including reporting and payment information. Most importantly, any account administrator has the ability to manage user permissions, including inviting or removing users from the account.
- Developer: Outside of an Administrator, this is the only role that gives users the ability submit and adjust application files.
- Marketer: Outside of an Administrator, this is the only role that gives users the ability to edit the content associated with apps (i.e. Descriptions, Images & Multimedia) and IAPs. Like the Analyst, this role also gives access to sales reports.
- Analyst: Outside of an Administrator, this is the only role that gives users the ability to view earnings reports. Like the Marketer, this role also gives users access to sales reports.
Build
- Use Build to set up the skill, configure the interaction model, and specify the endpoints for your service.
Test
- Use Test to test the skill with either text or voice.
- Utterance profiles
- Use utterance profiles to test the custom interaction model.
- enter utterances to see how they resolve to the intents and slots before you write the code for your service.
- Alexa Skill simulator
- Use the simulator provided on the Test page in the developer console.
- provides an ability to Interact with Alexa with either your voice or text, without an actual device.
- maintains the skill session with the skill just as a device would, so the interaction model and dialog flow can be tested.
- can sends any cards that the skill returns to the Alexa app the same way a device would.
- supports multiple languages testing by selecting the wanted language to test from the drop-down list.
- Manual Json
- enter a JSON request directly and see the skill returned JSON response
- does not maintain the skill session and is similar to testing a JSON request in the Lambda console.
- Voice & Tone
- enter plain text or SSML and hear how Alexa speaks the text in a selected language
- Alexa device
- Test with an Alexa-enabled device.
- Alexa app
- Test the skill with the Alexa app for Android/iOS
Distribution
- Use Distribution to preview how the skill will appear in the skill store.
- Distribution allows user to provide more information about the skill before is it published which includes
- information about the skill, description, icons, keywords, categories etc.
- privacy policy URL if skill requires account linking or collects user information
- skill availability where its public, for business organizations or beta test
- country, region and locale information
- Skill beta testing tool
- used to test the Alexa skill in beta before releasing it to production
- test changes to an existing skill, while still keeping the currently live version of the skill available for the general public.
- members can be invited using their Alexa email address. Alexa device used by the beta tester must be associated with the email address in the tester’s invitation.
- can help increase your chances of skill success.
Certification & Publish
- Use Certification to validate the skill, run pre-certification tests, and then submit the skill for certification.
- Alexa skill must pass the certification process, when submitted to the Alexa skill store, before it’s published live to Amazon customers.
- Before submitting the new skill for certification, proper quality assurance testing and if required beta testing must be done to ensure customers have a good experience.
- Skill can be validated and functional tests, set of pre-certification tests on the skill, can be executed on the skill which provide immediate feedback for common certification failures
- Certification may fail for reasons like
- Child directed skills cannot sell any products and cannot collect any personal information
- Cannot collect any information related to health
Status | Description | Stage |
---|---|---|
In Development | The skill is available to you and any potential beta testers that you have added to skill beta testing. If you have enabled your skill for testing, a user can invoke your skill on any devices registered to your developer account, or on any devices registered to your beta testers’ accounts. | development |
In Review | A certification review is in progress. During this time, you cannot edit the skill configuration. | development |
Certified | The skill has passed certification review, and is not yet available to users. To make the skill available to users, publish the skill. If you have not published the skill and want to start a new certification review, you must first withdraw the certified version of the skill. | certified |
Live | The skill has been published and is available to users. You cannot edit the configuration for live skills. To start development on an updated version, make your changes on the development version instead. | live |
Hidden | The skill was previously published, but has since been hidden. Users who enabled the skill before it was hidden can continue to use the skill. The skill is no longer available when users search or browse the Alexa Skills Store. | live |
Removed | The skill was previously published, but has since been removed. Users cannot enable or use the skill. | live |
Analytics
- Use Analytics to review metrics for the skill such as number utterances, customers, and intents invoked.
- Intent History – View aggregated, anonymized frequent utterances and the resolved intents.
- Available Skill Metrics
- Actions – Unique customers per action, total actions, and total utterances per action.
- Customers – Total number of unique customers who accessed the skill.
- Intents – Unique customers per intent, total utterances per intent, total intents, and failed intents.
- Interaction Path – Paths users take when interacting with the skill.
- Plays Total number of times that a user played the skill content.
- Retention (live skills only) Usage of the skill over time by groups of customers or cohorts. View the number or percentage of customers who returned to your skill over a 12-week period.
- Sessions Total sessions, successful session types (sessions that didn’t end due to an error), average sessions per customer. Includes a breakdown of successful, failed, and no-response sessions as a percentage of total sessions. Custom
- Utterances Metrics for utterances depend on the skill category.
Edit and Recertify
- Once a skill is published to users, it is considered live.
- A development version is automatically created as a copy of the live and has the same information as the original live version
- Live skill configuration cannot be edited
- For updates it is recommended to update the development version, test, apply for re-certification and publish it.
- Once the new version is published, it becomes live and replaces the previous live version.
Alexa Account Linking
- Account linking enables the skill to connect the skill user’s Amazon identity with an identity in a different system for e.g. uber, twitter etc.
- Alexa Skills Kit uses the OAuth 2.0 authentication framework for Account linking, which defines a means by which the service can allow Alexa, with the user’s permission, to access information from the account that the user has set up with you.
- “Link accounts” means “to get the user’s permission to obtain an access token” so that the skill can use the access token in API calls to the server that contains the user data
- Grants are ways for a client application (in this case, an Alexa skill) to authorize the user and obtain an access token that it can use to authenticate a request to the resource server.
- OAuth 2.0 defines a number of grant types.
- Authorization code grant type
- works on 2 step process
- by getting an authorization code from the authorization server,
- and exchanges it for an access token, and then passes the access token in requests to your skill.
- is recommended grant type for security and usability reasons.
- works on 2 step process
- Implicit grant type
- authorization server returns the access token once the user logs in
- is less secure
- Only custom skills can use the implicit grant type.
- Authorization code grant is applicable for the vast majority of cases and the implicit grant is for limited use.
- Authorization code grant type
- Account linking is not supported for all skills types for e.g. flash briefing
Alexa In-Skill Purchasing
- In-skill purchasing enables selling premium content such as game features and interactive stories in skills with a custom interaction model.
- Customers pay for products using the payment options associated with their Amazon account.
- Alexa In-Skill purchasing is handled by Alexa and the skill session ends when the purchase flow starts
- Amazon handles the voice interaction model and all the mechanics of the purchase, as well as obtaining the product description and list price from the product’s schema. The message and price are automatically adjusted for Prime customers.
- When the purchase completes the skill will be re-launched, and a purchase result is supplied to the skill.
- Because the skill session ends when a
Upsell
directive is sent, be sure to save any relevant user data in a persistent data store so that the skill can continue where the user left off after the purchase flow is completed and the endpoint is back in control of the user experience. - Skill can handle the Connections.Response request that indicates the result of a purchase flow and resume the skill
Alexa In-built Intents
- Standard built-in intents cannot include any slots. If slots are needed, create a custom intent and write your own sample utterances.
- AWS recommends extending built-in intents with additional skill-specific utterances as they provide better coverage than the sample utterances written manually.
Alexa.CancelIntent
- this should just exit the skill.
- can be map it to return to the skill if need be instead of exiting
Alexa.StopIntent
- must be implemented by the skill
shouldEndSession
must betrue
ornull
in the response
Alexa.HelpIntent
- provides help about how to use the skill.
- can be extended by adding custom sample utterances
Alexa.FallbackIntent
–- provides a fallback for user utterances that do not match any of the skill’s defined intents
- is considered when the user’s spoken input cannot be matched with confidence to any of the other intents in the skill
- is designed as an out-of-domain model that can pick up user input that does not fit into your skill’s intended design.
- can help the skill handle many utterances that may not confidently map to the skill’s defined sample utterances and intents.
- is not normally triggered if the dialog is delegated to Alexa
Alexa.PauseIntent
andAlexa.ResumeIntent
- must be implemented, if the skill streams audio using the
AudioPlayer
interface.
- must be implemented, if the skill streams audio using the
Alexa Card Types
- A
Simple
card displays plain text and can be provided with a text for the card title and content. - A
Standard
card also displays plain text, but can include an image and can be provided with a text for the title and content, and the URL for the image to display. - A
LinkAccount
card is a special card type only used with account linking. This card lets users start the account linking process. - An
AskForPermissionsConsent
card is sent to the Alexa app when a skill requires the customer to grant specific permissions.
Alexa AudioPlayer Interface
- Requires AMAZON.PauseIntent and AMAZON.ResumeIntent to be implemented
- PlaybackController events to track AudioPlayer status changes initiated from the device buttons
Alexa Dialog Management
- Alexa Dialog management model identifies the prompts and utterances to collect, validate, and confirm the slot values and intents.
- When delegated the dialog to Alexa, Alexa determines the next step in the conversation and uses prompts to ask the user for the information.
- Two ways to delegate the dialog
- Enable auto delegation, either for the entire skill or for specific intents.
- Alexa completes all of the dialog steps based on the dialog model.
- Alexa sends the skill a single
IntentRequest
when the dialog is complete.
- Delegate manually with the
Dialog.Delegate
directive.- Alexa sends the skill an
IntentRequest
for each turn of the conversation - Skill returns the
Dialog.Delegate
directive for incomplete dialog, indicating Alexa to check the dialog model for the next step and use a prompt to ask the user for more information as needed. - Once all the steps are complete, the skill receives the final
IntentRequest
withdialogState
set toCOMPLETED
. - provides flexibility as the skill can make run-time decisions such as defaulting values.
- can be used in combination with other
Dialog
directives to take complete control over the dialog
- Alexa sends the skill an
- Dialog management requires
shouldEndSession
to be set to false
- Enable auto delegation, either for the entire skill or for specific intents.
Alexa Request and Response
- Request include the
session
(optional),context
, andrequest
objects at the top level.-
session
object provides additional context associated with the request.session attributes
can be used to store datauser
containinguserId
to uniquely define an user andaccessToken
to access other services.
-
context
object provides the skill with information about the current state of the Alexa service and device at the time the request is sent to the service.system
object providesapiAccessToken
anddevice
object providesdeviceId
to access ASK APIsapplication
provideapplicationId
device
object providessupportedInterfaces
to list each interface that the device supportsuser
containinguserId
to uniquely define an user andaccessToken
to access other services.
- A
request
object that provides the details of the user’s request.
-
- Response includes
outputSpeech
contains the speech to render to the user.reprompt
contains the outputSpeech to use if a re-prompt is necessary.shouldEndSession
provides a boolean value that indicates what should happen after Alexa speaks the response.- true – the session ends.
- false – Alexa opens the microphone for a few seconds to listen for the user’s response and reprompt, if included, to give the user a second chance to respond.
Alexa Best Practices
- Alexa Skill state persistence can be handled using session attributes during the session and externally using services like DynamoDB, S3 (for hosted skills), and RDS across sessions.
- Verify that incoming requests come from Alexa using Skill ID verification to ensure request came from the intended skill.
- prevents a malicious developer from configuring a skill with your endpoint and then using that skill to send requests to your service.
- To do this validation, every request sent by Alexa includes a unique skill ID. Skill ID in the request can be checked against the actual skill ID to ensure that the request was intended for your service.