Module 07

Improving Bot Performance

Master techniques for monitoring, analyzing, and enhancing your conversational AI's performance through better training data, testing, and feedback loops.

Back to Modules

Learning Objectives

Apply best practices for training data preparation
Implement utterance expansion and synonym recognition
Design and execute A/B testing for conversational interfaces
Create effective feedback loops for continuous improvement
Integrate Lex with Amazon Kendra for enhanced knowledge retrieval

Training Data Best Practices

The quality of your training data directly impacts the performance of your conversational AI. Well-prepared training data leads to better intent recognition, more accurate slot filling, and ultimately a more satisfying user experience.

Data Collection Strategies

Effective training data collection involves gathering diverse, representative examples of how users might express their intents:

User Research: Conduct interviews and surveys to understand how users naturally express their needs
Wizard of Oz Testing: Simulate bot interactions with human operators to gather realistic conversations
Log Analysis: Analyze logs from existing systems or customer service interactions
Competitor Analysis: Study how users interact with similar conversational interfaces
Crowdsourcing: Use platforms like Mechanical Turk to gather diverse expressions

Training Data Collection Methods

Proactive Methods

User interviews and surveys
Wizard of Oz testing
Guided data generation sessions
Crowdsourcing platforms

Reactive Methods

Production system logs
Missed utterance analysis
Customer service transcripts
User feedback collection

Utterance Diversity

Diverse training utterances help your bot understand various ways users might express the same intent. Ensure your training data includes:

Linguistic Variations: Different sentence structures and phrasings
Vocabulary Differences: Various synonyms and terminology
Length Variations: Both short commands and longer, more conversational requests
Question vs. Statement Forms: Both interrogative and declarative forms
Formal vs. Informal Language: Different levels of formality

For example, for a "CheckBalance" intent, include variations like:

"What's my account balance?"
"Show me how much money I have"
"Balance please"
"I need to check my balance"
"Can you tell me my current account balance?"

Handling Regional Variations

If your bot serves users across different regions, consider regional language variations:

Dialect Differences: Include utterances reflecting different regional dialects
Regional Terminology: Account for region-specific terms (e.g., "soda" vs. "pop")
Spelling Variations: Include different spelling conventions (e.g., "color" vs. "colour")
Date and Number Formats: Consider different formats for dates, times, and numbers

Data Cleaning and Preparation

Before using collected data for training, it's important to clean and prepare it:

Remove Duplicates: Eliminate exact duplicate utterances
Fix Errors: Correct obvious spelling and grammatical errors
Normalize Format: Ensure consistent formatting
Remove Personally Identifiable Information (PII): Protect user privacy
Balance Intent Distribution: Ensure adequate examples for each intent

Python Script for Training Data Preparation

# Example Python script for training data preparation
import pandas as pd
import re
import nltk
from nltk.corpus import stopwords
from collections import Counter

# Load raw training data
df = pd.read_csv('raw_training_data.csv')

# Remove duplicates
df = df.drop_duplicates(subset=['utterance'])

# Basic cleaning
def clean_text(text):
    # Convert to lowercase
    text = text.lower()
    # Remove extra whitespace
    text = re.sub(r'\s+', ' ', text).strip()
    # Remove special characters (but keep question marks)
    text = re.sub(r'[^\w\s\?]', '', text)
    return text

df['cleaned_utterance'] = df['utterance'].apply(clean_text)

# Check for and remove PII (simplified example)
pii_patterns = [
    r'\b\d{3}[-.\s]?\d{3}[-.\s]?\d{4}\b',  # Phone numbers
    r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b',  # Emails
    r'\b\d{3}[-]?\d{2}[-]?\d{4}\b'  # SSN
]

def remove_pii(text):
    for pattern in pii_patterns:
        text = re.sub(pattern, '[REDACTED]', text)
    return text

df['cleaned_utterance'] = df['cleaned_utterance'].apply(remove_pii)

# Analyze intent distribution
intent_counts = df['intent'].value_counts()
print("Intent distribution:")
print(intent_counts)

# Identify intents with too few examples
min_examples = 10
low_data_intents = intent_counts[intent_counts < min_examples].index.tolist()
print(f"Intents with fewer than {min_examples} examples: {low_data_intents}")

# Analyze utterance length distribution
df['word_count'] = df['cleaned_utterance'].apply(lambda x: len(x.split()))
print("Utterance length statistics:")
print(df['word_count'].describe())

# Check for common words by intent
nltk.download('stopwords')
stop_words = set(stopwords.words('english'))

def get_common_words(intent_name):
    intent_utterances = df[df['intent'] == intent_name]['cleaned_utterance']
    words = []
    for utterance in intent_utterances:
        words.extend([word for word in utterance.split() if word not in stop_words])
    return Counter(words).most_common(10)

for intent in df['intent'].unique():
    print(f"\nMost common words for intent '{intent}':")
    print(get_common_words(intent))

# Save cleaned data
df.to_csv('cleaned_training_data.csv', index=False)

Utterance Expansion & Synonyms

Even with thorough data collection, it's challenging to anticipate all the ways users might express their intents. Utterance expansion techniques can help broaden your bot's understanding.

Techniques for Utterance Expansion

Several approaches can help you systematically expand your training utterances:

Pattern-Based Generation: Create templates with variable components
Synonym Substitution: Replace key words with synonyms
Word Order Variation: Rearrange sentence elements while preserving meaning
Contraction/Expansion: Add or remove contractions (e.g., "I am" vs. "I'm")
Paraphrasing Tools: Use NLP tools to generate paraphrases

Utterance Expansion Example

Original Utterance

"I want to book a flight to New York"

Pattern-Based

"I want to book a flight to [CITY]"
"I need to book a flight to [CITY]"

Synonym Substitution

"I want to reserve a flight to New York"
"I want to purchase a ticket to New York"

Word Order Variation

"To New York I want to book a flight"
"A flight to New York is what I want to book"

Question Form

"Can I book a flight to New York?"
"How do I book a flight to New York?"

Implementing Synonym Recognition

Synonym recognition helps your bot understand variations in terminology. In Amazon Lex, you can implement this through:

Slot Synonyms: Define synonyms for slot values
Multiple Utterances: Include utterances with different synonyms
Custom Slot Types: Define custom slot types with synonym values

For example, for a "PaymentMethod" slot, you might define synonyms like:

"credit card" → "card", "visa", "mastercard", "amex"
"bank transfer" → "wire transfer", "direct deposit", "ach"
"paypal" → "online payment", "digital wallet"

Custom Slot Type with Synonyms in Lex

{
  "slotTypes": [
    {
      "name": "PaymentMethod",
      "description": "Types of payment methods",
      "valueSelectionStrategy": "TOP_RESOLUTION",
      "slotTypeValues": [
        {
          "sampleValue": {
            "value": "credit card"
          },
          "synonyms": [
            "card",
            "visa",
            "mastercard",
            "amex",
            "credit",
            "plastic"
          ]
        },
        {
          "sampleValue": {
            "value": "bank transfer"
          },
          "synonyms": [
            "wire transfer",
            "direct deposit",
            "ach",
            "wire",
            "bank payment"
          ]
        },
        {
          "sampleValue": {
            "value": "paypal"
          },
          "synonyms": [
            "online payment",
            "digital wallet",
            "electronic payment",
            "online wallet"
          ]
        }
      ]
    }
  ]
}

Using Slot Catalogs Effectively

Slot catalogs in Amazon Lex provide pre-built slot types for common entities. To use them effectively:

Leverage built-in slot types for common entities (dates, numbers, cities, etc.)
Customize built-in slot types with additional values when needed
Use slot resolution strategies appropriate for your use case
Test slot recognition thoroughly with various inputs

Balancing Precision and Recall

When expanding utterances and implementing synonyms, it's important to balance precision (accuracy of intent matching) and recall (ability to recognize all relevant utterances):

Too Few Utterances/Synonyms: Poor recall, many missed intents
Too Many Broad Utterances/Synonyms: Poor precision, intent confusion

Strategies for finding the right balance include:

Start with core, unambiguous utterances
Gradually expand with clear variations
Test regularly to identify confusion between intents
Use confidence scores to identify borderline cases
Implement fallback strategies for low-confidence matches

A/B Testing & Experimentation

A/B testing allows you to compare different versions of your conversational interface to determine which performs better. This data-driven approach is essential for continuous improvement.

Setting up A/B Tests for Conversations

To set up effective A/B tests for conversational interfaces:

Define Clear Hypotheses: Specify what you're testing and why
Create Variants: Develop different versions with specific changes
Implement Traffic Splitting: Randomly assign users to variants
Determine Sample Size: Ensure sufficient data for statistical significance
Set Test Duration: Run tests long enough to gather reliable data

In Amazon Lex, you can implement A/B testing using:

Different bot versions with aliases
Traffic distribution across aliases
Lambda routing logic for more complex scenarios

Lambda Function for A/B Test Routing

// Example Lambda function for A/B test routing
exports.handler = async (event) => {
    // Extract user ID or session ID
    const userId = event.userId || event.sessionId || generateRandomId();
    
    // Determine which variant to use (A or B)
    // Using a hash of the user ID for consistent assignment
    const variant = determineVariant(userId);
    
    // Log the assignment for analytics
    console.log(`User ${userId} assigned to variant ${variant}`);
    
    // Route to the appropriate bot alias based on variant
    if (variant === 'A') {
        // Route to variant A (e.g., original version)
        return routeToBotAlias(event, 'VariantA');
    } else {
        // Route to variant B (e.g., new version)
        return routeToBotAlias(event, 'VariantB');
    }
};

// Function to consistently assign users to variants
function determineVariant(userId) {
    // Simple hash function to convert userId to a number
    let hash = 0;
    for (let i = 0; i < userId.length; i++) {
        hash = ((hash << 5) - hash) + userId.charCodeAt(i);
        hash |= 0; // Convert to 32bit integer
    }
    
    // Use hash to determine variant (50/50 split)
    return (Math.abs(hash) % 2 === 0) ? 'A' : 'B';
}

// Function to route to specific bot alias
function routeToBotAlias(event, alias) {
    // Implementation would depend on your architecture
    // This could involve calling the Lex API with the specified alias
    // or returning information that your client can use to route appropriately
    
    // For this example, we'll just return the alias in the session attributes
    const sessionAttributes = event.sessionAttributes || {};
    sessionAttributes.testVariant = alias;
    
    return {
        sessionAttributes: sessionAttributes,
        // Other response elements would go here
    };
}

Defining Success Metrics

Clear success metrics are essential for evaluating A/B test results. Common metrics for conversational interfaces include:

Task Completion Rate: Percentage of conversations that successfully complete the intended task
Conversation Length: Number of turns required to complete tasks
Error Rate: Frequency of misunderstood inputs or fallbacks
User Satisfaction: Explicit feedback or satisfaction scores
Retention: Rate at which users return to use the bot again
Conversion Rate: Percentage of conversations that lead to desired business outcomes

A/B Testing Scenarios

Prompt Wording

Variant A: "What city would you like to fly to?"

Variant B: "Please tell me your destination city."

Key Metrics: Slot filling success rate, need for reprompts

Conversation Flow

Variant A: Collect all information first, then confirm

Variant B: Confirm each piece of information as it's collected

Key Metrics: Task completion rate, conversation length, user satisfaction

Error Handling

Variant A: Generic error messages

Variant B: Specific error messages with examples

Key Metrics: Recovery rate after errors, abandonment rate

Analyzing Results

When analyzing A/B test results:

Check Statistical Significance: Ensure differences aren't due to random chance
Consider Multiple Metrics: Look at the full picture, not just primary metrics
Segment Results: Analyze performance across different user groups
Look for Unexpected Effects: Check for unintended consequences
Document Learnings: Record insights for future reference

Tools for analyzing results include:

Statistical analysis libraries (e.g., SciPy, StatsModels)
A/B testing platforms (e.g., Optimizely, VWO)
Custom analytics dashboards
Conversation analytics tools

User Feedback Loops

Establishing effective feedback loops is crucial for continuously improving your conversational interface based on real user interactions.

Collecting Explicit Feedback

Explicit feedback involves directly asking users about their experience. Approaches include:

End-of-Conversation Ratings: Simple thumbs up/down or star ratings
Follow-up Questions: "Did I answer your question?" or "Was this helpful?"
Short Surveys: Brief questions about specific aspects of the experience
Feedback Commands: Allow users to provide feedback at any time

Best practices for collecting explicit feedback:

Keep it simple and quick
Ask at appropriate moments (usually after task completion)
Make feedback optional
Thank users for their feedback
Follow up on negative feedback when possible

Feedback Collection Simulator

See different approaches to collecting user feedback

I've booked your flight from New York to London on June 15th. Your confirmation number is ABC123.

Was I able to help you today?

Yes, thank you

Great! I'm glad I could help. Is there anything else you need assistance with today?

No, that's all

Thank you for using our service. On a scale of 1-5, how would you rate your experience today?

Thank you for your feedback! Have a great day.

Analyzing Implicit Feedback

Implicit feedback involves analyzing user behavior without directly asking for feedback. Key indicators include:

Conversation Abandonment: Users leaving conversations before completion
Repeated Attempts: Users trying multiple times to express the same intent
Correction Patterns: Users correcting the bot's understanding
Escalation Requests: Users asking for human assistance
Sentiment Changes: Shifts in user sentiment during conversations

Tools and techniques for analyzing implicit feedback:

Conversation flow analysis
Sentiment analysis
Pattern recognition in conversation logs
User session analysis
Cohort analysis

Acting on Feedback Data

Collecting feedback is only valuable if you act on it. Effective approaches include:

Prioritize Issues: Focus on high-impact, frequently occurring problems
Root Cause Analysis: Identify underlying causes, not just symptoms
Targeted Improvements: Make specific changes to address identified issues
Measure Impact: Track metrics before and after changes
Continuous Cycle: Establish an ongoing process of feedback and improvement

Feedback Loop Cycle

Collect Feedback

Gather explicit and implicit feedback from users

→

Analyze Patterns

Identify trends, issues, and opportunities

→

Prioritize Changes

Focus on high-impact improvements

→

Implement Updates

Make targeted changes to the bot

→

Measure Results

Evaluate the impact of changes

↩

Using Lex with Kendra

Amazon Kendra is an intelligent search service that can significantly enhance your Lex bot's ability to answer questions by providing access to a knowledge base.

Introduction to Amazon Kendra

Amazon Kendra is a machine learning-powered search service that:

Uses natural language processing to understand questions
Indexes and searches across multiple document types and sources
Returns precise answers, not just document links
Learns and improves from user interactions
Supports enterprise-grade security and access controls

Integrating Kendra with Lex allows your bot to:

Answer questions beyond predefined intents
Provide information from documents, FAQs, and knowledge bases
Handle complex, information-seeking queries
Reduce the need for human escalation

Setting up a Kendra Index

To use Kendra with Lex, you first need to set up a Kendra index:

Create an Index: Set up a new Kendra index in the AWS console
Configure Data Sources: Connect to your content repositories (S3, SharePoint, Salesforce, etc.)
Add FAQs: Upload FAQ documents for direct question-answer matching
Set Up Access Control: Configure security settings if needed
Sync Data: Run initial synchronization to index your content

Best practices for Kendra index setup:

Organize content logically by topic or domain
Use metadata to enhance search relevance
Include variations of common questions in FAQs
Set up regular sync schedules to keep content fresh
Monitor index performance and adjust as needed

Connecting Lex and Kendra

There are two main approaches to integrating Lex with Kendra:

AMAZON.KendraSearchIntent: A built-in intent type that automatically queries Kendra
Custom Lambda Integration: More flexible approach using Lambda to query Kendra

Using AMAZON.KendraSearchIntent:

Create a new intent with the AMAZON.KendraSearchIntent type
Configure the Kendra index ID and query text
Set up response templates for different result types
Configure fallback behavior

Custom Lambda for Lex-Kendra Integration

// Example Lambda function for custom Lex-Kendra integration
const AWS = require('aws-sdk');
const kendra = new AWS.Kendra();

exports.handler = async (event) => {
    // Extract session attributes
    const sessionAttributes = event.sessionAttributes || {};
    
    // Get the user's question
    const question = event.inputTranscript;
    
    // Configure Kendra query parameters
    const params = {
        IndexId: process.env.KENDRA_INDEX_ID, // Set in Lambda environment variables
        QueryText: question,
        AttributeFilter: {
            // Optional: Add filters based on document attributes
            // For example, to filter by document type or category
        },
        PageSize: 3 // Number of results to return
    };
    
    try {
        // Query Kendra
        const kendraResponse = await kendra.query(params).promise();
        
        // Process the response
        if (kendraResponse.ResultItems && kendraResponse.ResultItems.length > 0) {
            // Find the best answer
            const answer = findBestAnswer(kendraResponse.ResultItems);
            
            if (answer) {
                // Return the answer to the user
                return {
                    sessionAttributes: sessionAttributes,
                    dialogAction: {
                        type: 'Close',
                        fulfillmentState: 'Fulfilled',
                        message: {
                            contentType: 'PlainText',
                            content: formatKendraResponse(answer)
                        }
                    }
                };
            }
        }
        
        // No good answer found, provide a fallback response
        return {
            sessionAttributes: sessionAttributes,
            dialogAction: {
                type: 'Close',
                fulfillmentState: 'Fulfilled',
                message: {
                    contentType: 'PlainText',
                    content: "I'm sorry, I couldn't find a specific answer to your question. Would you like to try rephrasing or ask something else?"
                }
            }
        };
    } catch (error) {
        console.error('Error querying Kendra:', error);
        
        // Return error response
        return {
            sessionAttributes: sessionAttributes,
            dialogAction: {
                type: 'Close',
                fulfillmentState: 'Fulfilled',
                message: {
                    contentType: 'PlainText',
                    content: "I'm sorry, I encountered an error while searching for information. Please try again later."
                }
            }
        };
    }
};

// Helper function to find the best answer from Kendra results
function findBestAnswer(resultItems) {
    // First, check for ANSWER type results
    const answers = resultItems.filter(item => item.Type === 'ANSWER');
    if (answers.length > 0) {
        return answers[0]; // Return the top answer
    }
    
    // Next, check for QUESTION_ANSWER type results
    const qaResults = resultItems.filter(item => item.Type === 'QUESTION_ANSWER');
    if (qaResults.length > 0) {
        return qaResults[0]; // Return the top Q&A result
    }
    
    // Finally, check for DOCUMENT type results
    const documents = resultItems.filter(item => item.Type === 'DOCUMENT');
    if (documents.length > 0) {
        return documents[0]; // Return the top document result
    }
    
    return null; // No suitable results found
}

// Helper function to format Kendra response for user
function formatKendraResponse(result) {
    let response = '';
    
    switch (result.Type) {
        case 'ANSWER':
            response = result.DocumentExcerpt.Text;
            break;
        case 'QUESTION_ANSWER':
            response = result.DocumentExcerpt.Text;
            break;
        case 'DOCUMENT':
            response = `I found this information that might help: ${result.DocumentExcerpt.Text}`;
            break;
    }
    
    // Add source attribution if available
    if (result.DocumentTitle && result.DocumentTitle.Text) {
        response += `\n\nSource: ${result.DocumentTitle.Text}`;
    }
    
    return response;
}

Optimizing Search Results

To improve the quality of Kendra search results in your Lex bot:

Use Attribute Filters: Narrow search scope based on metadata
Implement Query Preprocessing: Clean and enhance user queries before sending to Kendra
Result Ranking: Develop custom logic to rank and select the best results
Response Formatting: Present information in a conversational, digestible format
Feedback Collection: Gather user feedback on search results to improve over time

Kendra Result Types

ANSWER

Direct answers extracted from documents, highest confidence

Query: "What is the return policy?"
Result: "Our standard return policy allows returns within 30 days of purchase with original receipt."

QUESTION_ANSWER

Matches from FAQ documents, high confidence for exact question matches

Query: "How do I reset my password?"
Result: "To reset your password, click on the 'Forgot Password' link on the login page and follow the instructions sent to your email."

DOCUMENT

Relevant document excerpts, useful when direct answers aren't available

Query: "Cloud migration best practices"
Result: "From 'Cloud Migration Guide': Begin with an assessment of your current infrastructure. Identify applications that are good candidates for early migration..."

Performance Measurement

Comprehensive performance measurement is essential for understanding how well your conversational interface is serving users and identifying areas for improvement.

Defining KPIs for Conversational Interfaces

Key Performance Indicators (KPIs) for conversational interfaces typically fall into several categories:

Technical Performance: System uptime, response time, error rates
Conversation Quality: Intent recognition accuracy, slot filling success, context maintenance
User Experience: Task completion rate, conversation length, user satisfaction
Business Impact: Conversion rates, cost savings, ROI

Specific KPIs might include:

Intent recognition rate
Slot filling accuracy
Task completion rate
Average turns per conversation
Fallback/escalation rate
User satisfaction score
Retention rate
Cost per conversation

Measuring User Satisfaction

User satisfaction can be measured through:

Explicit Ratings: Direct feedback from users
Conversation Completion: Whether users complete their intended tasks
Return Rate: How often users come back to use the bot
Sentiment Analysis: Analyzing the emotional tone of user messages
Escalation Rate: How often users ask for human assistance

Techniques for measuring satisfaction include:

Post-conversation surveys
In-conversation feedback requests
User behavior analysis
Sentiment analysis of conversations
Focus groups and user interviews

Creating Performance Dashboards

Performance dashboards provide at-a-glance visibility into your bot's performance. Effective dashboards typically include:

High-Level KPI Summary: Key metrics at a glance
Trend Analysis: Performance over time
Intent and Slot Performance: Recognition rates and common issues
Conversation Flow Visualization: Common paths and drop-off points
User Feedback Summary: Aggregated user ratings and comments

Tools for creating dashboards include:

Amazon CloudWatch Dashboards
Amazon QuickSight
Tableau, Power BI, or other BI tools
Custom web dashboards

Knowledge Check: Module 7

Question 1 of X

Loading question...

All Modules

Previous Module Next Module