Module 07

Improving Bot Performance

Master techniques for monitoring, analyzing, and enhancing your conversational AI's performance through better training data, testing, and feedback loops.

Learning Objectives

  • Apply best practices for training data preparation
  • Implement utterance expansion and synonym recognition
  • Design and execute A/B testing for conversational interfaces
  • Create effective feedback loops for continuous improvement
  • Integrate Lex with Amazon Kendra for enhanced knowledge retrieval

Training Data Best Practices

The quality of your training data directly impacts the performance of your conversational AI. Well-prepared training data leads to better intent recognition, more accurate slot filling, and ultimately a more satisfying user experience.

Data Collection Strategies

Effective training data collection involves gathering diverse, representative examples of how users might express their intents:

  • User Research: Conduct interviews and surveys to understand how users naturally express their needs
  • Wizard of Oz Testing: Simulate bot interactions with human operators to gather realistic conversations
  • Log Analysis: Analyze logs from existing systems or customer service interactions
  • Competitor Analysis: Study how users interact with similar conversational interfaces
  • Crowdsourcing: Use platforms like Mechanical Turk to gather diverse expressions
Interactive

Training Data Collection Methods

Proactive Methods

  • User interviews and surveys
  • Wizard of Oz testing
  • Guided data generation sessions
  • Crowdsourcing platforms

Reactive Methods

  • Production system logs
  • Missed utterance analysis
  • Customer service transcripts
  • User feedback collection

Utterance Diversity

Diverse training utterances help your bot understand various ways users might express the same intent. Ensure your training data includes:

  • Linguistic Variations: Different sentence structures and phrasings
  • Vocabulary Differences: Various synonyms and terminology
  • Length Variations: Both short commands and longer, more conversational requests
  • Question vs. Statement Forms: Both interrogative and declarative forms
  • Formal vs. Informal Language: Different levels of formality

For example, for a "CheckBalance" intent, include variations like:

  • "What's my account balance?"
  • "Show me how much money I have"
  • "Balance please"
  • "I need to check my balance"
  • "Can you tell me my current account balance?"

Handling Regional Variations

If your bot serves users across different regions, consider regional language variations:

  • Dialect Differences: Include utterances reflecting different regional dialects
  • Regional Terminology: Account for region-specific terms (e.g., "soda" vs. "pop")
  • Spelling Variations: Include different spelling conventions (e.g., "color" vs. "colour")
  • Date and Number Formats: Consider different formats for dates, times, and numbers

Data Cleaning and Preparation

Before using collected data for training, it's important to clean and prepare it:

  1. Remove Duplicates: Eliminate exact duplicate utterances
  2. Fix Errors: Correct obvious spelling and grammatical errors
  3. Normalize Format: Ensure consistent formatting
  4. Remove Personally Identifiable Information (PII): Protect user privacy
  5. Balance Intent Distribution: Ensure adequate examples for each intent

Python Script for Training Data Preparation

# Example Python script for training data preparation
import pandas as pd
import re
import nltk
from nltk.corpus import stopwords
from collections import Counter

# Load raw training data
df = pd.read_csv('raw_training_data.csv')

# Remove duplicates
df = df.drop_duplicates(subset=['utterance'])

# Basic cleaning
def clean_text(text):
    # Convert to lowercase
    text = text.lower()
    # Remove extra whitespace
    text = re.sub(r'\s+', ' ', text).strip()
    # Remove special characters (but keep question marks)
    text = re.sub(r'[^\w\s\?]', '', text)
    return text

df['cleaned_utterance'] = df['utterance'].apply(clean_text)

# Check for and remove PII (simplified example)
pii_patterns = [
    r'\b\d{3}[-.\s]?\d{3}[-.\s]?\d{4}\b',  # Phone numbers
    r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b',  # Emails
    r'\b\d{3}[-]?\d{2}[-]?\d{4}\b'  # SSN
]

def remove_pii(text):
    for pattern in pii_patterns:
        text = re.sub(pattern, '[REDACTED]', text)
    return text

df['cleaned_utterance'] = df['cleaned_utterance'].apply(remove_pii)

# Analyze intent distribution
intent_counts = df['intent'].value_counts()
print("Intent distribution:")
print(intent_counts)

# Identify intents with too few examples
min_examples = 10
low_data_intents = intent_counts[intent_counts < min_examples].index.tolist()
print(f"Intents with fewer than {min_examples} examples: {low_data_intents}")

# Analyze utterance length distribution
df['word_count'] = df['cleaned_utterance'].apply(lambda x: len(x.split()))
print("Utterance length statistics:")
print(df['word_count'].describe())

# Check for common words by intent
nltk.download('stopwords')
stop_words = set(stopwords.words('english'))

def get_common_words(intent_name):
    intent_utterances = df[df['intent'] == intent_name]['cleaned_utterance']
    words = []
    for utterance in intent_utterances:
        words.extend([word for word in utterance.split() if word not in stop_words])
    return Counter(words).most_common(10)

for intent in df['intent'].unique():
    print(f"\nMost common words for intent '{intent}':")
    print(get_common_words(intent))

# Save cleaned data
df.to_csv('cleaned_training_data.csv', index=False)

Utterance Expansion & Synonyms

Even with thorough data collection, it's challenging to anticipate all the ways users might express their intents. Utterance expansion techniques can help broaden your bot's understanding.

Techniques for Utterance Expansion

Several approaches can help you systematically expand your training utterances:

  1. Pattern-Based Generation: Create templates with variable components
  2. Synonym Substitution: Replace key words with synonyms
  3. Word Order Variation: Rearrange sentence elements while preserving meaning
  4. Contraction/Expansion: Add or remove contractions (e.g., "I am" vs. "I'm")
  5. Paraphrasing Tools: Use NLP tools to generate paraphrases
Interactive

Utterance Expansion Example

Original Utterance

"I want to book a flight to New York"
Pattern-Based
  • "I want to book a flight to [CITY]"
  • "I need to book a flight to [CITY]"
Synonym Substitution
  • "I want to reserve a flight to New York"
  • "I want to purchase a ticket to New York"
Word Order Variation
  • "To New York I want to book a flight"
  • "A flight to New York is what I want to book"
Question Form
  • "Can I book a flight to New York?"
  • "How do I book a flight to New York?"

Implementing Synonym Recognition

Synonym recognition helps your bot understand variations in terminology. In Amazon Lex, you can implement this through:

  1. Slot Synonyms: Define synonyms for slot values
  2. Multiple Utterances: Include utterances with different synonyms
  3. Custom Slot Types: Define custom slot types with synonym values

For example, for a "PaymentMethod" slot, you might define synonyms like:

  • "credit card" → "card", "visa", "mastercard", "amex"
  • "bank transfer" → "wire transfer", "direct deposit", "ach"
  • "paypal" → "online payment", "digital wallet"

Custom Slot Type with Synonyms in Lex

{
  "slotTypes": [
    {
      "name": "PaymentMethod",
      "description": "Types of payment methods",
      "valueSelectionStrategy": "TOP_RESOLUTION",
      "slotTypeValues": [
        {
          "sampleValue": {
            "value": "credit card"
          },
          "synonyms": [
            "card",
            "visa",
            "mastercard",
            "amex",
            "credit",
            "plastic"
          ]
        },
        {
          "sampleValue": {
            "value": "bank transfer"
          },
          "synonyms": [
            "wire transfer",
            "direct deposit",
            "ach",
            "wire",
            "bank payment"
          ]
        },
        {
          "sampleValue": {
            "value": "paypal"
          },
          "synonyms": [
            "online payment",
            "digital wallet",
            "electronic payment",
            "online wallet"
          ]
        }
      ]
    }
  ]
}

Using Slot Catalogs Effectively

Slot catalogs in Amazon Lex provide pre-built slot types for common entities. To use them effectively:

  • Leverage built-in slot types for common entities (dates, numbers, cities, etc.)
  • Customize built-in slot types with additional values when needed
  • Use slot resolution strategies appropriate for your use case
  • Test slot recognition thoroughly with various inputs

Balancing Precision and Recall

When expanding utterances and implementing synonyms, it's important to balance precision (accuracy of intent matching) and recall (ability to recognize all relevant utterances):

  • Too Few Utterances/Synonyms: Poor recall, many missed intents
  • Too Many Broad Utterances/Synonyms: Poor precision, intent confusion

Strategies for finding the right balance include:

  1. Start with core, unambiguous utterances
  2. Gradually expand with clear variations
  3. Test regularly to identify confusion between intents
  4. Use confidence scores to identify borderline cases
  5. Implement fallback strategies for low-confidence matches

A/B Testing & Experimentation

A/B testing allows you to compare different versions of your conversational interface to determine which performs better. This data-driven approach is essential for continuous improvement.

Setting up A/B Tests for Conversations

To set up effective A/B tests for conversational interfaces:

  1. Define Clear Hypotheses: Specify what you're testing and why
  2. Create Variants: Develop different versions with specific changes
  3. Implement Traffic Splitting: Randomly assign users to variants
  4. Determine Sample Size: Ensure sufficient data for statistical significance
  5. Set Test Duration: Run tests long enough to gather reliable data

In Amazon Lex, you can implement A/B testing using:

  • Different bot versions with aliases
  • Traffic distribution across aliases
  • Lambda routing logic for more complex scenarios

Lambda Function for A/B Test Routing

// Example Lambda function for A/B test routing
exports.handler = async (event) => {
    // Extract user ID or session ID
    const userId = event.userId || event.sessionId || generateRandomId();
    
    // Determine which variant to use (A or B)
    // Using a hash of the user ID for consistent assignment
    const variant = determineVariant(userId);
    
    // Log the assignment for analytics
    console.log(`User ${userId} assigned to variant ${variant}`);
    
    // Route to the appropriate bot alias based on variant
    if (variant === 'A') {
        // Route to variant A (e.g., original version)
        return routeToBotAlias(event, 'VariantA');
    } else {
        // Route to variant B (e.g., new version)
        return routeToBotAlias(event, 'VariantB');
    }
};

// Function to consistently assign users to variants
function determineVariant(userId) {
    // Simple hash function to convert userId to a number
    let hash = 0;
    for (let i = 0; i < userId.length; i++) {
        hash = ((hash << 5) - hash) + userId.charCodeAt(i);
        hash |= 0; // Convert to 32bit integer
    }
    
    // Use hash to determine variant (50/50 split)
    return (Math.abs(hash) % 2 === 0) ? 'A' : 'B';
}

// Function to route to specific bot alias
function routeToBotAlias(event, alias) {
    // Implementation would depend on your architecture
    // This could involve calling the Lex API with the specified alias
    // or returning information that your client can use to route appropriately
    
    // For this example, we'll just return the alias in the session attributes
    const sessionAttributes = event.sessionAttributes || {};
    sessionAttributes.testVariant = alias;
    
    return {
        sessionAttributes: sessionAttributes,
        // Other response elements would go here
    };
}

Defining Success Metrics

Clear success metrics are essential for evaluating A/B test results. Common metrics for conversational interfaces include:

  • Task Completion Rate: Percentage of conversations that successfully complete the intended task
  • Conversation Length: Number of turns required to complete tasks
  • Error Rate: Frequency of misunderstood inputs or fallbacks
  • User Satisfaction: Explicit feedback or satisfaction scores
  • Retention: Rate at which users return to use the bot again
  • Conversion Rate: Percentage of conversations that lead to desired business outcomes
Interactive

A/B Testing Scenarios

Prompt Wording

Variant A: "What city would you like to fly to?"
Variant B: "Please tell me your destination city."
Key Metrics: Slot filling success rate, need for reprompts

Conversation Flow

Variant A: Collect all information first, then confirm
Variant B: Confirm each piece of information as it's collected
Key Metrics: Task completion rate, conversation length, user satisfaction

Error Handling

Variant A: Generic error messages
Variant B: Specific error messages with examples
Key Metrics: Recovery rate after errors, abandonment rate

Analyzing Results

When analyzing A/B test results:

  1. Check Statistical Significance: Ensure differences aren't due to random chance
  2. Consider Multiple Metrics: Look at the full picture, not just primary metrics
  3. Segment Results: Analyze performance across different user groups
  4. Look for Unexpected Effects: Check for unintended consequences
  5. Document Learnings: Record insights for future reference

Tools for analyzing results include:

  • Statistical analysis libraries (e.g., SciPy, StatsModels)
  • A/B testing platforms (e.g., Optimizely, VWO)
  • Custom analytics dashboards
  • Conversation analytics tools

User Feedback Loops

Establishing effective feedback loops is crucial for continuously improving your conversational interface based on real user interactions.

Collecting Explicit Feedback

Explicit feedback involves directly asking users about their experience. Approaches include:

  • End-of-Conversation Ratings: Simple thumbs up/down or star ratings
  • Follow-up Questions: "Did I answer your question?" or "Was this helpful?"
  • Short Surveys: Brief questions about specific aspects of the experience
  • Feedback Commands: Allow users to provide feedback at any time

Best practices for collecting explicit feedback:

  1. Keep it simple and quick
  2. Ask at appropriate moments (usually after task completion)
  3. Make feedback optional
  4. Thank users for their feedback
  5. Follow up on negative feedback when possible

Feedback Collection Simulator

See different approaches to collecting user feedback
I've booked your flight from New York to London on June 15th. Your confirmation number is ABC123.
Was I able to help you today?
Yes, thank you
Great! I'm glad I could help. Is there anything else you need assistance with today?
No, that's all
Thank you for using our service. On a scale of 1-5, how would you rate your experience today?
4
Thank you for your feedback! Have a great day.

Analyzing Implicit Feedback

Implicit feedback involves analyzing user behavior without directly asking for feedback. Key indicators include:

  • Conversation Abandonment: Users leaving conversations before completion
  • Repeated Attempts: Users trying multiple times to express the same intent
  • Correction Patterns: Users correcting the bot's understanding
  • Escalation Requests: Users asking for human assistance
  • Sentiment Changes: Shifts in user sentiment during conversations

Tools and techniques for analyzing implicit feedback:

  1. Conversation flow analysis
  2. Sentiment analysis
  3. Pattern recognition in conversation logs
  4. User session analysis
  5. Cohort analysis

Acting on Feedback Data

Collecting feedback is only valuable if you act on it. Effective approaches include:

  1. Prioritize Issues: Focus on high-impact, frequently occurring problems
  2. Root Cause Analysis: Identify underlying causes, not just symptoms
  3. Targeted Improvements: Make specific changes to address identified issues
  4. Measure Impact: Track metrics before and after changes
  5. Continuous Cycle: Establish an ongoing process of feedback and improvement
Interactive

Feedback Loop Cycle

Using Lex with Kendra

Amazon Kendra is an intelligent search service that can significantly enhance your Lex bot's ability to answer questions by providing access to a knowledge base.

Introduction to Amazon Kendra

Amazon Kendra is a machine learning-powered search service that:

  • Uses natural language processing to understand questions
  • Indexes and searches across multiple document types and sources
  • Returns precise answers, not just document links
  • Learns and improves from user interactions
  • Supports enterprise-grade security and access controls

Integrating Kendra with Lex allows your bot to:

  • Answer questions beyond predefined intents
  • Provide information from documents, FAQs, and knowledge bases
  • Handle complex, information-seeking queries
  • Reduce the need for human escalation

Setting up a Kendra Index

To use Kendra with Lex, you first need to set up a Kendra index:

  1. Create an Index: Set up a new Kendra index in the AWS console
  2. Configure Data Sources: Connect to your content repositories (S3, SharePoint, Salesforce, etc.)
  3. Add FAQs: Upload FAQ documents for direct question-answer matching
  4. Set Up Access Control: Configure security settings if needed
  5. Sync Data: Run initial synchronization to index your content

Best practices for Kendra index setup:

  • Organize content logically by topic or domain
  • Use metadata to enhance search relevance
  • Include variations of common questions in FAQs
  • Set up regular sync schedules to keep content fresh
  • Monitor index performance and adjust as needed

Connecting Lex and Kendra

There are two main approaches to integrating Lex with Kendra:

  1. AMAZON.KendraSearchIntent: A built-in intent type that automatically queries Kendra
  2. Custom Lambda Integration: More flexible approach using Lambda to query Kendra

Using AMAZON.KendraSearchIntent:

  1. Create a new intent with the AMAZON.KendraSearchIntent type
  2. Configure the Kendra index ID and query text
  3. Set up response templates for different result types
  4. Configure fallback behavior

Custom Lambda for Lex-Kendra Integration

// Example Lambda function for custom Lex-Kendra integration
const AWS = require('aws-sdk');
const kendra = new AWS.Kendra();

exports.handler = async (event) => {
    // Extract session attributes
    const sessionAttributes = event.sessionAttributes || {};
    
    // Get the user's question
    const question = event.inputTranscript;
    
    // Configure Kendra query parameters
    const params = {
        IndexId: process.env.KENDRA_INDEX_ID, // Set in Lambda environment variables
        QueryText: question,
        AttributeFilter: {
            // Optional: Add filters based on document attributes
            // For example, to filter by document type or category
        },
        PageSize: 3 // Number of results to return
    };
    
    try {
        // Query Kendra
        const kendraResponse = await kendra.query(params).promise();
        
        // Process the response
        if (kendraResponse.ResultItems && kendraResponse.ResultItems.length > 0) {
            // Find the best answer
            const answer = findBestAnswer(kendraResponse.ResultItems);
            
            if (answer) {
                // Return the answer to the user
                return {
                    sessionAttributes: sessionAttributes,
                    dialogAction: {
                        type: 'Close',
                        fulfillmentState: 'Fulfilled',
                        message: {
                            contentType: 'PlainText',
                            content: formatKendraResponse(answer)
                        }
                    }
                };
            }
        }
        
        // No good answer found, provide a fallback response
        return {
            sessionAttributes: sessionAttributes,
            dialogAction: {
                type: 'Close',
                fulfillmentState: 'Fulfilled',
                message: {
                    contentType: 'PlainText',
                    content: "I'm sorry, I couldn't find a specific answer to your question. Would you like to try rephrasing or ask something else?"
                }
            }
        };
    } catch (error) {
        console.error('Error querying Kendra:', error);
        
        // Return error response
        return {
            sessionAttributes: sessionAttributes,
            dialogAction: {
                type: 'Close',
                fulfillmentState: 'Fulfilled',
                message: {
                    contentType: 'PlainText',
                    content: "I'm sorry, I encountered an error while searching for information. Please try again later."
                }
            }
        };
    }
};

// Helper function to find the best answer from Kendra results
function findBestAnswer(resultItems) {
    // First, check for ANSWER type results
    const answers = resultItems.filter(item => item.Type === 'ANSWER');
    if (answers.length > 0) {
        return answers[0]; // Return the top answer
    }
    
    // Next, check for QUESTION_ANSWER type results
    const qaResults = resultItems.filter(item => item.Type === 'QUESTION_ANSWER');
    if (qaResults.length > 0) {
        return qaResults[0]; // Return the top Q&A result
    }
    
    // Finally, check for DOCUMENT type results
    const documents = resultItems.filter(item => item.Type === 'DOCUMENT');
    if (documents.length > 0) {
        return documents[0]; // Return the top document result
    }
    
    return null; // No suitable results found
}

// Helper function to format Kendra response for user
function formatKendraResponse(result) {
    let response = '';
    
    switch (result.Type) {
        case 'ANSWER':
            response = result.DocumentExcerpt.Text;
            break;
        case 'QUESTION_ANSWER':
            response = result.DocumentExcerpt.Text;
            break;
        case 'DOCUMENT':
            response = `I found this information that might help: ${result.DocumentExcerpt.Text}`;
            break;
    }
    
    // Add source attribution if available
    if (result.DocumentTitle && result.DocumentTitle.Text) {
        response += `\n\nSource: ${result.DocumentTitle.Text}`;
    }
    
    return response;
}

Optimizing Search Results

To improve the quality of Kendra search results in your Lex bot:

  1. Use Attribute Filters: Narrow search scope based on metadata
  2. Implement Query Preprocessing: Clean and enhance user queries before sending to Kendra
  3. Result Ranking: Develop custom logic to rank and select the best results
  4. Response Formatting: Present information in a conversational, digestible format
  5. Feedback Collection: Gather user feedback on search results to improve over time
Interactive

Kendra Result Types

ANSWER

Direct answers extracted from documents, highest confidence

Query: "What is the return policy?"
Result: "Our standard return policy allows returns within 30 days of purchase with original receipt."

QUESTION_ANSWER

Matches from FAQ documents, high confidence for exact question matches

Query: "How do I reset my password?"
Result: "To reset your password, click on the 'Forgot Password' link on the login page and follow the instructions sent to your email."

DOCUMENT

Relevant document excerpts, useful when direct answers aren't available

Query: "Cloud migration best practices"
Result: "From 'Cloud Migration Guide': Begin with an assessment of your current infrastructure. Identify applications that are good candidates for early migration..."

Performance Measurement

Comprehensive performance measurement is essential for understanding how well your conversational interface is serving users and identifying areas for improvement.

Defining KPIs for Conversational Interfaces

Key Performance Indicators (KPIs) for conversational interfaces typically fall into several categories:

  • Technical Performance: System uptime, response time, error rates
  • Conversation Quality: Intent recognition accuracy, slot filling success, context maintenance
  • User Experience: Task completion rate, conversation length, user satisfaction
  • Business Impact: Conversion rates, cost savings, ROI

Specific KPIs might include:

  • Intent recognition rate
  • Slot filling accuracy
  • Task completion rate
  • Average turns per conversation
  • Fallback/escalation rate
  • User satisfaction score
  • Retention rate
  • Cost per conversation

Measuring User Satisfaction

User satisfaction can be measured through:

  1. Explicit Ratings: Direct feedback from users
  2. Conversation Completion: Whether users complete their intended tasks
  3. Return Rate: How often users come back to use the bot
  4. Sentiment Analysis: Analyzing the emotional tone of user messages
  5. Escalation Rate: How often users ask for human assistance

Techniques for measuring satisfaction include:

  • Post-conversation surveys
  • In-conversation feedback requests
  • User behavior analysis
  • Sentiment analysis of conversations
  • Focus groups and user interviews

Creating Performance Dashboards

Performance dashboards provide at-a-glance visibility into your bot's performance. Effective dashboards typically include:

  1. High-Level KPI Summary: Key metrics at a glance
  2. Trend Analysis: Performance over time
  3. Intent and Slot Performance: Recognition rates and common issues
  4. Conversation Flow Visualization: Common paths and drop-off points
  5. User Feedback Summary: Aggregated user ratings and comments

Tools for creating dashboards include:

  • Amazon CloudWatch Dashboards
  • Amazon QuickSight
  • Tableau, Power BI, or other BI tools
  • Custom web dashboards

Knowledge Check: Module 7

Question 1 of X
Loading question...