YouTip LogoYouTip

Ai Product Design

AI Product Design | Rookie Tutorial

\\n\\n

If you design AI features like ordinary software features, the result will likely disappoint users.

\\n\\n
    \\n
  • Ordinary software is deterministic—you click save, it saves, and it's the same every time.
  • \\n
  • AI products are probabilistic—you ask it to write copy, it does well this time, but may not next time.
  • \\n
\\n\\n

When ordinary software makes an error, it shows an error message. When AI products make an error, they may confidently talk nonsense, making you think they're right.

\\n\\n

This fundamental difference determines that AI product design requires a completely new approach.

\\n\\n
\\n

Traditional product design pursues zero defects, where users get the same result every time. AI product design must accept probabilistic output and help users understand and deal with the uncertainty of results.

\\n
\\n\\n
\\n\\n

AI Product Thinking

\\n\\n

To design good AI products, you must first understand the uniqueness of AI and the psychological changes it brings to users.

\\n\\n

Probabilistic Output

\\n\\n

The output of large models is essentially probability sampling. The same question may yield different answers each time—sometimes good, sometimes bad, sometimes mediocre. This is not a bug; it's a characteristic of AI.

\\n\\n

For designers, this means several key principles:

\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n
Design PrincipleSpecific ApproachWhy It Matters
Provide retry mechanismLet users "generate again"Gives users a second chance when dissatisfied
Show multiple optionsGenerate 3-5 versions simultaneously for selectionIncreases probability of users finding satisfactory results
Allow editing and modificationAI-generated content should be easy to editUsers can fix imperfect parts
Set quality expectationsTell users "results may need adjustment"Avoids excessively high expectations
\\n\\n

Excellent AI products don't try to hide this uncertainty; they turn it into an advantage.

\\n\\n

User Psychology: Expectation Management

\\n\\n

Users' expectations of AI often swing between two extremes: either too high or completely dismissive.

\\n\\n

People using ChatGPT for the first time often exclaim how amazing it is, thinking AI can do anything.

\\n\\n

When AI makes a silly mistake, they may immediately conclude AI is nothing special.

\\n\\n

The product designer's task is to guide users' expectations to a reasonable range.

\\n\\n

How?

\\n\\n
    \\n
  • First, clearly tell users what AI can and cannot do.
  • \\n
  • Second, when AI makes mistakes, don't try to hide them—face them honestly.
  • \\n
  • Third, give users control—let them adjust, edit, and veto AI output.
  • \\n
\\n\\n
\\n

Good AI products make users feel: AI is my assistant, not my boss.

\\n
\\n\\n

Failure Modes of AI Products

\\n\\n

AI products fail differently from traditional software.

\\n\\n

Traditional software either works or doesn't—it crashes, shows errors, or features stop working.

\\n\\n

AI product failures are more subtle:

\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n
Failure ModeManifestationResponse Strategy
HallucinationFabricating non-existent facts, citations, dataAdd fact-checking, provide source annotations, allow user verification
Alignment failureAnswer doesn't match user intentProvide clarifying questions, let users confirm intent, multi-turn dialogue optimization
Unstable output qualitySometimes good, sometimes poorProvide multiple options, allow retries, let users rate and give feedback
OverconfidenceConfidently stating wrong informationAdd confidence display, use more cautious phrasing, encourage questioning
Context lossForgetting key information from earlier conversationShow context summary, allow referencing history, provide conversation memory management
\\n\\n

Understanding these failure modes is the first step to designing good AI products.

\\n\\n
\\n\\n

AI Feature Design Principles

\\n\\n

Three core principles: progressive disclosure, user control, and transparent communication.

\\n\\n

Progressive Disclosure

\\n\\n

Don't dump all features on users at once.

\\n\\n

The ideal flow is: give users a simple starting point first, then gradually show more options as needed.

\\n\\n

For example, when users write emails:

\\n\\n
    \\n
  • Step 1: Enter a topic or keywords, AI generates a draft.
  • \\n
  • Step 2: After seeing the draft, users can adjust tone, length, and style.
  • \\n
  • Step 3: If needed, further modify specific paragraphs or have AI provide several different versions.
  • \\n
\\n\\n

The benefits of this design:

\\n\\n

Beginners won't be intimidated by complex options, while experts can find sufficient control.

\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n
StageUser SeesUser Action
Initial interfaceSimple input boxEnter basic requirements
After generationResult + basic adjustment optionsSelect "more formal," "more concise," etc.
Expanded advanced optionsDetailed parameter controlsAdjust temperature, role, format, etc.
\\n\\n

Controllability: Let Users Adjust AI Output

\\n\\n

Users need to feel that they are the final decision-makers.

\\n\\n

Provide control knobs, not "take it or leave it" one-time results.

\\n\\n

Common control dimensions:

\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n
Control DimensionTypical OptionsApplicable Scenarios
Style/ToneFormal, casual, humorous, professionalWriting, emails, copy
LengthBrief, medium, detailed, longSummaries, articles, reports
ComplexitySimple and easy to understand, medium, professional depthExplanations, tutorials, technical documents
CreativityConservative, balanced, boldly innovativeBrainstorming, creative writing
FormatList, paragraph, table, outlineNotes, planning, documents
\\n\\n

More advanced control: let users directly edit AI prompts, or save their own commonly used prompt templates.

\\n\\n

Transparency: Inform Users This is AI-Generated

\\n\\n

Letting users know they are interacting with AI is not only an ethical issue but also a product experience issue.

\\n\\n

If users think it's written by a human, they will feel deceived when they discover it's AI.

\\n\\n

If it's stated from the beginning that it's AI-generated, users will be more forgiving and more willing to participate in improvement.

\\n\\n

Several approaches to transparency:

\\n\\n
    \\n
  • First, clear identification—use visual elements to distinguish AI content from user content.
  • \\n
  • Second, show the process—let users see how AI generates results (such as showing thinking process, retrieved sources).
  • \\n
  • Third, explain limitations—tell users what types of errors AI might make and how to identify them.
  • \\n
\\n\\n
\\n

Transparency builds trust. When users know this is AI, they will use it the right way—as a reference rather than blindly trusting it.

\\n
\\n\\n
\\n\\n

User Experience Design

\\n\\n

AI products have several key experience touchpoints: loading states, error handling, and feedback mechanisms.

\\n\\n

Loading State Design: Streaming Output

\\n\\n

Large models take time to generate content. Making users wait 10 seconds doing nothing creates a poor experience.

\\n\\n

The best approach is streaming output—content appears on screen word by word.

\\n\\n

Why is this good?

\\n\\n
    \\n
  • First, users feel the system is working, not frozen.
  • \\n
  • Second, users can start reading early, and halfway through know if it's the right direction.
  • \\n
  • Third, if the direction is wrong, they can interrupt at any time without waiting for full generation.
  • \\n
\\n\\n

Besides streaming output, you can also:

\\n\\n
    \\n
  • Show progress indicators—"Thinking," "Retrieving," "Generating."
  • \\n
  • Give estimated time—"Approximately 10 seconds needed."
  • \\n
  • Provide a cancel button—let users stop at any time.
  • \\n
\\n\\n

Error Handling and Degradation Plans

\\n\\n

AI will definitely make mistakes. Good product design makes errors less frightening.

\\n\\n

When AI output is obviously problematic:

\\n\\n
    \\n
  • First, give users a simple "dissatisfied" or "retry" button.
  • \\n
  • Second, provide alternatives—"How about trying this angle?"
  • \\n
  • Third, allow users to easily roll back to the previous state.
  • \\n
\\n\\n

More serious errors (such as the model not working at all):

\\n\\n

Have a degradation plan. For example, when AI is unavailable, provide a template library for users to manually select from.

\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n
Error SeverityUser ManifestationProduct Response
Minor issueResult has some flaws but is usableProvide editing functionality for users to fine-tune
Moderate issueResult is wrong, needs regenerationProvide retry button, or suggest adjusting input
Severe issueAI completely not workingProvide degradation plan, such as template library
\\n\\n

Feedback Mechanism

\\n\\n

User feedback is a valuable resource for improving AI products.

\\n\\n

But feedback functionality shouldn't be too complex, or users won't use it.

\\n\\n

Simple and effective feedback design:

\\n\\n
    \\n
  • First, thumbs up/thumbs down—one-click expression of satisfaction or dissatisfaction.
  • \\n
  • Second, brief multiple choice—"What's wrong?" (Too long, too short, off-topic, factual error...).
  • \\n
  • Third, optional text box—let users explain problems in detail, but not required.
  • \\n
  • Fourth, tell users what feedback is for—"Your feedback will help us improve."
  • \\n
\\n\\n

After feedback is collected, it's best to give users confirmation on the interface—"Thank you for your feedback, we received it."

\\n\\n
\\n\\n

Prompt Productization

\\n\\n

Good prompts are the core competitive advantage of AI products. Turn prompts from "magic" into manageable product features.

\\n\\n

Encapsulating Prompts as Product Features

\\n\\n

Ordinary users don't need to know what a "system prompt" is.

\\n\\n

They just need to know: click this button, and get a draft of a formal email.

\\n\\n

So, the designer's job is to package complex prompts into simple feature buttons.

\\n\\n

For example:

\\n\\n

The original prompt might be very long:

\\n\\n
youis/area/anprofessionalemailwritingassistant。pleasehelpuserwritea/anformal、polite、conciseofbusinessemail。requirement:1. Start with an appropriate salutation; 2. Express the body clearly; 3. End with a polite closing. The tone should be professional but not rigid, friendly but not overly casual.
\\n\\n

After productization, users see only:

\\n\\n
    \\n
  • A button—"Write Business Email."
  • \\n
  • A few simple options—"Formal/Neutral/Friendly," "Brief/Detailed."
  • \\n
\\n\\n

This is the core of prompt productization: keep complexity for yourself, keep simplicity for users.

\\n\\n

System Prompt Version Management

\\n\\n

Prompts aren't done once written; they need continuous iteration.

\\n\\n

You may find: the new version of the prompt works better for scenario A but worse for scenario B.

\\n\\n

So, prompts need version management like code.

\\n\\n

Key practices:

\\n\\n
    \\n
  • First, give each version a number or name—"v1.0," "v1.1," "Experimental-MoreFriendly."
  • \\n
  • Second, record changes for each version—"what was changed, why it was changed, expected effect."
  • \\n
  • Third, can run multiple versions simultaneously for A/B testing.
  • \\n
  • Fourth, can quickly roll back to previous versions.
  • \\n
\\n\\n
\\n

When iterating prompts, always retain the ability to roll back. New versions may bring unexpected problems.

\\n
\\n\\n

A/B Testing Prompts

\\n\\n

Which is better, version A or version B of the prompt? Don't guess—let data speak.

\\n\\n

A/B testing approach:

\\n\\n
    \\n
  • Some users use version A, some use version B.
  • \\n
  • See which version has higher user satisfaction, lower retry rate, and better completion rate.
  • \\n
\\n\\n

Below is a simple Python script demonstrating how to do A/B test analysis for prompts:

\\n\\n

Example

\\n\\n
# ============================================\\n# Prompt A/B Test Analysis Script\\n# Used to compare effects of two prompt versions\\n# ============================================\\n\\nfrom dataclasses import dataclass\\nfrom typing import List, Dict, Optional\\nimport statistics\\n\\n@dataclass\\nclass TestResult:\\n    """Data structure for single test result"""\\n    prompt_version: str  # "A" or "B"\\n    user_satisfaction: int  # User satisfaction 1-5\\n    retry_count: int  # User retry count\\n    task_completed: bool  # Whether task completed\\n    time_spent_seconds: int  # Time spent\\n    test_case_id: str  # Test case identifier (e.g., different query types)\\n\\ndef analyze_ab_test(results: List) -> Dict:\\n    """Analyze A/B test results, return comparison statistics"""\\n    # Group by version\\n    group_a = [r for r in results if r.prompt_version == "A"]\\n    group_b = [r for r in results if r.prompt_version == "B"]\\n    \\n    if not group_a or not group_b:\\n        return {"error": "Need test data for at least two versions"}\\n    \\n    def calc_stats(group: List) -> Dict:\\n        """Calculate statistics for single group"""\\n        satisfaction_scores = [r.user_satisfaction for r in group]\\n        return {\\n            "sample_size": len(group),\\n            "avg_satisfaction": statistics.mean(satisfaction_scores),\\n            "median_satisfaction": statistics.median(satisfaction_scores),\\n            "avg_retry": statistics.mean([r.retry_count for r in group]),\\n            "completion_rate": sum(1 for r in group if r.task_completed) / len(group),\\n            "avg_time": statistics.mean([r.time_spent_seconds for r in group]),\\n        }\\n    \\n    stats_a = calc_stats(group_a)\\n    stats_b = calc_stats(group_b)\\n    \\n    # Compare key metrics\\n    comparison = {\\n        "satisfaction_diff": stats_b - stats_a,\\n        "retry_diff": stats_b - stats_a,\\n        "completion_diff": stats_b - stats_a,\\n    }\\n    \\n    # Can also break down by test case (e.g., different types of queries)\\n    by_test_case = {}\\n    test_case_ids = set(r.test_case_id for r in results)\\n    \\n    for case_id in test_case_ids:\\n        case_results = [r for r in results if r.test_case_id == case_id]\\n        case_a = [r for r in case_results if r.prompt_version == "A"]\\n        case_b = [r for r in case_results if r.prompt_version == "B"]\\n        \\n        if case_a and case_b:\\n            by_test_case = {\\n                "a_score": statistics.mean(r.user_satisfaction for r in case_a),\\n                "b_score": statistics.mean(r.user_satisfaction for r in case_b),\\n            }\\n    \\n    return {\\n        "version_a": stats_a,\\n        "version_b": stats_b,\\n        "comparison": comparison,\\n        "by_test_case": by_test_case,\\n        "winner": _determine_winner(stats_a, stats_b),\\n    }\\n\\ndef _determine_winner(stats_a: Dict, stats_b: Dict) -> Optional:\\n    """Determine which version is better based on statistics"""\\n    # Comprehensive consideration of satisfaction, completion rate, retry count\\n    a_score = (\\n        stats_a * 0.5 +\\n        stats_a * 10 * 0.3 +\\n        (5 - stats_a) * 0.2\\n    )\\n    b_score = (\\n        stats_b * 0.5 +\\n        stats_b * 10 * 0.3 +\\n        (5 - stats_b) * 0.2\\n    )\\n    \\n    # If difference is too small, uncertain\\n    if abs(a_score - b_score) < 0.1:\\n        return "tie"  # Tie\\n    \\n    return "A" if a_score > b_score else "B"\\n\\ndef print_report(analysis: Dict) -> None:\\n    """Print readable A/B test report"""\\n    print("=" * 60)\\n    print("TUTORIAL Prompt A/B Test Report")\\n    print("=" * 60)\\n    \\n    if "error" in analysis:\\n        print(f"Error: {analysis['error']}")\\n        return\\n    \\n    print(f"\\\\n Version A (Baseline):")\\n    print(f"   Sample size: {analysis['version_a']['sample_size']}")\\n    print(f"   Average satisfaction: {analysis['version_a']['avg_satisfaction']:.2f}/5")\\n    print(f"   Completion rate: {analysis['version_a']['completion_rate']:.1%}")\\n    print(f"   Average retry count: {analysis['version_a']['avg_retry']:.2f}")\\n    \\n    print(f"\\\\n Version B (New Solution):")\\n    print(f"   Sample size: {analysis['version_b']['sample_size']}")\\n    print(f"   Average satisfaction: {analysis['version_b']['avg_satisfaction']:.2f}/5")\\n    print(f"   Completion rate: {analysis['version_b']['completion_rate']:.1%}")\\n    print(f"   Average retry count: {analysis['version_b']['avg_retry']:.2f}")\\n    \\n    print(f"\\\\n Comparison Results:")\\n    diff = analysis\\n    print(f"   Satisfaction change: {diff['satisfaction_diff']:+.2f}")\\n    print(f"   Completion rate change: {diff['completion_diff']:+.1%}")\\n    print(f"   Retry count change: {diff['retry_diff']:+.2f}")\\n    \\n    winner = analysis\\n    if winner == "tie":\\n        print("\\\\n Conclusion: Both versions perform similarly, need more data or adjustment")\\n    else:\\n        print(f"\\\\n Conclusion: Version {winner} performs better")
← Ai SecurityAi Multimodal →