Skip to main content

PRD-STD-015: Multilingual AI Quality & Safety

Standard ID: PRD-STD-015 Version: 1.0 Status: Active Compliance Level: Level 2 (Managed) Effective Date: 2026-02-22 Last Reviewed: 2026-02-22

How To Use This Standard

This page is the normative source of requirements for this control area. Use it to define policy, evidence expectations, and audit/compliance criteria.

For implementation and rollout support:

Use the Compliance Level metadata on this page to sequence adoption with other PRD-STDs.

1. Purpose

This standard defines mandatory quality and safety controls for AI products that operate across multiple languages, dialects, or scripts. AI models exhibit significant performance variance across languages — safety filters calibrated for English often fail for Arabic, code-switching inputs produce unpredictable outputs, and bias manifests differently across linguistic and cultural contexts.

Without explicit multilingual controls, organizations risk deploying AI features that are safe in one language but harmful, inaccurate, or unusable in others.

2. Scope

This standard applies to:

  • Any AI product feature that supports more than one language, processes multilingual input, or serves users across linguistic communities
  • Conversational AI, content generation, classification, moderation, search, and recommendation features operating in multilingual contexts
  • Single-language products serving dialect-diverse populations (e.g., Arabic dialects: MSA, Egyptian, Gulf, Levantine, Maghrebi)

This standard does not replace PRD-STD-010 or PRD-STD-001. It adds language-specific controls required for multilingual AI product operation.

3. Definitions

TermDefinition
Supported LanguageA language for which the AI product claims functional coverage, including quality, safety, and performance guarantees
Language Coverage MatrixA documented mapping of supported languages to evaluated quality metrics, safety test results, and known limitations per language
Code-SwitchingThe practice of alternating between two or more languages within a single conversation, sentence, or input — common in multilingual user populations
Dialect VariantA regional or social variation of a language with distinct vocabulary, grammar, or pragmatic norms that may affect AI model performance
Cross-Language ParityThe degree to which AI product quality, safety, and fairness metrics are consistent across supported languages
Multilingual Safety EvaluationStructured testing of harmful output, policy violations, and abuse patterns across all supported languages
Script NormalizationThe process of standardizing text encoding, directionality (LTR/RTL), and character representations to ensure consistent AI processing

4. Requirements

4.1 Multilingual Evaluation Standards

MANDATORY

REQ-015-01: Every AI product MUST maintain a Language Coverage Matrix documenting all supported languages with evaluated quality benchmarks, safety test status, and known limitations.

REQ-015-02: Quality evaluation MUST be performed independently for each supported language. Aggregate cross-language metrics MUST NOT be used as the sole indicator of per-language quality.

REQ-015-03: Minimum evaluation coverage MUST include task accuracy, response relevance, fluency, and factual consistency per supported language.

RECOMMENDED

REQ-015-04: Organizations SHOULD maintain language-specific evaluation datasets curated with native-speaker review, refreshed at least annually.

4.2 Cross-Language Safety Testing

MANDATORY

REQ-015-05: Safety evaluation MUST be executed independently for every supported language before release. A feature MUST NOT launch in a language that has not passed safety evaluation.

REQ-015-06: Adversarial abuse testing MUST include language-specific attack patterns including culturally-specific harmful content, script-based obfuscation, and transliteration-based policy evasion.

REQ-015-07: Cross-lingual transfer attacks — where harmful prompts in one language exploit model behavior in another — MUST be included in Tier 2 and Tier 3 safety evaluation.

RECOMMENDED

REQ-015-08: Organizations SHOULD maintain per-language harmful content taxonomies that account for culturally-specific sensitivities, taboo topics, and regulatory differences.

4.3 Dialect & Code-Switching Handling

MANDATORY

REQ-015-09: When an AI product serves dialect-diverse populations, evaluation MUST include the major dialect variants relevant to the user population with documented coverage and known limitations.

REQ-015-10: AI features MUST handle code-switching input without producing errors, truncated responses, or language confusion. Graceful degradation to a dominant language is acceptable if documented.

RECOMMENDED

REQ-015-11: Organizations SHOULD implement script normalization for languages with multiple encoding standards (e.g., Arabic Unicode normalization forms, CJK unified ideographs) to ensure consistent AI processing.

4.4 Multilingual Bias & Fairness Assessment

MANDATORY

REQ-015-12: Fairness evaluation MUST be conducted per supported language, not only on aggregated cross-language results.

REQ-015-13: AI products MUST test for and document cross-language quality parity gaps where performance in one supported language is materially worse than others.

REQ-015-14: When significant cross-language parity gaps are detected, the organization MUST either remediate before launch, restrict the affected language to a lower capability tier with user disclosure, or document the gap as a known limitation with a remediation timeline.

RECOMMENDED

REQ-015-15: Organizations SHOULD evaluate demographic fairness within each supported language (e.g., gender bias in Arabic vs. English may manifest differently due to grammatical gender systems).

4.5 Language-Specific Prompt Engineering

MANDATORY

REQ-015-16: System prompts and safety instructions MUST be validated in each supported language. Direct translation of English-language prompts without validation is prohibited.

REQ-015-17: Prompt libraries MUST include language-specific variants where prompt effectiveness varies by language (e.g., instruction-following patterns, formatting conventions, politeness norms).

RECOMMENDED

REQ-015-18: Organizations SHOULD implement language detection and routing to direct inputs to language-optimized model configurations or prompt variants.

5. Implementation Guidance

Minimum Multilingual Governance Pack

Teams SHOULD establish:

  1. Language Coverage Matrix template
  2. Per-language safety evaluation protocol
  3. Dialect coverage assessment for primary user populations
  4. Cross-language parity dashboard
  5. Language-specific prompt validation checklist
  6. Multilingual adversarial test suite

Example Language Coverage Matrix

LanguageQuality ScoreSafety StatusDialect CoverageKnown LimitationsLast Evaluated
English (en)92/100PassedN/ANone2026-02-15
Arabic (ar-MSA)87/100PassedMSA baselineReduced accuracy for technical domains2026-02-15
Arabic (ar-EG)79/100PassedEgyptian dialectCode-switching with English degrades quality by ~8%2026-02-15
Arabic (ar-SA)81/100PassedGulf dialectLimited Najdi sub-dialect coverage2026-02-15
French (fr)85/100PassedMetropolitan FrenchQuebec French not evaluated2026-02-15
Urdu (ur)68/100ConditionalStandard UrduScript rendering issues; safety tests incomplete for 2 categories2026-01-30

Minimum Operational Metrics

Track at least:

  • per-language quality score trend
  • cross-language parity gap (max/min quality ratio)
  • per-language safety evaluation pass rate
  • code-switching error rate
  • dialect coverage percentage for primary markets
  • language-specific user satisfaction scores

6. Exceptions & Waiver Process

Waivers are limited to non-safety procedural controls and MUST include:

  • business justification
  • compensating controls
  • named approver
  • expiration date (maximum 30 days)

No waivers are permitted for:

  • launching in a language without safety evaluation
  • ignoring cross-language parity gaps exceeding 20% without a documented remediation plan
  • deploying untranslated English safety instructions in non-English language surfaces

8. Revision History

VersionDateAuthorChanges
1.02026-02-22AEEF Standards CommitteeInitial release