Stirling-PDF/scripts/translations
Anthony Stirling 0fa53185f2
Add translations for ar-AR, de-DE, fr-FR, it-IT, pt-BR, ru-RU and ena… (#4572)
…ble in frontend

- Updated ar-AR (Arabic) to 98.7% completion (1088 entries)
- Updated fr-FR (French) to 97.3% completion (1296 entries)
- Updated pt-BR (Portuguese Brazil) to 98.6% completion (1294 entries)
- Updated ru-RU (Russian) to 98.1% completion (1277 entries)
- Updated ja-JP (Japanese) to 73.4% completion (796 entries, batches
1-2)
- Updated es-ES minor corrections
- Enabled 8 languages with >90% completion in LanguageSelector
- Added JSON validation scripts for translation quality assurance
- RTL support already enabled for ar-AR

Enabled languages: en-GB, ar-AR, de-DE, es-ES, fr-FR, it-IT, pt-BR,
ru-RU, zh-CN

🤖 Generated with [Claude Code](https://claude.com/claude-code)

# Description of Changes

<!--
Please provide a summary of the changes, including:

- What was changed
- Why the change was made
- Any challenges encountered

Closes #(issue_number)
-->

---

## Checklist

### General

- [ ] I have read the [Contribution
Guidelines](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/CONTRIBUTING.md)
- [ ] I have read the [Stirling-PDF Developer
Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md)
(if applicable)
- [ ] I have read the [How to add new languages to
Stirling-PDF](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md)
(if applicable)
- [ ] I have performed a self-review of my own code
- [ ] My changes generate no new warnings

### Documentation

- [ ] I have updated relevant docs on [Stirling-PDF's doc
repo](https://github.com/Stirling-Tools/Stirling-Tools.github.io/blob/main/docs/)
(if functionality has heavily changed)
- [ ] I have read the section [Add New Translation
Tags](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md#add-new-translation-tags)
(for new translation tags only)

### UI Changes (if applicable)

- [ ] Screenshots or videos demonstrating the UI changes are attached
(e.g., as comments or direct attachments in the PR)

### Testing (if applicable)

- [ ] I have tested my changes locally. Refer to the [Testing
Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md#6-testing)
for more details.

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-01 12:58:41 +01:00
..
ai_translation_helper.py
compact_translator.py
json_beautifier.py
json_validator.py Add translations for ar-AR, de-DE, fr-FR, it-IT, pt-BR, ru-RU and ena… (#4572) 2025-10-01 12:58:41 +01:00
README.md Add translations for ar-AR, de-DE, fr-FR, it-IT, pt-BR, ru-RU and ena… (#4572) 2025-10-01 12:58:41 +01:00
translation_analyzer.py
translation_merger.py
validate_json_structure.py Add translations for ar-AR, de-DE, fr-FR, it-IT, pt-BR, ru-RU and ena… (#4572) 2025-10-01 12:58:41 +01:00
validate_placeholders.py Add translations for ar-AR, de-DE, fr-FR, it-IT, pt-BR, ru-RU and ena… (#4572) 2025-10-01 12:58:41 +01:00

Translation Management Scripts

This directory contains Python scripts for managing frontend translations in Stirling PDF. These tools help analyze, merge, validate, and manage translations against the en-GB golden truth file.

Scripts Overview

0. Validation Scripts (Run First!)

json_validator.py

Validates JSON syntax in translation files with detailed error reporting.

Usage:

# Validate single file
python scripts/translations/json_validator.py ar_AR_batch_1_of_3.json

# Validate all batches for a language
python scripts/translations/json_validator.py --all-batches ar_AR

# Validate pattern with wildcards
python scripts/translations/json_validator.py "ar_AR_batch_*.json"

# Brief output (no context)
python scripts/translations/json_validator.py --all-batches ar_AR --brief

# Only show files with errors
python scripts/translations/json_validator.py --all-batches ar_AR --quiet

Features:

  • Validates JSON syntax with detailed error messages
  • Shows exact line, column, and character position of errors
  • Displays context around errors for easy fixing
  • Suggests common fixes based on error type
  • Detects unescaped quotes and backslashes
  • Reports entry counts for valid files
  • Exit code 1 if any files invalid (good for CI/CD)

Common Issues Detected:

  • Unescaped quotes inside strings: "text with "quotes"""text with \"quotes\""
  • Invalid backslash escapes: \d{4}\\d{4}
  • Missing commas between entries
  • Trailing commas before closing braces

validate_placeholders.py

Validates that translation files have correct placeholders matching en-GB (source of truth).

Usage:

# Validate all languages
python scripts/translations/validate_placeholders.py

# Validate specific language
python scripts/translations/validate_placeholders.py --language es-ES

# Show detailed text samples
python scripts/translations/validate_placeholders.py --verbose

# Output as JSON
python scripts/translations/validate_placeholders.py --json

Features:

  • Detects missing placeholders (e.g., {n}, {total}, {filename})
  • Detects extra placeholders not in en-GB
  • Shows exact keys and text where issues occur
  • Exit code 1 if issues found (good for CI/CD)

validate_json_structure.py

Validates JSON structure and key consistency with en-GB.

Usage:

# Validate all languages
python scripts/translations/validate_json_structure.py

# Validate specific language
python scripts/translations/validate_json_structure.py --language de-DE

# Show all missing/extra keys
python scripts/translations/validate_json_structure.py --verbose

# Output as JSON
python scripts/translations/validate_json_structure.py --json

Features:

  • Validates JSON syntax
  • Detects missing keys (not translated yet)
  • Detects extra keys (not in en-GB, should be removed)
  • Reports key counts and structure differences
  • Exit code 1 if issues found (good for CI/CD)

1. translation_analyzer.py

Analyzes translation files to find missing translations, untranslated entries, and provides completion statistics.

Usage:

# Analyze all languages
python scripts/translations/translation_analyzer.py

# Analyze specific language
python scripts/translations/translation_analyzer.py --language fr-FR

# Show only missing translations
python scripts/translations/translation_analyzer.py --missing-only

# Show only untranslated entries
python scripts/translations/translation_analyzer.py --untranslated-only

# Show summary only
python scripts/translations/translation_analyzer.py --summary

# JSON output format
python scripts/translations/translation_analyzer.py --format json

Features:

  • Finds missing translation keys
  • Identifies untranslated entries (identical to en-GB and [UNTRANSLATED] markers)
  • Shows accurate completion percentages using ignore patterns
  • Identifies extra keys not in en-GB
  • Supports JSON and text output formats
  • Uses scripts/ignore_translation.toml for language-specific exclusions

2. translation_merger.py

Merges missing translations from en-GB into target language files and manages translation workflows.

Usage:

# Add missing translations from en-GB to French
python scripts/translations/translation_merger.py fr-FR add-missing

# Add without marking as [UNTRANSLATED]
python scripts/translations/translation_merger.py fr-FR add-missing --no-mark-untranslated

# Extract untranslated entries to a file
python scripts/translations/translation_merger.py fr-FR extract-untranslated --output fr_untranslated.json

# Create a template for AI translation
python scripts/translations/translation_merger.py fr-FR create-template --output fr_template.json

# Apply translations from a file
python scripts/translations/translation_merger.py fr-FR apply-translations --translations-file fr_translated.json

Features:

  • Adds missing keys from en-GB with optional [UNTRANSLATED] markers
  • Extracts untranslated entries for external translation
  • Creates structured templates for AI translation
  • Applies translated content back to language files
  • Automatic backup creation

3. ai_translation_helper.py

Specialized tool for AI-assisted translation workflows with batch processing and validation.

Usage:

# Create batch file for AI translation (multiple languages)
python scripts/translations/ai_translation_helper.py create-batch --languages fr-FR de-DE es-ES --output batch.json --max-entries 50

# Validate AI translations
python scripts/translations/ai_translation_helper.py validate batch.json

# Apply validated AI translations
python scripts/translations/ai_translation_helper.py apply-batch batch.json

# Export for external translation services
python scripts/translations/ai_translation_helper.py export --languages fr-FR de-DE --format csv

Features:

  • Creates batch files for AI translation of multiple languages
  • Prioritizes important translation keys
  • Validates translations for placeholders and artifacts
  • Applies batch translations with validation
  • Exports to CSV/JSON for external translation services

4. compact_translator.py

Extracts untranslated entries in minimal JSON format for character-limited AI services.

Usage:

# Extract all untranslated entries
python scripts/translations/compact_translator.py it-IT --output to_translate.json

Features:

  • Produces minimal JSON output with no extra whitespace
  • Automatic ignore patterns for cleaner output
  • Batch size control for manageable chunks
  • 50-80% fewer characters than other extraction methods

5. json_beautifier.py

Restructures and beautifies translation JSON files to match en-GB structure exactly.

Usage:

# Restructure single language to match en-GB structure
python scripts/translations/json_beautifier.py --language de-DE

# Restructure all languages
python scripts/translations/json_beautifier.py --all-languages

# Validate structure without modifying files
python scripts/translations/json_beautifier.py --language de-DE --validate-only

# Skip backup creation
python scripts/translations/json_beautifier.py --language de-DE --no-backup

Features:

  • Restructures JSON to match en-GB nested structure exactly
  • Preserves key ordering for line-by-line comparison
  • Creates automatic backups before modification
  • Validates structure and key ordering
  • Handles flattened dot-notation keys (e.g., "key.subkey") properly

Translation Workflows

Best for character-limited AI services like Claude or ChatGPT

Step 1: Check Current Status

python scripts/translations/translation_analyzer.py --language it-IT --summary

Step 2: Extract Untranslated Entries

# For small files (< 1200 entries)
python scripts/translations/compact_translator.py it-IT --output to_translate.json

# For large files, split into batches
python scripts/translations/compact_translator.py it-IT --output it_IT_batch --batch-size 400
# Creates: it_IT_batch_1_of_N.json, it_IT_batch_2_of_N.json, etc.

Step 2.5: Validate JSON (if using batches)

# After AI translates the batches, validate them before merging
python scripts/translations/json_validator.py --all-batches it_IT

# Fix any errors reported (common issues: unescaped quotes, backslashes)

Output format: Compact JSON with minimal whitespace

{"key1":"English text","key2":"Another text","key3":"More text"}

Step 3: AI Translation

  1. Copy the compact JSON output
  2. Give it to your AI with instructions:
    Translate this JSON to Italian. Keep the same structure, translate only the values.
    Preserve placeholders like {n}, {total}, {filename}, {{variable}}.
    
  3. Save the AI's response as translated.json

Step 4: Apply Translations

python scripts/translations/translation_merger.py it-IT apply-translations --translations-file translated.json

Step 5: Verify Results

python scripts/translations/translation_analyzer.py --language it-IT --summary

Method 2: Batch Translation Workflow

For complete language translation from scratch or major updates

Step 1: Analyze Current State

python scripts/translations/translation_analyzer.py --language de-DE --summary

Step 2: Create Translation Batches

# Create batches of 100 entries each for systematic translation
python scripts/translations/ai_translation_helper.py create-batch --languages de-DE --output de_batch_1.json --max-entries 100

Step 3: Translate Batch with AI

Edit the batch file and fill in ALL translated fields:

  • Preserve all placeholders like {n}, {total}, {filename}, {{toolName}}
  • Keep technical terms consistent
  • Maintain JSON structure exactly
  • Consider context provided for each entry

Step 4: Apply Translations

# Skip validation if using legitimate placeholders ({{variable}})
python scripts/translations/ai_translation_helper.py apply-batch de_batch_1.json --skip-validation

Step 5: Check Progress and Continue

python scripts/translations/translation_analyzer.py --language de-DE --summary

Repeat steps 2-5 until 100% complete.

Method 3: Quick Translation Workflow (Legacy)

For small updates or existing translations

Step 1: Add Missing Translations

python scripts/translations/translation_merger.py fr-FR add-missing --mark-untranslated

Step 2: Create AI Template

python scripts/translations/translation_merger.py fr-FR create-template --output fr_template.json

Step 3: Apply Translations

python scripts/translations/translation_merger.py fr-FR apply-translations --translations-file fr_translated.json

Translation File Structure

Translation files are located in frontend/public/locales/{language}/translation.json with nested JSON structure:

{
  "addPageNumbers": {
    "title": "Add Page Numbers",
    "selectText": {
      "1": "Select PDF file:",
      "2": "Margin Size"
    }
  }
}

Keys use dot notation internally (e.g., addPageNumbers.selectText.1).

Key Features

Placeholder Preservation

All scripts preserve placeholders like {n}, {total}, {filename} in translations:

"customNumberDesc": "Defaults to {n}, also accepts 'Page {n} of {total}'"

Automatic Backups

Scripts create timestamped backups before modifying files:

translation.backup.20241201_143022.json

Context-Aware Translation

Scripts provide context information to help with accurate translations:

{
  "addPageNumbers.title": {
    "original": "Add Page Numbers",
    "context": "Feature for adding page numbers to PDFs"
  }
}

Priority-Based Translation

Important keys (title, submit, error messages) are prioritized when limiting translation batch sizes.

Ignore Patterns System

The scripts/ignore_translation.toml file defines keys that should be ignored for each language, improving completion accuracy.

Common ignore patterns:

  • language.direction: Text direction (ltr/rtl) - universal
  • lang.*: Language code entries not relevant to specific locales
  • pipeline.title, home.devApi.title: Technical terms kept in English
  • Specific technical IDs, version numbers, and system identifiers

Format:

[de_DE]
ignore = [
    'language.direction',
    'pipeline.title',
    'lang.afr',
    'lang.ceb',
    # ... more patterns
]

Best Practices & Lessons Learned

Critical Rules for Translation

  1. NEVER skip entries: Translate ALL entries in each batch to avoid [UNTRANSLATED] pollution
  2. Use appropriate batch sizes: 100 entries for systematic translation, unlimited for compact method
  3. Skip validation for placeholders: Use --skip-validation when batch contains {{variable}} patterns
  4. Check progress between batches: Use --summary flag to track completion percentage
  5. Preserve all placeholders: Keep {n}, {total}, {filename}, {{toolName}} exactly as-is

Workflow Comparison

Method Best For Character Usage Complexity Speed
Compact AI services Minimal (50-80% less) Simple Fastest
Batch Systematic translation Moderate Medium Medium
Quick Small updates High Low Slow

Common Issues and Solutions

JSON Syntax Errors in AI Translations

Problem: AI-translated batch files have JSON syntax errors Symptoms:

  • JSONDecodeError: Expecting ',' delimiter
  • JSONDecodeError: Invalid \escape

Solution:

# 1. Validate all batches to find errors
python scripts/translations/json_validator.py --all-batches ar_AR

# 2. Check detailed error with context
python scripts/translations/json_validator.py ar_AR_batch_2_of_3.json

# 3. Fix the reported issues:
#    - Unescaped quotes: "text with "quotes"" → "text with \"quotes\""
#    - Backslashes in regex: "\d{4}" → "\\d{4}"
#    - Missing commas between entries

# 4. Validate again until all pass
python scripts/translations/json_validator.py --all-batches ar_AR

Common fixes:

  • Arabic/RTL text with embedded quotes: Always escape with backslash
  • Regex patterns: Double all backslashes (\d\\d)
  • Check for missing/extra commas at line reported in error

[UNTRANSLATED] Pollution

Problem: Hundreds of [UNTRANSLATED] markers from incomplete translation attempts Solution:

  • Only translate complete batches of manageable size
  • Use analyzer that counts [UNTRANSLATED] as missing translations
  • Restore from backup if pollution occurs

Validation False Positives

Problem: Validator flags legitimate {{variable}} placeholders as artifacts Solution: Use --skip-validation flag when applying batches with template variables

JSON Structure Mismatches

Problem: Flattened dot-notation keys instead of proper nested objects Solution: Use json_beautifier.py to restructure files to match en-GB exactly

Real-World Examples

Complete Arabic Translation with Validation (Batch Method)

# Check status
python scripts/translations/translation_analyzer.py --language ar-AR --summary
# Result: 50% complete, 1088 missing

# Extract in batches due to AI token limits
python scripts/translations/compact_translator.py ar-AR --output ar_AR_batch --batch-size 400
# Created: ar_AR_batch_1_of_3.json (400 entries)
#          ar_AR_batch_2_of_3.json (400 entries)
#          ar_AR_batch_3_of_3.json (288 entries)

# [Send each batch to AI for translation]

# Validate translated batches before merging
python scripts/translations/json_validator.py --all-batches ar_AR
# Found errors in batch 1 and 2:
#   - Line 263: Unescaped quotes in "انقر "إضافة ملفات""
#   - Line 132: Unescaped quotes in "أو "and""
#   - Line 213: Invalid escape "\d{4}"

# Fix errors manually or with sed, then validate again
python scripts/translations/json_validator.py --all-batches ar_AR
# All valid!

# Merge all batches
python3 << 'EOF'
import json
merged = {}
for i in range(1, 4):
    with open(f'ar_AR_batch_{i}_of_3.json', 'r', encoding='utf-8') as f:
        merged.update(json.load(f))
with open('ar_AR_merged.json', 'w', encoding='utf-8') as f:
    json.dump(merged, f, ensure_ascii=False, indent=2)
EOF

# Apply merged translations
python scripts/translations/translation_merger.py ar-AR apply-translations --translations-file ar_AR_merged.json
# Result: Applied 1088 translations

# Beautify to match en-GB structure
python scripts/translations/json_beautifier.py --language ar-AR

# Check final progress
python scripts/translations/translation_analyzer.py --language ar-AR --summary
# Result: 98.7% complete, 9 missing, 20 untranslated

Complete Italian Translation (Compact Method)

# Check status
python scripts/translations/translation_analyzer.py --language it-IT --summary
# Result: 46.8% complete, 1147 missing

# Extract all entries for translation
python scripts/translations/compact_translator.py it-IT --output batch1.json

# [Translate batch1.json with AI, save as batch1_translated.json]

# Apply translations
python scripts/translations/translation_merger.py it-IT apply-translations --translations-file batch1_translated.json
# Result: Applied 1147 translations

# Check progress
python scripts/translations/translation_analyzer.py --language it-IT --summary
# Result: 100% complete, 0 missing

German Translation (Batch Method)

Starting from 46.3% completion, reaching 60.3% with batch method:

# Initial analysis
python scripts/translations/translation_analyzer.py --language de-DE --summary
# Result: 46.3% complete, 1142 missing entries

# Batch 1 (100 entries)
python scripts/translations/ai_translation_helper.py create-batch --languages de-DE --output de_batch_1.json --max-entries 100
# [Translate all 100 entries in batch file]
python scripts/translations/ai_translation_helper.py apply-batch de_batch_1.json --skip-validation
# Progress: 46.6% → 51.2%

# Continue with more batches until 100% complete

Error Handling

  • Missing Files: Scripts create new files when language directories don't exist
  • Invalid JSON: Clear error messages with line numbers
  • Placeholder Mismatches: Validation warnings for missing or extra placeholders
  • [UNTRANSLATED] Entries: Counted as missing translations to prevent pollution
  • Backup Failures: Graceful handling with user notification

Integration with Development

These scripts integrate with the existing translation system:

  • Works with the current frontend/public/locales/ structure
  • Compatible with the i18n system used in the React frontend
  • Respects the JSON format expected by the translation loader
  • Maintains the nested structure required by the UI components

Language-Specific Notes

German Translation Notes

  • Technical terms: Use German equivalents (PDF → PDF, API → API)
  • UI actions: "hochladen" (upload), "herunterladen" (download), "speichern" (save)
  • Error messages: Consistent pattern "Ein Fehler ist beim [action] aufgetreten"
  • Formal address: Use "Sie" form for user-facing text

Italian Translation Notes

  • Keep technical terms in English when commonly used (PDF, API, URL)
  • Use formal address ("Lei" form) for user-facing text
  • Error messages: "Si è verificato un errore durante [action]"
  • UI actions: "carica" (upload), "scarica" (download), "salva" (save)

Common Use Cases

  1. Complete Language Translation: Use Compact Workflow for fastest AI-assisted translation
  2. New Language Addition: Start with compact workflow for comprehensive coverage
  3. Updating Existing Language: Use analyzer to find gaps, then compact or batch method
  4. Quality Assurance: Use analyzer with --summary for completion metrics and issue detection
  5. External Translation Services: Use export functionality to generate CSV files for translators
  6. Structure Maintenance: Use json_beautifier to keep files aligned with en-GB structure