Stirling-PDF/scripts/translations
Anthony Stirling 1219cebd07
Language stuff (#4490)
# Description of Changes

<!--
Please provide a summary of the changes, including:

- What was changed
- Why the change was made
- Any challenges encountered

Closes #(issue_number)
-->

---

## Checklist

### General

- [ ] I have read the [Contribution
Guidelines](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/CONTRIBUTING.md)
- [ ] I have read the [Stirling-PDF Developer
Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md)
(if applicable)
- [ ] I have read the [How to add new languages to
Stirling-PDF](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md)
(if applicable)
- [ ] I have performed a self-review of my own code
- [ ] My changes generate no new warnings

### Documentation

- [ ] I have updated relevant docs on [Stirling-PDF's doc
repo](https://github.com/Stirling-Tools/Stirling-Tools.github.io/blob/main/docs/)
(if functionality has heavily changed)
- [ ] I have read the section [Add New Translation
Tags](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md#add-new-translation-tags)
(for new translation tags only)

### UI Changes (if applicable)

- [ ] Screenshots or videos demonstrating the UI changes are attached
(e.g., as comments or direct attachments in the PR)

### Testing (if applicable)

- [ ] I have tested my changes locally. Refer to the [Testing
Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md#6-testing)
for more details.
2025-09-25 12:50:19 +01:00
..
ai_translation_helper.py Language stuff (#4490) 2025-09-25 12:50:19 +01:00
compact_translator.py Language stuff (#4490) 2025-09-25 12:50:19 +01:00
json_beautifier.py Language stuff (#4490) 2025-09-25 12:50:19 +01:00
README.md Language stuff (#4490) 2025-09-25 12:50:19 +01:00
translation_analyzer.py Language stuff (#4490) 2025-09-25 12:50:19 +01:00
translation_merger.py Language stuff (#4490) 2025-09-25 12:50:19 +01:00

Translation Management Scripts

This directory contains Python scripts for managing frontend translations in Stirling PDF. These tools help analyze, merge, and manage translations against the en-GB golden truth file.

Scripts Overview

1. translation_analyzer.py

Analyzes translation files to find missing translations, untranslated entries, and provides completion statistics.

Usage:

# Analyze all languages
python scripts/translations/translation_analyzer.py

# Analyze specific language
python scripts/translations/translation_analyzer.py --language fr-FR

# Show only missing translations
python scripts/translations/translation_analyzer.py --missing-only

# Show only untranslated entries
python scripts/translations/translation_analyzer.py --untranslated-only

# Show summary only
python scripts/translations/translation_analyzer.py --summary

# JSON output format
python scripts/translations/translation_analyzer.py --format json

Features:

  • Finds missing translation keys
  • Identifies untranslated entries (identical to en-GB and [UNTRANSLATED] markers)
  • Shows accurate completion percentages using ignore patterns
  • Identifies extra keys not in en-GB
  • Supports JSON and text output formats
  • Uses scripts/ignore_translation.toml for language-specific exclusions

2. translation_merger.py

Merges missing translations from en-GB into target language files and manages translation workflows.

Usage:

# Add missing translations from en-GB to French
python scripts/translations/translation_merger.py fr-FR add-missing

# Add without marking as [UNTRANSLATED]
python scripts/translations/translation_merger.py fr-FR add-missing --no-mark-untranslated

# Extract untranslated entries to a file
python scripts/translations/translation_merger.py fr-FR extract-untranslated --output fr_untranslated.json

# Create a template for AI translation
python scripts/translations/translation_merger.py fr-FR create-template --output fr_template.json

# Apply translations from a file
python scripts/translations/translation_merger.py fr-FR apply-translations --translations-file fr_translated.json

Features:

  • Adds missing keys from en-GB with optional [UNTRANSLATED] markers
  • Extracts untranslated entries for external translation
  • Creates structured templates for AI translation
  • Applies translated content back to language files
  • Automatic backup creation

3. ai_translation_helper.py

Specialized tool for AI-assisted translation workflows with batch processing and validation.

Usage:

# Create batch file for AI translation (multiple languages)
python scripts/translations/ai_translation_helper.py create-batch --languages fr-FR de-DE es-ES --output batch.json --max-entries 50

# Validate AI translations
python scripts/translations/ai_translation_helper.py validate batch.json

# Apply validated AI translations
python scripts/translations/ai_translation_helper.py apply-batch batch.json

# Export for external translation services
python scripts/translations/ai_translation_helper.py export --languages fr-FR de-DE --format csv

Features:

  • Creates batch files for AI translation of multiple languages
  • Prioritizes important translation keys
  • Validates translations for placeholders and artifacts
  • Applies batch translations with validation
  • Exports to CSV/JSON for external translation services

4. compact_translator.py

Extracts untranslated entries in minimal JSON format for character-limited AI services.

Usage:

# Extract all untranslated entries
python scripts/translations/compact_translator.py it-IT --output to_translate.json

Features:

  • Produces minimal JSON output with no extra whitespace
  • Automatic ignore patterns for cleaner output
  • Batch size control for manageable chunks
  • 50-80% fewer characters than other extraction methods

5. json_beautifier.py

Restructures and beautifies translation JSON files to match en-GB structure exactly.

Usage:

# Restructure single language to match en-GB structure
python scripts/translations/json_beautifier.py --language de-DE

# Restructure all languages
python scripts/translations/json_beautifier.py --all-languages

# Validate structure without modifying files
python scripts/translations/json_beautifier.py --language de-DE --validate-only

# Skip backup creation
python scripts/translations/json_beautifier.py --language de-DE --no-backup

Features:

  • Restructures JSON to match en-GB nested structure exactly
  • Preserves key ordering for line-by-line comparison
  • Creates automatic backups before modification
  • Validates structure and key ordering
  • Handles flattened dot-notation keys (e.g., "key.subkey") properly

Translation Workflows

Best for character-limited AI services like Claude or ChatGPT

Step 1: Check Current Status

python scripts/translations/translation_analyzer.py --language it-IT --summary

Step 2: Extract Untranslated Entries

python scripts/translations/compact_translator.py it-IT --output to_translate.json

Output format: Compact JSON with minimal whitespace

{"key1":"English text","key2":"Another text","key3":"More text"}

Step 3: AI Translation

  1. Copy the compact JSON output
  2. Give it to your AI with instructions:
    Translate this JSON to Italian. Keep the same structure, translate only the values.
    Preserve placeholders like {n}, {total}, {filename}, {{variable}}.
    
  3. Save the AI's response as translated.json

Step 4: Apply Translations

python scripts/translations/translation_merger.py it-IT apply-translations --translations-file translated.json

Step 5: Verify Results

python scripts/translations/translation_analyzer.py --language it-IT --summary

Method 2: Batch Translation Workflow

For complete language translation from scratch or major updates

Step 1: Analyze Current State

python scripts/translations/translation_analyzer.py --language de-DE --summary

Step 2: Create Translation Batches

# Create batches of 100 entries each for systematic translation
python scripts/translations/ai_translation_helper.py create-batch --languages de-DE --output de_batch_1.json --max-entries 100

Step 3: Translate Batch with AI

Edit the batch file and fill in ALL translated fields:

  • Preserve all placeholders like {n}, {total}, {filename}, {{toolName}}
  • Keep technical terms consistent
  • Maintain JSON structure exactly
  • Consider context provided for each entry

Step 4: Apply Translations

# Skip validation if using legitimate placeholders ({{variable}})
python scripts/translations/ai_translation_helper.py apply-batch de_batch_1.json --skip-validation

Step 5: Check Progress and Continue

python scripts/translations/translation_analyzer.py --language de-DE --summary

Repeat steps 2-5 until 100% complete.

Method 3: Quick Translation Workflow (Legacy)

For small updates or existing translations

Step 1: Add Missing Translations

python scripts/translations/translation_merger.py fr-FR add-missing --mark-untranslated

Step 2: Create AI Template

python scripts/translations/translation_merger.py fr-FR create-template --output fr_template.json

Step 3: Apply Translations

python scripts/translations/translation_merger.py fr-FR apply-translations --translations-file fr_translated.json

Translation File Structure

Translation files are located in frontend/public/locales/{language}/translation.json with nested JSON structure:

{
  "addPageNumbers": {
    "title": "Add Page Numbers",
    "selectText": {
      "1": "Select PDF file:",
      "2": "Margin Size"
    }
  }
}

Keys use dot notation internally (e.g., addPageNumbers.selectText.1).

Key Features

Placeholder Preservation

All scripts preserve placeholders like {n}, {total}, {filename} in translations:

"customNumberDesc": "Defaults to {n}, also accepts 'Page {n} of {total}'"

Automatic Backups

Scripts create timestamped backups before modifying files:

translation.backup.20241201_143022.json

Context-Aware Translation

Scripts provide context information to help with accurate translations:

{
  "addPageNumbers.title": {
    "original": "Add Page Numbers",
    "context": "Feature for adding page numbers to PDFs"
  }
}

Priority-Based Translation

Important keys (title, submit, error messages) are prioritized when limiting translation batch sizes.

Ignore Patterns System

The scripts/ignore_translation.toml file defines keys that should be ignored for each language, improving completion accuracy.

Common ignore patterns:

  • language.direction: Text direction (ltr/rtl) - universal
  • lang.*: Language code entries not relevant to specific locales
  • pipeline.title, home.devApi.title: Technical terms kept in English
  • Specific technical IDs, version numbers, and system identifiers

Format:

[de_DE]
ignore = [
    'language.direction',
    'pipeline.title',
    'lang.afr',
    'lang.ceb',
    # ... more patterns
]

Best Practices & Lessons Learned

Critical Rules for Translation

  1. NEVER skip entries: Translate ALL entries in each batch to avoid [UNTRANSLATED] pollution
  2. Use appropriate batch sizes: 100 entries for systematic translation, unlimited for compact method
  3. Skip validation for placeholders: Use --skip-validation when batch contains {{variable}} patterns
  4. Check progress between batches: Use --summary flag to track completion percentage
  5. Preserve all placeholders: Keep {n}, {total}, {filename}, {{toolName}} exactly as-is

Workflow Comparison

Method Best For Character Usage Complexity Speed
Compact AI services Minimal (50-80% less) Simple Fastest
Batch Systematic translation Moderate Medium Medium
Quick Small updates High Low Slow

Common Issues and Solutions

[UNTRANSLATED] Pollution

Problem: Hundreds of [UNTRANSLATED] markers from incomplete translation attempts Solution:

  • Only translate complete batches of manageable size
  • Use analyzer that counts [UNTRANSLATED] as missing translations
  • Restore from backup if pollution occurs

Validation False Positives

Problem: Validator flags legitimate {{variable}} placeholders as artifacts Solution: Use --skip-validation flag when applying batches with template variables

JSON Structure Mismatches

Problem: Flattened dot-notation keys instead of proper nested objects Solution: Use json_beautifier.py to restructure files to match en-GB exactly

Real-World Examples

Complete Italian Translation (Compact Method)

# Check status
python scripts/translations/translation_analyzer.py --language it-IT --summary
# Result: 46.8% complete, 1147 missing

# Extract all entries for translation
python scripts/translations/compact_translator.py it-IT --output batch1.json

# [Translate batch1.json with AI, save as batch1_translated.json]

# Apply translations
python scripts/translations/translation_merger.py it-IT apply-translations --translations-file batch1_translated.json
# Result: Applied 1147 translations

# Check progress
python scripts/translations/translation_analyzer.py --language it-IT --summary
# Result: 100% complete, 0 missing

German Translation (Batch Method)

Starting from 46.3% completion, reaching 60.3% with batch method:

# Initial analysis
python scripts/translations/translation_analyzer.py --language de-DE --summary
# Result: 46.3% complete, 1142 missing entries

# Batch 1 (100 entries)
python scripts/translations/ai_translation_helper.py create-batch --languages de-DE --output de_batch_1.json --max-entries 100
# [Translate all 100 entries in batch file]
python scripts/translations/ai_translation_helper.py apply-batch de_batch_1.json --skip-validation
# Progress: 46.6% → 51.2%

# Continue with more batches until 100% complete

Error Handling

  • Missing Files: Scripts create new files when language directories don't exist
  • Invalid JSON: Clear error messages with line numbers
  • Placeholder Mismatches: Validation warnings for missing or extra placeholders
  • [UNTRANSLATED] Entries: Counted as missing translations to prevent pollution
  • Backup Failures: Graceful handling with user notification

Integration with Development

These scripts integrate with the existing translation system:

  • Works with the current frontend/public/locales/ structure
  • Compatible with the i18n system used in the React frontend
  • Respects the JSON format expected by the translation loader
  • Maintains the nested structure required by the UI components

Language-Specific Notes

German Translation Notes

  • Technical terms: Use German equivalents (PDF → PDF, API → API)
  • UI actions: "hochladen" (upload), "herunterladen" (download), "speichern" (save)
  • Error messages: Consistent pattern "Ein Fehler ist beim [action] aufgetreten"
  • Formal address: Use "Sie" form for user-facing text

Italian Translation Notes

  • Keep technical terms in English when commonly used (PDF, API, URL)
  • Use formal address ("Lei" form) for user-facing text
  • Error messages: "Si è verificato un errore durante [action]"
  • UI actions: "carica" (upload), "scarica" (download), "salva" (save)

Common Use Cases

  1. Complete Language Translation: Use Compact Workflow for fastest AI-assisted translation
  2. New Language Addition: Start with compact workflow for comprehensive coverage
  3. Updating Existing Language: Use analyzer to find gaps, then compact or batch method
  4. Quality Assurance: Use analyzer with --summary for completion metrics and issue detection
  5. External Translation Services: Use export functionality to generate CSV files for translators
  6. Structure Maintenance: Use json_beautifier to keep files aligned with en-GB structure