# Description of Changes <!-- Please provide a summary of the changes, including: - What was changed - Why the change was made - Any challenges encountered Closes #(issue_number) --> --- ## Checklist ### General - [ ] I have read the [Contribution Guidelines](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/CONTRIBUTING.md) - [ ] I have read the [Stirling-PDF Developer Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md) (if applicable) - [ ] I have read the [How to add new languages to Stirling-PDF](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md) (if applicable) - [ ] I have performed a self-review of my own code - [ ] My changes generate no new warnings ### Documentation - [ ] I have updated relevant docs on [Stirling-PDF's doc repo](https://github.com/Stirling-Tools/Stirling-Tools.github.io/blob/main/docs/) (if functionality has heavily changed) - [ ] I have read the section [Add New Translation Tags](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md#add-new-translation-tags) (for new translation tags only) ### UI Changes (if applicable) - [ ] Screenshots or videos demonstrating the UI changes are attached (e.g., as comments or direct attachments in the PR) ### Testing (if applicable) - [ ] I have tested my changes locally. Refer to the [Testing Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md#6-testing) for more details.
24 KiB
Translation Management Scripts
This directory contains Python scripts for managing frontend translations in Stirling PDF. These tools help analyze, merge, validate, and manage translations against the en-GB golden truth file.
Current Format: TOML
Stirling PDF uses TOML format for translations in frontend/public/locales/{lang}/translation.toml.
All scripts now support TOML format!
Quick Start - Automated Translation (RECOMMENDED)
The fastest and easiest way to translate a language is using the automated pipeline:
# Set your OpenAI API key
export OPENAI_API_KEY=your_openai_api_key_here
# Translate a language automatically (extract → translate → merge → beautify → verify)
python3 scripts/translations/auto_translate.py es-ES
# With custom batch size (default: 500 entries per batch)
python3 scripts/translations/auto_translate.py es-ES --batch-size 600
# Keep temporary files for inspection
python3 scripts/translations/auto_translate.py es-ES --no-cleanup
What it does:
- Extracts untranslated entries from the language file
- Splits into batches (default 500 entries each)
- Translates each batch using GPT-5 with specialized prompts
- Validates placeholders are preserved
- Merges translated batches
- Applies translations to language file
- Beautifies structure to match en-GB
- Cleans up temporary files
- Reports final completion percentage
Time: ~8-10 minutes per language with 1200+ untranslated entries
Cost: ~$2-4 per language using GPT-5 (or use gpt-5-mini for lower cost)
See auto_translate.py for full details.
Scripts Overview
0. Validation Scripts (Run First!)
json_validator.py
Validates JSON syntax in translation files with detailed error reporting.
Usage:
# Validate single file
python scripts/translations/json_validator.py ar_AR_batch_1_of_3.json
# Validate all batches for a language
python scripts/translations/json_validator.py --all-batches ar_AR
# Validate pattern with wildcards
python scripts/translations/json_validator.py "ar_AR_batch_*.json"
# Brief output (no context)
python scripts/translations/json_validator.py --all-batches ar_AR --brief
# Only show files with errors
python scripts/translations/json_validator.py --all-batches ar_AR --quiet
Features:
- Validates JSON syntax with detailed error messages
- Shows exact line, column, and character position of errors
- Displays context around errors for easy fixing
- Suggests common fixes based on error type
- Detects unescaped quotes and backslashes
- Reports entry counts for valid files
- Exit code 1 if any files invalid (good for CI/CD)
Common Issues Detected:
- Unescaped quotes inside strings:
"text with "quotes""→"text with \"quotes\"" - Invalid backslash escapes:
\d{4}→\\d{4} - Missing commas between entries
- Trailing commas before closing braces
validate_placeholders.py
Validates that translation files have correct placeholders matching en-GB (source of truth).
Usage:
# Validate all languages
python scripts/translations/validate_placeholders.py
# Validate specific language
python scripts/translations/validate_placeholders.py --language es-ES
# Show detailed text samples
python scripts/translations/validate_placeholders.py --verbose
# Output as JSON
python scripts/translations/validate_placeholders.py --json
Features:
- Detects missing placeholders (e.g., {n}, {total}, {filename})
- Detects extra placeholders not in en-GB
- Shows exact keys and text where issues occur
- Exit code 1 if issues found (good for CI/CD)
validate_json_structure.py
Validates JSON structure and key consistency with en-GB.
Usage:
# Validate all languages
python scripts/translations/validate_json_structure.py
# Validate specific language
python scripts/translations/validate_json_structure.py --language de-DE
# Show all missing/extra keys
python scripts/translations/validate_json_structure.py --verbose
# Output as JSON
python scripts/translations/validate_json_structure.py --json
Features:
- Validates JSON syntax
- Detects missing keys (not translated yet)
- Detects extra keys (not in en-GB, should be removed)
- Reports key counts and structure differences
- Exit code 1 if issues found (good for CI/CD)
1. translation_analyzer.py
Analyzes translation files to find missing translations, untranslated entries, and provides completion statistics.
Usage:
# Analyze all languages
python scripts/translations/translation_analyzer.py
# Analyze specific language
python scripts/translations/translation_analyzer.py --language fr-FR
# Show only missing translations
python scripts/translations/translation_analyzer.py --missing-only
# Show only untranslated entries
python scripts/translations/translation_analyzer.py --untranslated-only
# Show summary only
python scripts/translations/translation_analyzer.py --summary
# JSON output format
python scripts/translations/translation_analyzer.py --format json
Features:
- Finds missing translation keys
- Identifies untranslated entries (identical to en-GB and [UNTRANSLATED] markers)
- Shows accurate completion percentages using ignore patterns
- Identifies extra keys not in en-GB
- Supports JSON and text output formats
- Uses
scripts/ignore_translation.tomlfor language-specific exclusions
2. translation_merger.py
Merges missing translations from en-GB into target language files and manages translation workflows.
Usage:
# Add missing translations from en-GB to French
python scripts/translations/translation_merger.py fr-FR add-missing
# Extract untranslated entries to a file
python scripts/translations/translation_merger.py fr-FR extract-untranslated --output fr_untranslated.json
# Create a template for AI translation
python scripts/translations/translation_merger.py fr-FR create-template --output fr_template.json
# Apply translations from a file
python scripts/translations/translation_merger.py fr-FR apply-translations --translations-file fr_translated.json
Features:
- Adds missing keys from en-GB (copies English text directly)
- Extracts untranslated entries for external translation
- Creates structured templates for AI translation
- Applies translated content back to language files
- Automatic backup creation
3. ai_translation_helper.py
Specialized tool for AI-assisted translation workflows with batch processing and validation.
Usage:
# Create batch file for AI translation (multiple languages)
python scripts/translations/ai_translation_helper.py create-batch --languages fr-FR de-DE es-ES --output batch.json --max-entries 50
# Validate AI translations
python scripts/translations/ai_translation_helper.py validate batch.json
# Apply validated AI translations
python scripts/translations/ai_translation_helper.py apply-batch batch.json
# Export for external translation services
python scripts/translations/ai_translation_helper.py export --languages fr-FR de-DE --format csv
Features:
- Creates batch files for AI translation of multiple languages
- Prioritizes important translation keys
- Validates translations for placeholders and artifacts
- Applies batch translations with validation
- Exports to CSV/JSON for external translation services
4. compact_translator.py
Extracts untranslated entries in minimal JSON format for character-limited AI services.
Usage:
# Extract all untranslated entries
python scripts/translations/compact_translator.py it-IT --output to_translate.json
Features:
- Produces minimal JSON output with no extra whitespace
- Automatic ignore patterns for cleaner output
- Batch size control for manageable chunks
- 50-80% fewer characters than other extraction methods
5. auto_translate.py - Automated Translation Pipeline
NEW: Fully automated translation workflow using GPT-5.
Combines all translation steps into a single command that handles everything from extraction to verification.
Usage:
# Basic usage (requires OPENAI_API_KEY environment variable)
export OPENAI_API_KEY=your_api_key
python3 scripts/translations/auto_translate.py es-ES
# With inline API key
python3 scripts/translations/auto_translate.py es-ES --api-key YOUR_KEY
# Custom batch size (default: 500 entries)
python3 scripts/translations/auto_translate.py es-ES --batch-size 600
# Custom timeout per batch (default: 600 seconds / 10 minutes)
python3 scripts/translations/auto_translate.py es-ES --timeout 900
# Keep temporary files for debugging
python3 scripts/translations/auto_translate.py es-ES --no-cleanup
# Skip final verification
python3 scripts/translations/auto_translate.py es-ES --skip-verification
Features:
- Fully automated end-to-end translation pipeline
- Uses GPT-5 with specialized prompts for Stirling PDF
- Preserves all placeholders ({n}, {{variable}}, etc.)
- Maintains consistent terminology
- Validates translations automatically
- Creates backups before modifying files
- Reports detailed progress and final completion %
Pipeline Steps:
- Extract: Finds all untranslated entries
- Split: Divides into manageable batches (default: 500 entries)
- Translate: Uses GPT-5 to translate each batch with specialized prompts
- Validate: Ensures placeholders are preserved
- Merge: Combines all translated batches
- Apply: Updates the language file
- Beautify: Restructures to match en-GB format
- Cleanup: Removes temporary files
- Verify: Reports final completion percentage
Translation Quality:
- Preserves ALL placeholders exactly as-is
- Keeps HTML tags intact (,
, etc.) - Doesn't translate technical terms (PDF, API, OAuth2, etc.)
- Maintains consistent terminology throughout
- Uses appropriate formal/informal tone per language
Supported Languages:
All language codes from frontend/public/locales/ (e.g., es-ES, de-DE, fr-FR, zh-CN, ar-AR, etc.)
6. batch_translator.py - GPT-5 Translation Engine
Low-level translation script used by auto_translate.py. Can be used standalone for manual batch translation.
Usage:
# Translate single batch file
python3 scripts/translations/batch_translator.py my_batch.json --language es-ES --api-key YOUR_KEY
# Translate multiple batches
python3 scripts/translations/batch_translator.py batch_*.json --language de-DE --api-key YOUR_KEY
# Use different GPT model
python3 scripts/translations/batch_translator.py batch.json --language fr-FR --model gpt-5-mini
# Skip validation
python3 scripts/translations/batch_translator.py batch.json --language it-IT --skip-validation
Features:
- Translates JSON batch files using OpenAI GPT-5
- Specialized system prompts for Stirling PDF translations
- Automatic placeholder validation
- Supports pattern matching for multiple files
- Configurable model selection (gpt-5, gpt-5-mini, gpt-5-nano)
- Rate limiting with configurable delays
Models:
gpt-5(default): Best quality, $1.25/1M input, $10/1M outputgpt-5-mini: Balanced quality/costgpt-5-nano: Fastest, most economical
7. json_beautifier.py
Restructures and beautifies translation JSON files to match en-GB structure exactly.
Usage:
# Restructure single language to match en-GB structure
python scripts/translations/json_beautifier.py --language de-DE
# Restructure all languages
python scripts/translations/json_beautifier.py --all-languages
# Validate structure without modifying files
python scripts/translations/json_beautifier.py --language de-DE --validate-only
# Skip backup creation
python scripts/translations/json_beautifier.py --language de-DE --no-backup
Features:
- Restructures JSON to match en-GB nested structure exactly
- Preserves key ordering for line-by-line comparison
- Creates automatic backups before modification
- Validates structure and key ordering
- Handles flattened dot-notation keys (e.g., "key.subkey") properly
Translation Workflows
Method 1: Compact Translation Workflow (RECOMMENDED for AI)
Best for character-limited AI services like Claude or ChatGPT
Step 1: Check Current Status
python scripts/translations/translation_analyzer.py --language it-IT --summary
Step 2: Extract Untranslated Entries
# For small files (< 1200 entries)
python scripts/translations/compact_translator.py it-IT --output to_translate.json
# For large files, split into batches
python scripts/translations/compact_translator.py it-IT --output it_IT_batch --batch-size 400
# Creates: it_IT_batch_1_of_N.json, it_IT_batch_2_of_N.json, etc.
Step 2.5: Validate JSON (if using batches)
# After AI translates the batches, validate them before merging
python scripts/translations/json_validator.py --all-batches it_IT
# Fix any errors reported (common issues: unescaped quotes, backslashes)
Output format: Compact JSON with minimal whitespace
{"key1":"English text","key2":"Another text","key3":"More text"}
Step 3: AI Translation
- Copy the compact JSON output
- Give it to your AI with instructions:
Translate this JSON to Italian. Keep the same structure, translate only the values. Preserve placeholders like {n}, {total}, {filename}, {{variable}}. - Save the AI's response as
translated.json
Step 4: Apply Translations
python scripts/translations/translation_merger.py it-IT apply-translations --translations-file translated.json
Step 5: Verify Results
python scripts/translations/translation_analyzer.py --language it-IT --summary
Method 2: Batch Translation Workflow
For complete language translation from scratch or major updates
Step 1: Analyze Current State
python scripts/translations/translation_analyzer.py --language de-DE --summary
Step 2: Create Translation Batches
# Create batches of 100 entries each for systematic translation
python scripts/translations/ai_translation_helper.py create-batch --languages de-DE --output de_batch_1.json --max-entries 100
Step 3: Translate Batch with AI
Edit the batch file and fill in ALL translated fields:
- Preserve all placeholders like
{n},{total},{filename},{{toolName}} - Keep technical terms consistent
- Maintain JSON structure exactly
- Consider context provided for each entry
Step 4: Apply Translations
# Skip validation if using legitimate placeholders ({{variable}})
python scripts/translations/ai_translation_helper.py apply-batch de_batch_1.json --skip-validation
Step 5: Check Progress and Continue
python scripts/translations/translation_analyzer.py --language de-DE --summary
Repeat steps 2-5 until 100% complete.
Method 3: Quick Translation Workflow (Legacy)
For small updates or existing translations
Step 1: Add Missing Translations
python scripts/translations/translation_merger.py fr-FR add-missing
Step 2: Create AI Template
python scripts/translations/translation_merger.py fr-FR create-template --output fr_template.json
Step 3: Apply Translations
python scripts/translations/translation_merger.py fr-FR apply-translations --translations-file fr_translated.json
Translation File Structure
Translation files are located in frontend/public/locales/{language}/translation.toml with TOML structure:
[addPageNumbers]
title = "Add Page Numbers"
[addPageNumbers.selectText]
"1" = "Select PDF file:"
"2" = "Margin Size"
Keys use dot notation internally (e.g., addPageNumbers.selectText.1).
Key Features
Placeholder Preservation
All scripts preserve placeholders like {n}, {total}, {filename} in translations:
"customNumberDesc": "Defaults to {n}, also accepts 'Page {n} of {total}'"
Automatic Backups
Scripts create timestamped backups before modifying files:
translation.backup.20241201_143022.toml
Context-Aware Translation
Scripts provide context information to help with accurate translations:
{
"addPageNumbers.title": {
"original": "Add Page Numbers",
"context": "Feature for adding page numbers to PDFs"
}
}
Priority-Based Translation
Important keys (title, submit, error messages) are prioritized when limiting translation batch sizes.
Ignore Patterns System
The scripts/ignore_translation.toml file defines keys that should be ignored for each language, improving completion accuracy.
Common ignore patterns:
language.direction: Text direction (ltr/rtl) - universallang.*: Language code entries not relevant to specific localespipeline.title,home.devApi.title: Technical terms kept in English- Specific technical IDs, version numbers, and system identifiers
Format:
[de_DE]
ignore = [
'language.direction',
'pipeline.title',
'lang.afr',
'lang.ceb',
# ... more patterns
]
Best Practices & Lessons Learned
Critical Rules for Translation
- NEVER skip entries: Translate ALL entries in each batch to ensure completeness
- Use appropriate batch sizes: 100 entries for systematic translation, unlimited for compact method
- Skip validation for placeholders: Use
--skip-validationwhen batch contains{{variable}}patterns - Check progress between batches: Use
--summaryflag to track completion percentage - Preserve all placeholders: Keep
{n},{total},{filename},{{toolName}}exactly as-is
Workflow Comparison
| Method | Best For | Character Usage | Complexity | Speed |
|---|---|---|---|---|
| Compact | AI services | Minimal (50-80% less) | Simple | Fastest |
| Batch | Systematic translation | Moderate | Medium | Medium |
| Quick | Small updates | High | Low | Slow |
Common Issues and Solutions
JSON Syntax Errors in AI Translations
Problem: AI-translated batch files have JSON syntax errors Symptoms:
JSONDecodeError: Expecting ',' delimiterJSONDecodeError: Invalid \escape
Solution:
# 1. Validate all batches to find errors
python scripts/translations/json_validator.py --all-batches ar_AR
# 2. Check detailed error with context
python scripts/translations/json_validator.py ar_AR_batch_2_of_3.json
# 3. Fix the reported issues:
# - Unescaped quotes: "text with "quotes"" → "text with \"quotes\""
# - Backslashes in regex: "\d{4}" → "\\d{4}"
# - Missing commas between entries
# 4. Validate again until all pass
python scripts/translations/json_validator.py --all-batches ar_AR
Common fixes:
- Arabic/RTL text with embedded quotes: Always escape with backslash
- Regex patterns: Double all backslashes (
\d→\\d) - Check for missing/extra commas at line reported in error
Validation False Positives
Problem: Validator flags legitimate {{variable}} placeholders as artifacts
Solution: Use --skip-validation flag when applying batches with template variables
JSON Structure Mismatches
Problem: Flattened dot-notation keys instead of proper nested objects
Solution: Use json_beautifier.py to restructure files to match en-GB exactly
Real-World Examples
Complete Arabic Translation with Validation (Batch Method)
# Check status
python scripts/translations/translation_analyzer.py --language ar-AR --summary
# Result: 50% complete, 1088 missing
# Extract in batches due to AI token limits
python scripts/translations/compact_translator.py ar-AR --output ar_AR_batch --batch-size 400
# Created: ar_AR_batch_1_of_3.json (400 entries)
# ar_AR_batch_2_of_3.json (400 entries)
# ar_AR_batch_3_of_3.json (288 entries)
# [Send each batch to AI for translation]
# Validate translated batches before merging
python scripts/translations/json_validator.py --all-batches ar_AR
# Found errors in batch 1 and 2:
# - Line 263: Unescaped quotes in "انقر "إضافة ملفات""
# - Line 132: Unescaped quotes in "أو "and""
# - Line 213: Invalid escape "\d{4}"
# Fix errors manually or with sed, then validate again
python scripts/translations/json_validator.py --all-batches ar_AR
# All valid!
# Merge all batches
python3 << 'EOF'
import json
merged = {}
for i in range(1, 4):
with open(f'ar_AR_batch_{i}_of_3.json', 'r', encoding='utf-8') as f:
merged.update(json.load(f))
with open('ar_AR_merged.json', 'w', encoding='utf-8') as f:
json.dump(merged, f, ensure_ascii=False, indent=2)
EOF
# Apply merged translations
python scripts/translations/translation_merger.py ar-AR apply-translations --translations-file ar_AR_merged.json
# Result: Applied 1088 translations
# Beautify to match en-GB structure
python scripts/translations/json_beautifier.py --language ar-AR
# Check final progress
python scripts/translations/translation_analyzer.py --language ar-AR --summary
# Result: 98.7% complete, 9 missing, 20 untranslated
Complete Italian Translation (Compact Method)
# Check status
python scripts/translations/translation_analyzer.py --language it-IT --summary
# Result: 46.8% complete, 1147 missing
# Extract all entries for translation
python scripts/translations/compact_translator.py it-IT --output batch1.json
# [Translate batch1.json with AI, save as batch1_translated.json]
# Apply translations
python scripts/translations/translation_merger.py it-IT apply-translations --translations-file batch1_translated.json
# Result: Applied 1147 translations
# Check progress
python scripts/translations/translation_analyzer.py --language it-IT --summary
# Result: 100% complete, 0 missing
German Translation (Batch Method)
Starting from 46.3% completion, reaching 60.3% with batch method:
# Initial analysis
python scripts/translations/translation_analyzer.py --language de-DE --summary
# Result: 46.3% complete, 1142 missing entries
# Batch 1 (100 entries)
python scripts/translations/ai_translation_helper.py create-batch --languages de-DE --output de_batch_1.json --max-entries 100
# [Translate all 100 entries in batch file]
python scripts/translations/ai_translation_helper.py apply-batch de_batch_1.json --skip-validation
# Progress: 46.6% → 51.2%
# Continue with more batches until 100% complete
Error Handling
- Missing Files: Scripts create new files when language directories don't exist
- Invalid JSON: Clear error messages with line numbers
- Placeholder Mismatches: Validation warnings for missing or extra placeholders
- Legacy [UNTRANSLATED] Markers: Detected and stripped for backwards compatibility
- Backup Failures: Graceful handling with user notification
Integration with Development
These scripts integrate with the existing translation system:
- Works with the current
frontend/public/locales/structure - Compatible with the i18n system used in the React frontend
- Respects the JSON format expected by the translation loader
- Maintains the nested structure required by the UI components
Language-Specific Notes
German Translation Notes
- Technical terms: Use German equivalents (PDF → PDF, API → API)
- UI actions: "hochladen" (upload), "herunterladen" (download), "speichern" (save)
- Error messages: Consistent pattern "Ein Fehler ist beim [action] aufgetreten"
- Formal address: Use "Sie" form for user-facing text
Italian Translation Notes
- Keep technical terms in English when commonly used (PDF, API, URL)
- Use formal address ("Lei" form) for user-facing text
- Error messages: "Si è verificato un errore durante [action]"
- UI actions: "carica" (upload), "scarica" (download), "salva" (save)
Common Use Cases
- Complete Language Translation: Use Compact Workflow for fastest AI-assisted translation
- New Language Addition: Start with compact workflow for comprehensive coverage
- Updating Existing Language: Use analyzer to find gaps, then compact or batch method
- Quality Assurance: Use analyzer with
--summaryfor completion metrics and issue detection - External Translation Services: Use export functionality to generate CSV files for translators
- Structure Maintenance: Use json_beautifier to keep files aligned with en-GB structure