.. | ||
ai_translation_helper.py | ||
compact_translator.py | ||
json_beautifier.py | ||
README.md | ||
translation_analyzer.py | ||
translation_merger.py |
Translation Management Scripts
This directory contains Python scripts for managing frontend translations in Stirling PDF. These tools help analyze, merge, and manage translations against the en-GB golden truth file.
Scripts Overview
1. translation_analyzer.py
Analyzes translation files to find missing translations, untranslated entries, and provides completion statistics.
Usage:
# Analyze all languages
python scripts/translations/translation_analyzer.py
# Analyze specific language
python scripts/translations/translation_analyzer.py --language fr-FR
# Show only missing translations
python scripts/translations/translation_analyzer.py --missing-only
# Show only untranslated entries
python scripts/translations/translation_analyzer.py --untranslated-only
# Show summary only
python scripts/translations/translation_analyzer.py --summary
# JSON output format
python scripts/translations/translation_analyzer.py --format json
Features:
- Finds missing translation keys
- Identifies untranslated entries (identical to en-GB and [UNTRANSLATED] markers)
- Shows accurate completion percentages using ignore patterns
- Identifies extra keys not in en-GB
- Supports JSON and text output formats
- Uses
scripts/ignore_translation.toml
for language-specific exclusions
2. translation_merger.py
Merges missing translations from en-GB into target language files and manages translation workflows.
Usage:
# Add missing translations from en-GB to French
python scripts/translations/translation_merger.py fr-FR add-missing
# Add without marking as [UNTRANSLATED]
python scripts/translations/translation_merger.py fr-FR add-missing --no-mark-untranslated
# Extract untranslated entries to a file
python scripts/translations/translation_merger.py fr-FR extract-untranslated --output fr_untranslated.json
# Create a template for AI translation
python scripts/translations/translation_merger.py fr-FR create-template --output fr_template.json
# Apply translations from a file
python scripts/translations/translation_merger.py fr-FR apply-translations --translations-file fr_translated.json
Features:
- Adds missing keys from en-GB with optional [UNTRANSLATED] markers
- Extracts untranslated entries for external translation
- Creates structured templates for AI translation
- Applies translated content back to language files
- Automatic backup creation
3. ai_translation_helper.py
Specialized tool for AI-assisted translation workflows with batch processing and validation.
Usage:
# Create batch file for AI translation (multiple languages)
python scripts/translations/ai_translation_helper.py create-batch --languages fr-FR de-DE es-ES --output batch.json --max-entries 50
# Validate AI translations
python scripts/translations/ai_translation_helper.py validate batch.json
# Apply validated AI translations
python scripts/translations/ai_translation_helper.py apply-batch batch.json
# Export for external translation services
python scripts/translations/ai_translation_helper.py export --languages fr-FR de-DE --format csv
Features:
- Creates batch files for AI translation of multiple languages
- Prioritizes important translation keys
- Validates translations for placeholders and artifacts
- Applies batch translations with validation
- Exports to CSV/JSON for external translation services
4. compact_translator.py
Extracts untranslated entries in minimal JSON format for character-limited AI services.
Usage:
# Extract all untranslated entries
python scripts/translations/compact_translator.py it-IT --output to_translate.json
Features:
- Produces minimal JSON output with no extra whitespace
- Automatic ignore patterns for cleaner output
- Batch size control for manageable chunks
- 50-80% fewer characters than other extraction methods
5. json_beautifier.py
Restructures and beautifies translation JSON files to match en-GB structure exactly.
Usage:
# Restructure single language to match en-GB structure
python scripts/translations/json_beautifier.py --language de-DE
# Restructure all languages
python scripts/translations/json_beautifier.py --all-languages
# Validate structure without modifying files
python scripts/translations/json_beautifier.py --language de-DE --validate-only
# Skip backup creation
python scripts/translations/json_beautifier.py --language de-DE --no-backup
Features:
- Restructures JSON to match en-GB nested structure exactly
- Preserves key ordering for line-by-line comparison
- Creates automatic backups before modification
- Validates structure and key ordering
- Handles flattened dot-notation keys (e.g., "key.subkey") properly
Translation Workflows
Method 1: Compact Translation Workflow (RECOMMENDED for AI)
Best for character-limited AI services like Claude or ChatGPT
Step 1: Check Current Status
python scripts/translations/translation_analyzer.py --language it-IT --summary
Step 2: Extract Untranslated Entries
python scripts/translations/compact_translator.py it-IT --output to_translate.json
Output format: Compact JSON with minimal whitespace
{"key1":"English text","key2":"Another text","key3":"More text"}
Step 3: AI Translation
- Copy the compact JSON output
- Give it to your AI with instructions:
Translate this JSON to Italian. Keep the same structure, translate only the values. Preserve placeholders like {n}, {total}, {filename}, {{variable}}.
- Save the AI's response as
translated.json
Step 4: Apply Translations
python scripts/translations/translation_merger.py it-IT apply-translations --translations-file translated.json
Step 5: Verify Results
python scripts/translations/translation_analyzer.py --language it-IT --summary
Method 2: Batch Translation Workflow
For complete language translation from scratch or major updates
Step 1: Analyze Current State
python scripts/translations/translation_analyzer.py --language de-DE --summary
Step 2: Create Translation Batches
# Create batches of 100 entries each for systematic translation
python scripts/translations/ai_translation_helper.py create-batch --languages de-DE --output de_batch_1.json --max-entries 100
Step 3: Translate Batch with AI
Edit the batch file and fill in ALL translated
fields:
- Preserve all placeholders like
{n}
,{total}
,{filename}
,{{toolName}}
- Keep technical terms consistent
- Maintain JSON structure exactly
- Consider context provided for each entry
Step 4: Apply Translations
# Skip validation if using legitimate placeholders ({{variable}})
python scripts/translations/ai_translation_helper.py apply-batch de_batch_1.json --skip-validation
Step 5: Check Progress and Continue
python scripts/translations/translation_analyzer.py --language de-DE --summary
Repeat steps 2-5 until 100% complete.
Method 3: Quick Translation Workflow (Legacy)
For small updates or existing translations
Step 1: Add Missing Translations
python scripts/translations/translation_merger.py fr-FR add-missing --mark-untranslated
Step 2: Create AI Template
python scripts/translations/translation_merger.py fr-FR create-template --output fr_template.json
Step 3: Apply Translations
python scripts/translations/translation_merger.py fr-FR apply-translations --translations-file fr_translated.json
Translation File Structure
Translation files are located in frontend/public/locales/{language}/translation.json
with nested JSON structure:
{
"addPageNumbers": {
"title": "Add Page Numbers",
"selectText": {
"1": "Select PDF file:",
"2": "Margin Size"
}
}
}
Keys use dot notation internally (e.g., addPageNumbers.selectText.1
).
Key Features
Placeholder Preservation
All scripts preserve placeholders like {n}
, {total}
, {filename}
in translations:
"customNumberDesc": "Defaults to {n}, also accepts 'Page {n} of {total}'"
Automatic Backups
Scripts create timestamped backups before modifying files:
translation.backup.20241201_143022.json
Context-Aware Translation
Scripts provide context information to help with accurate translations:
{
"addPageNumbers.title": {
"original": "Add Page Numbers",
"context": "Feature for adding page numbers to PDFs"
}
}
Priority-Based Translation
Important keys (title, submit, error messages) are prioritized when limiting translation batch sizes.
Ignore Patterns System
The scripts/ignore_translation.toml
file defines keys that should be ignored for each language, improving completion accuracy.
Common ignore patterns:
language.direction
: Text direction (ltr/rtl) - universallang.*
: Language code entries not relevant to specific localespipeline.title
,home.devApi.title
: Technical terms kept in English- Specific technical IDs, version numbers, and system identifiers
Format:
[de_DE]
ignore = [
'language.direction',
'pipeline.title',
'lang.afr',
'lang.ceb',
# ... more patterns
]
Best Practices & Lessons Learned
Critical Rules for Translation
- NEVER skip entries: Translate ALL entries in each batch to avoid [UNTRANSLATED] pollution
- Use appropriate batch sizes: 100 entries for systematic translation, unlimited for compact method
- Skip validation for placeholders: Use
--skip-validation
when batch contains{{variable}}
patterns - Check progress between batches: Use
--summary
flag to track completion percentage - Preserve all placeholders: Keep
{n}
,{total}
,{filename}
,{{toolName}}
exactly as-is
Workflow Comparison
Method | Best For | Character Usage | Complexity | Speed |
---|---|---|---|---|
Compact | AI services | Minimal (50-80% less) | Simple | Fastest |
Batch | Systematic translation | Moderate | Medium | Medium |
Quick | Small updates | High | Low | Slow |
Common Issues and Solutions
[UNTRANSLATED] Pollution
Problem: Hundreds of [UNTRANSLATED] markers from incomplete translation attempts Solution:
- Only translate complete batches of manageable size
- Use analyzer that counts [UNTRANSLATED] as missing translations
- Restore from backup if pollution occurs
Validation False Positives
Problem: Validator flags legitimate {{variable}}
placeholders as artifacts
Solution: Use --skip-validation
flag when applying batches with template variables
JSON Structure Mismatches
Problem: Flattened dot-notation keys instead of proper nested objects
Solution: Use json_beautifier.py
to restructure files to match en-GB exactly
Real-World Examples
Complete Italian Translation (Compact Method)
# Check status
python scripts/translations/translation_analyzer.py --language it-IT --summary
# Result: 46.8% complete, 1147 missing
# Extract all entries for translation
python scripts/translations/compact_translator.py it-IT --output batch1.json
# [Translate batch1.json with AI, save as batch1_translated.json]
# Apply translations
python scripts/translations/translation_merger.py it-IT apply-translations --translations-file batch1_translated.json
# Result: Applied 1147 translations
# Check progress
python scripts/translations/translation_analyzer.py --language it-IT --summary
# Result: 100% complete, 0 missing
German Translation (Batch Method)
Starting from 46.3% completion, reaching 60.3% with batch method:
# Initial analysis
python scripts/translations/translation_analyzer.py --language de-DE --summary
# Result: 46.3% complete, 1142 missing entries
# Batch 1 (100 entries)
python scripts/translations/ai_translation_helper.py create-batch --languages de-DE --output de_batch_1.json --max-entries 100
# [Translate all 100 entries in batch file]
python scripts/translations/ai_translation_helper.py apply-batch de_batch_1.json --skip-validation
# Progress: 46.6% → 51.2%
# Continue with more batches until 100% complete
Error Handling
- Missing Files: Scripts create new files when language directories don't exist
- Invalid JSON: Clear error messages with line numbers
- Placeholder Mismatches: Validation warnings for missing or extra placeholders
- [UNTRANSLATED] Entries: Counted as missing translations to prevent pollution
- Backup Failures: Graceful handling with user notification
Integration with Development
These scripts integrate with the existing translation system:
- Works with the current
frontend/public/locales/
structure - Compatible with the i18n system used in the React frontend
- Respects the JSON format expected by the translation loader
- Maintains the nested structure required by the UI components
Language-Specific Notes
German Translation Notes
- Technical terms: Use German equivalents (PDF → PDF, API → API)
- UI actions: "hochladen" (upload), "herunterladen" (download), "speichern" (save)
- Error messages: Consistent pattern "Ein Fehler ist beim [action] aufgetreten"
- Formal address: Use "Sie" form for user-facing text
Italian Translation Notes
- Keep technical terms in English when commonly used (PDF, API, URL)
- Use formal address ("Lei" form) for user-facing text
- Error messages: "Si è verificato un errore durante [action]"
- UI actions: "carica" (upload), "scarica" (download), "salva" (save)
Common Use Cases
- Complete Language Translation: Use Compact Workflow for fastest AI-assisted translation
- New Language Addition: Start with compact workflow for comprehensive coverage
- Updating Existing Language: Use analyzer to find gaps, then compact or batch method
- Quality Assurance: Use analyzer with
--summary
for completion metrics and issue detection - External Translation Services: Use export functionality to generate CSV files for translators
- Structure Maintenance: Use json_beautifier to keep files aligned with en-GB structure