mirror of
https://github.com/Frooodle/Stirling-PDF.git
synced 2025-09-26 17:52:59 +02:00
# Description of Changes <!-- Please provide a summary of the changes, including: - What was changed - Why the change was made - Any challenges encountered Closes #(issue_number) --> --- ## Checklist ### General - [ ] I have read the [Contribution Guidelines](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/CONTRIBUTING.md) - [ ] I have read the [Stirling-PDF Developer Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md) (if applicable) - [ ] I have read the [How to add new languages to Stirling-PDF](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md) (if applicable) - [ ] I have performed a self-review of my own code - [ ] My changes generate no new warnings ### Documentation - [ ] I have updated relevant docs on [Stirling-PDF's doc repo](https://github.com/Stirling-Tools/Stirling-Tools.github.io/blob/main/docs/) (if functionality has heavily changed) - [ ] I have read the section [Add New Translation Tags](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md#add-new-translation-tags) (for new translation tags only) ### UI Changes (if applicable) - [ ] Screenshots or videos demonstrating the UI changes are attached (e.g., as comments or direct attachments in the PR) ### Testing (if applicable) - [ ] I have tested my changes locally. Refer to the [Testing Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md#6-testing) for more details.
403 lines
14 KiB
Markdown
403 lines
14 KiB
Markdown
# Translation Management Scripts
|
|
|
|
This directory contains Python scripts for managing frontend translations in Stirling PDF. These tools help analyze, merge, and manage translations against the en-GB golden truth file.
|
|
|
|
## Scripts Overview
|
|
|
|
### 1. `translation_analyzer.py`
|
|
Analyzes translation files to find missing translations, untranslated entries, and provides completion statistics.
|
|
|
|
**Usage:**
|
|
```bash
|
|
# Analyze all languages
|
|
python scripts/translations/translation_analyzer.py
|
|
|
|
# Analyze specific language
|
|
python scripts/translations/translation_analyzer.py --language fr-FR
|
|
|
|
# Show only missing translations
|
|
python scripts/translations/translation_analyzer.py --missing-only
|
|
|
|
# Show only untranslated entries
|
|
python scripts/translations/translation_analyzer.py --untranslated-only
|
|
|
|
# Show summary only
|
|
python scripts/translations/translation_analyzer.py --summary
|
|
|
|
# JSON output format
|
|
python scripts/translations/translation_analyzer.py --format json
|
|
```
|
|
|
|
**Features:**
|
|
- Finds missing translation keys
|
|
- Identifies untranslated entries (identical to en-GB and [UNTRANSLATED] markers)
|
|
- Shows accurate completion percentages using ignore patterns
|
|
- Identifies extra keys not in en-GB
|
|
- Supports JSON and text output formats
|
|
- Uses `scripts/ignore_translation.toml` for language-specific exclusions
|
|
|
|
### 2. `translation_merger.py`
|
|
Merges missing translations from en-GB into target language files and manages translation workflows.
|
|
|
|
**Usage:**
|
|
```bash
|
|
# Add missing translations from en-GB to French
|
|
python scripts/translations/translation_merger.py fr-FR add-missing
|
|
|
|
# Add without marking as [UNTRANSLATED]
|
|
python scripts/translations/translation_merger.py fr-FR add-missing --no-mark-untranslated
|
|
|
|
# Extract untranslated entries to a file
|
|
python scripts/translations/translation_merger.py fr-FR extract-untranslated --output fr_untranslated.json
|
|
|
|
# Create a template for AI translation
|
|
python scripts/translations/translation_merger.py fr-FR create-template --output fr_template.json
|
|
|
|
# Apply translations from a file
|
|
python scripts/translations/translation_merger.py fr-FR apply-translations --translations-file fr_translated.json
|
|
```
|
|
|
|
**Features:**
|
|
- Adds missing keys from en-GB with optional [UNTRANSLATED] markers
|
|
- Extracts untranslated entries for external translation
|
|
- Creates structured templates for AI translation
|
|
- Applies translated content back to language files
|
|
- Automatic backup creation
|
|
|
|
### 3. `ai_translation_helper.py`
|
|
Specialized tool for AI-assisted translation workflows with batch processing and validation.
|
|
|
|
**Usage:**
|
|
```bash
|
|
# Create batch file for AI translation (multiple languages)
|
|
python scripts/translations/ai_translation_helper.py create-batch --languages fr-FR de-DE es-ES --output batch.json --max-entries 50
|
|
|
|
# Validate AI translations
|
|
python scripts/translations/ai_translation_helper.py validate batch.json
|
|
|
|
# Apply validated AI translations
|
|
python scripts/translations/ai_translation_helper.py apply-batch batch.json
|
|
|
|
# Export for external translation services
|
|
python scripts/translations/ai_translation_helper.py export --languages fr-FR de-DE --format csv
|
|
```
|
|
|
|
**Features:**
|
|
- Creates batch files for AI translation of multiple languages
|
|
- Prioritizes important translation keys
|
|
- Validates translations for placeholders and artifacts
|
|
- Applies batch translations with validation
|
|
- Exports to CSV/JSON for external translation services
|
|
|
|
### 4. `compact_translator.py`
|
|
Extracts untranslated entries in minimal JSON format for character-limited AI services.
|
|
|
|
**Usage:**
|
|
```bash
|
|
# Extract all untranslated entries
|
|
python scripts/translations/compact_translator.py it-IT --output to_translate.json
|
|
```
|
|
|
|
**Features:**
|
|
- Produces minimal JSON output with no extra whitespace
|
|
- Automatic ignore patterns for cleaner output
|
|
- Batch size control for manageable chunks
|
|
- 50-80% fewer characters than other extraction methods
|
|
|
|
### 5. `json_beautifier.py`
|
|
Restructures and beautifies translation JSON files to match en-GB structure exactly.
|
|
|
|
**Usage:**
|
|
```bash
|
|
# Restructure single language to match en-GB structure
|
|
python scripts/translations/json_beautifier.py --language de-DE
|
|
|
|
# Restructure all languages
|
|
python scripts/translations/json_beautifier.py --all-languages
|
|
|
|
# Validate structure without modifying files
|
|
python scripts/translations/json_beautifier.py --language de-DE --validate-only
|
|
|
|
# Skip backup creation
|
|
python scripts/translations/json_beautifier.py --language de-DE --no-backup
|
|
```
|
|
|
|
**Features:**
|
|
- Restructures JSON to match en-GB nested structure exactly
|
|
- Preserves key ordering for line-by-line comparison
|
|
- Creates automatic backups before modification
|
|
- Validates structure and key ordering
|
|
- Handles flattened dot-notation keys (e.g., "key.subkey") properly
|
|
|
|
## Translation Workflows
|
|
|
|
### Method 1: Compact Translation Workflow (RECOMMENDED for AI)
|
|
|
|
**Best for character-limited AI services like Claude or ChatGPT**
|
|
|
|
#### Step 1: Check Current Status
|
|
```bash
|
|
python scripts/translations/translation_analyzer.py --language it-IT --summary
|
|
```
|
|
|
|
#### Step 2: Extract Untranslated Entries
|
|
```bash
|
|
python scripts/translations/compact_translator.py it-IT --output to_translate.json
|
|
```
|
|
|
|
**Output format**: Compact JSON with minimal whitespace
|
|
```json
|
|
{"key1":"English text","key2":"Another text","key3":"More text"}
|
|
```
|
|
|
|
#### Step 3: AI Translation
|
|
1. Copy the compact JSON output
|
|
2. Give it to your AI with instructions:
|
|
```
|
|
Translate this JSON to Italian. Keep the same structure, translate only the values.
|
|
Preserve placeholders like {n}, {total}, {filename}, {{variable}}.
|
|
```
|
|
3. Save the AI's response as `translated.json`
|
|
|
|
#### Step 4: Apply Translations
|
|
```bash
|
|
python scripts/translations/translation_merger.py it-IT apply-translations --translations-file translated.json
|
|
```
|
|
|
|
#### Step 5: Verify Results
|
|
```bash
|
|
python scripts/translations/translation_analyzer.py --language it-IT --summary
|
|
```
|
|
|
|
### Method 2: Batch Translation Workflow
|
|
|
|
**For complete language translation from scratch or major updates**
|
|
|
|
#### Step 1: Analyze Current State
|
|
```bash
|
|
python scripts/translations/translation_analyzer.py --language de-DE --summary
|
|
```
|
|
|
|
#### Step 2: Create Translation Batches
|
|
```bash
|
|
# Create batches of 100 entries each for systematic translation
|
|
python scripts/translations/ai_translation_helper.py create-batch --languages de-DE --output de_batch_1.json --max-entries 100
|
|
```
|
|
|
|
#### Step 3: Translate Batch with AI
|
|
Edit the batch file and fill in ALL `translated` fields:
|
|
- Preserve all placeholders like `{n}`, `{total}`, `{filename}`, `{{toolName}}`
|
|
- Keep technical terms consistent
|
|
- Maintain JSON structure exactly
|
|
- Consider context provided for each entry
|
|
|
|
#### Step 4: Apply Translations
|
|
```bash
|
|
# Skip validation if using legitimate placeholders ({{variable}})
|
|
python scripts/translations/ai_translation_helper.py apply-batch de_batch_1.json --skip-validation
|
|
```
|
|
|
|
#### Step 5: Check Progress and Continue
|
|
```bash
|
|
python scripts/translations/translation_analyzer.py --language de-DE --summary
|
|
```
|
|
Repeat steps 2-5 until 100% complete.
|
|
|
|
### Method 3: Quick Translation Workflow (Legacy)
|
|
|
|
**For small updates or existing translations**
|
|
|
|
#### Step 1: Add Missing Translations
|
|
```bash
|
|
python scripts/translations/translation_merger.py fr-FR add-missing --mark-untranslated
|
|
```
|
|
|
|
#### Step 2: Create AI Template
|
|
```bash
|
|
python scripts/translations/translation_merger.py fr-FR create-template --output fr_template.json
|
|
```
|
|
|
|
#### Step 3: Apply Translations
|
|
```bash
|
|
python scripts/translations/translation_merger.py fr-FR apply-translations --translations-file fr_translated.json
|
|
```
|
|
|
|
## Translation File Structure
|
|
|
|
Translation files are located in `frontend/public/locales/{language}/translation.json` with nested JSON structure:
|
|
|
|
```json
|
|
{
|
|
"addPageNumbers": {
|
|
"title": "Add Page Numbers",
|
|
"selectText": {
|
|
"1": "Select PDF file:",
|
|
"2": "Margin Size"
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
Keys use dot notation internally (e.g., `addPageNumbers.selectText.1`).
|
|
|
|
## Key Features
|
|
|
|
### Placeholder Preservation
|
|
All scripts preserve placeholders like `{n}`, `{total}`, `{filename}` in translations:
|
|
```
|
|
"customNumberDesc": "Defaults to {n}, also accepts 'Page {n} of {total}'"
|
|
```
|
|
|
|
### Automatic Backups
|
|
Scripts create timestamped backups before modifying files:
|
|
```
|
|
translation.backup.20241201_143022.json
|
|
```
|
|
|
|
### Context-Aware Translation
|
|
Scripts provide context information to help with accurate translations:
|
|
```json
|
|
{
|
|
"addPageNumbers.title": {
|
|
"original": "Add Page Numbers",
|
|
"context": "Feature for adding page numbers to PDFs"
|
|
}
|
|
}
|
|
```
|
|
|
|
### Priority-Based Translation
|
|
Important keys (title, submit, error messages) are prioritized when limiting translation batch sizes.
|
|
|
|
### Ignore Patterns System
|
|
The `scripts/ignore_translation.toml` file defines keys that should be ignored for each language, improving completion accuracy.
|
|
|
|
**Common ignore patterns:**
|
|
- `language.direction`: Text direction (ltr/rtl) - universal
|
|
- `lang.*`: Language code entries not relevant to specific locales
|
|
- `pipeline.title`, `home.devApi.title`: Technical terms kept in English
|
|
- Specific technical IDs, version numbers, and system identifiers
|
|
|
|
**Format:**
|
|
```toml
|
|
[de_DE]
|
|
ignore = [
|
|
'language.direction',
|
|
'pipeline.title',
|
|
'lang.afr',
|
|
'lang.ceb',
|
|
# ... more patterns
|
|
]
|
|
```
|
|
|
|
## Best Practices & Lessons Learned
|
|
|
|
### Critical Rules for Translation
|
|
|
|
1. **NEVER skip entries**: Translate ALL entries in each batch to avoid [UNTRANSLATED] pollution
|
|
2. **Use appropriate batch sizes**: 100 entries for systematic translation, unlimited for compact method
|
|
3. **Skip validation for placeholders**: Use `--skip-validation` when batch contains `{{variable}}` patterns
|
|
4. **Check progress between batches**: Use `--summary` flag to track completion percentage
|
|
5. **Preserve all placeholders**: Keep `{n}`, `{total}`, `{filename}`, `{{toolName}}` exactly as-is
|
|
|
|
### Workflow Comparison
|
|
|
|
| Method | Best For | Character Usage | Complexity | Speed |
|
|
|--------|----------|----------------|------------|-------|
|
|
| Compact | AI services | Minimal (50-80% less) | Simple | Fastest |
|
|
| Batch | Systematic translation | Moderate | Medium | Medium |
|
|
| Quick | Small updates | High | Low | Slow |
|
|
|
|
### Common Issues and Solutions
|
|
|
|
#### [UNTRANSLATED] Pollution
|
|
**Problem**: Hundreds of [UNTRANSLATED] markers from incomplete translation attempts
|
|
**Solution**:
|
|
- Only translate complete batches of manageable size
|
|
- Use analyzer that counts [UNTRANSLATED] as missing translations
|
|
- Restore from backup if pollution occurs
|
|
|
|
#### Validation False Positives
|
|
**Problem**: Validator flags legitimate `{{variable}}` placeholders as artifacts
|
|
**Solution**: Use `--skip-validation` flag when applying batches with template variables
|
|
|
|
#### JSON Structure Mismatches
|
|
**Problem**: Flattened dot-notation keys instead of proper nested objects
|
|
**Solution**: Use `json_beautifier.py` to restructure files to match en-GB exactly
|
|
|
|
## Real-World Examples
|
|
|
|
### Complete Italian Translation (Compact Method)
|
|
```bash
|
|
# Check status
|
|
python scripts/translations/translation_analyzer.py --language it-IT --summary
|
|
# Result: 46.8% complete, 1147 missing
|
|
|
|
# Extract all entries for translation
|
|
python scripts/translations/compact_translator.py it-IT --output batch1.json
|
|
|
|
# [Translate batch1.json with AI, save as batch1_translated.json]
|
|
|
|
# Apply translations
|
|
python scripts/translations/translation_merger.py it-IT apply-translations --translations-file batch1_translated.json
|
|
# Result: Applied 1147 translations
|
|
|
|
# Check progress
|
|
python scripts/translations/translation_analyzer.py --language it-IT --summary
|
|
# Result: 100% complete, 0 missing
|
|
```
|
|
|
|
### German Translation (Batch Method)
|
|
Starting from 46.3% completion, reaching 60.3% with batch method:
|
|
|
|
```bash
|
|
# Initial analysis
|
|
python scripts/translations/translation_analyzer.py --language de-DE --summary
|
|
# Result: 46.3% complete, 1142 missing entries
|
|
|
|
# Batch 1 (100 entries)
|
|
python scripts/translations/ai_translation_helper.py create-batch --languages de-DE --output de_batch_1.json --max-entries 100
|
|
# [Translate all 100 entries in batch file]
|
|
python scripts/translations/ai_translation_helper.py apply-batch de_batch_1.json --skip-validation
|
|
# Progress: 46.6% → 51.2%
|
|
|
|
# Continue with more batches until 100% complete
|
|
```
|
|
|
|
## Error Handling
|
|
|
|
- **Missing Files**: Scripts create new files when language directories don't exist
|
|
- **Invalid JSON**: Clear error messages with line numbers
|
|
- **Placeholder Mismatches**: Validation warnings for missing or extra placeholders
|
|
- **[UNTRANSLATED] Entries**: Counted as missing translations to prevent pollution
|
|
- **Backup Failures**: Graceful handling with user notification
|
|
|
|
## Integration with Development
|
|
|
|
These scripts integrate with the existing translation system:
|
|
- Works with the current `frontend/public/locales/` structure
|
|
- Compatible with the i18n system used in the React frontend
|
|
- Respects the JSON format expected by the translation loader
|
|
- Maintains the nested structure required by the UI components
|
|
|
|
## Language-Specific Notes
|
|
|
|
### German Translation Notes
|
|
- Technical terms: Use German equivalents (PDF → PDF, API → API)
|
|
- UI actions: "hochladen" (upload), "herunterladen" (download), "speichern" (save)
|
|
- Error messages: Consistent pattern "Ein Fehler ist beim [action] aufgetreten"
|
|
- Formal address: Use "Sie" form for user-facing text
|
|
|
|
### Italian Translation Notes
|
|
- Keep technical terms in English when commonly used (PDF, API, URL)
|
|
- Use formal address ("Lei" form) for user-facing text
|
|
- Error messages: "Si è verificato un errore durante [action]"
|
|
- UI actions: "carica" (upload), "scarica" (download), "salva" (save)
|
|
|
|
## Common Use Cases
|
|
|
|
1. **Complete Language Translation**: Use Compact Workflow for fastest AI-assisted translation
|
|
2. **New Language Addition**: Start with compact workflow for comprehensive coverage
|
|
3. **Updating Existing Language**: Use analyzer to find gaps, then compact or batch method
|
|
4. **Quality Assurance**: Use analyzer with `--summary` for completion metrics and issue detection
|
|
5. **External Translation Services**: Use export functionality to generate CSV files for translators
|
|
6. **Structure Maintenance**: Use json_beautifier to keep files aligned with en-GB structure |