mirror of
https://github.com/Frooodle/Stirling-PDF.git
synced 2026-02-17 13:52:14 +01:00
# Description of Changes This pull request updates the Arabic translation file (`frontend/public/locales/ar-AR/translation.toml`) with a large number of new and improved strings, adding support for new features and enhancing clarity and coverage across the application. Additionally, it makes several improvements to the TOML language check script (`.github/scripts/check_language_toml.py`) and updates the corresponding GitHub Actions workflow to better track and validate translation changes. **Translation updates and enhancements:** * Added translations for new features and UI elements, including annotation tools, PDF/A-3b conversion, line art compression, background removal, split modes, onboarding tours, and more. [[1]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR343-R346) [[2]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR442-R460) [[3]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR514-R523) [[4]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR739-R743) [[5]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR1281-R1295) [[6]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR1412-R1416) [[7]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR2362-R2365) [[8]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR2411-R2415) [[9]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR2990) [[10]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR3408-R3420) [[11]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR3782-R3794) [[12]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR3812-R3815) [[13]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR3828-R3832) [[14]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effL3974-R4157) [[15]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR4208-R4221) [[16]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR5247) [[17]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR5414-R5423) [[18]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR5444-R5447) * Improved and expanded coverage for settings, security, onboarding, and help menus, including detailed descriptions and tooltips for new and existing features. [[1]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR442-R460) [[2]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR5247) [[3]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR5414-R5423) [[4]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR5444-R5447) **TOML language check script improvements:** * Increased the maximum allowed TOML file size from 500 KB to 570 KB to accommodate larger translation files. * Improved file validation logic to more accurately skip or process files based on directory structure and file type, and added informative print statements for skipped files. * Enhanced reporting in the difference check: now, instead of raising exceptions for unsafe files or oversized files, the script logs warnings and continues processing, improving robustness and clarity in CI reports. * Adjusted the placement of file check report lines for clarity in the generated report. **Workflow and CI improvements:** * Updated the GitHub Actions workflow (`.github/workflows/check_toml.yml`) to trigger on changes to the translation script and workflow files, in addition to translation TOMLs, ensuring all relevant changes are validated. These changes collectively improve the translation quality and coverage for Arabic users, enhance the reliability and clarity of the translation validation process, and ensure smoother CI/CD workflows for localization updates. <img width="654" height="133" alt="image" src="https://github.com/user-attachments/assets/9f3e505d-927f-4dc0-9098-cee70bbe85ca" /> --- ## Checklist ### General - [ ] I have read the [Contribution Guidelines](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/CONTRIBUTING.md) - [ ] I have read the [Stirling-PDF Developer Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md) (if applicable) - [ ] I have read the [How to add new languages to Stirling-PDF](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md) (if applicable) - [ ] I have performed a self-review of my own code - [ ] My changes generate no new warnings ### Documentation - [ ] I have updated relevant docs on [Stirling-PDF's doc repo](https://github.com/Stirling-Tools/Stirling-Tools.github.io/blob/main/docs/) (if functionality has heavily changed) - [ ] I have read the section [Add New Translation Tags](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md#add-new-translation-tags) (for new translation tags only) ### Translations (if applicable) - [ ] I ran [`scripts/counter_translation.py`](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/docs/counter_translation.md) ### UI Changes (if applicable) - [ ] Screenshots or videos demonstrating the UI changes are attached (e.g., as comments or direct attachments in the PR) ### Testing (if applicable) - [ ] I have tested my changes locally. Refer to the [Testing Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md#6-testing) for more details.
715 lines
24 KiB
Markdown
715 lines
24 KiB
Markdown
# Translation Management Scripts
|
|
|
|
This directory contains Python scripts for managing frontend translations in Stirling PDF. These tools help analyze, merge, validate, and manage translations against the en-GB golden truth file.
|
|
|
|
## Current Format: TOML
|
|
|
|
**Stirling PDF uses TOML format for translations** in `frontend/public/locales/{lang}/translation.toml`.
|
|
|
|
**All scripts now support TOML format!**
|
|
|
|
## Quick Start - Automated Translation (RECOMMENDED)
|
|
|
|
The **fastest and easiest way** to translate a language is using the automated pipeline:
|
|
|
|
```bash
|
|
# Set your OpenAI API key
|
|
export OPENAI_API_KEY=your_openai_api_key_here
|
|
|
|
# Translate a language automatically (extract → translate → merge → beautify → verify)
|
|
python3 scripts/translations/auto_translate.py es-ES
|
|
|
|
# With custom batch size (default: 500 entries per batch)
|
|
python3 scripts/translations/auto_translate.py es-ES --batch-size 600
|
|
|
|
# Keep temporary files for inspection
|
|
python3 scripts/translations/auto_translate.py es-ES --no-cleanup
|
|
```
|
|
|
|
**What it does:**
|
|
1. Extracts untranslated entries from the language file
|
|
2. Splits into batches (default 500 entries each)
|
|
3. Translates each batch using GPT-5 with specialized prompts
|
|
4. Validates placeholders are preserved
|
|
5. Merges translated batches
|
|
6. Applies translations to language file
|
|
7. Beautifies structure to match en-GB
|
|
8. Cleans up temporary files
|
|
9. Reports final completion percentage
|
|
|
|
**Time:** ~8-10 minutes per language with 1200+ untranslated entries
|
|
|
|
**Cost:** ~$2-4 per language using GPT-5 (or use `gpt-5-mini` for lower cost)
|
|
|
|
See [`auto_translate.py`](#auto_translatepy-automated-translation-pipeline) for full details.
|
|
|
|
---
|
|
|
|
## Scripts Overview
|
|
|
|
### 0. Validation Scripts (Run First!)
|
|
|
|
#### `json_validator.py`
|
|
Validates JSON syntax in translation files with detailed error reporting.
|
|
|
|
**Usage:**
|
|
```bash
|
|
# Validate single file
|
|
python scripts/translations/json_validator.py ar_AR_batch_1_of_3.json
|
|
|
|
# Validate all batches for a language
|
|
python scripts/translations/json_validator.py --all-batches ar_AR
|
|
|
|
# Validate pattern with wildcards
|
|
python scripts/translations/json_validator.py "ar_AR_batch_*.json"
|
|
|
|
# Brief output (no context)
|
|
python scripts/translations/json_validator.py --all-batches ar_AR --brief
|
|
|
|
# Only show files with errors
|
|
python scripts/translations/json_validator.py --all-batches ar_AR --quiet
|
|
```
|
|
|
|
**Features:**
|
|
- Validates JSON syntax with detailed error messages
|
|
- Shows exact line, column, and character position of errors
|
|
- Displays context around errors for easy fixing
|
|
- Suggests common fixes based on error type
|
|
- Detects unescaped quotes and backslashes
|
|
- Reports entry counts for valid files
|
|
- Exit code 1 if any files invalid (good for CI/CD)
|
|
|
|
**Common Issues Detected:**
|
|
- Unescaped quotes inside strings: `"text with "quotes""` → `"text with \"quotes\""`
|
|
- Invalid backslash escapes: `\d{4}` → `\\d{4}`
|
|
- Missing commas between entries
|
|
- Trailing commas before closing braces
|
|
|
|
#### `validate_placeholders.py`
|
|
Validates that translation files have correct placeholders matching en-GB (source of truth).
|
|
|
|
**Usage:**
|
|
```bash
|
|
# Validate all languages
|
|
python scripts/translations/validate_placeholders.py
|
|
|
|
# Validate specific language
|
|
python scripts/translations/validate_placeholders.py --language es-ES
|
|
|
|
# Show detailed text samples
|
|
python scripts/translations/validate_placeholders.py --verbose
|
|
|
|
# Output as JSON
|
|
python scripts/translations/validate_placeholders.py --json
|
|
```
|
|
|
|
**Features:**
|
|
- Detects missing placeholders (e.g., {n}, {total}, {filename})
|
|
- Detects extra placeholders not in en-GB
|
|
- Shows exact keys and text where issues occur
|
|
- Exit code 1 if issues found (good for CI/CD)
|
|
|
|
#### `validate_json_structure.py`
|
|
Validates JSON structure and key consistency with en-GB.
|
|
|
|
**Usage:**
|
|
```bash
|
|
# Validate all languages
|
|
python scripts/translations/validate_json_structure.py
|
|
|
|
# Validate specific language
|
|
python scripts/translations/validate_json_structure.py --language de-DE
|
|
|
|
# Show all missing/extra keys
|
|
python scripts/translations/validate_json_structure.py --verbose
|
|
|
|
# Output as JSON
|
|
python scripts/translations/validate_json_structure.py --json
|
|
```
|
|
|
|
**Features:**
|
|
- Validates JSON syntax
|
|
- Detects missing keys (not translated yet)
|
|
- Detects extra keys (not in en-GB, should be removed)
|
|
- Reports key counts and structure differences
|
|
- Exit code 1 if issues found (good for CI/CD)
|
|
|
|
### 1. `translation_analyzer.py`
|
|
Analyzes translation files to find missing translations, untranslated entries, and provides completion statistics.
|
|
|
|
**Usage:**
|
|
```bash
|
|
# Analyze all languages
|
|
python scripts/translations/translation_analyzer.py
|
|
|
|
# Analyze specific language
|
|
python scripts/translations/translation_analyzer.py --language fr-FR
|
|
|
|
# Show only missing translations
|
|
python scripts/translations/translation_analyzer.py --missing-only
|
|
|
|
# Show only untranslated entries
|
|
python scripts/translations/translation_analyzer.py --untranslated-only
|
|
|
|
# Show summary only
|
|
python scripts/translations/translation_analyzer.py --summary
|
|
|
|
# JSON output format
|
|
python scripts/translations/translation_analyzer.py --format json
|
|
```
|
|
|
|
**Features:**
|
|
- Finds missing translation keys
|
|
- Identifies untranslated entries (identical to en-GB and [UNTRANSLATED] markers)
|
|
- Shows accurate completion percentages using ignore patterns
|
|
- Identifies extra keys not in en-GB
|
|
- Supports JSON and text output formats
|
|
- Uses `scripts/ignore_translation.toml` for language-specific exclusions
|
|
|
|
### 2. `translation_merger.py`
|
|
Merges missing translations from en-GB into target language files and manages translation workflows.
|
|
|
|
**Usage:**
|
|
```bash
|
|
# Operate on all locales (except en-GB) when language is omitted
|
|
python scripts/translations/translation_merger.py add-missing
|
|
|
|
# Add missing translations from en-GB to French
|
|
python scripts/translations/translation_merger.py fr-FR add-missing
|
|
|
|
# Create backups before modifying files
|
|
python scripts/translations/translation_merger.py fr-FR add-missing --backup
|
|
|
|
# Extract untranslated entries to a file
|
|
python scripts/translations/translation_merger.py fr-FR extract-untranslated --output fr_untranslated.json
|
|
|
|
# Create a template for AI translation
|
|
python scripts/translations/translation_merger.py fr-FR create-template --output fr_template.json
|
|
|
|
# Apply translations from a file
|
|
python scripts/translations/translation_merger.py fr-FR apply-translations --translations-file fr_translated.json
|
|
|
|
# Override default paths if needed
|
|
python scripts/translations/translation_merger.py fr-FR add-missing --locales-dir ./frontend/public/locales --ignore-file ./scripts/ignore_translation.toml
|
|
|
|
# Remove unused translations not present in en-GB
|
|
python scripts/translations/translation_merger.py fr-FR remove-unused
|
|
```
|
|
|
|
**Features:**
|
|
- Adds missing keys from en-GB (copies English text directly)
|
|
- Runs across all locales for add-missing/remove-unused when language is omitted
|
|
- Extracts untranslated entries for external translation
|
|
- Creates structured templates for AI translation
|
|
- Applies translated content back to language files (template format or plain JSON)
|
|
- Supports `--backup` on mutating commands
|
|
- Automatic backup creation
|
|
- Removes unused translations not present in en-GB
|
|
|
|
### 3. `ai_translation_helper.py`
|
|
Specialized tool for AI-assisted translation workflows with batch processing and validation.
|
|
|
|
**Usage:**
|
|
```bash
|
|
# Create batch file for AI translation (multiple languages)
|
|
python scripts/translations/ai_translation_helper.py create-batch --languages fr-FR de-DE es-ES --output batch.json --max-entries 50
|
|
|
|
# Validate AI translations
|
|
python scripts/translations/ai_translation_helper.py validate batch.json
|
|
|
|
# Apply validated AI translations
|
|
python scripts/translations/ai_translation_helper.py apply-batch batch.json
|
|
|
|
# Export for external translation services
|
|
python scripts/translations/ai_translation_helper.py export --languages fr-FR de-DE --format csv
|
|
```
|
|
|
|
**Features:**
|
|
- Creates batch files for AI translation of multiple languages
|
|
- Prioritizes important translation keys
|
|
- Validates translations for placeholders and artifacts
|
|
- Applies batch translations with validation
|
|
- Exports to CSV/JSON for external translation services
|
|
|
|
### 4. `compact_translator.py`
|
|
Extracts untranslated entries in minimal JSON format for character-limited AI services.
|
|
|
|
**Usage:**
|
|
```bash
|
|
# Extract all untranslated entries
|
|
python scripts/translations/compact_translator.py it-IT --output to_translate.json
|
|
```
|
|
|
|
**Features:**
|
|
- Produces minimal JSON output with no extra whitespace
|
|
- Automatic ignore patterns for cleaner output
|
|
- Batch size control for manageable chunks
|
|
- 50-80% fewer characters than other extraction methods
|
|
|
|
### 5. `auto_translate.py` - Automated Translation Pipeline
|
|
|
|
**NEW: Fully automated translation workflow using GPT-5.**
|
|
|
|
Combines all translation steps into a single command that handles everything from extraction to verification.
|
|
|
|
**Usage:**
|
|
```bash
|
|
# Basic usage (requires OPENAI_API_KEY environment variable)
|
|
export OPENAI_API_KEY=your_api_key
|
|
python3 scripts/translations/auto_translate.py es-ES
|
|
|
|
# With inline API key
|
|
python3 scripts/translations/auto_translate.py es-ES --api-key YOUR_KEY
|
|
|
|
# Custom batch size (default: 500 entries)
|
|
python3 scripts/translations/auto_translate.py es-ES --batch-size 600
|
|
|
|
# Custom timeout per batch (default: 600 seconds / 10 minutes)
|
|
python3 scripts/translations/auto_translate.py es-ES --timeout 900
|
|
|
|
# Keep temporary files for debugging
|
|
python3 scripts/translations/auto_translate.py es-ES --no-cleanup
|
|
|
|
# Skip final verification
|
|
python3 scripts/translations/auto_translate.py es-ES --skip-verification
|
|
```
|
|
|
|
**Features:**
|
|
- Fully automated end-to-end translation pipeline
|
|
- Uses GPT-5 with specialized prompts for Stirling PDF
|
|
- Preserves all placeholders ({n}, {{variable}}, etc.)
|
|
- Maintains consistent terminology
|
|
- Validates translations automatically
|
|
- Creates backups before modifying files
|
|
- Reports detailed progress and final completion %
|
|
|
|
**Pipeline Steps:**
|
|
1. **Extract**: Finds all untranslated entries
|
|
2. **Split**: Divides into manageable batches (default: 500 entries)
|
|
3. **Translate**: Uses GPT-5 to translate each batch with specialized prompts
|
|
4. **Validate**: Ensures placeholders are preserved
|
|
5. **Merge**: Combines all translated batches
|
|
6. **Apply**: Updates the language file
|
|
7. **Beautify**: Restructures to match en-GB format
|
|
8. **Cleanup**: Removes temporary files
|
|
9. **Verify**: Reports final completion percentage
|
|
|
|
**Translation Quality:**
|
|
- Preserves ALL placeholders exactly as-is
|
|
- Keeps HTML tags intact (<strong>, <br>, etc.)
|
|
- Doesn't translate technical terms (PDF, API, OAuth2, etc.)
|
|
- Maintains consistent terminology throughout
|
|
- Uses appropriate formal/informal tone per language
|
|
|
|
**Supported Languages:**
|
|
All language codes from `frontend/public/locales/` (e.g., es-ES, de-DE, fr-FR, zh-CN, ar-AR, etc.)
|
|
|
|
### 6. `batch_translator.py` - GPT-5 Translation Engine
|
|
|
|
Low-level translation script used by `auto_translate.py`. Can be used standalone for manual batch translation.
|
|
|
|
**Usage:**
|
|
```bash
|
|
# Translate single batch file
|
|
python3 scripts/translations/batch_translator.py my_batch.json --language es-ES --api-key YOUR_KEY
|
|
|
|
# Translate multiple batches
|
|
python3 scripts/translations/batch_translator.py batch_*.json --language de-DE --api-key YOUR_KEY
|
|
|
|
# Use different GPT model
|
|
python3 scripts/translations/batch_translator.py batch.json --language fr-FR --model gpt-5-mini
|
|
|
|
# Skip validation
|
|
python3 scripts/translations/batch_translator.py batch.json --language it-IT --skip-validation
|
|
```
|
|
|
|
**Features:**
|
|
- Translates JSON batch files using OpenAI GPT-5
|
|
- Specialized system prompts for Stirling PDF translations
|
|
- Automatic placeholder validation
|
|
- Supports pattern matching for multiple files
|
|
- Configurable model selection (gpt-5, gpt-5-mini, gpt-5-nano)
|
|
- Rate limiting with configurable delays
|
|
|
|
**Models:**
|
|
- `gpt-5` (default): Best quality, $1.25/1M input, $10/1M output
|
|
- `gpt-5-mini`: Balanced quality/cost
|
|
- `gpt-5-nano`: Fastest, most economical
|
|
|
|
### 7. `json_beautifier.py`
|
|
Restructures and beautifies translation JSON files to match en-GB structure exactly.
|
|
|
|
**Usage:**
|
|
```bash
|
|
# Restructure single language to match en-GB structure
|
|
python scripts/translations/json_beautifier.py --language de-DE
|
|
|
|
# Restructure all languages
|
|
python scripts/translations/json_beautifier.py --all-languages
|
|
|
|
# Validate structure without modifying files
|
|
python scripts/translations/json_beautifier.py --language de-DE --validate-only
|
|
|
|
# Skip backup creation
|
|
python scripts/translations/json_beautifier.py --language de-DE --no-backup
|
|
```
|
|
|
|
**Features:**
|
|
- Restructures JSON to match en-GB nested structure exactly
|
|
- Preserves key ordering for line-by-line comparison
|
|
- Creates automatic backups before modification
|
|
- Validates structure and key ordering
|
|
- Handles flattened dot-notation keys (e.g., "key.subkey") properly
|
|
|
|
## Translation Workflows
|
|
|
|
### Method 1: Compact Translation Workflow (RECOMMENDED for AI)
|
|
|
|
**Best for character-limited AI services like Claude or ChatGPT**
|
|
|
|
#### Step 1: Check Current Status
|
|
```bash
|
|
python scripts/translations/translation_analyzer.py --language it-IT --summary
|
|
```
|
|
|
|
#### Step 2: Extract Untranslated Entries
|
|
```bash
|
|
# For small files (< 1200 entries)
|
|
python scripts/translations/compact_translator.py it-IT --output to_translate.json
|
|
|
|
# For large files, split into batches
|
|
python scripts/translations/compact_translator.py it-IT --output it_IT_batch --batch-size 400
|
|
# Creates: it_IT_batch_1_of_N.json, it_IT_batch_2_of_N.json, etc.
|
|
```
|
|
|
|
#### Step 2.5: Validate JSON (if using batches)
|
|
```bash
|
|
# After AI translates the batches, validate them before merging
|
|
python scripts/translations/json_validator.py --all-batches it_IT
|
|
|
|
# Fix any errors reported (common issues: unescaped quotes, backslashes)
|
|
```
|
|
|
|
**Output format**: Compact JSON with minimal whitespace
|
|
```json
|
|
{"key1":"English text","key2":"Another text","key3":"More text"}
|
|
```
|
|
|
|
#### Step 3: AI Translation
|
|
1. Copy the compact JSON output
|
|
2. Give it to your AI with instructions:
|
|
```
|
|
Translate this JSON to Italian. Keep the same structure, translate only the values.
|
|
Preserve placeholders like {n}, {total}, {filename}, {{variable}}.
|
|
```
|
|
3. Save the AI's response as `translated.json`
|
|
|
|
#### Step 4: Apply Translations
|
|
```bash
|
|
python scripts/translations/translation_merger.py it-IT apply-translations --translations-file translated.json
|
|
```
|
|
|
|
#### Step 5: Verify Results
|
|
```bash
|
|
python scripts/translations/translation_analyzer.py --language it-IT --summary
|
|
```
|
|
|
|
### Method 2: Batch Translation Workflow
|
|
|
|
**For complete language translation from scratch or major updates**
|
|
|
|
#### Step 1: Analyze Current State
|
|
```bash
|
|
python scripts/translations/translation_analyzer.py --language de-DE --summary
|
|
```
|
|
|
|
#### Step 2: Create Translation Batches
|
|
```bash
|
|
# Create batches of 100 entries each for systematic translation
|
|
python scripts/translations/ai_translation_helper.py create-batch --languages de-DE --output de_batch_1.json --max-entries 100
|
|
```
|
|
|
|
#### Step 3: Translate Batch with AI
|
|
Edit the batch file and fill in ALL `translated` fields:
|
|
- Preserve all placeholders like `{n}`, `{total}`, `{filename}`, `{{toolName}}`
|
|
- Keep technical terms consistent
|
|
- Maintain JSON structure exactly
|
|
- Consider context provided for each entry
|
|
|
|
#### Step 4: Apply Translations
|
|
```bash
|
|
# Skip validation if using legitimate placeholders ({{variable}})
|
|
python scripts/translations/ai_translation_helper.py apply-batch de_batch_1.json --skip-validation
|
|
```
|
|
|
|
#### Step 5: Check Progress and Continue
|
|
```bash
|
|
python scripts/translations/translation_analyzer.py --language de-DE --summary
|
|
```
|
|
Repeat steps 2-5 until 100% complete.
|
|
|
|
### Method 3: Quick Translation Workflow (Legacy)
|
|
|
|
**For small updates or existing translations**
|
|
|
|
#### Step 1: Add Missing Translations
|
|
```bash
|
|
python scripts/translations/translation_merger.py fr-FR add-missing
|
|
```
|
|
|
|
#### Step 2: Create AI Template
|
|
```bash
|
|
python scripts/translations/translation_merger.py fr-FR create-template --output fr_template.json
|
|
```
|
|
|
|
#### Step 3: Apply Translations
|
|
```bash
|
|
python scripts/translations/translation_merger.py fr-FR apply-translations --translations-file fr_translated.json
|
|
```
|
|
|
|
## Translation File Structure
|
|
|
|
Translation files are located in `frontend/public/locales/{language}/translation.toml` with TOML structure:
|
|
|
|
```toml
|
|
[addPageNumbers]
|
|
title = "Add Page Numbers"
|
|
|
|
[addPageNumbers.selectText]
|
|
"1" = "Select PDF file:"
|
|
"2" = "Margin Size"
|
|
```
|
|
|
|
Keys use dot notation internally (e.g., `addPageNumbers.selectText.1`).
|
|
|
|
## Key Features
|
|
|
|
### Placeholder Preservation
|
|
All scripts preserve placeholders like `{n}`, `{total}`, `{filename}` in translations:
|
|
```
|
|
"customNumberDesc": "Defaults to {n}, also accepts 'Page {n} of {total}'"
|
|
```
|
|
|
|
### Automatic Backups
|
|
Scripts create timestamped backups before modifying files:
|
|
```
|
|
translation.backup.20241201_143022.toml
|
|
```
|
|
|
|
### Context-Aware Translation
|
|
Scripts provide context information to help with accurate translations:
|
|
```json
|
|
{
|
|
"addPageNumbers.title": {
|
|
"original": "Add Page Numbers",
|
|
"context": "Feature for adding page numbers to PDFs"
|
|
}
|
|
}
|
|
```
|
|
|
|
### Priority-Based Translation
|
|
Important keys (title, submit, error messages) are prioritized when limiting translation batch sizes.
|
|
|
|
### Ignore Patterns System
|
|
The `scripts/ignore_translation.toml` file defines keys that should be ignored for each language, improving completion accuracy.
|
|
|
|
**Common ignore patterns:**
|
|
- `language.direction`: Text direction (ltr/rtl) - universal
|
|
- `lang.*`: Language code entries not relevant to specific locales
|
|
- `pipeline.title`, `home.devApi.title`: Technical terms kept in English
|
|
- Specific technical IDs, version numbers, and system identifiers
|
|
|
|
**Format:**
|
|
```toml
|
|
[de_DE]
|
|
ignore = [
|
|
'language.direction',
|
|
'pipeline.title',
|
|
'lang.afr',
|
|
'lang.ceb',
|
|
# ... more patterns
|
|
]
|
|
```
|
|
|
|
## Best Practices & Lessons Learned
|
|
|
|
### Critical Rules for Translation
|
|
|
|
1. **NEVER skip entries**: Translate ALL entries in each batch to ensure completeness
|
|
2. **Use appropriate batch sizes**: 100 entries for systematic translation, unlimited for compact method
|
|
3. **Skip validation for placeholders**: Use `--skip-validation` when batch contains `{{variable}}` patterns
|
|
4. **Check progress between batches**: Use `--summary` flag to track completion percentage
|
|
5. **Preserve all placeholders**: Keep `{n}`, `{total}`, `{filename}`, `{{toolName}}` exactly as-is
|
|
|
|
### Workflow Comparison
|
|
|
|
| Method | Best For | Character Usage | Complexity | Speed |
|
|
|--------|----------|----------------|------------|-------|
|
|
| Compact | AI services | Minimal (50-80% less) | Simple | Fastest |
|
|
| Batch | Systematic translation | Moderate | Medium | Medium |
|
|
| Quick | Small updates | High | Low | Slow |
|
|
|
|
### Common Issues and Solutions
|
|
|
|
#### JSON Syntax Errors in AI Translations
|
|
**Problem**: AI-translated batch files have JSON syntax errors
|
|
**Symptoms**:
|
|
- `JSONDecodeError: Expecting ',' delimiter`
|
|
- `JSONDecodeError: Invalid \escape`
|
|
|
|
**Solution**:
|
|
```bash
|
|
# 1. Validate all batches to find errors
|
|
python scripts/translations/json_validator.py --all-batches ar_AR
|
|
|
|
# 2. Check detailed error with context
|
|
python scripts/translations/json_validator.py ar_AR_batch_2_of_3.json
|
|
|
|
# 3. Fix the reported issues:
|
|
# - Unescaped quotes: "text with "quotes"" → "text with \"quotes\""
|
|
# - Backslashes in regex: "\d{4}" → "\\d{4}"
|
|
# - Missing commas between entries
|
|
|
|
# 4. Validate again until all pass
|
|
python scripts/translations/json_validator.py --all-batches ar_AR
|
|
```
|
|
|
|
**Common fixes:**
|
|
- Arabic/RTL text with embedded quotes: Always escape with backslash
|
|
- Regex patterns: Double all backslashes (`\d` → `\\d`)
|
|
- Check for missing/extra commas at line reported in error
|
|
|
|
#### Validation False Positives
|
|
**Problem**: Validator flags legitimate `{{variable}}` placeholders as artifacts
|
|
**Solution**: Use `--skip-validation` flag when applying batches with template variables
|
|
|
|
#### JSON Structure Mismatches
|
|
**Problem**: Flattened dot-notation keys instead of proper nested objects
|
|
**Solution**: Use `json_beautifier.py` to restructure files to match en-GB exactly
|
|
|
|
## Real-World Examples
|
|
|
|
### Complete Arabic Translation with Validation (Batch Method)
|
|
```bash
|
|
# Check status
|
|
python scripts/translations/translation_analyzer.py --language ar-AR --summary
|
|
# Result: 50% complete, 1088 missing
|
|
|
|
# Extract in batches due to AI token limits
|
|
python scripts/translations/compact_translator.py ar-AR --output ar_AR_batch --batch-size 400
|
|
# Created: ar_AR_batch_1_of_3.json (400 entries)
|
|
# ar_AR_batch_2_of_3.json (400 entries)
|
|
# ar_AR_batch_3_of_3.json (288 entries)
|
|
|
|
# [Send each batch to AI for translation]
|
|
|
|
# Validate translated batches before merging
|
|
python scripts/translations/json_validator.py --all-batches ar_AR
|
|
# Found errors in batch 1 and 2:
|
|
# - Line 263: Unescaped quotes in "انقر "إضافة ملفات""
|
|
# - Line 132: Unescaped quotes in "أو "and""
|
|
# - Line 213: Invalid escape "\d{4}"
|
|
|
|
# Fix errors manually or with sed, then validate again
|
|
python scripts/translations/json_validator.py --all-batches ar_AR
|
|
# All valid!
|
|
|
|
# Merge all batches
|
|
python3 << 'EOF'
|
|
import json
|
|
merged = {}
|
|
for i in range(1, 4):
|
|
with open(f'ar_AR_batch_{i}_of_3.json', 'r', encoding='utf-8') as f:
|
|
merged.update(json.load(f))
|
|
with open('ar_AR_merged.json', 'w', encoding='utf-8') as f:
|
|
json.dump(merged, f, ensure_ascii=False, indent=2)
|
|
EOF
|
|
|
|
# Apply merged translations
|
|
python scripts/translations/translation_merger.py ar-AR apply-translations --translations-file ar_AR_merged.json
|
|
# Result: Applied 1088 translations
|
|
|
|
# Beautify to match en-GB structure
|
|
python scripts/translations/json_beautifier.py --language ar-AR
|
|
|
|
# Check final progress
|
|
python scripts/translations/translation_analyzer.py --language ar-AR --summary
|
|
# Result: 98.7% complete, 9 missing, 20 untranslated
|
|
```
|
|
|
|
### Complete Italian Translation (Compact Method)
|
|
```bash
|
|
# Check status
|
|
python scripts/translations/translation_analyzer.py --language it-IT --summary
|
|
# Result: 46.8% complete, 1147 missing
|
|
|
|
# Extract all entries for translation
|
|
python scripts/translations/compact_translator.py it-IT --output batch1.json
|
|
|
|
# [Translate batch1.json with AI, save as batch1_translated.json]
|
|
|
|
# Apply translations
|
|
python scripts/translations/translation_merger.py it-IT apply-translations --translations-file batch1_translated.json
|
|
# Result: Applied 1147 translations
|
|
|
|
# Check progress
|
|
python scripts/translations/translation_analyzer.py --language it-IT --summary
|
|
# Result: 100% complete, 0 missing
|
|
```
|
|
|
|
### German Translation (Batch Method)
|
|
Starting from 46.3% completion, reaching 60.3% with batch method:
|
|
|
|
```bash
|
|
# Initial analysis
|
|
python scripts/translations/translation_analyzer.py --language de-DE --summary
|
|
# Result: 46.3% complete, 1142 missing entries
|
|
|
|
# Batch 1 (100 entries)
|
|
python scripts/translations/ai_translation_helper.py create-batch --languages de-DE --output de_batch_1.json --max-entries 100
|
|
# [Translate all 100 entries in batch file]
|
|
python scripts/translations/ai_translation_helper.py apply-batch de_batch_1.json --skip-validation
|
|
# Progress: 46.6% → 51.2%
|
|
|
|
# Continue with more batches until 100% complete
|
|
```
|
|
|
|
## Error Handling
|
|
|
|
- **Missing Files**: Scripts create new files when language directories don't exist
|
|
- **Invalid JSON**: Clear error messages with line numbers
|
|
- **Placeholder Mismatches**: Validation warnings for missing or extra placeholders
|
|
- **Legacy [UNTRANSLATED] Markers**: Detected and stripped for backwards compatibility
|
|
- **Backup Failures**: Graceful handling with user notification
|
|
|
|
## Integration with Development
|
|
|
|
These scripts integrate with the existing translation system:
|
|
- Works with the current `frontend/public/locales/` structure
|
|
- Compatible with the i18n system used in the React frontend
|
|
- Respects the JSON format expected by the translation loader
|
|
- Maintains the nested structure required by the UI components
|
|
|
|
## Language-Specific Notes
|
|
|
|
### German Translation Notes
|
|
- Technical terms: Use German equivalents (PDF → PDF, API → API)
|
|
- UI actions: "hochladen" (upload), "herunterladen" (download), "speichern" (save)
|
|
- Error messages: Consistent pattern "Ein Fehler ist beim [action] aufgetreten"
|
|
- Formal address: Use "Sie" form for user-facing text
|
|
|
|
### Italian Translation Notes
|
|
- Keep technical terms in English when commonly used (PDF, API, URL)
|
|
- Use formal address ("Lei" form) for user-facing text
|
|
- Error messages: "Si è verificato un errore durante [action]"
|
|
- UI actions: "carica" (upload), "scarica" (download), "salva" (save)
|
|
|
|
## Common Use Cases
|
|
|
|
1. **Complete Language Translation**: Use Compact Workflow for fastest AI-assisted translation
|
|
2. **New Language Addition**: Start with compact workflow for comprehensive coverage
|
|
3. **Updating Existing Language**: Use analyzer to find gaps, then compact or batch method
|
|
4. **Quality Assurance**: Use analyzer with `--summary` for completion metrics and issue detection
|
|
5. **External Translation Services**: Use export functionality to generate CSV files for translators
|
|
6. **Structure Maintenance**: Use json_beautifier to keep files aligned with en-GB structure
|