{languageOptions.map((option, index) => {
- // Enable languages with >90% translation completion
- const enabledLanguages = ['en-GB', 'ar-AR', 'de-DE', 'es-ES', 'fr-FR', 'it-IT', 'pt-BR', 'ru-RU', 'zh-CN'];
+ const enabledLanguages = [
+ 'en-GB', 'zh-CN', 'zh-TW', 'ar-AR', 'fa-IR', 'tr-TR', 'uk-UA', 'zh-BO', 'sl-SI',
+ 'ru-RU', 'ja-JP', 'ko-KR', 'hu-HU', 'ga-IE', 'bg-BG', 'es-ES', 'hi-IN', 'hr-HR',
+ 'el-GR', 'ml-ML', 'pt-BR', 'pl-PL', 'pt-PT', 'sk-SK', 'sr-LATN-RS', 'no-NB',
+ 'th-TH', 'vi-VN', 'az-AZ', 'eu-ES', 'de-DE', 'sv-SE', 'it-IT', 'ca-CA', 'id-ID',
+ 'ro-RO', 'fr-FR', 'nl-NL', 'da-DK', 'cs-CZ'
+ ];
const isDisabled = !enabledLanguages.includes(option.value);
return (
diff --git a/scripts/translations/README.md b/scripts/translations/README.md
index a8b91525c..a4f9f53d3 100644
--- a/scripts/translations/README.md
+++ b/scripts/translations/README.md
@@ -2,6 +2,43 @@
This directory contains Python scripts for managing frontend translations in Stirling PDF. These tools help analyze, merge, validate, and manage translations against the en-GB golden truth file.
+## Quick Start - Automated Translation (RECOMMENDED)
+
+The **fastest and easiest way** to translate a language is using the automated pipeline:
+
+```bash
+# Set your OpenAI API key
+export OPENAI_API_KEY=your_openai_api_key_here
+
+# Translate a language automatically (extract → translate → merge → beautify → verify)
+python3 scripts/translations/auto_translate.py es-ES
+
+# With custom batch size (default: 500 entries per batch)
+python3 scripts/translations/auto_translate.py es-ES --batch-size 600
+
+# Keep temporary files for inspection
+python3 scripts/translations/auto_translate.py es-ES --no-cleanup
+```
+
+**What it does:**
+1. Extracts untranslated entries from the language file
+2. Splits into batches (default 500 entries each)
+3. Translates each batch using GPT-5 with specialized prompts
+4. Validates placeholders are preserved
+5. Merges translated batches
+6. Applies translations to language file
+7. Beautifies structure to match en-GB
+8. Cleans up temporary files
+9. Reports final completion percentage
+
+**Time:** ~8-10 minutes per language with 1200+ untranslated entries
+
+**Cost:** ~$2-4 per language using GPT-5 (or use `gpt-5-mini` for lower cost)
+
+See [`auto_translate.py`](#auto_translatepy-automated-translation-pipeline) for full details.
+
+---
+
## Scripts Overview
### 0. Validation Scripts (Run First!)
@@ -191,7 +228,97 @@ python scripts/translations/compact_translator.py it-IT --output to_translate.js
- Batch size control for manageable chunks
- 50-80% fewer characters than other extraction methods
-### 5. `json_beautifier.py`
+### 5. `auto_translate.py` - Automated Translation Pipeline
+
+**NEW: Fully automated translation workflow using GPT-5.**
+
+Combines all translation steps into a single command that handles everything from extraction to verification.
+
+**Usage:**
+```bash
+# Basic usage (requires OPENAI_API_KEY environment variable)
+export OPENAI_API_KEY=your_api_key
+python3 scripts/translations/auto_translate.py es-ES
+
+# With inline API key
+python3 scripts/translations/auto_translate.py es-ES --api-key YOUR_KEY
+
+# Custom batch size (default: 500 entries)
+python3 scripts/translations/auto_translate.py es-ES --batch-size 600
+
+# Custom timeout per batch (default: 600 seconds / 10 minutes)
+python3 scripts/translations/auto_translate.py es-ES --timeout 900
+
+# Keep temporary files for debugging
+python3 scripts/translations/auto_translate.py es-ES --no-cleanup
+
+# Skip final verification
+python3 scripts/translations/auto_translate.py es-ES --skip-verification
+```
+
+**Features:**
+- Fully automated end-to-end translation pipeline
+- Uses GPT-5 with specialized prompts for Stirling PDF
+- Preserves all placeholders ({n}, {{variable}}, etc.)
+- Maintains consistent terminology
+- Validates translations automatically
+- Creates backups before modifying files
+- Reports detailed progress and final completion %
+
+**Pipeline Steps:**
+1. **Extract**: Finds all untranslated entries
+2. **Split**: Divides into manageable batches (default: 500 entries)
+3. **Translate**: Uses GPT-5 to translate each batch with specialized prompts
+4. **Validate**: Ensures placeholders are preserved
+5. **Merge**: Combines all translated batches
+6. **Apply**: Updates the language file
+7. **Beautify**: Restructures to match en-GB format
+8. **Cleanup**: Removes temporary files
+9. **Verify**: Reports final completion percentage
+
+**Translation Quality:**
+- Preserves ALL placeholders exactly as-is
+- Keeps HTML tags intact (,
, etc.)
+- Doesn't translate technical terms (PDF, API, OAuth2, etc.)
+- Maintains consistent terminology throughout
+- Uses appropriate formal/informal tone per language
+
+**Supported Languages:**
+All language codes from `frontend/public/locales/` (e.g., es-ES, de-DE, fr-FR, zh-CN, ar-AR, etc.)
+
+### 6. `batch_translator.py` - GPT-5 Translation Engine
+
+Low-level translation script used by `auto_translate.py`. Can be used standalone for manual batch translation.
+
+**Usage:**
+```bash
+# Translate single batch file
+python3 scripts/translations/batch_translator.py my_batch.json --language es-ES --api-key YOUR_KEY
+
+# Translate multiple batches
+python3 scripts/translations/batch_translator.py batch_*.json --language de-DE --api-key YOUR_KEY
+
+# Use different GPT model
+python3 scripts/translations/batch_translator.py batch.json --language fr-FR --model gpt-5-mini
+
+# Skip validation
+python3 scripts/translations/batch_translator.py batch.json --language it-IT --skip-validation
+```
+
+**Features:**
+- Translates JSON batch files using OpenAI GPT-5
+- Specialized system prompts for Stirling PDF translations
+- Automatic placeholder validation
+- Supports pattern matching for multiple files
+- Configurable model selection (gpt-5, gpt-5-mini, gpt-5-nano)
+- Rate limiting with configurable delays
+
+**Models:**
+- `gpt-5` (default): Best quality, $1.25/1M input, $10/1M output
+- `gpt-5-mini`: Balanced quality/cost
+- `gpt-5-nano`: Fastest, most economical
+
+### 7. `json_beautifier.py`
Restructures and beautifies translation JSON files to match en-GB structure exactly.
**Usage:**
diff --git a/scripts/translations/auto_translate.py b/scripts/translations/auto_translate.py
new file mode 100644
index 000000000..505823f47
--- /dev/null
+++ b/scripts/translations/auto_translate.py
@@ -0,0 +1,324 @@
+#!/usr/bin/env python3
+"""
+Automated Translation Pipeline
+Extracts, translates, merges, and beautifies translations for a language.
+"""
+
+import json
+import sys
+import argparse
+import os
+import subprocess
+from pathlib import Path
+import time
+
+
+def run_command(cmd, description=""):
+ """Run a shell command and return success status."""
+ if description:
+ print(f"\n{'='*60}")
+ print(f"Step: {description}")
+ print(f"{'='*60}")
+
+ result = subprocess.run(cmd, shell=True, capture_output=True, text=True)
+
+ if result.stdout:
+ print(result.stdout)
+ if result.stderr:
+ print(result.stderr, file=sys.stderr)
+
+ return result.returncode == 0
+
+
+def extract_untranslated(language_code, batch_size=500):
+ """Extract untranslated entries and split into batches."""
+ print(f"\n🔍 Extracting untranslated entries for {language_code}...")
+
+ # Load files
+ golden_path = Path(f'frontend/public/locales/en-GB/translation.json')
+ lang_path = Path(f'frontend/public/locales/{language_code}/translation.json')
+
+ if not golden_path.exists():
+ print(f"Error: Golden truth file not found: {golden_path}")
+ return None
+
+ if not lang_path.exists():
+ print(f"Error: Language file not found: {lang_path}")
+ return None
+
+ def load_json(path):
+ with open(path, 'r', encoding='utf-8') as f:
+ return json.load(f)
+
+ def flatten_dict(d, parent_key='', separator='.'):
+ items = []
+ for k, v in d.items():
+ new_key = f"{parent_key}{separator}{k}" if parent_key else k
+ if isinstance(v, dict):
+ items.extend(flatten_dict(v, new_key, separator).items())
+ else:
+ items.append((new_key, str(v)))
+ return dict(items)
+
+ golden = load_json(golden_path)
+ lang_data = load_json(lang_path)
+
+ golden_flat = flatten_dict(golden)
+ lang_flat = flatten_dict(lang_data)
+
+ # Find untranslated
+ untranslated = {}
+ for key, value in golden_flat.items():
+ if (key not in lang_flat or
+ lang_flat.get(key) == value or
+ (isinstance(lang_flat.get(key), str) and lang_flat.get(key).startswith("[UNTRANSLATED]"))):
+ untranslated[key] = value
+
+ total = len(untranslated)
+ print(f"Found {total} untranslated entries")
+
+ if total == 0:
+ print("✓ Language is already complete!")
+ return []
+
+ # Split into batches
+ entries = list(untranslated.items())
+ num_batches = (total + batch_size - 1) // batch_size
+
+ batch_files = []
+ lang_code_safe = language_code.replace('-', '_')
+
+ for i in range(num_batches):
+ start = i * batch_size
+ end = min((i + 1) * batch_size, total)
+ batch = dict(entries[start:end])
+
+ filename = f'{lang_code_safe}_batch_{i+1}_of_{num_batches}.json'
+ with open(filename, 'w', encoding='utf-8') as f:
+ json.dump(batch, f, ensure_ascii=False, separators=(',', ':'))
+
+ batch_files.append(filename)
+ print(f" Created {filename} with {len(batch)} entries")
+
+ return batch_files
+
+
+def translate_batches(batch_files, language_code, api_key, timeout=600):
+ """Translate all batch files using GPT-5."""
+ if not batch_files:
+ return []
+
+ print(f"\n🤖 Translating {len(batch_files)} batches using GPT-5...")
+ print(f"Timeout: {timeout}s ({timeout//60} minutes) per batch")
+
+ translated_files = []
+
+ for i, batch_file in enumerate(batch_files, 1):
+ print(f"\n[{i}/{len(batch_files)}] Translating {batch_file}...")
+
+ # Always pass API key since it's required
+ cmd = f'python3 scripts/translations/batch_translator.py "{batch_file}" --language {language_code} --api-key "{api_key}"'
+
+ # Run with timeout
+ result = subprocess.run(cmd, shell=True, capture_output=True, text=True, timeout=timeout)
+
+ if result.stdout:
+ print(result.stdout)
+ if result.stderr:
+ print(result.stderr, file=sys.stderr)
+
+ if result.returncode != 0:
+ print(f"✗ Failed to translate {batch_file}")
+ return None
+
+ translated_file = batch_file.replace('.json', '_translated.json')
+ translated_files.append(translated_file)
+
+ # Small delay between batches
+ if i < len(batch_files):
+ time.sleep(1)
+
+ print(f"\n✓ All {len(batch_files)} batches translated successfully")
+ return translated_files
+
+
+def merge_translations(translated_files, language_code):
+ """Merge all translated batch files."""
+ if not translated_files:
+ return None
+
+ print(f"\n🔗 Merging {len(translated_files)} translated batches...")
+
+ merged = {}
+ for filename in translated_files:
+ if not Path(filename).exists():
+ print(f"Error: Translated file not found: {filename}")
+ return None
+
+ with open(filename, 'r', encoding='utf-8') as f:
+ merged.update(json.load(f))
+
+ lang_code_safe = language_code.replace('-', '_')
+ merged_file = f'{lang_code_safe}_merged.json'
+
+ with open(merged_file, 'w', encoding='utf-8') as f:
+ json.dump(merged, f, ensure_ascii=False, separators=(',', ':'))
+
+ print(f"✓ Merged {len(merged)} translations into {merged_file}")
+ return merged_file
+
+
+def apply_translations(merged_file, language_code):
+ """Apply merged translations to the language file."""
+ print(f"\n📝 Applying translations to {language_code}...")
+
+ cmd = f'python3 scripts/translations/translation_merger.py {language_code} apply-translations --translations-file {merged_file}'
+
+ if not run_command(cmd):
+ print(f"✗ Failed to apply translations")
+ return False
+
+ print(f"✓ Translations applied successfully")
+ return True
+
+
+def beautify_translations(language_code):
+ """Beautify translation file to match en-GB structure."""
+ print(f"\n✨ Beautifying {language_code} translation file...")
+
+ cmd = f'python3 scripts/translations/json_beautifier.py --language {language_code}'
+
+ if not run_command(cmd):
+ print(f"✗ Failed to beautify translations")
+ return False
+
+ print(f"✓ Translation file beautified")
+ return True
+
+
+def cleanup_temp_files(language_code):
+ """Remove temporary batch files."""
+ print(f"\n🧹 Cleaning up temporary files...")
+
+ lang_code_safe = language_code.replace('-', '_')
+ patterns = [
+ f'{lang_code_safe}_batch_*.json',
+ f'{lang_code_safe}_merged.json'
+ ]
+
+ import glob
+ removed = 0
+ for pattern in patterns:
+ for file in glob.glob(pattern):
+ Path(file).unlink()
+ removed += 1
+
+ print(f"✓ Removed {removed} temporary files")
+
+
+def verify_completion(language_code):
+ """Check final completion percentage."""
+ print(f"\n📊 Verifying completion...")
+
+ cmd = f'python3 scripts/translations/translation_analyzer.py --language {language_code} --summary'
+ run_command(cmd)
+
+
+def main():
+ parser = argparse.ArgumentParser(
+ description='Automated translation pipeline for Stirling PDF',
+ formatter_class=argparse.RawDescriptionHelpFormatter,
+ epilog="""
+Examples:
+ # Translate Spanish with API key in environment
+ export OPENAI_API_KEY=your_key_here
+ python3 scripts/translations/auto_translate.py es-ES
+
+ # Translate German with inline API key
+ python3 scripts/translations/auto_translate.py de-DE --api-key YOUR_KEY
+
+ # Translate Italian with custom batch size
+ python3 scripts/translations/auto_translate.py it-IT --batch-size 600
+
+ # Skip cleanup (keep temporary files for inspection)
+ python3 scripts/translations/auto_translate.py fr-FR --no-cleanup
+ """
+ )
+
+ parser.add_argument('language', help='Language code (e.g., es-ES, de-DE, zh-CN)')
+ parser.add_argument('--api-key', help='OpenAI API key (or set OPENAI_API_KEY env var)')
+ parser.add_argument('--batch-size', type=int, default=500, help='Entries per batch (default: 500)')
+ parser.add_argument('--no-cleanup', action='store_true', help='Keep temporary batch files')
+ parser.add_argument('--skip-verification', action='store_true', help='Skip final completion check')
+ parser.add_argument('--timeout', type=int, default=600, help='Timeout per batch in seconds (default: 600 = 10 minutes)')
+
+ args = parser.parse_args()
+
+ # Verify API key
+ api_key = args.api_key or os.environ.get('OPENAI_API_KEY')
+ if not api_key:
+ print("Error: OpenAI API key required. Provide via --api-key or OPENAI_API_KEY environment variable")
+ sys.exit(1)
+
+ print("="*60)
+ print(f"Automated Translation Pipeline")
+ print(f"Language: {args.language}")
+ print(f"Batch Size: {args.batch_size} entries")
+ print("="*60)
+
+ start_time = time.time()
+
+ try:
+ # Step 1: Extract and split
+ batch_files = extract_untranslated(args.language, args.batch_size)
+ if batch_files is None:
+ sys.exit(1)
+
+ if len(batch_files) == 0:
+ print("\n✓ Nothing to translate!")
+ sys.exit(0)
+
+ # Step 2: Translate all batches
+ translated_files = translate_batches(batch_files, args.language, api_key, args.timeout)
+ if translated_files is None:
+ sys.exit(1)
+
+ # Step 3: Merge translations
+ merged_file = merge_translations(translated_files, args.language)
+ if merged_file is None:
+ sys.exit(1)
+
+ # Step 4: Apply translations
+ if not apply_translations(merged_file, args.language):
+ sys.exit(1)
+
+ # Step 5: Beautify
+ if not beautify_translations(args.language):
+ sys.exit(1)
+
+ # Step 6: Cleanup
+ if not args.no_cleanup:
+ cleanup_temp_files(args.language)
+
+ # Step 7: Verify
+ if not args.skip_verification:
+ verify_completion(args.language)
+
+ elapsed = time.time() - start_time
+ print("\n" + "="*60)
+ print(f"✅ Translation pipeline completed successfully!")
+ print(f"Time elapsed: {elapsed:.1f} seconds")
+ print("="*60)
+
+ except KeyboardInterrupt:
+ print("\n\n⚠ Translation interrupted by user")
+ sys.exit(1)
+ except Exception as e:
+ print(f"\n\n✗ Error: {e}")
+ import traceback
+ traceback.print_exc()
+ sys.exit(1)
+
+
+if __name__ == "__main__":
+ main()
diff --git a/scripts/translations/batch_translator.py b/scripts/translations/batch_translator.py
new file mode 100644
index 000000000..aeb99a074
--- /dev/null
+++ b/scripts/translations/batch_translator.py
@@ -0,0 +1,321 @@
+#!/usr/bin/env python3
+"""
+Batch Translation Script using OpenAI API
+Automatically translates JSON batch files to target language while preserving:
+- Placeholders: {n}, {total}, {filename}, {{variable}}
+- HTML tags: , , etc.
+- Technical terms: PDF, API, OAuth2, SAML2, JWT, etc.
+"""
+
+import json
+import sys
+import argparse
+from pathlib import Path
+import time
+
+try:
+ from openai import OpenAI
+except ImportError:
+ print("Error: openai package not installed. Install with: pip install openai")
+ sys.exit(1)
+
+
+class BatchTranslator:
+ def __init__(self, api_key: str, model: str = "gpt-5"):
+ """Initialize translator with OpenAI API key."""
+ self.client = OpenAI(api_key=api_key)
+ self.model = model
+
+ def get_translation_prompt(self, language_name: str, language_code: str) -> str:
+ """Generate the system prompt for translation."""
+ return f"""You are a professional translator for Stirling PDF, an open-source PDF manipulation tool.
+
+Translate the following JSON from English to {language_name} ({language_code}) for the Stirling PDF user interface.
+
+CRITICAL RULES - MUST FOLLOW EXACTLY:
+
+1. PRESERVE ALL PLACEHOLDERS EXACTLY AS-IS:
+ - Single braces: {{{{n}}}}, {{{{total}}}}, {{{{filename}}}}, {{{{count}}}}, {{{{date}}}}, {{{{planName}}}}, {{{{toolName}}}}, {{{{variable}}}}
+ - Double braces: {{{{{{{{variable}}}}}}}}
+ - Never translate, modify, or remove these - they are template variables
+
+2. KEEP ALL HTML TAGS INTACT:
+ - , ,
, , , etc.
+ - Do not translate tag names, only text between tags
+
+3. DO NOT TRANSLATE TECHNICAL TERMS:
+ - File formats: PDF, JSON, CSV, XML, HTML, ZIP, DOCX, XLSX, PNG, JPG
+ - Protocols: API, OAuth2, SAML2, JWT, SMTP, HTTP, HTTPS, SSL, TLS
+ - Technologies: Git, GitHub, Google, PostHog, Scarf, LibreOffice, Ghostscript, Tesseract, OCR
+ - Technical keywords: URL, URI, DPI, RGB, CMYK, QR
+ - "Stirling PDF" - always keep as-is
+
+4. MAINTAIN CONSISTENT TERMINOLOGY:
+ - Use the SAME translation for repeated terms throughout
+ - Do not introduce new terminology or synonyms
+ - Keep UI action words consistent (e.g., "upload", "download", "compress")
+
+5. PRESERVE SPECIAL KEYWORDS IN CONTEXT:
+ - Mathematical expressions: "2n", "2n-1", "3n" (in page selection)
+ - Special keywords: "all", "odd", "even" (in page contexts)
+ - Code examples and technical patterns
+
+6. JSON STRUCTURE:
+ - Translate ONLY the values (text after :), NEVER the keys
+ - Return ONLY valid JSON with exact same structure
+ - Maintain all quotes, commas, and braces
+
+7. TONE & STYLE:
+ - Use appropriate formal/informal tone for {language_name} UI
+ - Keep translations concise and user-friendly
+ - Maintain the professional but accessible tone of the original
+
+8. DO NOT ADD OR REMOVE TEXT:
+ - Do not add explanations, comments, or extra text
+ - Do not remove any part of the original meaning
+ - Keep the same level of detail
+
+Return ONLY the translated JSON. No markdown, no explanations, just the JSON object."""
+
+ def translate_batch(self, batch_data: dict, target_language: str, language_code: str) -> dict:
+ """Translate a batch file using OpenAI API."""
+ # Convert batch to compact JSON for API
+ input_json = json.dumps(batch_data, ensure_ascii=False, separators=(',', ':'))
+
+ print(f"Translating {len(batch_data)} entries to {target_language}...")
+ print(f"Input size: {len(input_json)} characters")
+
+ try:
+ # GPT-5 only supports temperature=1, so we don't include it
+ response = self.client.chat.completions.create(
+ model=self.model,
+ messages=[
+ {
+ "role": "system",
+ "content": self.get_translation_prompt(target_language, language_code)
+ },
+ {
+ "role": "user",
+ "content": f"Translate this JSON:\n\n{input_json}"
+ }
+ ],
+ )
+
+ translated_text = response.choices[0].message.content.strip()
+
+ # Remove markdown code blocks if present
+ if translated_text.startswith("```"):
+ lines = translated_text.split('\n')
+ translated_text = '\n'.join(lines[1:-1])
+
+ # Parse the translated JSON
+ translated_data = json.loads(translated_text)
+
+ print(f"✓ Translation complete")
+ return translated_data
+
+ except json.JSONDecodeError as e:
+ print(f"Error: AI returned invalid JSON: {e}")
+ print(f"Response: {translated_text[:500]}...")
+ raise
+ except Exception as e:
+ print(f"Error during translation: {e}")
+ raise
+
+ def validate_translation(self, original: dict, translated: dict) -> bool:
+ """Validate that translation preserved all placeholders and structure."""
+ issues = []
+
+ # Check that all keys are present
+ if set(original.keys()) != set(translated.keys()):
+ missing = set(original.keys()) - set(translated.keys())
+ extra = set(translated.keys()) - set(original.keys())
+ if missing:
+ issues.append(f"Missing keys: {missing}")
+ if extra:
+ issues.append(f"Extra keys: {extra}")
+
+ # Check placeholders in each value
+ import re
+ placeholder_pattern = r'\{[^}]+\}|\{\{[^}]+\}\}'
+
+ for key in original.keys():
+ if key not in translated:
+ continue
+
+ orig_value = str(original[key])
+ trans_value = str(translated[key])
+
+ # Find all placeholders in original
+ orig_placeholders = set(re.findall(placeholder_pattern, orig_value))
+ trans_placeholders = set(re.findall(placeholder_pattern, trans_value))
+
+ if orig_placeholders != trans_placeholders:
+ issues.append(f"Placeholder mismatch in '{key}': {orig_placeholders} vs {trans_placeholders}")
+
+ if issues:
+ print("\n⚠ Validation warnings:")
+ for issue in issues[:10]: # Show first 10 issues
+ print(f" - {issue}")
+ if len(issues) > 10:
+ print(f" ... and {len(issues) - 10} more issues")
+ return False
+
+ print("✓ Validation passed")
+ return True
+
+
+def get_language_info(language_code: str) -> tuple:
+ """Get full language name from code."""
+ languages = {
+ 'zh-CN': ('Simplified Chinese', 'zh-CN'),
+ 'es-ES': ('Spanish', 'es-ES'),
+ 'it-IT': ('Italian', 'it-IT'),
+ 'de-DE': ('German', 'de-DE'),
+ 'ar-AR': ('Arabic', 'ar-AR'),
+ 'pt-BR': ('Brazilian Portuguese', 'pt-BR'),
+ 'ru-RU': ('Russian', 'ru-RU'),
+ 'fr-FR': ('French', 'fr-FR'),
+ 'ja-JP': ('Japanese', 'ja-JP'),
+ 'ko-KR': ('Korean', 'ko-KR'),
+ 'nl-NL': ('Dutch', 'nl-NL'),
+ 'pl-PL': ('Polish', 'pl-PL'),
+ 'sv-SE': ('Swedish', 'sv-SE'),
+ 'da-DK': ('Danish', 'da-DK'),
+ 'no-NB': ('Norwegian', 'no-NB'),
+ 'fi-FI': ('Finnish', 'fi-FI'),
+ 'tr-TR': ('Turkish', 'tr-TR'),
+ 'vi-VN': ('Vietnamese', 'vi-VN'),
+ 'th-TH': ('Thai', 'th-TH'),
+ 'id-ID': ('Indonesian', 'id-ID'),
+ 'hi-IN': ('Hindi', 'hi-IN'),
+ 'cs-CZ': ('Czech', 'cs-CZ'),
+ 'hu-HU': ('Hungarian', 'hu-HU'),
+ 'ro-RO': ('Romanian', 'ro-RO'),
+ 'uk-UA': ('Ukrainian', 'uk-UA'),
+ 'el-GR': ('Greek', 'el-GR'),
+ 'bg-BG': ('Bulgarian', 'bg-BG'),
+ 'hr-HR': ('Croatian', 'hr-HR'),
+ 'sk-SK': ('Slovak', 'sk-SK'),
+ 'sl-SI': ('Slovenian', 'sl-SI'),
+ 'ca-CA': ('Catalan', 'ca-CA'),
+ }
+
+ return languages.get(language_code, (language_code, language_code))
+
+
+def main():
+ parser = argparse.ArgumentParser(
+ description='Translate JSON batch files using OpenAI API',
+ formatter_class=argparse.RawDescriptionHelpFormatter,
+ epilog="""
+Examples:
+ # Translate single batch file
+ python batch_translator.py zh_CN_batch_1_of_4.json --api-key YOUR_KEY --language zh-CN
+
+ # Translate all batches for a language (with pattern)
+ python batch_translator.py "zh_CN_batch_*_of_*.json" --api-key YOUR_KEY --language zh-CN
+
+ # Use environment variable for API key
+ export OPENAI_API_KEY=your_key_here
+ python batch_translator.py zh_CN_batch_1_of_4.json --language zh-CN
+
+ # Use different model
+ python batch_translator.py file.json --api-key KEY --language es-ES --model gpt-4-turbo
+ """
+ )
+
+ parser.add_argument('input_files', nargs='+', help='Input batch JSON file(s) or pattern')
+ parser.add_argument('--api-key', help='OpenAI API key (or set OPENAI_API_KEY env var)')
+ parser.add_argument('--language', '-l', required=True, help='Target language code (e.g., zh-CN, es-ES)')
+ parser.add_argument('--model', default='gpt-5', help='OpenAI model to use (default: gpt-5, options: gpt-5-mini, gpt-5-nano)')
+ parser.add_argument('--output-suffix', default='_translated', help='Suffix for output files (default: _translated)')
+ parser.add_argument('--skip-validation', action='store_true', help='Skip validation checks')
+ parser.add_argument('--delay', type=float, default=1.0, help='Delay between API calls in seconds (default: 1.0)')
+
+ args = parser.parse_args()
+
+ # Get API key from args or environment
+ import os
+ api_key = args.api_key or os.environ.get('OPENAI_API_KEY')
+ if not api_key:
+ print("Error: OpenAI API key required. Provide via --api-key or OPENAI_API_KEY environment variable")
+ sys.exit(1)
+
+ # Get language info
+ language_name, language_code = get_language_info(args.language)
+
+ # Expand file patterns
+ import glob
+ input_files = []
+ for pattern in args.input_files:
+ matched = glob.glob(pattern)
+ if matched:
+ input_files.extend(matched)
+ else:
+ input_files.append(pattern) # Use as literal filename
+
+ if not input_files:
+ print("Error: No input files found")
+ sys.exit(1)
+
+ print(f"Batch Translator")
+ print(f"Target Language: {language_name} ({language_code})")
+ print(f"Model: {args.model}")
+ print(f"Files to translate: {len(input_files)}")
+ print("=" * 60)
+
+ # Initialize translator
+ translator = BatchTranslator(api_key, args.model)
+
+ # Process each file
+ successful = 0
+ failed = 0
+
+ for i, input_file in enumerate(input_files, 1):
+ print(f"\n[{i}/{len(input_files)}] Processing: {input_file}")
+
+ try:
+ # Load input file
+ with open(input_file, 'r', encoding='utf-8') as f:
+ batch_data = json.load(f)
+
+ # Translate
+ translated_data = translator.translate_batch(batch_data, language_name, language_code)
+
+ # Validate
+ if not args.skip_validation:
+ translator.validate_translation(batch_data, translated_data)
+
+ # Save output
+ input_path = Path(input_file)
+ output_file = input_path.stem + args.output_suffix + input_path.suffix
+
+ with open(output_file, 'w', encoding='utf-8') as f:
+ json.dump(translated_data, f, ensure_ascii=False, separators=(',', ':'))
+
+ print(f"✓ Saved to: {output_file}")
+ successful += 1
+
+ # Delay between API calls to avoid rate limits
+ if i < len(input_files):
+ time.sleep(args.delay)
+
+ except Exception as e:
+ print(f"✗ Failed: {e}")
+ failed += 1
+ continue
+
+ # Summary
+ print("\n" + "=" * 60)
+ print(f"Translation complete!")
+ print(f"Successful: {successful}/{len(input_files)}")
+ if failed > 0:
+ print(f"Failed: {failed}/{len(input_files)}")
+
+ sys.exit(0 if failed == 0 else 1)
+
+
+if __name__ == "__main__":
+ import os
+ main()