translations (#4906)

# Description of Changes  --- ## Checklist ### General - [ ] I have read the [Contribution Guidelines](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/CONTRIBUTING.md) - [ ] I have read the [Stirling-PDF Developer Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md) (if applicable) - [ ] I have read the [How to add new languages to Stirling-PDF](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md) (if applicable) - [ ] I have performed a self-review of my own code - [ ] My changes generate no new warnings ### Documentation - [ ] I have updated relevant docs on [Stirling-PDF's doc repo](https://github.com/Stirling-Tools/Stirling-Tools.github.io/blob/main/docs/) (if functionality has heavily changed) - [ ] I have read the section [Add New Translation Tags](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md#add-new-translation-tags) (for new translation tags only) ### Translations (if applicable) - [ ] I ran [`scripts/counter_translation.py`](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/docs/counter_translation.md) ### UI Changes (if applicable) - [ ] Screenshots or videos demonstrating the UI changes are attached (e.g., as comments or direct attachments in the PR) ### Testing (if applicable) - [ ] I have tested my changes locally. Refer to the [Testing Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md#6-testing) for more details.
2026-04-22 23:08:53 +02:00 · 2025-11-18 12:23:11 +00:00
parent a7fc36586a
commit 15b8447626
44 changed files with 91001 additions and 91522 deletions
--- a/scripts/translations/README.md
+++ b/scripts/translations/README.md
@@ -2,6 +2,43 @@

 This directory contains Python scripts for managing frontend translations in Stirling PDF. These tools help analyze, merge, validate, and manage translations against the en-GB golden truth file.

+## Quick Start - Automated Translation (RECOMMENDED)
+
+The **fastest and easiest way** to translate a language is using the automated pipeline:
+
+```bash
+# Set your OpenAI API key
+export OPENAI_API_KEY=your_openai_api_key_here
+
+# Translate a language automatically (extract → translate → merge → beautify → verify)
+python3 scripts/translations/auto_translate.py es-ES
+
+# With custom batch size (default: 500 entries per batch)
+python3 scripts/translations/auto_translate.py es-ES --batch-size 600
+
+# Keep temporary files for inspection
+python3 scripts/translations/auto_translate.py es-ES --no-cleanup
+```
+
+**What it does:**
+1. Extracts untranslated entries from the language file
+2. Splits into batches (default 500 entries each)
+3. Translates each batch using GPT-5 with specialized prompts
+4. Validates placeholders are preserved
+5. Merges translated batches
+6. Applies translations to language file
+7. Beautifies structure to match en-GB
+8. Cleans up temporary files
+9. Reports final completion percentage
+
+**Time:** ~8-10 minutes per language with 1200+ untranslated entries
+
+**Cost:** ~$2-4 per language using GPT-5 (or use `gpt-5-mini` for lower cost)
+
+See [`auto_translate.py`](#auto_translatepy-automated-translation-pipeline) for full details.
+
+---
+
 ## Scripts Overview

 ### 0. Validation Scripts (Run First!)
@@ -191,7 +228,97 @@ python scripts/translations/compact_translator.py it-IT --output to_translate.js
 - Batch size control for manageable chunks
 - 50-80% fewer characters than other extraction methods

-### 5. `json_beautifier.py`
+### 5. `auto_translate.py` - Automated Translation Pipeline
+
+**NEW: Fully automated translation workflow using GPT-5.**
+
+Combines all translation steps into a single command that handles everything from extraction to verification.
+
+**Usage:**
+```bash
+# Basic usage (requires OPENAI_API_KEY environment variable)
+export OPENAI_API_KEY=your_api_key
+python3 scripts/translations/auto_translate.py es-ES
+
+# With inline API key
+python3 scripts/translations/auto_translate.py es-ES --api-key YOUR_KEY
+
+# Custom batch size (default: 500 entries)
+python3 scripts/translations/auto_translate.py es-ES --batch-size 600
+
+# Custom timeout per batch (default: 600 seconds / 10 minutes)
+python3 scripts/translations/auto_translate.py es-ES --timeout 900
+
+# Keep temporary files for debugging
+python3 scripts/translations/auto_translate.py es-ES --no-cleanup
+
+# Skip final verification
+python3 scripts/translations/auto_translate.py es-ES --skip-verification
+```
+
+**Features:**
+- Fully automated end-to-end translation pipeline
+- Uses GPT-5 with specialized prompts for Stirling PDF
+- Preserves all placeholders ({n}, {{variable}}, etc.)
+- Maintains consistent terminology
+- Validates translations automatically
+- Creates backups before modifying files
+- Reports detailed progress and final completion %
+
+**Pipeline Steps:**
+1. **Extract**: Finds all untranslated entries
+2. **Split**: Divides into manageable batches (default: 500 entries)
+3. **Translate**: Uses GPT-5 to translate each batch with specialized prompts
+4. **Validate**: Ensures placeholders are preserved
+5. **Merge**: Combines all translated batches
+6. **Apply**: Updates the language file
+7. **Beautify**: Restructures to match en-GB format
+8. **Cleanup**: Removes temporary files
+9. **Verify**: Reports final completion percentage
+
+**Translation Quality:**
+- Preserves ALL placeholders exactly as-is
+- Keeps HTML tags intact (<strong>, <br>, etc.)
+- Doesn't translate technical terms (PDF, API, OAuth2, etc.)
+- Maintains consistent terminology throughout
+- Uses appropriate formal/informal tone per language
+
+**Supported Languages:**
+All language codes from `frontend/public/locales/` (e.g., es-ES, de-DE, fr-FR, zh-CN, ar-AR, etc.)
+
+### 6. `batch_translator.py` - GPT-5 Translation Engine
+
+Low-level translation script used by `auto_translate.py`. Can be used standalone for manual batch translation.
+
+**Usage:**
+```bash
+# Translate single batch file
+python3 scripts/translations/batch_translator.py my_batch.json --language es-ES --api-key YOUR_KEY
+
+# Translate multiple batches
+python3 scripts/translations/batch_translator.py batch_*.json --language de-DE --api-key YOUR_KEY
+
+# Use different GPT model
+python3 scripts/translations/batch_translator.py batch.json --language fr-FR --model gpt-5-mini
+
+# Skip validation
+python3 scripts/translations/batch_translator.py batch.json --language it-IT --skip-validation
+```
+
+**Features:**
+- Translates JSON batch files using OpenAI GPT-5
+- Specialized system prompts for Stirling PDF translations
+- Automatic placeholder validation
+- Supports pattern matching for multiple files
+- Configurable model selection (gpt-5, gpt-5-mini, gpt-5-nano)
+- Rate limiting with configurable delays
+
+**Models:**
+- `gpt-5` (default): Best quality, $1.25/1M input, $10/1M output
+- `gpt-5-mini`: Balanced quality/cost
+- `gpt-5-nano`: Fastest, most economical
+
+### 7. `json_beautifier.py`
 Restructures and beautifies translation JSON files to match en-GB structure exactly.

 **Usage:**
--- a/scripts/translations/auto_translate.py
+++ b/scripts/translations/auto_translate.py
@@ -0,0 +1,324 @@
+#!/usr/bin/env python3
+"""
+Automated Translation Pipeline
+Extracts, translates, merges, and beautifies translations for a language.
+"""
+
+import json
+import sys
+import argparse
+import os
+import subprocess
+from pathlib import Path
+import time
+
+
+def run_command(cmd, description=""):
+    """Run a shell command and return success status."""
+    if description:
+        print(f"\n{'='*60}")
+        print(f"Step: {description}")
+        print(f"{'='*60}")
+
+    result = subprocess.run(cmd, shell=True, capture_output=True, text=True)
+
+    if result.stdout:
+        print(result.stdout)
+    if result.stderr:
+        print(result.stderr, file=sys.stderr)
+
+    return result.returncode == 0
+
+
+def extract_untranslated(language_code, batch_size=500):
+    """Extract untranslated entries and split into batches."""
+    print(f"\n🔍 Extracting untranslated entries for {language_code}...")
+
+    # Load files
+    golden_path = Path(f'frontend/public/locales/en-GB/translation.json')
+    lang_path = Path(f'frontend/public/locales/{language_code}/translation.json')
+
+    if not golden_path.exists():
+        print(f"Error: Golden truth file not found: {golden_path}")
+        return None
+
+    if not lang_path.exists():
+        print(f"Error: Language file not found: {lang_path}")
+        return None
+
+    def load_json(path):
+        with open(path, 'r', encoding='utf-8') as f:
+            return json.load(f)
+
+    def flatten_dict(d, parent_key='', separator='.'):
+        items = []
+        for k, v in d.items():
+            new_key = f"{parent_key}{separator}{k}" if parent_key else k
+            if isinstance(v, dict):
+                items.extend(flatten_dict(v, new_key, separator).items())
+            else:
+                items.append((new_key, str(v)))
+        return dict(items)
+
+    golden = load_json(golden_path)
+    lang_data = load_json(lang_path)
+
+    golden_flat = flatten_dict(golden)
+    lang_flat = flatten_dict(lang_data)
+
+    # Find untranslated
+    untranslated = {}
+    for key, value in golden_flat.items():
+        if (key not in lang_flat or
+            lang_flat.get(key) == value or
+            (isinstance(lang_flat.get(key), str) and lang_flat.get(key).startswith("[UNTRANSLATED]"))):
+            untranslated[key] = value
+
+    total = len(untranslated)
+    print(f"Found {total} untranslated entries")
+
+    if total == 0:
+        print("✓ Language is already complete!")
+        return []
+
+    # Split into batches
+    entries = list(untranslated.items())
+    num_batches = (total + batch_size - 1) // batch_size
+
+    batch_files = []
+    lang_code_safe = language_code.replace('-', '_')
+
+    for i in range(num_batches):
+        start = i * batch_size
+        end = min((i + 1) * batch_size, total)
+        batch = dict(entries[start:end])
+
+        filename = f'{lang_code_safe}_batch_{i+1}_of_{num_batches}.json'
+        with open(filename, 'w', encoding='utf-8') as f:
+            json.dump(batch, f, ensure_ascii=False, separators=(',', ':'))
+
+        batch_files.append(filename)
+        print(f"  Created {filename} with {len(batch)} entries")
+
+    return batch_files
+
+
+def translate_batches(batch_files, language_code, api_key, timeout=600):
+    """Translate all batch files using GPT-5."""
+    if not batch_files:
+        return []
+
+    print(f"\n🤖 Translating {len(batch_files)} batches using GPT-5...")
+    print(f"Timeout: {timeout}s ({timeout//60} minutes) per batch")
+
+    translated_files = []
+
+    for i, batch_file in enumerate(batch_files, 1):
+        print(f"\n[{i}/{len(batch_files)}] Translating {batch_file}...")
+
+        # Always pass API key since it's required
+        cmd = f'python3 scripts/translations/batch_translator.py "{batch_file}" --language {language_code} --api-key "{api_key}"'
+
+        # Run with timeout
+        result = subprocess.run(cmd, shell=True, capture_output=True, text=True, timeout=timeout)
+
+        if result.stdout:
+            print(result.stdout)
+        if result.stderr:
+            print(result.stderr, file=sys.stderr)
+
+        if result.returncode != 0:
+            print(f"✗ Failed to translate {batch_file}")
+            return None
+
+        translated_file = batch_file.replace('.json', '_translated.json')
+        translated_files.append(translated_file)
+
+        # Small delay between batches
+        if i < len(batch_files):
+            time.sleep(1)
+
+    print(f"\n✓ All {len(batch_files)} batches translated successfully")
+    return translated_files
+
+
+def merge_translations(translated_files, language_code):
+    """Merge all translated batch files."""
+    if not translated_files:
+        return None
+
+    print(f"\n🔗 Merging {len(translated_files)} translated batches...")
+
+    merged = {}
+    for filename in translated_files:
+        if not Path(filename).exists():
+            print(f"Error: Translated file not found: {filename}")
+            return None
+
+        with open(filename, 'r', encoding='utf-8') as f:
+            merged.update(json.load(f))
+
+    lang_code_safe = language_code.replace('-', '_')
+    merged_file = f'{lang_code_safe}_merged.json'
+
+    with open(merged_file, 'w', encoding='utf-8') as f:
+        json.dump(merged, f, ensure_ascii=False, separators=(',', ':'))
+
+    print(f"✓ Merged {len(merged)} translations into {merged_file}")
+    return merged_file
+
+
+def apply_translations(merged_file, language_code):
+    """Apply merged translations to the language file."""
+    print(f"\n📝 Applying translations to {language_code}...")
+
+    cmd = f'python3 scripts/translations/translation_merger.py {language_code} apply-translations --translations-file {merged_file}'
+
+    if not run_command(cmd):
+        print(f"✗ Failed to apply translations")
+        return False
+
+    print(f"✓ Translations applied successfully")
+    return True
+
+
+def beautify_translations(language_code):
+    """Beautify translation file to match en-GB structure."""
+    print(f"\n✨ Beautifying {language_code} translation file...")
+
+    cmd = f'python3 scripts/translations/json_beautifier.py --language {language_code}'
+
+    if not run_command(cmd):
+        print(f"✗ Failed to beautify translations")
+        return False
+
+    print(f"✓ Translation file beautified")
+    return True
+
+
+def cleanup_temp_files(language_code):
+    """Remove temporary batch files."""
+    print(f"\n🧹 Cleaning up temporary files...")
+
+    lang_code_safe = language_code.replace('-', '_')
+    patterns = [
+        f'{lang_code_safe}_batch_*.json',
+        f'{lang_code_safe}_merged.json'
+    ]
+
+    import glob
+    removed = 0
+    for pattern in patterns:
+        for file in glob.glob(pattern):
+            Path(file).unlink()
+            removed += 1
+
+    print(f"✓ Removed {removed} temporary files")
+
+
+def verify_completion(language_code):
+    """Check final completion percentage."""
+    print(f"\n📊 Verifying completion...")
+
+    cmd = f'python3 scripts/translations/translation_analyzer.py --language {language_code} --summary'
+    run_command(cmd)
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description='Automated translation pipeline for Stirling PDF',
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Examples:
+  # Translate Spanish with API key in environment
+  export OPENAI_API_KEY=your_key_here
+  python3 scripts/translations/auto_translate.py es-ES
+
+  # Translate German with inline API key
+  python3 scripts/translations/auto_translate.py de-DE --api-key YOUR_KEY
+
+  # Translate Italian with custom batch size
+  python3 scripts/translations/auto_translate.py it-IT --batch-size 600
+
+  # Skip cleanup (keep temporary files for inspection)
+  python3 scripts/translations/auto_translate.py fr-FR --no-cleanup
+        """
+    )
+
+    parser.add_argument('language', help='Language code (e.g., es-ES, de-DE, zh-CN)')
+    parser.add_argument('--api-key', help='OpenAI API key (or set OPENAI_API_KEY env var)')
+    parser.add_argument('--batch-size', type=int, default=500, help='Entries per batch (default: 500)')
+    parser.add_argument('--no-cleanup', action='store_true', help='Keep temporary batch files')
+    parser.add_argument('--skip-verification', action='store_true', help='Skip final completion check')
+    parser.add_argument('--timeout', type=int, default=600, help='Timeout per batch in seconds (default: 600 = 10 minutes)')
+
+    args = parser.parse_args()
+
+    # Verify API key
+    api_key = args.api_key or os.environ.get('OPENAI_API_KEY')
+    if not api_key:
+        print("Error: OpenAI API key required. Provide via --api-key or OPENAI_API_KEY environment variable")
+        sys.exit(1)
+
+    print("="*60)
+    print(f"Automated Translation Pipeline")
+    print(f"Language: {args.language}")
+    print(f"Batch Size: {args.batch_size} entries")
+    print("="*60)
+
+    start_time = time.time()
+
+    try:
+        # Step 1: Extract and split
+        batch_files = extract_untranslated(args.language, args.batch_size)
+        if batch_files is None:
+            sys.exit(1)
+
+        if len(batch_files) == 0:
+            print("\n✓ Nothing to translate!")
+            sys.exit(0)
+
+        # Step 2: Translate all batches
+        translated_files = translate_batches(batch_files, args.language, api_key, args.timeout)
+        if translated_files is None:
+            sys.exit(1)
+
+        # Step 3: Merge translations
+        merged_file = merge_translations(translated_files, args.language)
+        if merged_file is None:
+            sys.exit(1)
+
+        # Step 4: Apply translations
+        if not apply_translations(merged_file, args.language):
+            sys.exit(1)
+
+        # Step 5: Beautify
+        if not beautify_translations(args.language):
+            sys.exit(1)
+
+        # Step 6: Cleanup
+        if not args.no_cleanup:
+            cleanup_temp_files(args.language)
+
+        # Step 7: Verify
+        if not args.skip_verification:
+            verify_completion(args.language)
+
+        elapsed = time.time() - start_time
+        print("\n" + "="*60)
+        print(f"✅ Translation pipeline completed successfully!")
+        print(f"Time elapsed: {elapsed:.1f} seconds")
+        print("="*60)
+
+    except KeyboardInterrupt:
+        print("\n\n⚠ Translation interrupted by user")
+        sys.exit(1)
+    except Exception as e:
+        print(f"\n\n✗ Error: {e}")
+        import traceback
+        traceback.print_exc()
+        sys.exit(1)
+
+
+if __name__ == "__main__":
+    main()
--- a/scripts/translations/batch_translator.py
+++ b/scripts/translations/batch_translator.py
@@ -0,0 +1,321 @@
+#!/usr/bin/env python3
+"""
+Batch Translation Script using OpenAI API
+Automatically translates JSON batch files to target language while preserving:
+- Placeholders: {n}, {total}, {filename}, {{variable}}
+- HTML tags: <strong>, </strong>, etc.
+- Technical terms: PDF, API, OAuth2, SAML2, JWT, etc.
+"""
+
+import json
+import sys
+import argparse
+from pathlib import Path
+import time
+
+try:
+    from openai import OpenAI
+except ImportError:
+    print("Error: openai package not installed. Install with: pip install openai")
+    sys.exit(1)
+
+
+class BatchTranslator:
+    def __init__(self, api_key: str, model: str = "gpt-5"):
+        """Initialize translator with OpenAI API key."""
+        self.client = OpenAI(api_key=api_key)
+        self.model = model
+
+    def get_translation_prompt(self, language_name: str, language_code: str) -> str:
+        """Generate the system prompt for translation."""
+        return f"""You are a professional translator for Stirling PDF, an open-source PDF manipulation tool.
+
+Translate the following JSON from English to {language_name} ({language_code}) for the Stirling PDF user interface.
+
+CRITICAL RULES - MUST FOLLOW EXACTLY:
+
+1. PRESERVE ALL PLACEHOLDERS EXACTLY AS-IS:
+   - Single braces: {{{{n}}}}, {{{{total}}}}, {{{{filename}}}}, {{{{count}}}}, {{{{date}}}}, {{{{planName}}}}, {{{{toolName}}}}, {{{{variable}}}}
+   - Double braces: {{{{{{{{variable}}}}}}}}
+   - Never translate, modify, or remove these - they are template variables
+
+2. KEEP ALL HTML TAGS INTACT:
+   - <strong>, </strong>, <br>, <code>, </code>, etc.
+   - Do not translate tag names, only text between tags
+
+3. DO NOT TRANSLATE TECHNICAL TERMS:
+   - File formats: PDF, JSON, CSV, XML, HTML, ZIP, DOCX, XLSX, PNG, JPG
+   - Protocols: API, OAuth2, SAML2, JWT, SMTP, HTTP, HTTPS, SSL, TLS
+   - Technologies: Git, GitHub, Google, PostHog, Scarf, LibreOffice, Ghostscript, Tesseract, OCR
+   - Technical keywords: URL, URI, DPI, RGB, CMYK, QR
+   - "Stirling PDF" - always keep as-is
+
+4. MAINTAIN CONSISTENT TERMINOLOGY:
+   - Use the SAME translation for repeated terms throughout
+   - Do not introduce new terminology or synonyms
+   - Keep UI action words consistent (e.g., "upload", "download", "compress")
+
+5. PRESERVE SPECIAL KEYWORDS IN CONTEXT:
+   - Mathematical expressions: "2n", "2n-1", "3n" (in page selection)
+   - Special keywords: "all", "odd", "even" (in page contexts)
+   - Code examples and technical patterns
+
+6. JSON STRUCTURE:
+   - Translate ONLY the values (text after :), NEVER the keys
+   - Return ONLY valid JSON with exact same structure
+   - Maintain all quotes, commas, and braces
+
+7. TONE & STYLE:
+   - Use appropriate formal/informal tone for {language_name} UI
+   - Keep translations concise and user-friendly
+   - Maintain the professional but accessible tone of the original
+
+8. DO NOT ADD OR REMOVE TEXT:
+   - Do not add explanations, comments, or extra text
+   - Do not remove any part of the original meaning
+   - Keep the same level of detail
+
+Return ONLY the translated JSON. No markdown, no explanations, just the JSON object."""
+
+    def translate_batch(self, batch_data: dict, target_language: str, language_code: str) -> dict:
+        """Translate a batch file using OpenAI API."""
+        # Convert batch to compact JSON for API
+        input_json = json.dumps(batch_data, ensure_ascii=False, separators=(',', ':'))
+
+        print(f"Translating {len(batch_data)} entries to {target_language}...")
+        print(f"Input size: {len(input_json)} characters")
+
+        try:
+            # GPT-5 only supports temperature=1, so we don't include it
+            response = self.client.chat.completions.create(
+                model=self.model,
+                messages=[
+                    {
+                        "role": "system",
+                        "content": self.get_translation_prompt(target_language, language_code)
+                    },
+                    {
+                        "role": "user",
+                        "content": f"Translate this JSON:\n\n{input_json}"
+                    }
+                ],
+            )
+
+            translated_text = response.choices[0].message.content.strip()
+
+            # Remove markdown code blocks if present
+            if translated_text.startswith("```"):
+                lines = translated_text.split('\n')
+                translated_text = '\n'.join(lines[1:-1])
+
+            # Parse the translated JSON
+            translated_data = json.loads(translated_text)
+
+            print(f"✓ Translation complete")
+            return translated_data
+
+        except json.JSONDecodeError as e:
+            print(f"Error: AI returned invalid JSON: {e}")
+            print(f"Response: {translated_text[:500]}...")
+            raise
+        except Exception as e:
+            print(f"Error during translation: {e}")
+            raise
+
+    def validate_translation(self, original: dict, translated: dict) -> bool:
+        """Validate that translation preserved all placeholders and structure."""
+        issues = []
+
+        # Check that all keys are present
+        if set(original.keys()) != set(translated.keys()):
+            missing = set(original.keys()) - set(translated.keys())
+            extra = set(translated.keys()) - set(original.keys())
+            if missing:
+                issues.append(f"Missing keys: {missing}")
+            if extra:
+                issues.append(f"Extra keys: {extra}")
+
+        # Check placeholders in each value
+        import re
+        placeholder_pattern = r'\{[^}]+\}|\{\{[^}]+\}\}'
+
+        for key in original.keys():
+            if key not in translated:
+                continue
+
+            orig_value = str(original[key])
+            trans_value = str(translated[key])
+
+            # Find all placeholders in original
+            orig_placeholders = set(re.findall(placeholder_pattern, orig_value))
+            trans_placeholders = set(re.findall(placeholder_pattern, trans_value))
+
+            if orig_placeholders != trans_placeholders:
+                issues.append(f"Placeholder mismatch in '{key}': {orig_placeholders} vs {trans_placeholders}")
+
+        if issues:
+            print("\n⚠ Validation warnings:")
+            for issue in issues[:10]:  # Show first 10 issues
+                print(f"  - {issue}")
+            if len(issues) > 10:
+                print(f"  ... and {len(issues) - 10} more issues")
+            return False
+
+        print("✓ Validation passed")
+        return True
+
+
+def get_language_info(language_code: str) -> tuple:
+    """Get full language name from code."""
+    languages = {
+        'zh-CN': ('Simplified Chinese', 'zh-CN'),
+        'es-ES': ('Spanish', 'es-ES'),
+        'it-IT': ('Italian', 'it-IT'),
+        'de-DE': ('German', 'de-DE'),
+        'ar-AR': ('Arabic', 'ar-AR'),
+        'pt-BR': ('Brazilian Portuguese', 'pt-BR'),
+        'ru-RU': ('Russian', 'ru-RU'),
+        'fr-FR': ('French', 'fr-FR'),
+        'ja-JP': ('Japanese', 'ja-JP'),
+        'ko-KR': ('Korean', 'ko-KR'),
+        'nl-NL': ('Dutch', 'nl-NL'),
+        'pl-PL': ('Polish', 'pl-PL'),
+        'sv-SE': ('Swedish', 'sv-SE'),
+        'da-DK': ('Danish', 'da-DK'),
+        'no-NB': ('Norwegian', 'no-NB'),
+        'fi-FI': ('Finnish', 'fi-FI'),
+        'tr-TR': ('Turkish', 'tr-TR'),
+        'vi-VN': ('Vietnamese', 'vi-VN'),
+        'th-TH': ('Thai', 'th-TH'),
+        'id-ID': ('Indonesian', 'id-ID'),
+        'hi-IN': ('Hindi', 'hi-IN'),
+        'cs-CZ': ('Czech', 'cs-CZ'),
+        'hu-HU': ('Hungarian', 'hu-HU'),
+        'ro-RO': ('Romanian', 'ro-RO'),
+        'uk-UA': ('Ukrainian', 'uk-UA'),
+        'el-GR': ('Greek', 'el-GR'),
+        'bg-BG': ('Bulgarian', 'bg-BG'),
+        'hr-HR': ('Croatian', 'hr-HR'),
+        'sk-SK': ('Slovak', 'sk-SK'),
+        'sl-SI': ('Slovenian', 'sl-SI'),
+        'ca-CA': ('Catalan', 'ca-CA'),
+    }
+
+    return languages.get(language_code, (language_code, language_code))
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description='Translate JSON batch files using OpenAI API',
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Examples:
+  # Translate single batch file
+  python batch_translator.py zh_CN_batch_1_of_4.json --api-key YOUR_KEY --language zh-CN
+
+  # Translate all batches for a language (with pattern)
+  python batch_translator.py "zh_CN_batch_*_of_*.json" --api-key YOUR_KEY --language zh-CN
+
+  # Use environment variable for API key
+  export OPENAI_API_KEY=your_key_here
+  python batch_translator.py zh_CN_batch_1_of_4.json --language zh-CN
+
+  # Use different model
+  python batch_translator.py file.json --api-key KEY --language es-ES --model gpt-4-turbo
+        """
+    )
+
+    parser.add_argument('input_files', nargs='+', help='Input batch JSON file(s) or pattern')
+    parser.add_argument('--api-key', help='OpenAI API key (or set OPENAI_API_KEY env var)')
+    parser.add_argument('--language', '-l', required=True, help='Target language code (e.g., zh-CN, es-ES)')
+    parser.add_argument('--model', default='gpt-5', help='OpenAI model to use (default: gpt-5, options: gpt-5-mini, gpt-5-nano)')
+    parser.add_argument('--output-suffix', default='_translated', help='Suffix for output files (default: _translated)')
+    parser.add_argument('--skip-validation', action='store_true', help='Skip validation checks')
+    parser.add_argument('--delay', type=float, default=1.0, help='Delay between API calls in seconds (default: 1.0)')
+
+    args = parser.parse_args()
+
+    # Get API key from args or environment
+    import os
+    api_key = args.api_key or os.environ.get('OPENAI_API_KEY')
+    if not api_key:
+        print("Error: OpenAI API key required. Provide via --api-key or OPENAI_API_KEY environment variable")
+        sys.exit(1)
+
+    # Get language info
+    language_name, language_code = get_language_info(args.language)
+
+    # Expand file patterns
+    import glob
+    input_files = []
+    for pattern in args.input_files:
+        matched = glob.glob(pattern)
+        if matched:
+            input_files.extend(matched)
+        else:
+            input_files.append(pattern)  # Use as literal filename
+
+    if not input_files:
+        print("Error: No input files found")
+        sys.exit(1)
+
+    print(f"Batch Translator")
+    print(f"Target Language: {language_name} ({language_code})")
+    print(f"Model: {args.model}")
+    print(f"Files to translate: {len(input_files)}")
+    print("=" * 60)
+
+    # Initialize translator
+    translator = BatchTranslator(api_key, args.model)
+
+    # Process each file
+    successful = 0
+    failed = 0
+
+    for i, input_file in enumerate(input_files, 1):
+        print(f"\n[{i}/{len(input_files)}] Processing: {input_file}")
+
+        try:
+            # Load input file
+            with open(input_file, 'r', encoding='utf-8') as f:
+                batch_data = json.load(f)
+
+            # Translate
+            translated_data = translator.translate_batch(batch_data, language_name, language_code)
+
+            # Validate
+            if not args.skip_validation:
+                translator.validate_translation(batch_data, translated_data)
+
+            # Save output
+            input_path = Path(input_file)
+            output_file = input_path.stem + args.output_suffix + input_path.suffix
+
+            with open(output_file, 'w', encoding='utf-8') as f:
+                json.dump(translated_data, f, ensure_ascii=False, separators=(',', ':'))
+
+            print(f"✓ Saved to: {output_file}")
+            successful += 1
+
+            # Delay between API calls to avoid rate limits
+            if i < len(input_files):
+                time.sleep(args.delay)
+
+        except Exception as e:
+            print(f"✗ Failed: {e}")
+            failed += 1
+            continue
+
+    # Summary
+    print("\n" + "=" * 60)
+    print(f"Translation complete!")
+    print(f"Successful: {successful}/{len(input_files)}")
+    if failed > 0:
+        print(f"Failed: {failed}/{len(input_files)}")
+
+    sys.exit(0 if failed == 0 else 1)
+
+
+if __name__ == "__main__":
+    import os
+    main()