mirror of
https://github.com/Frooodle/Stirling-PDF.git
synced 2025-11-16 01:21:16 +01:00
translations
This commit is contained in:
parent
5c9e590856
commit
c87e34ecf9
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@ -2,6 +2,43 @@
|
||||
|
||||
This directory contains Python scripts for managing frontend translations in Stirling PDF. These tools help analyze, merge, validate, and manage translations against the en-GB golden truth file.
|
||||
|
||||
## Quick Start - Automated Translation (RECOMMENDED)
|
||||
|
||||
The **fastest and easiest way** to translate a language is using the automated pipeline:
|
||||
|
||||
```bash
|
||||
# Set your OpenAI API key
|
||||
export OPENAI_API_KEY=your_openai_api_key_here
|
||||
|
||||
# Translate a language automatically (extract → translate → merge → beautify → verify)
|
||||
python3 scripts/translations/auto_translate.py es-ES
|
||||
|
||||
# With custom batch size (default: 500 entries per batch)
|
||||
python3 scripts/translations/auto_translate.py es-ES --batch-size 600
|
||||
|
||||
# Keep temporary files for inspection
|
||||
python3 scripts/translations/auto_translate.py es-ES --no-cleanup
|
||||
```
|
||||
|
||||
**What it does:**
|
||||
1. Extracts untranslated entries from the language file
|
||||
2. Splits into batches (default 500 entries each)
|
||||
3. Translates each batch using GPT-5 with specialized prompts
|
||||
4. Validates placeholders are preserved
|
||||
5. Merges translated batches
|
||||
6. Applies translations to language file
|
||||
7. Beautifies structure to match en-GB
|
||||
8. Cleans up temporary files
|
||||
9. Reports final completion percentage
|
||||
|
||||
**Time:** ~8-10 minutes per language with 1200+ untranslated entries
|
||||
|
||||
**Cost:** ~$2-4 per language using GPT-5 (or use `gpt-5-mini` for lower cost)
|
||||
|
||||
See [`auto_translate.py`](#auto_translatepy-automated-translation-pipeline) for full details.
|
||||
|
||||
---
|
||||
|
||||
## Scripts Overview
|
||||
|
||||
### 0. Validation Scripts (Run First!)
|
||||
@ -191,7 +228,97 @@ python scripts/translations/compact_translator.py it-IT --output to_translate.js
|
||||
- Batch size control for manageable chunks
|
||||
- 50-80% fewer characters than other extraction methods
|
||||
|
||||
### 5. `json_beautifier.py`
|
||||
### 5. `auto_translate.py` - Automated Translation Pipeline
|
||||
|
||||
**NEW: Fully automated translation workflow using GPT-5.**
|
||||
|
||||
Combines all translation steps into a single command that handles everything from extraction to verification.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
# Basic usage (requires OPENAI_API_KEY environment variable)
|
||||
export OPENAI_API_KEY=your_api_key
|
||||
python3 scripts/translations/auto_translate.py es-ES
|
||||
|
||||
# With inline API key
|
||||
python3 scripts/translations/auto_translate.py es-ES --api-key YOUR_KEY
|
||||
|
||||
# Custom batch size (default: 500 entries)
|
||||
python3 scripts/translations/auto_translate.py es-ES --batch-size 600
|
||||
|
||||
# Custom timeout per batch (default: 600 seconds / 10 minutes)
|
||||
python3 scripts/translations/auto_translate.py es-ES --timeout 900
|
||||
|
||||
# Keep temporary files for debugging
|
||||
python3 scripts/translations/auto_translate.py es-ES --no-cleanup
|
||||
|
||||
# Skip final verification
|
||||
python3 scripts/translations/auto_translate.py es-ES --skip-verification
|
||||
```
|
||||
|
||||
**Features:**
|
||||
- Fully automated end-to-end translation pipeline
|
||||
- Uses GPT-5 with specialized prompts for Stirling PDF
|
||||
- Preserves all placeholders ({n}, {{variable}}, etc.)
|
||||
- Maintains consistent terminology
|
||||
- Validates translations automatically
|
||||
- Creates backups before modifying files
|
||||
- Reports detailed progress and final completion %
|
||||
|
||||
**Pipeline Steps:**
|
||||
1. **Extract**: Finds all untranslated entries
|
||||
2. **Split**: Divides into manageable batches (default: 500 entries)
|
||||
3. **Translate**: Uses GPT-5 to translate each batch with specialized prompts
|
||||
4. **Validate**: Ensures placeholders are preserved
|
||||
5. **Merge**: Combines all translated batches
|
||||
6. **Apply**: Updates the language file
|
||||
7. **Beautify**: Restructures to match en-GB format
|
||||
8. **Cleanup**: Removes temporary files
|
||||
9. **Verify**: Reports final completion percentage
|
||||
|
||||
**Translation Quality:**
|
||||
- Preserves ALL placeholders exactly as-is
|
||||
- Keeps HTML tags intact (<strong>, <br>, etc.)
|
||||
- Doesn't translate technical terms (PDF, API, OAuth2, etc.)
|
||||
- Maintains consistent terminology throughout
|
||||
- Uses appropriate formal/informal tone per language
|
||||
|
||||
**Supported Languages:**
|
||||
All language codes from `frontend/public/locales/` (e.g., es-ES, de-DE, fr-FR, zh-CN, ar-AR, etc.)
|
||||
|
||||
### 6. `batch_translator.py` - GPT-5 Translation Engine
|
||||
|
||||
Low-level translation script used by `auto_translate.py`. Can be used standalone for manual batch translation.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
# Translate single batch file
|
||||
python3 scripts/translations/batch_translator.py my_batch.json --language es-ES --api-key YOUR_KEY
|
||||
|
||||
# Translate multiple batches
|
||||
python3 scripts/translations/batch_translator.py batch_*.json --language de-DE --api-key YOUR_KEY
|
||||
|
||||
# Use different GPT model
|
||||
python3 scripts/translations/batch_translator.py batch.json --language fr-FR --model gpt-5-mini
|
||||
|
||||
# Skip validation
|
||||
python3 scripts/translations/batch_translator.py batch.json --language it-IT --skip-validation
|
||||
```
|
||||
|
||||
**Features:**
|
||||
- Translates JSON batch files using OpenAI GPT-5
|
||||
- Specialized system prompts for Stirling PDF translations
|
||||
- Automatic placeholder validation
|
||||
- Supports pattern matching for multiple files
|
||||
- Configurable model selection (gpt-5, gpt-5-mini, gpt-5-nano)
|
||||
- Rate limiting with configurable delays
|
||||
|
||||
**Models:**
|
||||
- `gpt-5` (default): Best quality, $1.25/1M input, $10/1M output
|
||||
- `gpt-5-mini`: Balanced quality/cost
|
||||
- `gpt-5-nano`: Fastest, most economical
|
||||
|
||||
### 7. `json_beautifier.py`
|
||||
Restructures and beautifies translation JSON files to match en-GB structure exactly.
|
||||
|
||||
**Usage:**
|
||||
|
||||
324
scripts/translations/auto_translate.py
Normal file
324
scripts/translations/auto_translate.py
Normal file
@ -0,0 +1,324 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Automated Translation Pipeline
|
||||
Extracts, translates, merges, and beautifies translations for a language.
|
||||
"""
|
||||
|
||||
import json
|
||||
import sys
|
||||
import argparse
|
||||
import os
|
||||
import subprocess
|
||||
from pathlib import Path
|
||||
import time
|
||||
|
||||
|
||||
def run_command(cmd, description=""):
|
||||
"""Run a shell command and return success status."""
|
||||
if description:
|
||||
print(f"\n{'='*60}")
|
||||
print(f"Step: {description}")
|
||||
print(f"{'='*60}")
|
||||
|
||||
result = subprocess.run(cmd, shell=True, capture_output=True, text=True)
|
||||
|
||||
if result.stdout:
|
||||
print(result.stdout)
|
||||
if result.stderr:
|
||||
print(result.stderr, file=sys.stderr)
|
||||
|
||||
return result.returncode == 0
|
||||
|
||||
|
||||
def extract_untranslated(language_code, batch_size=500):
|
||||
"""Extract untranslated entries and split into batches."""
|
||||
print(f"\n🔍 Extracting untranslated entries for {language_code}...")
|
||||
|
||||
# Load files
|
||||
golden_path = Path(f'frontend/public/locales/en-GB/translation.json')
|
||||
lang_path = Path(f'frontend/public/locales/{language_code}/translation.json')
|
||||
|
||||
if not golden_path.exists():
|
||||
print(f"Error: Golden truth file not found: {golden_path}")
|
||||
return None
|
||||
|
||||
if not lang_path.exists():
|
||||
print(f"Error: Language file not found: {lang_path}")
|
||||
return None
|
||||
|
||||
def load_json(path):
|
||||
with open(path, 'r', encoding='utf-8') as f:
|
||||
return json.load(f)
|
||||
|
||||
def flatten_dict(d, parent_key='', separator='.'):
|
||||
items = []
|
||||
for k, v in d.items():
|
||||
new_key = f"{parent_key}{separator}{k}" if parent_key else k
|
||||
if isinstance(v, dict):
|
||||
items.extend(flatten_dict(v, new_key, separator).items())
|
||||
else:
|
||||
items.append((new_key, str(v)))
|
||||
return dict(items)
|
||||
|
||||
golden = load_json(golden_path)
|
||||
lang_data = load_json(lang_path)
|
||||
|
||||
golden_flat = flatten_dict(golden)
|
||||
lang_flat = flatten_dict(lang_data)
|
||||
|
||||
# Find untranslated
|
||||
untranslated = {}
|
||||
for key, value in golden_flat.items():
|
||||
if (key not in lang_flat or
|
||||
lang_flat.get(key) == value or
|
||||
(isinstance(lang_flat.get(key), str) and lang_flat.get(key).startswith("[UNTRANSLATED]"))):
|
||||
untranslated[key] = value
|
||||
|
||||
total = len(untranslated)
|
||||
print(f"Found {total} untranslated entries")
|
||||
|
||||
if total == 0:
|
||||
print("✓ Language is already complete!")
|
||||
return []
|
||||
|
||||
# Split into batches
|
||||
entries = list(untranslated.items())
|
||||
num_batches = (total + batch_size - 1) // batch_size
|
||||
|
||||
batch_files = []
|
||||
lang_code_safe = language_code.replace('-', '_')
|
||||
|
||||
for i in range(num_batches):
|
||||
start = i * batch_size
|
||||
end = min((i + 1) * batch_size, total)
|
||||
batch = dict(entries[start:end])
|
||||
|
||||
filename = f'{lang_code_safe}_batch_{i+1}_of_{num_batches}.json'
|
||||
with open(filename, 'w', encoding='utf-8') as f:
|
||||
json.dump(batch, f, ensure_ascii=False, separators=(',', ':'))
|
||||
|
||||
batch_files.append(filename)
|
||||
print(f" Created {filename} with {len(batch)} entries")
|
||||
|
||||
return batch_files
|
||||
|
||||
|
||||
def translate_batches(batch_files, language_code, api_key, timeout=600):
|
||||
"""Translate all batch files using GPT-5."""
|
||||
if not batch_files:
|
||||
return []
|
||||
|
||||
print(f"\n🤖 Translating {len(batch_files)} batches using GPT-5...")
|
||||
print(f"Timeout: {timeout}s ({timeout//60} minutes) per batch")
|
||||
|
||||
translated_files = []
|
||||
|
||||
for i, batch_file in enumerate(batch_files, 1):
|
||||
print(f"\n[{i}/{len(batch_files)}] Translating {batch_file}...")
|
||||
|
||||
# Always pass API key since it's required
|
||||
cmd = f'python3 scripts/translations/batch_translator.py "{batch_file}" --language {language_code} --api-key "{api_key}"'
|
||||
|
||||
# Run with timeout
|
||||
result = subprocess.run(cmd, shell=True, capture_output=True, text=True, timeout=timeout)
|
||||
|
||||
if result.stdout:
|
||||
print(result.stdout)
|
||||
if result.stderr:
|
||||
print(result.stderr, file=sys.stderr)
|
||||
|
||||
if result.returncode != 0:
|
||||
print(f"✗ Failed to translate {batch_file}")
|
||||
return None
|
||||
|
||||
translated_file = batch_file.replace('.json', '_translated.json')
|
||||
translated_files.append(translated_file)
|
||||
|
||||
# Small delay between batches
|
||||
if i < len(batch_files):
|
||||
time.sleep(1)
|
||||
|
||||
print(f"\n✓ All {len(batch_files)} batches translated successfully")
|
||||
return translated_files
|
||||
|
||||
|
||||
def merge_translations(translated_files, language_code):
|
||||
"""Merge all translated batch files."""
|
||||
if not translated_files:
|
||||
return None
|
||||
|
||||
print(f"\n🔗 Merging {len(translated_files)} translated batches...")
|
||||
|
||||
merged = {}
|
||||
for filename in translated_files:
|
||||
if not Path(filename).exists():
|
||||
print(f"Error: Translated file not found: {filename}")
|
||||
return None
|
||||
|
||||
with open(filename, 'r', encoding='utf-8') as f:
|
||||
merged.update(json.load(f))
|
||||
|
||||
lang_code_safe = language_code.replace('-', '_')
|
||||
merged_file = f'{lang_code_safe}_merged.json'
|
||||
|
||||
with open(merged_file, 'w', encoding='utf-8') as f:
|
||||
json.dump(merged, f, ensure_ascii=False, separators=(',', ':'))
|
||||
|
||||
print(f"✓ Merged {len(merged)} translations into {merged_file}")
|
||||
return merged_file
|
||||
|
||||
|
||||
def apply_translations(merged_file, language_code):
|
||||
"""Apply merged translations to the language file."""
|
||||
print(f"\n📝 Applying translations to {language_code}...")
|
||||
|
||||
cmd = f'python3 scripts/translations/translation_merger.py {language_code} apply-translations --translations-file {merged_file}'
|
||||
|
||||
if not run_command(cmd):
|
||||
print(f"✗ Failed to apply translations")
|
||||
return False
|
||||
|
||||
print(f"✓ Translations applied successfully")
|
||||
return True
|
||||
|
||||
|
||||
def beautify_translations(language_code):
|
||||
"""Beautify translation file to match en-GB structure."""
|
||||
print(f"\n✨ Beautifying {language_code} translation file...")
|
||||
|
||||
cmd = f'python3 scripts/translations/json_beautifier.py --language {language_code}'
|
||||
|
||||
if not run_command(cmd):
|
||||
print(f"✗ Failed to beautify translations")
|
||||
return False
|
||||
|
||||
print(f"✓ Translation file beautified")
|
||||
return True
|
||||
|
||||
|
||||
def cleanup_temp_files(language_code):
|
||||
"""Remove temporary batch files."""
|
||||
print(f"\n🧹 Cleaning up temporary files...")
|
||||
|
||||
lang_code_safe = language_code.replace('-', '_')
|
||||
patterns = [
|
||||
f'{lang_code_safe}_batch_*.json',
|
||||
f'{lang_code_safe}_merged.json'
|
||||
]
|
||||
|
||||
import glob
|
||||
removed = 0
|
||||
for pattern in patterns:
|
||||
for file in glob.glob(pattern):
|
||||
Path(file).unlink()
|
||||
removed += 1
|
||||
|
||||
print(f"✓ Removed {removed} temporary files")
|
||||
|
||||
|
||||
def verify_completion(language_code):
|
||||
"""Check final completion percentage."""
|
||||
print(f"\n📊 Verifying completion...")
|
||||
|
||||
cmd = f'python3 scripts/translations/translation_analyzer.py --language {language_code} --summary'
|
||||
run_command(cmd)
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description='Automated translation pipeline for Stirling PDF',
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
Examples:
|
||||
# Translate Spanish with API key in environment
|
||||
export OPENAI_API_KEY=your_key_here
|
||||
python3 scripts/translations/auto_translate.py es-ES
|
||||
|
||||
# Translate German with inline API key
|
||||
python3 scripts/translations/auto_translate.py de-DE --api-key YOUR_KEY
|
||||
|
||||
# Translate Italian with custom batch size
|
||||
python3 scripts/translations/auto_translate.py it-IT --batch-size 600
|
||||
|
||||
# Skip cleanup (keep temporary files for inspection)
|
||||
python3 scripts/translations/auto_translate.py fr-FR --no-cleanup
|
||||
"""
|
||||
)
|
||||
|
||||
parser.add_argument('language', help='Language code (e.g., es-ES, de-DE, zh-CN)')
|
||||
parser.add_argument('--api-key', help='OpenAI API key (or set OPENAI_API_KEY env var)')
|
||||
parser.add_argument('--batch-size', type=int, default=500, help='Entries per batch (default: 500)')
|
||||
parser.add_argument('--no-cleanup', action='store_true', help='Keep temporary batch files')
|
||||
parser.add_argument('--skip-verification', action='store_true', help='Skip final completion check')
|
||||
parser.add_argument('--timeout', type=int, default=600, help='Timeout per batch in seconds (default: 600 = 10 minutes)')
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# Verify API key
|
||||
api_key = args.api_key or os.environ.get('OPENAI_API_KEY')
|
||||
if not api_key:
|
||||
print("Error: OpenAI API key required. Provide via --api-key or OPENAI_API_KEY environment variable")
|
||||
sys.exit(1)
|
||||
|
||||
print("="*60)
|
||||
print(f"Automated Translation Pipeline")
|
||||
print(f"Language: {args.language}")
|
||||
print(f"Batch Size: {args.batch_size} entries")
|
||||
print("="*60)
|
||||
|
||||
start_time = time.time()
|
||||
|
||||
try:
|
||||
# Step 1: Extract and split
|
||||
batch_files = extract_untranslated(args.language, args.batch_size)
|
||||
if batch_files is None:
|
||||
sys.exit(1)
|
||||
|
||||
if len(batch_files) == 0:
|
||||
print("\n✓ Nothing to translate!")
|
||||
sys.exit(0)
|
||||
|
||||
# Step 2: Translate all batches
|
||||
translated_files = translate_batches(batch_files, args.language, api_key, args.timeout)
|
||||
if translated_files is None:
|
||||
sys.exit(1)
|
||||
|
||||
# Step 3: Merge translations
|
||||
merged_file = merge_translations(translated_files, args.language)
|
||||
if merged_file is None:
|
||||
sys.exit(1)
|
||||
|
||||
# Step 4: Apply translations
|
||||
if not apply_translations(merged_file, args.language):
|
||||
sys.exit(1)
|
||||
|
||||
# Step 5: Beautify
|
||||
if not beautify_translations(args.language):
|
||||
sys.exit(1)
|
||||
|
||||
# Step 6: Cleanup
|
||||
if not args.no_cleanup:
|
||||
cleanup_temp_files(args.language)
|
||||
|
||||
# Step 7: Verify
|
||||
if not args.skip_verification:
|
||||
verify_completion(args.language)
|
||||
|
||||
elapsed = time.time() - start_time
|
||||
print("\n" + "="*60)
|
||||
print(f"✅ Translation pipeline completed successfully!")
|
||||
print(f"Time elapsed: {elapsed:.1f} seconds")
|
||||
print("="*60)
|
||||
|
||||
except KeyboardInterrupt:
|
||||
print("\n\n⚠ Translation interrupted by user")
|
||||
sys.exit(1)
|
||||
except Exception as e:
|
||||
print(f"\n\n✗ Error: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
321
scripts/translations/batch_translator.py
Normal file
321
scripts/translations/batch_translator.py
Normal file
@ -0,0 +1,321 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Batch Translation Script using OpenAI API
|
||||
Automatically translates JSON batch files to target language while preserving:
|
||||
- Placeholders: {n}, {total}, {filename}, {{variable}}
|
||||
- HTML tags: <strong>, </strong>, etc.
|
||||
- Technical terms: PDF, API, OAuth2, SAML2, JWT, etc.
|
||||
"""
|
||||
|
||||
import json
|
||||
import sys
|
||||
import argparse
|
||||
from pathlib import Path
|
||||
import time
|
||||
|
||||
try:
|
||||
from openai import OpenAI
|
||||
except ImportError:
|
||||
print("Error: openai package not installed. Install with: pip install openai")
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
class BatchTranslator:
|
||||
def __init__(self, api_key: str, model: str = "gpt-5"):
|
||||
"""Initialize translator with OpenAI API key."""
|
||||
self.client = OpenAI(api_key=api_key)
|
||||
self.model = model
|
||||
|
||||
def get_translation_prompt(self, language_name: str, language_code: str) -> str:
|
||||
"""Generate the system prompt for translation."""
|
||||
return f"""You are a professional translator for Stirling PDF, an open-source PDF manipulation tool.
|
||||
|
||||
Translate the following JSON from English to {language_name} ({language_code}) for the Stirling PDF user interface.
|
||||
|
||||
CRITICAL RULES - MUST FOLLOW EXACTLY:
|
||||
|
||||
1. PRESERVE ALL PLACEHOLDERS EXACTLY AS-IS:
|
||||
- Single braces: {{{{n}}}}, {{{{total}}}}, {{{{filename}}}}, {{{{count}}}}, {{{{date}}}}, {{{{planName}}}}, {{{{toolName}}}}, {{{{variable}}}}
|
||||
- Double braces: {{{{{{{{variable}}}}}}}}
|
||||
- Never translate, modify, or remove these - they are template variables
|
||||
|
||||
2. KEEP ALL HTML TAGS INTACT:
|
||||
- <strong>, </strong>, <br>, <code>, </code>, etc.
|
||||
- Do not translate tag names, only text between tags
|
||||
|
||||
3. DO NOT TRANSLATE TECHNICAL TERMS:
|
||||
- File formats: PDF, JSON, CSV, XML, HTML, ZIP, DOCX, XLSX, PNG, JPG
|
||||
- Protocols: API, OAuth2, SAML2, JWT, SMTP, HTTP, HTTPS, SSL, TLS
|
||||
- Technologies: Git, GitHub, Google, PostHog, Scarf, LibreOffice, Ghostscript, Tesseract, OCR
|
||||
- Technical keywords: URL, URI, DPI, RGB, CMYK, QR
|
||||
- "Stirling PDF" - always keep as-is
|
||||
|
||||
4. MAINTAIN CONSISTENT TERMINOLOGY:
|
||||
- Use the SAME translation for repeated terms throughout
|
||||
- Do not introduce new terminology or synonyms
|
||||
- Keep UI action words consistent (e.g., "upload", "download", "compress")
|
||||
|
||||
5. PRESERVE SPECIAL KEYWORDS IN CONTEXT:
|
||||
- Mathematical expressions: "2n", "2n-1", "3n" (in page selection)
|
||||
- Special keywords: "all", "odd", "even" (in page contexts)
|
||||
- Code examples and technical patterns
|
||||
|
||||
6. JSON STRUCTURE:
|
||||
- Translate ONLY the values (text after :), NEVER the keys
|
||||
- Return ONLY valid JSON with exact same structure
|
||||
- Maintain all quotes, commas, and braces
|
||||
|
||||
7. TONE & STYLE:
|
||||
- Use appropriate formal/informal tone for {language_name} UI
|
||||
- Keep translations concise and user-friendly
|
||||
- Maintain the professional but accessible tone of the original
|
||||
|
||||
8. DO NOT ADD OR REMOVE TEXT:
|
||||
- Do not add explanations, comments, or extra text
|
||||
- Do not remove any part of the original meaning
|
||||
- Keep the same level of detail
|
||||
|
||||
Return ONLY the translated JSON. No markdown, no explanations, just the JSON object."""
|
||||
|
||||
def translate_batch(self, batch_data: dict, target_language: str, language_code: str) -> dict:
|
||||
"""Translate a batch file using OpenAI API."""
|
||||
# Convert batch to compact JSON for API
|
||||
input_json = json.dumps(batch_data, ensure_ascii=False, separators=(',', ':'))
|
||||
|
||||
print(f"Translating {len(batch_data)} entries to {target_language}...")
|
||||
print(f"Input size: {len(input_json)} characters")
|
||||
|
||||
try:
|
||||
# GPT-5 only supports temperature=1, so we don't include it
|
||||
response = self.client.chat.completions.create(
|
||||
model=self.model,
|
||||
messages=[
|
||||
{
|
||||
"role": "system",
|
||||
"content": self.get_translation_prompt(target_language, language_code)
|
||||
},
|
||||
{
|
||||
"role": "user",
|
||||
"content": f"Translate this JSON:\n\n{input_json}"
|
||||
}
|
||||
],
|
||||
)
|
||||
|
||||
translated_text = response.choices[0].message.content.strip()
|
||||
|
||||
# Remove markdown code blocks if present
|
||||
if translated_text.startswith("```"):
|
||||
lines = translated_text.split('\n')
|
||||
translated_text = '\n'.join(lines[1:-1])
|
||||
|
||||
# Parse the translated JSON
|
||||
translated_data = json.loads(translated_text)
|
||||
|
||||
print(f"✓ Translation complete")
|
||||
return translated_data
|
||||
|
||||
except json.JSONDecodeError as e:
|
||||
print(f"Error: AI returned invalid JSON: {e}")
|
||||
print(f"Response: {translated_text[:500]}...")
|
||||
raise
|
||||
except Exception as e:
|
||||
print(f"Error during translation: {e}")
|
||||
raise
|
||||
|
||||
def validate_translation(self, original: dict, translated: dict) -> bool:
|
||||
"""Validate that translation preserved all placeholders and structure."""
|
||||
issues = []
|
||||
|
||||
# Check that all keys are present
|
||||
if set(original.keys()) != set(translated.keys()):
|
||||
missing = set(original.keys()) - set(translated.keys())
|
||||
extra = set(translated.keys()) - set(original.keys())
|
||||
if missing:
|
||||
issues.append(f"Missing keys: {missing}")
|
||||
if extra:
|
||||
issues.append(f"Extra keys: {extra}")
|
||||
|
||||
# Check placeholders in each value
|
||||
import re
|
||||
placeholder_pattern = r'\{[^}]+\}|\{\{[^}]+\}\}'
|
||||
|
||||
for key in original.keys():
|
||||
if key not in translated:
|
||||
continue
|
||||
|
||||
orig_value = str(original[key])
|
||||
trans_value = str(translated[key])
|
||||
|
||||
# Find all placeholders in original
|
||||
orig_placeholders = set(re.findall(placeholder_pattern, orig_value))
|
||||
trans_placeholders = set(re.findall(placeholder_pattern, trans_value))
|
||||
|
||||
if orig_placeholders != trans_placeholders:
|
||||
issues.append(f"Placeholder mismatch in '{key}': {orig_placeholders} vs {trans_placeholders}")
|
||||
|
||||
if issues:
|
||||
print("\n⚠ Validation warnings:")
|
||||
for issue in issues[:10]: # Show first 10 issues
|
||||
print(f" - {issue}")
|
||||
if len(issues) > 10:
|
||||
print(f" ... and {len(issues) - 10} more issues")
|
||||
return False
|
||||
|
||||
print("✓ Validation passed")
|
||||
return True
|
||||
|
||||
|
||||
def get_language_info(language_code: str) -> tuple:
|
||||
"""Get full language name from code."""
|
||||
languages = {
|
||||
'zh-CN': ('Simplified Chinese', 'zh-CN'),
|
||||
'es-ES': ('Spanish', 'es-ES'),
|
||||
'it-IT': ('Italian', 'it-IT'),
|
||||
'de-DE': ('German', 'de-DE'),
|
||||
'ar-AR': ('Arabic', 'ar-AR'),
|
||||
'pt-BR': ('Brazilian Portuguese', 'pt-BR'),
|
||||
'ru-RU': ('Russian', 'ru-RU'),
|
||||
'fr-FR': ('French', 'fr-FR'),
|
||||
'ja-JP': ('Japanese', 'ja-JP'),
|
||||
'ko-KR': ('Korean', 'ko-KR'),
|
||||
'nl-NL': ('Dutch', 'nl-NL'),
|
||||
'pl-PL': ('Polish', 'pl-PL'),
|
||||
'sv-SE': ('Swedish', 'sv-SE'),
|
||||
'da-DK': ('Danish', 'da-DK'),
|
||||
'no-NB': ('Norwegian', 'no-NB'),
|
||||
'fi-FI': ('Finnish', 'fi-FI'),
|
||||
'tr-TR': ('Turkish', 'tr-TR'),
|
||||
'vi-VN': ('Vietnamese', 'vi-VN'),
|
||||
'th-TH': ('Thai', 'th-TH'),
|
||||
'id-ID': ('Indonesian', 'id-ID'),
|
||||
'hi-IN': ('Hindi', 'hi-IN'),
|
||||
'cs-CZ': ('Czech', 'cs-CZ'),
|
||||
'hu-HU': ('Hungarian', 'hu-HU'),
|
||||
'ro-RO': ('Romanian', 'ro-RO'),
|
||||
'uk-UA': ('Ukrainian', 'uk-UA'),
|
||||
'el-GR': ('Greek', 'el-GR'),
|
||||
'bg-BG': ('Bulgarian', 'bg-BG'),
|
||||
'hr-HR': ('Croatian', 'hr-HR'),
|
||||
'sk-SK': ('Slovak', 'sk-SK'),
|
||||
'sl-SI': ('Slovenian', 'sl-SI'),
|
||||
'ca-CA': ('Catalan', 'ca-CA'),
|
||||
}
|
||||
|
||||
return languages.get(language_code, (language_code, language_code))
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description='Translate JSON batch files using OpenAI API',
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
Examples:
|
||||
# Translate single batch file
|
||||
python batch_translator.py zh_CN_batch_1_of_4.json --api-key YOUR_KEY --language zh-CN
|
||||
|
||||
# Translate all batches for a language (with pattern)
|
||||
python batch_translator.py "zh_CN_batch_*_of_*.json" --api-key YOUR_KEY --language zh-CN
|
||||
|
||||
# Use environment variable for API key
|
||||
export OPENAI_API_KEY=your_key_here
|
||||
python batch_translator.py zh_CN_batch_1_of_4.json --language zh-CN
|
||||
|
||||
# Use different model
|
||||
python batch_translator.py file.json --api-key KEY --language es-ES --model gpt-4-turbo
|
||||
"""
|
||||
)
|
||||
|
||||
parser.add_argument('input_files', nargs='+', help='Input batch JSON file(s) or pattern')
|
||||
parser.add_argument('--api-key', help='OpenAI API key (or set OPENAI_API_KEY env var)')
|
||||
parser.add_argument('--language', '-l', required=True, help='Target language code (e.g., zh-CN, es-ES)')
|
||||
parser.add_argument('--model', default='gpt-5', help='OpenAI model to use (default: gpt-5, options: gpt-5-mini, gpt-5-nano)')
|
||||
parser.add_argument('--output-suffix', default='_translated', help='Suffix for output files (default: _translated)')
|
||||
parser.add_argument('--skip-validation', action='store_true', help='Skip validation checks')
|
||||
parser.add_argument('--delay', type=float, default=1.0, help='Delay between API calls in seconds (default: 1.0)')
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# Get API key from args or environment
|
||||
import os
|
||||
api_key = args.api_key or os.environ.get('OPENAI_API_KEY')
|
||||
if not api_key:
|
||||
print("Error: OpenAI API key required. Provide via --api-key or OPENAI_API_KEY environment variable")
|
||||
sys.exit(1)
|
||||
|
||||
# Get language info
|
||||
language_name, language_code = get_language_info(args.language)
|
||||
|
||||
# Expand file patterns
|
||||
import glob
|
||||
input_files = []
|
||||
for pattern in args.input_files:
|
||||
matched = glob.glob(pattern)
|
||||
if matched:
|
||||
input_files.extend(matched)
|
||||
else:
|
||||
input_files.append(pattern) # Use as literal filename
|
||||
|
||||
if not input_files:
|
||||
print("Error: No input files found")
|
||||
sys.exit(1)
|
||||
|
||||
print(f"Batch Translator")
|
||||
print(f"Target Language: {language_name} ({language_code})")
|
||||
print(f"Model: {args.model}")
|
||||
print(f"Files to translate: {len(input_files)}")
|
||||
print("=" * 60)
|
||||
|
||||
# Initialize translator
|
||||
translator = BatchTranslator(api_key, args.model)
|
||||
|
||||
# Process each file
|
||||
successful = 0
|
||||
failed = 0
|
||||
|
||||
for i, input_file in enumerate(input_files, 1):
|
||||
print(f"\n[{i}/{len(input_files)}] Processing: {input_file}")
|
||||
|
||||
try:
|
||||
# Load input file
|
||||
with open(input_file, 'r', encoding='utf-8') as f:
|
||||
batch_data = json.load(f)
|
||||
|
||||
# Translate
|
||||
translated_data = translator.translate_batch(batch_data, language_name, language_code)
|
||||
|
||||
# Validate
|
||||
if not args.skip_validation:
|
||||
translator.validate_translation(batch_data, translated_data)
|
||||
|
||||
# Save output
|
||||
input_path = Path(input_file)
|
||||
output_file = input_path.stem + args.output_suffix + input_path.suffix
|
||||
|
||||
with open(output_file, 'w', encoding='utf-8') as f:
|
||||
json.dump(translated_data, f, ensure_ascii=False, separators=(',', ':'))
|
||||
|
||||
print(f"✓ Saved to: {output_file}")
|
||||
successful += 1
|
||||
|
||||
# Delay between API calls to avoid rate limits
|
||||
if i < len(input_files):
|
||||
time.sleep(args.delay)
|
||||
|
||||
except Exception as e:
|
||||
print(f"✗ Failed: {e}")
|
||||
failed += 1
|
||||
continue
|
||||
|
||||
# Summary
|
||||
print("\n" + "=" * 60)
|
||||
print(f"Translation complete!")
|
||||
print(f"Successful: {successful}/{len(input_files)}")
|
||||
if failed > 0:
|
||||
print(f"Failed: {failed}/{len(input_files)}")
|
||||
|
||||
sys.exit(0 if failed == 0 else 1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
import os
|
||||
main()
|
||||
Loading…
Reference in New Issue
Block a user