translations

This commit is contained in:
Anthony Stirling 2025-11-15 14:46:03 +00:00
parent 5c9e590856
commit c87e34ecf9
42 changed files with 90629 additions and 90865 deletions

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -2,6 +2,43 @@
This directory contains Python scripts for managing frontend translations in Stirling PDF. These tools help analyze, merge, validate, and manage translations against the en-GB golden truth file.
## Quick Start - Automated Translation (RECOMMENDED)
The **fastest and easiest way** to translate a language is using the automated pipeline:
```bash
# Set your OpenAI API key
export OPENAI_API_KEY=your_openai_api_key_here
# Translate a language automatically (extract → translate → merge → beautify → verify)
python3 scripts/translations/auto_translate.py es-ES
# With custom batch size (default: 500 entries per batch)
python3 scripts/translations/auto_translate.py es-ES --batch-size 600
# Keep temporary files for inspection
python3 scripts/translations/auto_translate.py es-ES --no-cleanup
```
**What it does:**
1. Extracts untranslated entries from the language file
2. Splits into batches (default 500 entries each)
3. Translates each batch using GPT-5 with specialized prompts
4. Validates placeholders are preserved
5. Merges translated batches
6. Applies translations to language file
7. Beautifies structure to match en-GB
8. Cleans up temporary files
9. Reports final completion percentage
**Time:** ~8-10 minutes per language with 1200+ untranslated entries
**Cost:** ~$2-4 per language using GPT-5 (or use `gpt-5-mini` for lower cost)
See [`auto_translate.py`](#auto_translatepy-automated-translation-pipeline) for full details.
---
## Scripts Overview
### 0. Validation Scripts (Run First!)
@ -191,7 +228,97 @@ python scripts/translations/compact_translator.py it-IT --output to_translate.js
- Batch size control for manageable chunks
- 50-80% fewer characters than other extraction methods
### 5. `json_beautifier.py`
### 5. `auto_translate.py` - Automated Translation Pipeline
**NEW: Fully automated translation workflow using GPT-5.**
Combines all translation steps into a single command that handles everything from extraction to verification.
**Usage:**
```bash
# Basic usage (requires OPENAI_API_KEY environment variable)
export OPENAI_API_KEY=your_api_key
python3 scripts/translations/auto_translate.py es-ES
# With inline API key
python3 scripts/translations/auto_translate.py es-ES --api-key YOUR_KEY
# Custom batch size (default: 500 entries)
python3 scripts/translations/auto_translate.py es-ES --batch-size 600
# Custom timeout per batch (default: 600 seconds / 10 minutes)
python3 scripts/translations/auto_translate.py es-ES --timeout 900
# Keep temporary files for debugging
python3 scripts/translations/auto_translate.py es-ES --no-cleanup
# Skip final verification
python3 scripts/translations/auto_translate.py es-ES --skip-verification
```
**Features:**
- Fully automated end-to-end translation pipeline
- Uses GPT-5 with specialized prompts for Stirling PDF
- Preserves all placeholders ({n}, {{variable}}, etc.)
- Maintains consistent terminology
- Validates translations automatically
- Creates backups before modifying files
- Reports detailed progress and final completion %
**Pipeline Steps:**
1. **Extract**: Finds all untranslated entries
2. **Split**: Divides into manageable batches (default: 500 entries)
3. **Translate**: Uses GPT-5 to translate each batch with specialized prompts
4. **Validate**: Ensures placeholders are preserved
5. **Merge**: Combines all translated batches
6. **Apply**: Updates the language file
7. **Beautify**: Restructures to match en-GB format
8. **Cleanup**: Removes temporary files
9. **Verify**: Reports final completion percentage
**Translation Quality:**
- Preserves ALL placeholders exactly as-is
- Keeps HTML tags intact (<strong>, <br>, etc.)
- Doesn't translate technical terms (PDF, API, OAuth2, etc.)
- Maintains consistent terminology throughout
- Uses appropriate formal/informal tone per language
**Supported Languages:**
All language codes from `frontend/public/locales/` (e.g., es-ES, de-DE, fr-FR, zh-CN, ar-AR, etc.)
### 6. `batch_translator.py` - GPT-5 Translation Engine
Low-level translation script used by `auto_translate.py`. Can be used standalone for manual batch translation.
**Usage:**
```bash
# Translate single batch file
python3 scripts/translations/batch_translator.py my_batch.json --language es-ES --api-key YOUR_KEY
# Translate multiple batches
python3 scripts/translations/batch_translator.py batch_*.json --language de-DE --api-key YOUR_KEY
# Use different GPT model
python3 scripts/translations/batch_translator.py batch.json --language fr-FR --model gpt-5-mini
# Skip validation
python3 scripts/translations/batch_translator.py batch.json --language it-IT --skip-validation
```
**Features:**
- Translates JSON batch files using OpenAI GPT-5
- Specialized system prompts for Stirling PDF translations
- Automatic placeholder validation
- Supports pattern matching for multiple files
- Configurable model selection (gpt-5, gpt-5-mini, gpt-5-nano)
- Rate limiting with configurable delays
**Models:**
- `gpt-5` (default): Best quality, $1.25/1M input, $10/1M output
- `gpt-5-mini`: Balanced quality/cost
- `gpt-5-nano`: Fastest, most economical
### 7. `json_beautifier.py`
Restructures and beautifies translation JSON files to match en-GB structure exactly.
**Usage:**

View File

@ -0,0 +1,324 @@
#!/usr/bin/env python3
"""
Automated Translation Pipeline
Extracts, translates, merges, and beautifies translations for a language.
"""
import json
import sys
import argparse
import os
import subprocess
from pathlib import Path
import time
def run_command(cmd, description=""):
"""Run a shell command and return success status."""
if description:
print(f"\n{'='*60}")
print(f"Step: {description}")
print(f"{'='*60}")
result = subprocess.run(cmd, shell=True, capture_output=True, text=True)
if result.stdout:
print(result.stdout)
if result.stderr:
print(result.stderr, file=sys.stderr)
return result.returncode == 0
def extract_untranslated(language_code, batch_size=500):
"""Extract untranslated entries and split into batches."""
print(f"\n🔍 Extracting untranslated entries for {language_code}...")
# Load files
golden_path = Path(f'frontend/public/locales/en-GB/translation.json')
lang_path = Path(f'frontend/public/locales/{language_code}/translation.json')
if not golden_path.exists():
print(f"Error: Golden truth file not found: {golden_path}")
return None
if not lang_path.exists():
print(f"Error: Language file not found: {lang_path}")
return None
def load_json(path):
with open(path, 'r', encoding='utf-8') as f:
return json.load(f)
def flatten_dict(d, parent_key='', separator='.'):
items = []
for k, v in d.items():
new_key = f"{parent_key}{separator}{k}" if parent_key else k
if isinstance(v, dict):
items.extend(flatten_dict(v, new_key, separator).items())
else:
items.append((new_key, str(v)))
return dict(items)
golden = load_json(golden_path)
lang_data = load_json(lang_path)
golden_flat = flatten_dict(golden)
lang_flat = flatten_dict(lang_data)
# Find untranslated
untranslated = {}
for key, value in golden_flat.items():
if (key not in lang_flat or
lang_flat.get(key) == value or
(isinstance(lang_flat.get(key), str) and lang_flat.get(key).startswith("[UNTRANSLATED]"))):
untranslated[key] = value
total = len(untranslated)
print(f"Found {total} untranslated entries")
if total == 0:
print("✓ Language is already complete!")
return []
# Split into batches
entries = list(untranslated.items())
num_batches = (total + batch_size - 1) // batch_size
batch_files = []
lang_code_safe = language_code.replace('-', '_')
for i in range(num_batches):
start = i * batch_size
end = min((i + 1) * batch_size, total)
batch = dict(entries[start:end])
filename = f'{lang_code_safe}_batch_{i+1}_of_{num_batches}.json'
with open(filename, 'w', encoding='utf-8') as f:
json.dump(batch, f, ensure_ascii=False, separators=(',', ':'))
batch_files.append(filename)
print(f" Created {filename} with {len(batch)} entries")
return batch_files
def translate_batches(batch_files, language_code, api_key, timeout=600):
"""Translate all batch files using GPT-5."""
if not batch_files:
return []
print(f"\n🤖 Translating {len(batch_files)} batches using GPT-5...")
print(f"Timeout: {timeout}s ({timeout//60} minutes) per batch")
translated_files = []
for i, batch_file in enumerate(batch_files, 1):
print(f"\n[{i}/{len(batch_files)}] Translating {batch_file}...")
# Always pass API key since it's required
cmd = f'python3 scripts/translations/batch_translator.py "{batch_file}" --language {language_code} --api-key "{api_key}"'
# Run with timeout
result = subprocess.run(cmd, shell=True, capture_output=True, text=True, timeout=timeout)
if result.stdout:
print(result.stdout)
if result.stderr:
print(result.stderr, file=sys.stderr)
if result.returncode != 0:
print(f"✗ Failed to translate {batch_file}")
return None
translated_file = batch_file.replace('.json', '_translated.json')
translated_files.append(translated_file)
# Small delay between batches
if i < len(batch_files):
time.sleep(1)
print(f"\n✓ All {len(batch_files)} batches translated successfully")
return translated_files
def merge_translations(translated_files, language_code):
"""Merge all translated batch files."""
if not translated_files:
return None
print(f"\n🔗 Merging {len(translated_files)} translated batches...")
merged = {}
for filename in translated_files:
if not Path(filename).exists():
print(f"Error: Translated file not found: {filename}")
return None
with open(filename, 'r', encoding='utf-8') as f:
merged.update(json.load(f))
lang_code_safe = language_code.replace('-', '_')
merged_file = f'{lang_code_safe}_merged.json'
with open(merged_file, 'w', encoding='utf-8') as f:
json.dump(merged, f, ensure_ascii=False, separators=(',', ':'))
print(f"✓ Merged {len(merged)} translations into {merged_file}")
return merged_file
def apply_translations(merged_file, language_code):
"""Apply merged translations to the language file."""
print(f"\n📝 Applying translations to {language_code}...")
cmd = f'python3 scripts/translations/translation_merger.py {language_code} apply-translations --translations-file {merged_file}'
if not run_command(cmd):
print(f"✗ Failed to apply translations")
return False
print(f"✓ Translations applied successfully")
return True
def beautify_translations(language_code):
"""Beautify translation file to match en-GB structure."""
print(f"\n✨ Beautifying {language_code} translation file...")
cmd = f'python3 scripts/translations/json_beautifier.py --language {language_code}'
if not run_command(cmd):
print(f"✗ Failed to beautify translations")
return False
print(f"✓ Translation file beautified")
return True
def cleanup_temp_files(language_code):
"""Remove temporary batch files."""
print(f"\n🧹 Cleaning up temporary files...")
lang_code_safe = language_code.replace('-', '_')
patterns = [
f'{lang_code_safe}_batch_*.json',
f'{lang_code_safe}_merged.json'
]
import glob
removed = 0
for pattern in patterns:
for file in glob.glob(pattern):
Path(file).unlink()
removed += 1
print(f"✓ Removed {removed} temporary files")
def verify_completion(language_code):
"""Check final completion percentage."""
print(f"\n📊 Verifying completion...")
cmd = f'python3 scripts/translations/translation_analyzer.py --language {language_code} --summary'
run_command(cmd)
def main():
parser = argparse.ArgumentParser(
description='Automated translation pipeline for Stirling PDF',
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
# Translate Spanish with API key in environment
export OPENAI_API_KEY=your_key_here
python3 scripts/translations/auto_translate.py es-ES
# Translate German with inline API key
python3 scripts/translations/auto_translate.py de-DE --api-key YOUR_KEY
# Translate Italian with custom batch size
python3 scripts/translations/auto_translate.py it-IT --batch-size 600
# Skip cleanup (keep temporary files for inspection)
python3 scripts/translations/auto_translate.py fr-FR --no-cleanup
"""
)
parser.add_argument('language', help='Language code (e.g., es-ES, de-DE, zh-CN)')
parser.add_argument('--api-key', help='OpenAI API key (or set OPENAI_API_KEY env var)')
parser.add_argument('--batch-size', type=int, default=500, help='Entries per batch (default: 500)')
parser.add_argument('--no-cleanup', action='store_true', help='Keep temporary batch files')
parser.add_argument('--skip-verification', action='store_true', help='Skip final completion check')
parser.add_argument('--timeout', type=int, default=600, help='Timeout per batch in seconds (default: 600 = 10 minutes)')
args = parser.parse_args()
# Verify API key
api_key = args.api_key or os.environ.get('OPENAI_API_KEY')
if not api_key:
print("Error: OpenAI API key required. Provide via --api-key or OPENAI_API_KEY environment variable")
sys.exit(1)
print("="*60)
print(f"Automated Translation Pipeline")
print(f"Language: {args.language}")
print(f"Batch Size: {args.batch_size} entries")
print("="*60)
start_time = time.time()
try:
# Step 1: Extract and split
batch_files = extract_untranslated(args.language, args.batch_size)
if batch_files is None:
sys.exit(1)
if len(batch_files) == 0:
print("\n✓ Nothing to translate!")
sys.exit(0)
# Step 2: Translate all batches
translated_files = translate_batches(batch_files, args.language, api_key, args.timeout)
if translated_files is None:
sys.exit(1)
# Step 3: Merge translations
merged_file = merge_translations(translated_files, args.language)
if merged_file is None:
sys.exit(1)
# Step 4: Apply translations
if not apply_translations(merged_file, args.language):
sys.exit(1)
# Step 5: Beautify
if not beautify_translations(args.language):
sys.exit(1)
# Step 6: Cleanup
if not args.no_cleanup:
cleanup_temp_files(args.language)
# Step 7: Verify
if not args.skip_verification:
verify_completion(args.language)
elapsed = time.time() - start_time
print("\n" + "="*60)
print(f"✅ Translation pipeline completed successfully!")
print(f"Time elapsed: {elapsed:.1f} seconds")
print("="*60)
except KeyboardInterrupt:
print("\n\n⚠ Translation interrupted by user")
sys.exit(1)
except Exception as e:
print(f"\n\n✗ Error: {e}")
import traceback
traceback.print_exc()
sys.exit(1)
if __name__ == "__main__":
main()

View File

@ -0,0 +1,321 @@
#!/usr/bin/env python3
"""
Batch Translation Script using OpenAI API
Automatically translates JSON batch files to target language while preserving:
- Placeholders: {n}, {total}, {filename}, {{variable}}
- HTML tags: <strong>, </strong>, etc.
- Technical terms: PDF, API, OAuth2, SAML2, JWT, etc.
"""
import json
import sys
import argparse
from pathlib import Path
import time
try:
from openai import OpenAI
except ImportError:
print("Error: openai package not installed. Install with: pip install openai")
sys.exit(1)
class BatchTranslator:
def __init__(self, api_key: str, model: str = "gpt-5"):
"""Initialize translator with OpenAI API key."""
self.client = OpenAI(api_key=api_key)
self.model = model
def get_translation_prompt(self, language_name: str, language_code: str) -> str:
"""Generate the system prompt for translation."""
return f"""You are a professional translator for Stirling PDF, an open-source PDF manipulation tool.
Translate the following JSON from English to {language_name} ({language_code}) for the Stirling PDF user interface.
CRITICAL RULES - MUST FOLLOW EXACTLY:
1. PRESERVE ALL PLACEHOLDERS EXACTLY AS-IS:
- Single braces: {{{{n}}}}, {{{{total}}}}, {{{{filename}}}}, {{{{count}}}}, {{{{date}}}}, {{{{planName}}}}, {{{{toolName}}}}, {{{{variable}}}}
- Double braces: {{{{{{{{variable}}}}}}}}
- Never translate, modify, or remove these - they are template variables
2. KEEP ALL HTML TAGS INTACT:
- <strong>, </strong>, <br>, <code>, </code>, etc.
- Do not translate tag names, only text between tags
3. DO NOT TRANSLATE TECHNICAL TERMS:
- File formats: PDF, JSON, CSV, XML, HTML, ZIP, DOCX, XLSX, PNG, JPG
- Protocols: API, OAuth2, SAML2, JWT, SMTP, HTTP, HTTPS, SSL, TLS
- Technologies: Git, GitHub, Google, PostHog, Scarf, LibreOffice, Ghostscript, Tesseract, OCR
- Technical keywords: URL, URI, DPI, RGB, CMYK, QR
- "Stirling PDF" - always keep as-is
4. MAINTAIN CONSISTENT TERMINOLOGY:
- Use the SAME translation for repeated terms throughout
- Do not introduce new terminology or synonyms
- Keep UI action words consistent (e.g., "upload", "download", "compress")
5. PRESERVE SPECIAL KEYWORDS IN CONTEXT:
- Mathematical expressions: "2n", "2n-1", "3n" (in page selection)
- Special keywords: "all", "odd", "even" (in page contexts)
- Code examples and technical patterns
6. JSON STRUCTURE:
- Translate ONLY the values (text after :), NEVER the keys
- Return ONLY valid JSON with exact same structure
- Maintain all quotes, commas, and braces
7. TONE & STYLE:
- Use appropriate formal/informal tone for {language_name} UI
- Keep translations concise and user-friendly
- Maintain the professional but accessible tone of the original
8. DO NOT ADD OR REMOVE TEXT:
- Do not add explanations, comments, or extra text
- Do not remove any part of the original meaning
- Keep the same level of detail
Return ONLY the translated JSON. No markdown, no explanations, just the JSON object."""
def translate_batch(self, batch_data: dict, target_language: str, language_code: str) -> dict:
"""Translate a batch file using OpenAI API."""
# Convert batch to compact JSON for API
input_json = json.dumps(batch_data, ensure_ascii=False, separators=(',', ':'))
print(f"Translating {len(batch_data)} entries to {target_language}...")
print(f"Input size: {len(input_json)} characters")
try:
# GPT-5 only supports temperature=1, so we don't include it
response = self.client.chat.completions.create(
model=self.model,
messages=[
{
"role": "system",
"content": self.get_translation_prompt(target_language, language_code)
},
{
"role": "user",
"content": f"Translate this JSON:\n\n{input_json}"
}
],
)
translated_text = response.choices[0].message.content.strip()
# Remove markdown code blocks if present
if translated_text.startswith("```"):
lines = translated_text.split('\n')
translated_text = '\n'.join(lines[1:-1])
# Parse the translated JSON
translated_data = json.loads(translated_text)
print(f"✓ Translation complete")
return translated_data
except json.JSONDecodeError as e:
print(f"Error: AI returned invalid JSON: {e}")
print(f"Response: {translated_text[:500]}...")
raise
except Exception as e:
print(f"Error during translation: {e}")
raise
def validate_translation(self, original: dict, translated: dict) -> bool:
"""Validate that translation preserved all placeholders and structure."""
issues = []
# Check that all keys are present
if set(original.keys()) != set(translated.keys()):
missing = set(original.keys()) - set(translated.keys())
extra = set(translated.keys()) - set(original.keys())
if missing:
issues.append(f"Missing keys: {missing}")
if extra:
issues.append(f"Extra keys: {extra}")
# Check placeholders in each value
import re
placeholder_pattern = r'\{[^}]+\}|\{\{[^}]+\}\}'
for key in original.keys():
if key not in translated:
continue
orig_value = str(original[key])
trans_value = str(translated[key])
# Find all placeholders in original
orig_placeholders = set(re.findall(placeholder_pattern, orig_value))
trans_placeholders = set(re.findall(placeholder_pattern, trans_value))
if orig_placeholders != trans_placeholders:
issues.append(f"Placeholder mismatch in '{key}': {orig_placeholders} vs {trans_placeholders}")
if issues:
print("\n⚠ Validation warnings:")
for issue in issues[:10]: # Show first 10 issues
print(f" - {issue}")
if len(issues) > 10:
print(f" ... and {len(issues) - 10} more issues")
return False
print("✓ Validation passed")
return True
def get_language_info(language_code: str) -> tuple:
"""Get full language name from code."""
languages = {
'zh-CN': ('Simplified Chinese', 'zh-CN'),
'es-ES': ('Spanish', 'es-ES'),
'it-IT': ('Italian', 'it-IT'),
'de-DE': ('German', 'de-DE'),
'ar-AR': ('Arabic', 'ar-AR'),
'pt-BR': ('Brazilian Portuguese', 'pt-BR'),
'ru-RU': ('Russian', 'ru-RU'),
'fr-FR': ('French', 'fr-FR'),
'ja-JP': ('Japanese', 'ja-JP'),
'ko-KR': ('Korean', 'ko-KR'),
'nl-NL': ('Dutch', 'nl-NL'),
'pl-PL': ('Polish', 'pl-PL'),
'sv-SE': ('Swedish', 'sv-SE'),
'da-DK': ('Danish', 'da-DK'),
'no-NB': ('Norwegian', 'no-NB'),
'fi-FI': ('Finnish', 'fi-FI'),
'tr-TR': ('Turkish', 'tr-TR'),
'vi-VN': ('Vietnamese', 'vi-VN'),
'th-TH': ('Thai', 'th-TH'),
'id-ID': ('Indonesian', 'id-ID'),
'hi-IN': ('Hindi', 'hi-IN'),
'cs-CZ': ('Czech', 'cs-CZ'),
'hu-HU': ('Hungarian', 'hu-HU'),
'ro-RO': ('Romanian', 'ro-RO'),
'uk-UA': ('Ukrainian', 'uk-UA'),
'el-GR': ('Greek', 'el-GR'),
'bg-BG': ('Bulgarian', 'bg-BG'),
'hr-HR': ('Croatian', 'hr-HR'),
'sk-SK': ('Slovak', 'sk-SK'),
'sl-SI': ('Slovenian', 'sl-SI'),
'ca-CA': ('Catalan', 'ca-CA'),
}
return languages.get(language_code, (language_code, language_code))
def main():
parser = argparse.ArgumentParser(
description='Translate JSON batch files using OpenAI API',
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
# Translate single batch file
python batch_translator.py zh_CN_batch_1_of_4.json --api-key YOUR_KEY --language zh-CN
# Translate all batches for a language (with pattern)
python batch_translator.py "zh_CN_batch_*_of_*.json" --api-key YOUR_KEY --language zh-CN
# Use environment variable for API key
export OPENAI_API_KEY=your_key_here
python batch_translator.py zh_CN_batch_1_of_4.json --language zh-CN
# Use different model
python batch_translator.py file.json --api-key KEY --language es-ES --model gpt-4-turbo
"""
)
parser.add_argument('input_files', nargs='+', help='Input batch JSON file(s) or pattern')
parser.add_argument('--api-key', help='OpenAI API key (or set OPENAI_API_KEY env var)')
parser.add_argument('--language', '-l', required=True, help='Target language code (e.g., zh-CN, es-ES)')
parser.add_argument('--model', default='gpt-5', help='OpenAI model to use (default: gpt-5, options: gpt-5-mini, gpt-5-nano)')
parser.add_argument('--output-suffix', default='_translated', help='Suffix for output files (default: _translated)')
parser.add_argument('--skip-validation', action='store_true', help='Skip validation checks')
parser.add_argument('--delay', type=float, default=1.0, help='Delay between API calls in seconds (default: 1.0)')
args = parser.parse_args()
# Get API key from args or environment
import os
api_key = args.api_key or os.environ.get('OPENAI_API_KEY')
if not api_key:
print("Error: OpenAI API key required. Provide via --api-key or OPENAI_API_KEY environment variable")
sys.exit(1)
# Get language info
language_name, language_code = get_language_info(args.language)
# Expand file patterns
import glob
input_files = []
for pattern in args.input_files:
matched = glob.glob(pattern)
if matched:
input_files.extend(matched)
else:
input_files.append(pattern) # Use as literal filename
if not input_files:
print("Error: No input files found")
sys.exit(1)
print(f"Batch Translator")
print(f"Target Language: {language_name} ({language_code})")
print(f"Model: {args.model}")
print(f"Files to translate: {len(input_files)}")
print("=" * 60)
# Initialize translator
translator = BatchTranslator(api_key, args.model)
# Process each file
successful = 0
failed = 0
for i, input_file in enumerate(input_files, 1):
print(f"\n[{i}/{len(input_files)}] Processing: {input_file}")
try:
# Load input file
with open(input_file, 'r', encoding='utf-8') as f:
batch_data = json.load(f)
# Translate
translated_data = translator.translate_batch(batch_data, language_name, language_code)
# Validate
if not args.skip_validation:
translator.validate_translation(batch_data, translated_data)
# Save output
input_path = Path(input_file)
output_file = input_path.stem + args.output_suffix + input_path.suffix
with open(output_file, 'w', encoding='utf-8') as f:
json.dump(translated_data, f, ensure_ascii=False, separators=(',', ':'))
print(f"✓ Saved to: {output_file}")
successful += 1
# Delay between API calls to avoid rate limits
if i < len(input_files):
time.sleep(args.delay)
except Exception as e:
print(f"✗ Failed: {e}")
failed += 1
continue
# Summary
print("\n" + "=" * 60)
print(f"Translation complete!")
print(f"Successful: {successful}/{len(input_files)}")
if failed > 0:
print(f"Failed: {failed}/{len(input_files)}")
sys.exit(0 if failed == 0 else 1)
if __name__ == "__main__":
import os
main()