fix(translations): improve translation merger CLI and sync missing UI strings across locales (#5309)

# Description of Changes

This pull request updates the Arabic translation file
(`frontend/public/locales/ar-AR/translation.toml`) with a large number
of new and improved strings, adding support for new features and
enhancing clarity and coverage across the application. Additionally, it
makes several improvements to the TOML language check script
(`.github/scripts/check_language_toml.py`) and updates the corresponding
GitHub Actions workflow to better track and validate translation
changes.

**Translation updates and enhancements:**

* Added translations for new features and UI elements, including
annotation tools, PDF/A-3b conversion, line art compression, background
removal, split modes, onboarding tours, and more.
[[1]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR343-R346)
[[2]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR442-R460)
[[3]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR514-R523)
[[4]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR739-R743)
[[5]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR1281-R1295)
[[6]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR1412-R1416)
[[7]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR2362-R2365)
[[8]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR2411-R2415)
[[9]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR2990)
[[10]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR3408-R3420)
[[11]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR3782-R3794)
[[12]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR3812-R3815)
[[13]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR3828-R3832)
[[14]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effL3974-R4157)
[[15]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR4208-R4221)
[[16]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR5247)
[[17]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR5414-R5423)
[[18]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR5444-R5447)
* Improved and expanded coverage for settings, security, onboarding, and
help menus, including detailed descriptions and tooltips for new and
existing features.
[[1]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR442-R460)
[[2]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR5247)
[[3]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR5414-R5423)
[[4]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR5444-R5447)

**TOML language check script improvements:**

* Increased the maximum allowed TOML file size from 500 KB to 570 KB to
accommodate larger translation files.
* Improved file validation logic to more accurately skip or process
files based on directory structure and file type, and added informative
print statements for skipped files.
* Enhanced reporting in the difference check: now, instead of raising
exceptions for unsafe files or oversized files, the script logs warnings
and continues processing, improving robustness and clarity in CI
reports.
* Adjusted the placement of file check report lines for clarity in the
generated report.

**Workflow and CI improvements:**

* Updated the GitHub Actions workflow
(`.github/workflows/check_toml.yml`) to trigger on changes to the
translation script and workflow files, in addition to translation TOMLs,
ensuring all relevant changes are validated.

These changes collectively improve the translation quality and coverage
for Arabic users, enhance the reliability and clarity of the translation
validation process, and ensure smoother CI/CD workflows for localization
updates.

<img width="654" height="133" alt="image"
src="https://github.com/user-attachments/assets/9f3e505d-927f-4dc0-9098-cee70bbe85ca"
/>


---

## Checklist

### General

- [ ] I have read the [Contribution
Guidelines](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/CONTRIBUTING.md)
- [ ] I have read the [Stirling-PDF Developer
Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md)
(if applicable)
- [ ] I have read the [How to add new languages to
Stirling-PDF](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md)
(if applicable)
- [ ] I have performed a self-review of my own code
- [ ] My changes generate no new warnings

### Documentation

- [ ] I have updated relevant docs on [Stirling-PDF's doc
repo](https://github.com/Stirling-Tools/Stirling-Tools.github.io/blob/main/docs/)
(if functionality has heavily changed)
- [ ] I have read the section [Add New Translation
Tags](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md#add-new-translation-tags)
(for new translation tags only)

### Translations (if applicable)

- [ ] I ran
[`scripts/counter_translation.py`](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/docs/counter_translation.md)

### UI Changes (if applicable)

- [ ] Screenshots or videos demonstrating the UI changes are attached
(e.g., as comments or direct attachments in the PR)

### Testing (if applicable)

- [ ] I have tested my changes locally. Refer to the [Testing
Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md#6-testing)
for more details.
This commit is contained in:
Ludy
2026-01-14 01:31:05 +01:00
committed by GitHub
parent db049a3467
commit 472ee54098
46 changed files with 16654 additions and 2060 deletions

View File

@@ -1,16 +1,16 @@
"""
A script to update language progress status in README.md based on
properties file comparison.
frontend locale TOML file comparisons.
This script compares the default (reference) properties file, usually
`messages_en_GB.properties`, with other translation files in the
`app/core/src/main/resources/` directory.
It determines how many lines are fully translated and automatically updates
This script compares the default (reference) TOML file,
`frontend/public/locales/en-GB/translation.toml`, with other translation
files in `frontend/public/locales/*/translation.toml`.
It determines how many keys are fully translated and automatically updates
progress badges in the `README.md`.
Additionally, it maintains a TOML configuration file
(`scripts/ignore_translation.toml`) that defines which keys are ignored
during comparison (e.g., static values like `language.direction`).
during comparison (e.g., values intentionally matching English).
Author: Ludy87
@@ -18,31 +18,31 @@ Usage:
Run this script directly from the project root.
# --- Compare all translation files and update README.md ---
$ python scripts/counter_translation.py
$ python scripts/counter_translation_v3.py
This will:
• Compare all files matching messages_*.properties
• Compare all files matching frontend/public/locales/*/translation.toml
• Update progress badges in README.md
• Update/format ignore_translation.toml automatically
# --- Check a single language file ---
$ python scripts/counter_translation.py --lang messages_fr_FR.properties
$ python scripts/counter_translation_v3.py --lang fr-FR
This will:
• Compare the French translation file against the English reference
• Print the translation percentage in the console
# --- Print ONLY the percentage (for CI pipelines or automation) ---
$ python scripts/counter_translation.py --lang messages_fr_FR.properties --show-percentage
$ python scripts/counter_translation_v3.py --lang fr-FR --show-percentage
Example output:
87
Arguments:
-l, --lang <file> Specific properties file to check
(relative or absolute path).
--show-percentage Print only the percentage (no formatting, ideal for CI/CD).
--show-missing-keys Show the list of missing keys when checking a single language file.
-l, --lang <locale or file> Specific locale to check (e.g. 'de-DE'),
a directory, or a full path to translation.toml.
--show-percentage Print only the percentage (no formatting, ideal for CI/CD).
--show-missing-keys Show the list of missing keys when checking a single language file.
"""
import argparse
@@ -50,10 +50,18 @@ import glob
import os
import re
import sys
from collections.abc import Mapping
from typing import Iterable
import tomlkit
import tomlkit.toml_file
# Ensure tomlkit is installed before importing
try:
import tomlkit
except ImportError:
raise ImportError(
"The 'tomlkit' library is not installed. Please install it using 'pip install tomlkit'."
)
sys.stdout.reconfigure(encoding="utf-8", errors="replace")
def convert_to_multiline(data: tomlkit.TOMLDocument) -> tomlkit.TOMLDocument:
@@ -102,7 +110,10 @@ def write_readme(progress_list: list[tuple[str, int]]) -> None:
Returns:
None
"""
with open("README.md", encoding="utf-8") as file:
with open(
os.path.join(os.getcwd(), "devGuide", "HowToAddNewLanguage.md"),
encoding="utf-8",
) as file:
content = file.readlines()
for i, line in enumerate(content[2:], start=2):
@@ -115,56 +126,62 @@ def write_readme(progress_list: list[tuple[str, int]]) -> None:
f"![{value}%](https://geps.dev/progress/{value})",
)
with open("README.md", "w", encoding="utf-8", newline="\n") as file:
with open(
os.path.join(os.getcwd(), "devGuide", "HowToAddNewLanguage.md"),
"w",
encoding="utf-8",
newline="\n",
) as file:
file.writelines(content)
def load_reference_keys(default_file_path: str) -> set[str]:
"""Reads all keys from the reference properties file (excluding comments and empty lines).
This function skips the first 5 lines (assumed to be headers or metadata) and then
extracts keys from lines containing '=' separators, ignoring comments (#) and empty lines.
It also handles potential BOM (Byte Order Mark) characters.
def _flatten_toml(data: Mapping[str, object], prefix: str = "") -> dict[str, object]:
"""Flattens a TOML document into dotted keys for comparison.
Args:
default_file_path (str): The path to the default (reference) properties file.
data (Mapping[str, object]): TOML content loaded into a mapping.
prefix (str): Prefix for nested keys.
Returns:
set[str]: A set of unique keys found in the reference file.
dict[str, object]: Flattened key/value mapping.
"""
keys: set[str] = set()
with open(default_file_path, encoding="utf-8") as f:
# Skip the first 5 lines (headers)
for _ in range(5):
try:
next(f)
except StopIteration:
break
flattened: dict[str, object] = {}
for key, value in data.items():
combined_key = f"{prefix}{key}"
if isinstance(value, Mapping):
flattened.update(_flatten_toml(value, f"{combined_key}."))
else:
flattened[combined_key] = value
return flattened
for line in f:
s = line.strip()
if not s or s.startswith("#") or "=" not in s:
continue
k, _ = s.split("=", 1)
keys.add(k.strip().replace("\ufeff", "")) # BOM protection
return keys
def load_translation_entries(file_path: str) -> dict[str, object]:
"""Reads and flattens translation entries from a TOML file.
Args:
file_path (str): Path to translation.toml.
Returns:
dict[str, object]: Flattened key/value entries.
"""
with open(file_path, encoding="utf-8") as f:
document = tomlkit.parse(f.read())
return _flatten_toml(document)
def _lang_from_path(file_path: str) -> str:
"""Extracts the language code from a properties file path.
"""Extracts the language code from a locale TOML file path.
Assumes the filename format is 'messages_<language>.properties', where <language>
is the code like 'fr_FR'.
Assumes the filename format is '<locale>/translation.toml', where <locale>
is the code like 'fr-FR'.
Args:
file_path (str): The full path to the properties file.
file_path (str): The full path to the TOML translation file.
Returns:
str: The extracted language code.
"""
return (
os.path.basename(file_path).split("messages_", 1)[1].split(".properties", 1)[0]
)
return os.path.basename(os.path.dirname(file_path))
def compare_files(
@@ -174,16 +191,16 @@ def compare_files(
show_missing_keys: bool = False,
show_percentage: bool = False,
) -> list[tuple[str, int]]:
"""Compares the default properties file with other properties files in the directory.
"""Compares the default TOML file with other locale TOML files in the directory.
This function calculates translation progress for each language file by comparing
keys and values line-by-line, skipping headers. It accounts for ignored keys defined
in a TOML configuration file and updates that file with cleaned ignore lists.
English variants (en_GB, en_US) are hardcoded to 100% progress.
keys and values. It accounts for ignored keys defined in a TOML configuration file
and updates that file with cleaned ignore lists. English variants (en-GB, en-US)
are hardcoded to 100% progress.
Args:
default_file_path (str): The path to the default properties file (reference).
file_paths (Iterable[str]): Iterable of paths to properties files to compare.
default_file_path (str): The path to the default TOML file (reference).
file_paths (Iterable[str]): Iterable of paths to TOML files to compare.
ignore_translation_file (str): Path to the TOML file with ignore/missing configurations per language.
show_missing_keys (bool, optional): If True, prints the list of missing keys for each file. Defaults to False.
show_percentage (bool, optional): If True, suppresses detailed output and focuses on percentage calculation. Defaults to False.
@@ -192,14 +209,9 @@ def compare_files(
list[tuple[str, int]]: A sorted list of tuples containing language codes and progress percentages
(descending order by percentage). Duplicates are removed.
"""
# Count total translatable lines in reference (excluding empty and comments)
num_lines = sum(
1
for line in open(default_file_path, encoding="utf-8")
if line.strip() and not line.strip().startswith("#")
)
ref_keys: set[str] = load_reference_keys(default_file_path)
reference_entries = load_translation_entries(default_file_path)
ref_keys = set(reference_entries.keys())
num_lines = len(ref_keys)
result_list: list[tuple[str, int]] = []
sort_ignore_translation: tomlkit.TOMLDocument
@@ -215,10 +227,12 @@ def compare_files(
language = _lang_from_path(file_path)
# Hardcode English variants to 100%
if "en_GB" in language or "en_US" in language:
if language in {"en-GB", "en-US"}:
result_list.append((language, 100))
continue
language = language.replace("-", "_")
# Initialize language table in TOML if missing
if language not in sort_ignore_translation:
sort_ignore_translation[language] = tomlkit.table()
@@ -239,58 +253,30 @@ def compare_files(
if key in ref_keys or key == "language.direction"
]
translation_entries = load_translation_entries(file_path)
fails = 0
missing_str_keys: list[str] = []
with (
open(default_file_path, encoding="utf-8") as default_file,
open(file_path, encoding="utf-8") as file,
):
# Skip headers (first 5 lines) in both files
for _ in range(5):
next(default_file)
try:
next(file)
except StopIteration:
fails = num_lines
break
for line_num, (line_default, line_file) in enumerate(
zip(default_file, file), start=6
for default_key, default_value in reference_entries.items():
if default_key not in translation_entries:
fails += 1
missing_str_keys.append(default_key)
continue
file_value = translation_entries[default_key]
if (
default_value == file_value
and default_key not in sort_ignore_translation[language]["ignore"]
):
try:
# Ignoring empty lines and lines starting with #
if line_default.strip() == "" or line_default.startswith("#"):
# Missing translation (same as default and not ignored)
fails += 1
missing_str_keys.append(default_key)
if default_value != file_value:
if default_key in sort_ignore_translation[language]["ignore"]:
if default_key == "language.direction":
continue
default_key, default_value = line_default.split("=", 1)
file_key, file_value = line_file.split("=", 1)
default_key = default_key.strip()
default_value = default_value.strip()
file_key = file_key.strip()
file_value = file_value.strip()
if (
default_value == file_value
and default_key
not in sort_ignore_translation[language]["ignore"]
):
# Missing translation (same as default and not ignored)
fails += 1
missing_str_keys.append(default_key)
if default_value != file_value:
if default_key in sort_ignore_translation[language]["ignore"]:
# Remove from ignore if actually translated
sort_ignore_translation[language]["ignore"].remove(
default_key
)
except ValueError as e:
print(f"Error processing line {line_num} in {file_path}: {e}")
print(f"{line_default}|{line_file}")
sys.exit(1)
except IndexError:
# Handle mismatched line counts
fails += 1
continue
# Remove from ignore if actually translated
sort_ignore_translation[language]["ignore"].remove(default_key)
if show_missing_keys:
if len(missing_str_keys) > 0:
@@ -327,19 +313,19 @@ def main() -> None:
(with optional percentage output) or all files and updates the README.md.
Command-line options:
--lang, -l <file>: Specific properties file to check (e.g., 'messages_fr_FR.properties').
--lang, -l <file>: Specific locale to check, e.g. 'fr-FR'
--show-percentage: Print only the translation percentage for --lang and exit.
--show-missing-keys: Show the list of missing keys when checking a single language file.
"""
parser = argparse.ArgumentParser(
description="Compare i18n property files and optionally update README badges."
description="Compare frontend i18n TOML files and optionally update README badges."
)
parser.add_argument(
"--lang",
"-l",
help=(
"Specific properties file to check, e.g. 'messages_fr_FR.properties'. "
"If a relative filename is given, it is resolved against the resources directory."
"Specific locale to check, e.g. 'fr-FR'. "
"If a relative filename is given, it is resolved against the locales directory."
),
)
parser.add_argument(
@@ -359,8 +345,8 @@ def main() -> None:
# Project layout assumptions
cwd = os.getcwd()
resources_dir = os.path.join(cwd, "app", "core", "src", "main", "resources")
reference_file = os.path.join(resources_dir, "messages_en_GB.properties")
locales_dir = os.path.join(cwd, "frontend", "public", "locales")
reference_file = os.path.join(locales_dir, "en-GB", "translation.toml")
scripts_directory = os.path.join(cwd, "scripts")
translation_state_file = os.path.join(scripts_directory, "ignore_translation.toml")
@@ -370,7 +356,19 @@ def main() -> None:
if os.path.isabs(lang_input) or os.path.exists(lang_input):
lang_file = lang_input
else:
lang_file = os.path.join(resources_dir, lang_input)
candidate = os.path.join(locales_dir, lang_input)
candidate_with_file = os.path.join(
locales_dir, lang_input, "translation.toml"
)
if os.path.exists(candidate):
if os.path.isdir(candidate):
lang_file = candidate_with_file
else:
lang_file = candidate
elif os.path.exists(candidate_with_file):
lang_file = candidate_with_file
else:
lang_file = lang_input
if not os.path.exists(lang_file):
print(f"ERROR: Could not find language file: {lang_file}")
@@ -384,7 +382,7 @@ def main() -> None:
args.show_percentage,
)
# Find the exact tuple for the requested language
wanted_key = _lang_from_path(lang_file)
wanted_key = _lang_from_path(lang_file).replace("-", "_")
for lang, pct in results:
if lang == wanted_key:
if args.show_percentage:
@@ -400,13 +398,11 @@ def main() -> None:
sys.exit(3)
# Default behavior (no --lang): process all and update README
messages_file_paths = glob.glob(
os.path.join(resources_dir, "messages_*.properties")
)
messages_file_paths = glob.glob(os.path.join(locales_dir, "*", "translation.toml"))
progress = compare_files(
reference_file, messages_file_paths, translation_state_file
)
write_readme(progress)
# write_readme(progress)
if __name__ == "__main__":

File diff suppressed because it is too large Load Diff

View File

@@ -171,9 +171,15 @@ Merges missing translations from en-GB into target language files and manages tr
**Usage:**
```bash
# Operate on all locales (except en-GB) when language is omitted
python scripts/translations/translation_merger.py add-missing
# Add missing translations from en-GB to French
python scripts/translations/translation_merger.py fr-FR add-missing
# Create backups before modifying files
python scripts/translations/translation_merger.py fr-FR add-missing --backup
# Extract untranslated entries to a file
python scripts/translations/translation_merger.py fr-FR extract-untranslated --output fr_untranslated.json
@@ -183,15 +189,20 @@ python scripts/translations/translation_merger.py fr-FR create-template --output
# Apply translations from a file
python scripts/translations/translation_merger.py fr-FR apply-translations --translations-file fr_translated.json
# Override default paths if needed
python scripts/translations/translation_merger.py fr-FR add-missing --locales-dir ./frontend/public/locales --ignore-file ./scripts/ignore_translation.toml
# Remove unused translations not present in en-GB
python scripts/translations/translation_merger.py fr-FR remove-unused
```
**Features:**
- Adds missing keys from en-GB (copies English text directly)
- Runs across all locales for add-missing/remove-unused when language is omitted
- Extracts untranslated entries for external translation
- Creates structured templates for AI translation
- Applies translated content back to language files
- Applies translated content back to language files (template format or plain JSON)
- Supports `--backup` on mutating commands
- Automatic backup creation
- Removes unused translations not present in en-GB

View File

@@ -6,13 +6,14 @@ Useful for AI-assisted translation workflows.
TOML format only.
"""
import json
import sys
from pathlib import Path
from typing import Dict, List, Set, Any
import os
import argparse
import json
import shutil
import sys
from datetime import datetime
from pathlib import Path
from typing import Any
import tomllib
import tomli_w
@@ -21,8 +22,10 @@ import tomli_w
class TranslationMerger:
def __init__(
self,
locales_dir: str = "frontend/public/locales",
ignore_file: str = "scripts/ignore_translation.toml",
locales_dir: str = os.path.join(os.getcwd(), "frontend", "public", "locales"),
ignore_file: str = os.path.join(
os.getcwd(), "scripts", "ignore_translation.toml"
),
):
self.locales_dir = Path(locales_dir)
self.golden_truth_file = self.locales_dir / "en-GB" / "translation.toml"
@@ -30,7 +33,7 @@ class TranslationMerger:
self.ignore_file = Path(ignore_file)
self.ignore_patterns = self._load_ignore_patterns()
def _load_translation_file(self, file_path: Path) -> Dict:
def _load_translation_file(self, file_path: Path) -> dict[str, Any]:
"""Load TOML translation file."""
try:
with open(file_path, "rb") as f:
@@ -43,7 +46,7 @@ class TranslationMerger:
sys.exit(1)
def _save_translation_file(
self, data: Dict, file_path: Path, backup: bool = False
self, data: dict[str, Any], file_path: Path, backup: bool = False
) -> None:
"""Save TOML translation file with backup option."""
if backup and file_path.exists():
@@ -56,7 +59,7 @@ class TranslationMerger:
with open(file_path, "wb") as f:
tomli_w.dump(data, f)
def _load_ignore_patterns(self) -> Dict[str, Set[str]]:
def _load_ignore_patterns(self) -> dict[str, set[str]]:
"""Load ignore patterns from TOML file."""
if not self.ignore_file.exists():
return {}
@@ -73,7 +76,7 @@ class TranslationMerger:
print(f"Warning: Could not load ignore file {self.ignore_file}: {e}")
return {}
def _get_nested_value(self, data: Dict, key_path: str) -> Any:
def _get_nested_value(self, data: dict[str, Any], key_path: str) -> Any:
"""Get value from nested dict using dot notation."""
keys = key_path.split(".")
current = data
@@ -84,7 +87,9 @@ class TranslationMerger:
return None
return current
def _set_nested_value(self, data: Dict, key_path: str, value: Any) -> None:
def _set_nested_value(
self, data: dict[str, Any], key_path: str, value: Any
) -> None:
"""Set value in nested dict using dot notation."""
keys = key_path.split(".")
current = data
@@ -102,8 +107,8 @@ class TranslationMerger:
current[keys[-1]] = value
def _flatten_dict(
self, d: Dict, parent_key: str = "", separator: str = "."
) -> Dict[str, Any]:
self, d: dict[str, Any], parent_key: str = "", separator: str = "."
) -> dict[str, Any]:
"""Flatten nested dictionary into dot-notation keys."""
items = []
for k, v in d.items():
@@ -114,10 +119,10 @@ class TranslationMerger:
items.append((new_key, v))
return dict(items)
def _delete_nested_key(self, data: Dict, key_path: str) -> bool:
def _delete_nested_key(self, data: dict[str, Any], key_path: str) -> bool:
"""Delete a nested key using dot notation and clean up empty branches."""
def _delete(current: Dict, keys: List[str]) -> bool:
def _delete(current: dict[str, Any], keys: list[str]) -> bool:
key = keys[0]
if key not in current:
@@ -137,7 +142,7 @@ class TranslationMerger:
return _delete(data, key_path.split("."))
def get_missing_keys(self, target_file: Path) -> List[str]:
def get_missing_keys(self, target_file: Path) -> list[str]:
"""Get list of missing keys in target file."""
lang_code = target_file.parent.name.replace("-", "_")
ignore_set = self.ignore_patterns.get(lang_code, set())
@@ -153,7 +158,7 @@ class TranslationMerger:
missing = set(golden_flat.keys()) - set(target_flat.keys())
return sorted(missing - ignore_set)
def get_unused_keys(self, target_file: Path) -> List[str]:
def get_unused_keys(self, target_file: Path) -> list[str]:
"""Get list of keys that are not present in the golden truth file."""
if not target_file.exists():
return []
@@ -165,13 +170,20 @@ class TranslationMerger:
return sorted(set(target_flat.keys()) - set(golden_flat.keys()))
def add_missing_translations(
self, target_file: Path, keys_to_add: List[str] = None
) -> Dict:
"""Add missing translations from en-GB to target file."""
if not target_file.exists():
self,
target_file: Path,
keys_to_add: list[str] | None = None,
save: bool = True,
backup: bool = False,
) -> dict[str, Any]:
"""Add missing translations from en-GB to target file and optionally save."""
if not target_file.parent.exists():
target_file.parent.mkdir(parents=True, exist_ok=True)
target_data = {}
else:
elif target_file.exists():
target_data = self._load_translation_file(target_file)
else:
target_data = {}
golden_flat = self._flatten_dict(self.golden_truth)
missing_keys = keys_to_add or self.get_missing_keys(target_file)
@@ -184,6 +196,9 @@ class TranslationMerger:
self._set_nested_value(target_data, key, value)
added_count += 1
if added_count > 0 and save:
self._save_translation_file(target_data, target_file, backup)
return {
"added_count": added_count,
"missing_keys": missing_keys,
@@ -191,8 +206,8 @@ class TranslationMerger:
}
def extract_untranslated_entries(
self, target_file: Path, output_file: Path = None
) -> Dict:
self, target_file: Path, output_file: Path | None = None
) -> dict[str, Any]:
"""Extract entries marked as untranslated or identical to en-GB for AI translation."""
if not target_file.exists():
print(f"Error: Target file does not exist: {target_file}")
@@ -233,9 +248,7 @@ class TranslationMerger:
def _is_expected_identical(self, key: str, value: str) -> bool:
"""Check if a key-value pair is expected to be identical across languages."""
identical_patterns = [
"language.direction",
]
identical_patterns = ["language.direction"]
if str(value).strip() in ["ltr", "rtl", "True", "False", "true", "false"]:
return True
@@ -247,8 +260,11 @@ class TranslationMerger:
return False
def apply_translations(
self, target_file: Path, translations: Dict[str, str], backup: bool = False
) -> Dict:
self,
target_file: Path,
translations: dict[str, str],
backup: bool = False,
) -> dict[str, Any]:
"""Apply provided translations to target file."""
if not target_file.exists():
print(f"Error: Target file does not exist: {target_file}")
@@ -261,7 +277,9 @@ class TranslationMerger:
for key, translation in translations.items():
try:
# Remove [UNTRANSLATED] marker if present
if translation.startswith("[UNTRANSLATED]"):
if isinstance(translation, str) and translation.startswith(
"[UNTRANSLATED]"
):
translation = translation.replace("[UNTRANSLATED]", "").strip()
self._set_nested_value(target_data, key, translation)
@@ -273,15 +291,19 @@ class TranslationMerger:
self._save_translation_file(target_data, target_file, backup)
return {
"success": True,
"success": applied_count > 0,
"applied_count": applied_count,
"errors": errors,
"data": target_data,
}
def remove_unused_translations(
self, target_file: Path, keys_to_remove: List[str] = None, backup: bool = False
) -> Dict:
self,
target_file: Path,
keys_to_remove: list[str] | None = None,
save: bool = True,
backup: bool = False,
) -> dict[str, Any]:
"""Remove translations that are not present in the golden truth file."""
if not target_file.exists():
print(f"Error: Target file does not exist: {target_file}")
@@ -296,11 +318,11 @@ class TranslationMerger:
if self._delete_nested_key(target_data, key):
removed_count += 1
if removed_count > 0:
if removed_count > 0 and save:
self._save_translation_file(target_data, target_file, backup)
return {
"success": True,
"success": removed_count > 0,
"removed_count": removed_count,
"data": target_data,
}
@@ -349,15 +371,19 @@ def main():
)
parser.add_argument(
"--locales-dir",
default="frontend/public/locales",
default=os.path.join(os.getcwd(), "frontend", "public", "locales"),
help="Path to locales directory",
)
parser.add_argument(
"--ignore-file",
default="scripts/ignore_translation.toml",
default=os.path.join(os.getcwd(), "scripts", "ignore_translation.toml"),
help="Path to ignore patterns TOML file",
)
parser.add_argument("language", help="Target language code (e.g., fr-FR)")
parser.add_argument(
"language",
nargs="?",
help="Target language code (e.g., fr-FR). If omitted, add-missing and remove-unused run for all locales except en-GB.",
)
subparsers = parser.add_subparsers(dest="command", help="Available commands")
@@ -410,18 +436,57 @@ def main():
merger = TranslationMerger(args.locales_dir, args.ignore_file)
# Find translation file
lang_dir = Path(args.locales_dir) / args.language
target_file = lang_dir / "translation.toml"
if args.command == "add-missing":
print(f"Adding missing translations to {args.language}...")
result = merger.add_missing_translations(target_file)
if args.language:
# Find translation file
lang_dir = Path(args.locales_dir) / args.language
target_file = lang_dir / "translation.toml"
print(f"Processing {args.language}...")
result = merger.add_missing_translations(target_file, backup=args.backup)
print(f"Added {result['added_count']} missing translations")
else:
total_added = 0
for lang_dir in sorted(Path(args.locales_dir).iterdir()):
if not lang_dir.is_dir() or lang_dir.name == "en-GB":
continue
target_file = lang_dir / "translation.toml"
print(f"Processing {lang_dir.name}...")
result = merger.add_missing_translations(
target_file, backup=args.backup
)
added = result["added_count"]
total_added += added
print(f"Added {added} missing translations")
print(f"\nTotal added across all languages: {total_added}")
merger._save_translation_file(result["data"], target_file, backup=args.backup)
print(f"Added {result['added_count']} missing translations")
elif args.command == "remove-unused":
if args.language:
lang_dir = Path(args.locales_dir) / args.language
target_file = lang_dir / "translation.toml"
print(f"Processing {args.language}...")
result = merger.remove_unused_translations(target_file, backup=args.backup)
print(f"Removed {result['removed_count']} unused translations")
else:
total_removed = 0
for lang_dir in sorted(Path(args.locales_dir).iterdir()):
if not lang_dir.is_dir() or lang_dir.name == "en-GB":
continue
target_file = lang_dir / "translation.toml"
print(f"Processing {lang_dir.name}...")
result = merger.remove_unused_translations(
target_file, backup=args.backup
)
removed = result["removed_count"]
total_removed += removed
print(f"Removed {removed} unused translations")
print(f"\nTotal removed across all languages: {total_removed}")
elif args.command == "extract-untranslated":
if not args.language:
print("Error: language is required for extract-untranslated")
sys.exit(1)
lang_dir = Path(args.locales_dir) / args.language
target_file = lang_dir / "translation.toml"
output_file = (
Path(args.output)
if args.output
@@ -431,10 +496,20 @@ def main():
print(f"Extracted {len(untranslated)} untranslated entries to {output_file}")
elif args.command == "create-template":
output_file = Path(args.output)
merger.create_translation_template(target_file, output_file)
if not args.language:
print("Error: language is required for create-template")
sys.exit(1)
lang_dir = Path(args.locales_dir) / args.language
target_file = lang_dir / "translation.toml"
merger.create_translation_template(target_file, Path(args.output))
elif args.command == "apply-translations":
if not args.language:
print("Error: language is required for apply-translations")
sys.exit(1)
lang_dir = Path(args.locales_dir) / args.language
target_file = lang_dir / "translation.toml"
with open(args.translations_file, "r", encoding="utf-8") as f:
translations_data = json.load(f)
@@ -455,20 +530,11 @@ def main():
if result["success"]:
print(f"Applied {result['applied_count']} translations")
if result["errors"]:
print(f"Errors: {len(result['errors'])}")
print(f"Errors encountered: {len(result['errors'])}")
for error in result["errors"][:5]:
print(f" - {error}")
else:
print(f"Failed: {result.get('error', 'Unknown error')}")
elif args.command == "remove-unused":
print(f"Removing unused translations from {args.language}...")
result = merger.remove_unused_translations(target_file, backup=args.backup)
if result["success"]:
print(f"Removed {result['removed_count']} unused translations")
else:
print(f"Failed: {result.get('error', 'Unknown error')}")
print("No translations applied.")
if __name__ == "__main__":