mirror of
https://github.com/Frooodle/Stirling-PDF.git
synced 2026-04-22 23:08:53 +02:00
fix(translations): improve translation merger CLI and sync missing UI strings across locales (#5309)
# Description of Changes This pull request updates the Arabic translation file (`frontend/public/locales/ar-AR/translation.toml`) with a large number of new and improved strings, adding support for new features and enhancing clarity and coverage across the application. Additionally, it makes several improvements to the TOML language check script (`.github/scripts/check_language_toml.py`) and updates the corresponding GitHub Actions workflow to better track and validate translation changes. **Translation updates and enhancements:** * Added translations for new features and UI elements, including annotation tools, PDF/A-3b conversion, line art compression, background removal, split modes, onboarding tours, and more. [[1]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR343-R346) [[2]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR442-R460) [[3]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR514-R523) [[4]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR739-R743) [[5]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR1281-R1295) [[6]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR1412-R1416) [[7]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR2362-R2365) [[8]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR2411-R2415) [[9]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR2990) [[10]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR3408-R3420) [[11]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR3782-R3794) [[12]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR3812-R3815) [[13]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR3828-R3832) [[14]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effL3974-R4157) [[15]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR4208-R4221) [[16]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR5247) [[17]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR5414-R5423) [[18]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR5444-R5447) * Improved and expanded coverage for settings, security, onboarding, and help menus, including detailed descriptions and tooltips for new and existing features. [[1]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR442-R460) [[2]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR5247) [[3]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR5414-R5423) [[4]](diffhunk://#diff-460d5f61a7649a5b149373af2e52a8a87d9a1964cf54240a78ad4747e7233effR5444-R5447) **TOML language check script improvements:** * Increased the maximum allowed TOML file size from 500 KB to 570 KB to accommodate larger translation files. * Improved file validation logic to more accurately skip or process files based on directory structure and file type, and added informative print statements for skipped files. * Enhanced reporting in the difference check: now, instead of raising exceptions for unsafe files or oversized files, the script logs warnings and continues processing, improving robustness and clarity in CI reports. * Adjusted the placement of file check report lines for clarity in the generated report. **Workflow and CI improvements:** * Updated the GitHub Actions workflow (`.github/workflows/check_toml.yml`) to trigger on changes to the translation script and workflow files, in addition to translation TOMLs, ensuring all relevant changes are validated. These changes collectively improve the translation quality and coverage for Arabic users, enhance the reliability and clarity of the translation validation process, and ensure smoother CI/CD workflows for localization updates. <img width="654" height="133" alt="image" src="https://github.com/user-attachments/assets/9f3e505d-927f-4dc0-9098-cee70bbe85ca" /> --- ## Checklist ### General - [ ] I have read the [Contribution Guidelines](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/CONTRIBUTING.md) - [ ] I have read the [Stirling-PDF Developer Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md) (if applicable) - [ ] I have read the [How to add new languages to Stirling-PDF](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md) (if applicable) - [ ] I have performed a self-review of my own code - [ ] My changes generate no new warnings ### Documentation - [ ] I have updated relevant docs on [Stirling-PDF's doc repo](https://github.com/Stirling-Tools/Stirling-Tools.github.io/blob/main/docs/) (if functionality has heavily changed) - [ ] I have read the section [Add New Translation Tags](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md#add-new-translation-tags) (for new translation tags only) ### Translations (if applicable) - [ ] I ran [`scripts/counter_translation.py`](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/docs/counter_translation.md) ### UI Changes (if applicable) - [ ] Screenshots or videos demonstrating the UI changes are attached (e.g., as comments or direct attachments in the PR) ### Testing (if applicable) - [ ] I have tested my changes locally. Refer to the [Testing Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md#6-testing) for more details.
This commit is contained in:
@@ -1,16 +1,16 @@
|
||||
"""
|
||||
A script to update language progress status in README.md based on
|
||||
properties file comparison.
|
||||
frontend locale TOML file comparisons.
|
||||
|
||||
This script compares the default (reference) properties file, usually
|
||||
`messages_en_GB.properties`, with other translation files in the
|
||||
`app/core/src/main/resources/` directory.
|
||||
It determines how many lines are fully translated and automatically updates
|
||||
This script compares the default (reference) TOML file,
|
||||
`frontend/public/locales/en-GB/translation.toml`, with other translation
|
||||
files in `frontend/public/locales/*/translation.toml`.
|
||||
It determines how many keys are fully translated and automatically updates
|
||||
progress badges in the `README.md`.
|
||||
|
||||
Additionally, it maintains a TOML configuration file
|
||||
(`scripts/ignore_translation.toml`) that defines which keys are ignored
|
||||
during comparison (e.g., static values like `language.direction`).
|
||||
during comparison (e.g., values intentionally matching English).
|
||||
|
||||
Author: Ludy87
|
||||
|
||||
@@ -18,31 +18,31 @@ Usage:
|
||||
Run this script directly from the project root.
|
||||
|
||||
# --- Compare all translation files and update README.md ---
|
||||
$ python scripts/counter_translation.py
|
||||
$ python scripts/counter_translation_v3.py
|
||||
|
||||
This will:
|
||||
• Compare all files matching messages_*.properties
|
||||
• Compare all files matching frontend/public/locales/*/translation.toml
|
||||
• Update progress badges in README.md
|
||||
• Update/format ignore_translation.toml automatically
|
||||
|
||||
# --- Check a single language file ---
|
||||
$ python scripts/counter_translation.py --lang messages_fr_FR.properties
|
||||
$ python scripts/counter_translation_v3.py --lang fr-FR
|
||||
|
||||
This will:
|
||||
• Compare the French translation file against the English reference
|
||||
• Print the translation percentage in the console
|
||||
|
||||
# --- Print ONLY the percentage (for CI pipelines or automation) ---
|
||||
$ python scripts/counter_translation.py --lang messages_fr_FR.properties --show-percentage
|
||||
$ python scripts/counter_translation_v3.py --lang fr-FR --show-percentage
|
||||
|
||||
Example output:
|
||||
87
|
||||
|
||||
Arguments:
|
||||
-l, --lang <file> Specific properties file to check
|
||||
(relative or absolute path).
|
||||
--show-percentage Print only the percentage (no formatting, ideal for CI/CD).
|
||||
--show-missing-keys Show the list of missing keys when checking a single language file.
|
||||
-l, --lang <locale or file> Specific locale to check (e.g. 'de-DE'),
|
||||
a directory, or a full path to translation.toml.
|
||||
--show-percentage Print only the percentage (no formatting, ideal for CI/CD).
|
||||
--show-missing-keys Show the list of missing keys when checking a single language file.
|
||||
"""
|
||||
|
||||
import argparse
|
||||
@@ -50,10 +50,18 @@ import glob
|
||||
import os
|
||||
import re
|
||||
import sys
|
||||
from collections.abc import Mapping
|
||||
from typing import Iterable
|
||||
|
||||
import tomlkit
|
||||
import tomlkit.toml_file
|
||||
# Ensure tomlkit is installed before importing
|
||||
try:
|
||||
import tomlkit
|
||||
except ImportError:
|
||||
raise ImportError(
|
||||
"The 'tomlkit' library is not installed. Please install it using 'pip install tomlkit'."
|
||||
)
|
||||
|
||||
sys.stdout.reconfigure(encoding="utf-8", errors="replace")
|
||||
|
||||
|
||||
def convert_to_multiline(data: tomlkit.TOMLDocument) -> tomlkit.TOMLDocument:
|
||||
@@ -102,7 +110,10 @@ def write_readme(progress_list: list[tuple[str, int]]) -> None:
|
||||
Returns:
|
||||
None
|
||||
"""
|
||||
with open("README.md", encoding="utf-8") as file:
|
||||
with open(
|
||||
os.path.join(os.getcwd(), "devGuide", "HowToAddNewLanguage.md"),
|
||||
encoding="utf-8",
|
||||
) as file:
|
||||
content = file.readlines()
|
||||
|
||||
for i, line in enumerate(content[2:], start=2):
|
||||
@@ -115,56 +126,62 @@ def write_readme(progress_list: list[tuple[str, int]]) -> None:
|
||||
f"",
|
||||
)
|
||||
|
||||
with open("README.md", "w", encoding="utf-8", newline="\n") as file:
|
||||
with open(
|
||||
os.path.join(os.getcwd(), "devGuide", "HowToAddNewLanguage.md"),
|
||||
"w",
|
||||
encoding="utf-8",
|
||||
newline="\n",
|
||||
) as file:
|
||||
file.writelines(content)
|
||||
|
||||
|
||||
def load_reference_keys(default_file_path: str) -> set[str]:
|
||||
"""Reads all keys from the reference properties file (excluding comments and empty lines).
|
||||
|
||||
This function skips the first 5 lines (assumed to be headers or metadata) and then
|
||||
extracts keys from lines containing '=' separators, ignoring comments (#) and empty lines.
|
||||
It also handles potential BOM (Byte Order Mark) characters.
|
||||
def _flatten_toml(data: Mapping[str, object], prefix: str = "") -> dict[str, object]:
|
||||
"""Flattens a TOML document into dotted keys for comparison.
|
||||
|
||||
Args:
|
||||
default_file_path (str): The path to the default (reference) properties file.
|
||||
data (Mapping[str, object]): TOML content loaded into a mapping.
|
||||
prefix (str): Prefix for nested keys.
|
||||
|
||||
Returns:
|
||||
set[str]: A set of unique keys found in the reference file.
|
||||
dict[str, object]: Flattened key/value mapping.
|
||||
"""
|
||||
keys: set[str] = set()
|
||||
with open(default_file_path, encoding="utf-8") as f:
|
||||
# Skip the first 5 lines (headers)
|
||||
for _ in range(5):
|
||||
try:
|
||||
next(f)
|
||||
except StopIteration:
|
||||
break
|
||||
flattened: dict[str, object] = {}
|
||||
for key, value in data.items():
|
||||
combined_key = f"{prefix}{key}"
|
||||
if isinstance(value, Mapping):
|
||||
flattened.update(_flatten_toml(value, f"{combined_key}."))
|
||||
else:
|
||||
flattened[combined_key] = value
|
||||
return flattened
|
||||
|
||||
for line in f:
|
||||
s = line.strip()
|
||||
if not s or s.startswith("#") or "=" not in s:
|
||||
continue
|
||||
k, _ = s.split("=", 1)
|
||||
keys.add(k.strip().replace("\ufeff", "")) # BOM protection
|
||||
return keys
|
||||
|
||||
def load_translation_entries(file_path: str) -> dict[str, object]:
|
||||
"""Reads and flattens translation entries from a TOML file.
|
||||
|
||||
Args:
|
||||
file_path (str): Path to translation.toml.
|
||||
|
||||
Returns:
|
||||
dict[str, object]: Flattened key/value entries.
|
||||
"""
|
||||
with open(file_path, encoding="utf-8") as f:
|
||||
document = tomlkit.parse(f.read())
|
||||
return _flatten_toml(document)
|
||||
|
||||
|
||||
def _lang_from_path(file_path: str) -> str:
|
||||
"""Extracts the language code from a properties file path.
|
||||
"""Extracts the language code from a locale TOML file path.
|
||||
|
||||
Assumes the filename format is 'messages_<language>.properties', where <language>
|
||||
is the code like 'fr_FR'.
|
||||
Assumes the filename format is '<locale>/translation.toml', where <locale>
|
||||
is the code like 'fr-FR'.
|
||||
|
||||
Args:
|
||||
file_path (str): The full path to the properties file.
|
||||
file_path (str): The full path to the TOML translation file.
|
||||
|
||||
Returns:
|
||||
str: The extracted language code.
|
||||
"""
|
||||
return (
|
||||
os.path.basename(file_path).split("messages_", 1)[1].split(".properties", 1)[0]
|
||||
)
|
||||
return os.path.basename(os.path.dirname(file_path))
|
||||
|
||||
|
||||
def compare_files(
|
||||
@@ -174,16 +191,16 @@ def compare_files(
|
||||
show_missing_keys: bool = False,
|
||||
show_percentage: bool = False,
|
||||
) -> list[tuple[str, int]]:
|
||||
"""Compares the default properties file with other properties files in the directory.
|
||||
"""Compares the default TOML file with other locale TOML files in the directory.
|
||||
|
||||
This function calculates translation progress for each language file by comparing
|
||||
keys and values line-by-line, skipping headers. It accounts for ignored keys defined
|
||||
in a TOML configuration file and updates that file with cleaned ignore lists.
|
||||
English variants (en_GB, en_US) are hardcoded to 100% progress.
|
||||
keys and values. It accounts for ignored keys defined in a TOML configuration file
|
||||
and updates that file with cleaned ignore lists. English variants (en-GB, en-US)
|
||||
are hardcoded to 100% progress.
|
||||
|
||||
Args:
|
||||
default_file_path (str): The path to the default properties file (reference).
|
||||
file_paths (Iterable[str]): Iterable of paths to properties files to compare.
|
||||
default_file_path (str): The path to the default TOML file (reference).
|
||||
file_paths (Iterable[str]): Iterable of paths to TOML files to compare.
|
||||
ignore_translation_file (str): Path to the TOML file with ignore/missing configurations per language.
|
||||
show_missing_keys (bool, optional): If True, prints the list of missing keys for each file. Defaults to False.
|
||||
show_percentage (bool, optional): If True, suppresses detailed output and focuses on percentage calculation. Defaults to False.
|
||||
@@ -192,14 +209,9 @@ def compare_files(
|
||||
list[tuple[str, int]]: A sorted list of tuples containing language codes and progress percentages
|
||||
(descending order by percentage). Duplicates are removed.
|
||||
"""
|
||||
# Count total translatable lines in reference (excluding empty and comments)
|
||||
num_lines = sum(
|
||||
1
|
||||
for line in open(default_file_path, encoding="utf-8")
|
||||
if line.strip() and not line.strip().startswith("#")
|
||||
)
|
||||
|
||||
ref_keys: set[str] = load_reference_keys(default_file_path)
|
||||
reference_entries = load_translation_entries(default_file_path)
|
||||
ref_keys = set(reference_entries.keys())
|
||||
num_lines = len(ref_keys)
|
||||
|
||||
result_list: list[tuple[str, int]] = []
|
||||
sort_ignore_translation: tomlkit.TOMLDocument
|
||||
@@ -215,10 +227,12 @@ def compare_files(
|
||||
language = _lang_from_path(file_path)
|
||||
|
||||
# Hardcode English variants to 100%
|
||||
if "en_GB" in language or "en_US" in language:
|
||||
if language in {"en-GB", "en-US"}:
|
||||
result_list.append((language, 100))
|
||||
continue
|
||||
|
||||
language = language.replace("-", "_")
|
||||
|
||||
# Initialize language table in TOML if missing
|
||||
if language not in sort_ignore_translation:
|
||||
sort_ignore_translation[language] = tomlkit.table()
|
||||
@@ -239,58 +253,30 @@ def compare_files(
|
||||
if key in ref_keys or key == "language.direction"
|
||||
]
|
||||
|
||||
translation_entries = load_translation_entries(file_path)
|
||||
fails = 0
|
||||
missing_str_keys: list[str] = []
|
||||
with (
|
||||
open(default_file_path, encoding="utf-8") as default_file,
|
||||
open(file_path, encoding="utf-8") as file,
|
||||
):
|
||||
# Skip headers (first 5 lines) in both files
|
||||
for _ in range(5):
|
||||
next(default_file)
|
||||
try:
|
||||
next(file)
|
||||
except StopIteration:
|
||||
fails = num_lines
|
||||
break
|
||||
|
||||
for line_num, (line_default, line_file) in enumerate(
|
||||
zip(default_file, file), start=6
|
||||
for default_key, default_value in reference_entries.items():
|
||||
if default_key not in translation_entries:
|
||||
fails += 1
|
||||
missing_str_keys.append(default_key)
|
||||
continue
|
||||
|
||||
file_value = translation_entries[default_key]
|
||||
if (
|
||||
default_value == file_value
|
||||
and default_key not in sort_ignore_translation[language]["ignore"]
|
||||
):
|
||||
try:
|
||||
# Ignoring empty lines and lines starting with #
|
||||
if line_default.strip() == "" or line_default.startswith("#"):
|
||||
# Missing translation (same as default and not ignored)
|
||||
fails += 1
|
||||
missing_str_keys.append(default_key)
|
||||
if default_value != file_value:
|
||||
if default_key in sort_ignore_translation[language]["ignore"]:
|
||||
if default_key == "language.direction":
|
||||
continue
|
||||
|
||||
default_key, default_value = line_default.split("=", 1)
|
||||
file_key, file_value = line_file.split("=", 1)
|
||||
default_key = default_key.strip()
|
||||
default_value = default_value.strip()
|
||||
file_key = file_key.strip()
|
||||
file_value = file_value.strip()
|
||||
|
||||
if (
|
||||
default_value == file_value
|
||||
and default_key
|
||||
not in sort_ignore_translation[language]["ignore"]
|
||||
):
|
||||
# Missing translation (same as default and not ignored)
|
||||
fails += 1
|
||||
missing_str_keys.append(default_key)
|
||||
if default_value != file_value:
|
||||
if default_key in sort_ignore_translation[language]["ignore"]:
|
||||
# Remove from ignore if actually translated
|
||||
sort_ignore_translation[language]["ignore"].remove(
|
||||
default_key
|
||||
)
|
||||
except ValueError as e:
|
||||
print(f"Error processing line {line_num} in {file_path}: {e}")
|
||||
print(f"{line_default}|{line_file}")
|
||||
sys.exit(1)
|
||||
except IndexError:
|
||||
# Handle mismatched line counts
|
||||
fails += 1
|
||||
continue
|
||||
# Remove from ignore if actually translated
|
||||
sort_ignore_translation[language]["ignore"].remove(default_key)
|
||||
|
||||
if show_missing_keys:
|
||||
if len(missing_str_keys) > 0:
|
||||
@@ -327,19 +313,19 @@ def main() -> None:
|
||||
(with optional percentage output) or all files and updates the README.md.
|
||||
|
||||
Command-line options:
|
||||
--lang, -l <file>: Specific properties file to check (e.g., 'messages_fr_FR.properties').
|
||||
--lang, -l <file>: Specific locale to check, e.g. 'fr-FR'
|
||||
--show-percentage: Print only the translation percentage for --lang and exit.
|
||||
--show-missing-keys: Show the list of missing keys when checking a single language file.
|
||||
"""
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Compare i18n property files and optionally update README badges."
|
||||
description="Compare frontend i18n TOML files and optionally update README badges."
|
||||
)
|
||||
parser.add_argument(
|
||||
"--lang",
|
||||
"-l",
|
||||
help=(
|
||||
"Specific properties file to check, e.g. 'messages_fr_FR.properties'. "
|
||||
"If a relative filename is given, it is resolved against the resources directory."
|
||||
"Specific locale to check, e.g. 'fr-FR'. "
|
||||
"If a relative filename is given, it is resolved against the locales directory."
|
||||
),
|
||||
)
|
||||
parser.add_argument(
|
||||
@@ -359,8 +345,8 @@ def main() -> None:
|
||||
|
||||
# Project layout assumptions
|
||||
cwd = os.getcwd()
|
||||
resources_dir = os.path.join(cwd, "app", "core", "src", "main", "resources")
|
||||
reference_file = os.path.join(resources_dir, "messages_en_GB.properties")
|
||||
locales_dir = os.path.join(cwd, "frontend", "public", "locales")
|
||||
reference_file = os.path.join(locales_dir, "en-GB", "translation.toml")
|
||||
scripts_directory = os.path.join(cwd, "scripts")
|
||||
translation_state_file = os.path.join(scripts_directory, "ignore_translation.toml")
|
||||
|
||||
@@ -370,7 +356,19 @@ def main() -> None:
|
||||
if os.path.isabs(lang_input) or os.path.exists(lang_input):
|
||||
lang_file = lang_input
|
||||
else:
|
||||
lang_file = os.path.join(resources_dir, lang_input)
|
||||
candidate = os.path.join(locales_dir, lang_input)
|
||||
candidate_with_file = os.path.join(
|
||||
locales_dir, lang_input, "translation.toml"
|
||||
)
|
||||
if os.path.exists(candidate):
|
||||
if os.path.isdir(candidate):
|
||||
lang_file = candidate_with_file
|
||||
else:
|
||||
lang_file = candidate
|
||||
elif os.path.exists(candidate_with_file):
|
||||
lang_file = candidate_with_file
|
||||
else:
|
||||
lang_file = lang_input
|
||||
|
||||
if not os.path.exists(lang_file):
|
||||
print(f"ERROR: Could not find language file: {lang_file}")
|
||||
@@ -384,7 +382,7 @@ def main() -> None:
|
||||
args.show_percentage,
|
||||
)
|
||||
# Find the exact tuple for the requested language
|
||||
wanted_key = _lang_from_path(lang_file)
|
||||
wanted_key = _lang_from_path(lang_file).replace("-", "_")
|
||||
for lang, pct in results:
|
||||
if lang == wanted_key:
|
||||
if args.show_percentage:
|
||||
@@ -400,13 +398,11 @@ def main() -> None:
|
||||
sys.exit(3)
|
||||
|
||||
# Default behavior (no --lang): process all and update README
|
||||
messages_file_paths = glob.glob(
|
||||
os.path.join(resources_dir, "messages_*.properties")
|
||||
)
|
||||
messages_file_paths = glob.glob(os.path.join(locales_dir, "*", "translation.toml"))
|
||||
progress = compare_files(
|
||||
reference_file, messages_file_paths, translation_state_file
|
||||
)
|
||||
write_readme(progress)
|
||||
# write_readme(progress)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -171,9 +171,15 @@ Merges missing translations from en-GB into target language files and manages tr
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
# Operate on all locales (except en-GB) when language is omitted
|
||||
python scripts/translations/translation_merger.py add-missing
|
||||
|
||||
# Add missing translations from en-GB to French
|
||||
python scripts/translations/translation_merger.py fr-FR add-missing
|
||||
|
||||
# Create backups before modifying files
|
||||
python scripts/translations/translation_merger.py fr-FR add-missing --backup
|
||||
|
||||
# Extract untranslated entries to a file
|
||||
python scripts/translations/translation_merger.py fr-FR extract-untranslated --output fr_untranslated.json
|
||||
|
||||
@@ -183,15 +189,20 @@ python scripts/translations/translation_merger.py fr-FR create-template --output
|
||||
# Apply translations from a file
|
||||
python scripts/translations/translation_merger.py fr-FR apply-translations --translations-file fr_translated.json
|
||||
|
||||
# Override default paths if needed
|
||||
python scripts/translations/translation_merger.py fr-FR add-missing --locales-dir ./frontend/public/locales --ignore-file ./scripts/ignore_translation.toml
|
||||
|
||||
# Remove unused translations not present in en-GB
|
||||
python scripts/translations/translation_merger.py fr-FR remove-unused
|
||||
```
|
||||
|
||||
**Features:**
|
||||
- Adds missing keys from en-GB (copies English text directly)
|
||||
- Runs across all locales for add-missing/remove-unused when language is omitted
|
||||
- Extracts untranslated entries for external translation
|
||||
- Creates structured templates for AI translation
|
||||
- Applies translated content back to language files
|
||||
- Applies translated content back to language files (template format or plain JSON)
|
||||
- Supports `--backup` on mutating commands
|
||||
- Automatic backup creation
|
||||
- Removes unused translations not present in en-GB
|
||||
|
||||
|
||||
@@ -6,13 +6,14 @@ Useful for AI-assisted translation workflows.
|
||||
TOML format only.
|
||||
"""
|
||||
|
||||
import json
|
||||
import sys
|
||||
from pathlib import Path
|
||||
from typing import Dict, List, Set, Any
|
||||
import os
|
||||
import argparse
|
||||
import json
|
||||
import shutil
|
||||
import sys
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
import tomllib
|
||||
import tomli_w
|
||||
@@ -21,8 +22,10 @@ import tomli_w
|
||||
class TranslationMerger:
|
||||
def __init__(
|
||||
self,
|
||||
locales_dir: str = "frontend/public/locales",
|
||||
ignore_file: str = "scripts/ignore_translation.toml",
|
||||
locales_dir: str = os.path.join(os.getcwd(), "frontend", "public", "locales"),
|
||||
ignore_file: str = os.path.join(
|
||||
os.getcwd(), "scripts", "ignore_translation.toml"
|
||||
),
|
||||
):
|
||||
self.locales_dir = Path(locales_dir)
|
||||
self.golden_truth_file = self.locales_dir / "en-GB" / "translation.toml"
|
||||
@@ -30,7 +33,7 @@ class TranslationMerger:
|
||||
self.ignore_file = Path(ignore_file)
|
||||
self.ignore_patterns = self._load_ignore_patterns()
|
||||
|
||||
def _load_translation_file(self, file_path: Path) -> Dict:
|
||||
def _load_translation_file(self, file_path: Path) -> dict[str, Any]:
|
||||
"""Load TOML translation file."""
|
||||
try:
|
||||
with open(file_path, "rb") as f:
|
||||
@@ -43,7 +46,7 @@ class TranslationMerger:
|
||||
sys.exit(1)
|
||||
|
||||
def _save_translation_file(
|
||||
self, data: Dict, file_path: Path, backup: bool = False
|
||||
self, data: dict[str, Any], file_path: Path, backup: bool = False
|
||||
) -> None:
|
||||
"""Save TOML translation file with backup option."""
|
||||
if backup and file_path.exists():
|
||||
@@ -56,7 +59,7 @@ class TranslationMerger:
|
||||
with open(file_path, "wb") as f:
|
||||
tomli_w.dump(data, f)
|
||||
|
||||
def _load_ignore_patterns(self) -> Dict[str, Set[str]]:
|
||||
def _load_ignore_patterns(self) -> dict[str, set[str]]:
|
||||
"""Load ignore patterns from TOML file."""
|
||||
if not self.ignore_file.exists():
|
||||
return {}
|
||||
@@ -73,7 +76,7 @@ class TranslationMerger:
|
||||
print(f"Warning: Could not load ignore file {self.ignore_file}: {e}")
|
||||
return {}
|
||||
|
||||
def _get_nested_value(self, data: Dict, key_path: str) -> Any:
|
||||
def _get_nested_value(self, data: dict[str, Any], key_path: str) -> Any:
|
||||
"""Get value from nested dict using dot notation."""
|
||||
keys = key_path.split(".")
|
||||
current = data
|
||||
@@ -84,7 +87,9 @@ class TranslationMerger:
|
||||
return None
|
||||
return current
|
||||
|
||||
def _set_nested_value(self, data: Dict, key_path: str, value: Any) -> None:
|
||||
def _set_nested_value(
|
||||
self, data: dict[str, Any], key_path: str, value: Any
|
||||
) -> None:
|
||||
"""Set value in nested dict using dot notation."""
|
||||
keys = key_path.split(".")
|
||||
current = data
|
||||
@@ -102,8 +107,8 @@ class TranslationMerger:
|
||||
current[keys[-1]] = value
|
||||
|
||||
def _flatten_dict(
|
||||
self, d: Dict, parent_key: str = "", separator: str = "."
|
||||
) -> Dict[str, Any]:
|
||||
self, d: dict[str, Any], parent_key: str = "", separator: str = "."
|
||||
) -> dict[str, Any]:
|
||||
"""Flatten nested dictionary into dot-notation keys."""
|
||||
items = []
|
||||
for k, v in d.items():
|
||||
@@ -114,10 +119,10 @@ class TranslationMerger:
|
||||
items.append((new_key, v))
|
||||
return dict(items)
|
||||
|
||||
def _delete_nested_key(self, data: Dict, key_path: str) -> bool:
|
||||
def _delete_nested_key(self, data: dict[str, Any], key_path: str) -> bool:
|
||||
"""Delete a nested key using dot notation and clean up empty branches."""
|
||||
|
||||
def _delete(current: Dict, keys: List[str]) -> bool:
|
||||
def _delete(current: dict[str, Any], keys: list[str]) -> bool:
|
||||
key = keys[0]
|
||||
|
||||
if key not in current:
|
||||
@@ -137,7 +142,7 @@ class TranslationMerger:
|
||||
|
||||
return _delete(data, key_path.split("."))
|
||||
|
||||
def get_missing_keys(self, target_file: Path) -> List[str]:
|
||||
def get_missing_keys(self, target_file: Path) -> list[str]:
|
||||
"""Get list of missing keys in target file."""
|
||||
lang_code = target_file.parent.name.replace("-", "_")
|
||||
ignore_set = self.ignore_patterns.get(lang_code, set())
|
||||
@@ -153,7 +158,7 @@ class TranslationMerger:
|
||||
missing = set(golden_flat.keys()) - set(target_flat.keys())
|
||||
return sorted(missing - ignore_set)
|
||||
|
||||
def get_unused_keys(self, target_file: Path) -> List[str]:
|
||||
def get_unused_keys(self, target_file: Path) -> list[str]:
|
||||
"""Get list of keys that are not present in the golden truth file."""
|
||||
if not target_file.exists():
|
||||
return []
|
||||
@@ -165,13 +170,20 @@ class TranslationMerger:
|
||||
return sorted(set(target_flat.keys()) - set(golden_flat.keys()))
|
||||
|
||||
def add_missing_translations(
|
||||
self, target_file: Path, keys_to_add: List[str] = None
|
||||
) -> Dict:
|
||||
"""Add missing translations from en-GB to target file."""
|
||||
if not target_file.exists():
|
||||
self,
|
||||
target_file: Path,
|
||||
keys_to_add: list[str] | None = None,
|
||||
save: bool = True,
|
||||
backup: bool = False,
|
||||
) -> dict[str, Any]:
|
||||
"""Add missing translations from en-GB to target file and optionally save."""
|
||||
if not target_file.parent.exists():
|
||||
target_file.parent.mkdir(parents=True, exist_ok=True)
|
||||
target_data = {}
|
||||
else:
|
||||
elif target_file.exists():
|
||||
target_data = self._load_translation_file(target_file)
|
||||
else:
|
||||
target_data = {}
|
||||
|
||||
golden_flat = self._flatten_dict(self.golden_truth)
|
||||
missing_keys = keys_to_add or self.get_missing_keys(target_file)
|
||||
@@ -184,6 +196,9 @@ class TranslationMerger:
|
||||
self._set_nested_value(target_data, key, value)
|
||||
added_count += 1
|
||||
|
||||
if added_count > 0 and save:
|
||||
self._save_translation_file(target_data, target_file, backup)
|
||||
|
||||
return {
|
||||
"added_count": added_count,
|
||||
"missing_keys": missing_keys,
|
||||
@@ -191,8 +206,8 @@ class TranslationMerger:
|
||||
}
|
||||
|
||||
def extract_untranslated_entries(
|
||||
self, target_file: Path, output_file: Path = None
|
||||
) -> Dict:
|
||||
self, target_file: Path, output_file: Path | None = None
|
||||
) -> dict[str, Any]:
|
||||
"""Extract entries marked as untranslated or identical to en-GB for AI translation."""
|
||||
if not target_file.exists():
|
||||
print(f"Error: Target file does not exist: {target_file}")
|
||||
@@ -233,9 +248,7 @@ class TranslationMerger:
|
||||
|
||||
def _is_expected_identical(self, key: str, value: str) -> bool:
|
||||
"""Check if a key-value pair is expected to be identical across languages."""
|
||||
identical_patterns = [
|
||||
"language.direction",
|
||||
]
|
||||
identical_patterns = ["language.direction"]
|
||||
|
||||
if str(value).strip() in ["ltr", "rtl", "True", "False", "true", "false"]:
|
||||
return True
|
||||
@@ -247,8 +260,11 @@ class TranslationMerger:
|
||||
return False
|
||||
|
||||
def apply_translations(
|
||||
self, target_file: Path, translations: Dict[str, str], backup: bool = False
|
||||
) -> Dict:
|
||||
self,
|
||||
target_file: Path,
|
||||
translations: dict[str, str],
|
||||
backup: bool = False,
|
||||
) -> dict[str, Any]:
|
||||
"""Apply provided translations to target file."""
|
||||
if not target_file.exists():
|
||||
print(f"Error: Target file does not exist: {target_file}")
|
||||
@@ -261,7 +277,9 @@ class TranslationMerger:
|
||||
for key, translation in translations.items():
|
||||
try:
|
||||
# Remove [UNTRANSLATED] marker if present
|
||||
if translation.startswith("[UNTRANSLATED]"):
|
||||
if isinstance(translation, str) and translation.startswith(
|
||||
"[UNTRANSLATED]"
|
||||
):
|
||||
translation = translation.replace("[UNTRANSLATED]", "").strip()
|
||||
|
||||
self._set_nested_value(target_data, key, translation)
|
||||
@@ -273,15 +291,19 @@ class TranslationMerger:
|
||||
self._save_translation_file(target_data, target_file, backup)
|
||||
|
||||
return {
|
||||
"success": True,
|
||||
"success": applied_count > 0,
|
||||
"applied_count": applied_count,
|
||||
"errors": errors,
|
||||
"data": target_data,
|
||||
}
|
||||
|
||||
def remove_unused_translations(
|
||||
self, target_file: Path, keys_to_remove: List[str] = None, backup: bool = False
|
||||
) -> Dict:
|
||||
self,
|
||||
target_file: Path,
|
||||
keys_to_remove: list[str] | None = None,
|
||||
save: bool = True,
|
||||
backup: bool = False,
|
||||
) -> dict[str, Any]:
|
||||
"""Remove translations that are not present in the golden truth file."""
|
||||
if not target_file.exists():
|
||||
print(f"Error: Target file does not exist: {target_file}")
|
||||
@@ -296,11 +318,11 @@ class TranslationMerger:
|
||||
if self._delete_nested_key(target_data, key):
|
||||
removed_count += 1
|
||||
|
||||
if removed_count > 0:
|
||||
if removed_count > 0 and save:
|
||||
self._save_translation_file(target_data, target_file, backup)
|
||||
|
||||
return {
|
||||
"success": True,
|
||||
"success": removed_count > 0,
|
||||
"removed_count": removed_count,
|
||||
"data": target_data,
|
||||
}
|
||||
@@ -349,15 +371,19 @@ def main():
|
||||
)
|
||||
parser.add_argument(
|
||||
"--locales-dir",
|
||||
default="frontend/public/locales",
|
||||
default=os.path.join(os.getcwd(), "frontend", "public", "locales"),
|
||||
help="Path to locales directory",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--ignore-file",
|
||||
default="scripts/ignore_translation.toml",
|
||||
default=os.path.join(os.getcwd(), "scripts", "ignore_translation.toml"),
|
||||
help="Path to ignore patterns TOML file",
|
||||
)
|
||||
parser.add_argument("language", help="Target language code (e.g., fr-FR)")
|
||||
parser.add_argument(
|
||||
"language",
|
||||
nargs="?",
|
||||
help="Target language code (e.g., fr-FR). If omitted, add-missing and remove-unused run for all locales except en-GB.",
|
||||
)
|
||||
|
||||
subparsers = parser.add_subparsers(dest="command", help="Available commands")
|
||||
|
||||
@@ -410,18 +436,57 @@ def main():
|
||||
|
||||
merger = TranslationMerger(args.locales_dir, args.ignore_file)
|
||||
|
||||
# Find translation file
|
||||
lang_dir = Path(args.locales_dir) / args.language
|
||||
target_file = lang_dir / "translation.toml"
|
||||
|
||||
if args.command == "add-missing":
|
||||
print(f"Adding missing translations to {args.language}...")
|
||||
result = merger.add_missing_translations(target_file)
|
||||
if args.language:
|
||||
# Find translation file
|
||||
lang_dir = Path(args.locales_dir) / args.language
|
||||
target_file = lang_dir / "translation.toml"
|
||||
print(f"Processing {args.language}...")
|
||||
result = merger.add_missing_translations(target_file, backup=args.backup)
|
||||
print(f"Added {result['added_count']} missing translations")
|
||||
else:
|
||||
total_added = 0
|
||||
for lang_dir in sorted(Path(args.locales_dir).iterdir()):
|
||||
if not lang_dir.is_dir() or lang_dir.name == "en-GB":
|
||||
continue
|
||||
target_file = lang_dir / "translation.toml"
|
||||
print(f"Processing {lang_dir.name}...")
|
||||
result = merger.add_missing_translations(
|
||||
target_file, backup=args.backup
|
||||
)
|
||||
added = result["added_count"]
|
||||
total_added += added
|
||||
print(f"Added {added} missing translations")
|
||||
print(f"\nTotal added across all languages: {total_added}")
|
||||
|
||||
merger._save_translation_file(result["data"], target_file, backup=args.backup)
|
||||
print(f"Added {result['added_count']} missing translations")
|
||||
elif args.command == "remove-unused":
|
||||
if args.language:
|
||||
lang_dir = Path(args.locales_dir) / args.language
|
||||
target_file = lang_dir / "translation.toml"
|
||||
print(f"Processing {args.language}...")
|
||||
result = merger.remove_unused_translations(target_file, backup=args.backup)
|
||||
print(f"Removed {result['removed_count']} unused translations")
|
||||
else:
|
||||
total_removed = 0
|
||||
for lang_dir in sorted(Path(args.locales_dir).iterdir()):
|
||||
if not lang_dir.is_dir() or lang_dir.name == "en-GB":
|
||||
continue
|
||||
target_file = lang_dir / "translation.toml"
|
||||
print(f"Processing {lang_dir.name}...")
|
||||
result = merger.remove_unused_translations(
|
||||
target_file, backup=args.backup
|
||||
)
|
||||
removed = result["removed_count"]
|
||||
total_removed += removed
|
||||
print(f"Removed {removed} unused translations")
|
||||
print(f"\nTotal removed across all languages: {total_removed}")
|
||||
|
||||
elif args.command == "extract-untranslated":
|
||||
if not args.language:
|
||||
print("Error: language is required for extract-untranslated")
|
||||
sys.exit(1)
|
||||
lang_dir = Path(args.locales_dir) / args.language
|
||||
target_file = lang_dir / "translation.toml"
|
||||
output_file = (
|
||||
Path(args.output)
|
||||
if args.output
|
||||
@@ -431,10 +496,20 @@ def main():
|
||||
print(f"Extracted {len(untranslated)} untranslated entries to {output_file}")
|
||||
|
||||
elif args.command == "create-template":
|
||||
output_file = Path(args.output)
|
||||
merger.create_translation_template(target_file, output_file)
|
||||
if not args.language:
|
||||
print("Error: language is required for create-template")
|
||||
sys.exit(1)
|
||||
lang_dir = Path(args.locales_dir) / args.language
|
||||
target_file = lang_dir / "translation.toml"
|
||||
merger.create_translation_template(target_file, Path(args.output))
|
||||
|
||||
elif args.command == "apply-translations":
|
||||
if not args.language:
|
||||
print("Error: language is required for apply-translations")
|
||||
sys.exit(1)
|
||||
lang_dir = Path(args.locales_dir) / args.language
|
||||
target_file = lang_dir / "translation.toml"
|
||||
|
||||
with open(args.translations_file, "r", encoding="utf-8") as f:
|
||||
translations_data = json.load(f)
|
||||
|
||||
@@ -455,20 +530,11 @@ def main():
|
||||
if result["success"]:
|
||||
print(f"Applied {result['applied_count']} translations")
|
||||
if result["errors"]:
|
||||
print(f"Errors: {len(result['errors'])}")
|
||||
print(f"Errors encountered: {len(result['errors'])}")
|
||||
for error in result["errors"][:5]:
|
||||
print(f" - {error}")
|
||||
else:
|
||||
print(f"Failed: {result.get('error', 'Unknown error')}")
|
||||
|
||||
elif args.command == "remove-unused":
|
||||
print(f"Removing unused translations from {args.language}...")
|
||||
result = merger.remove_unused_translations(target_file, backup=args.backup)
|
||||
|
||||
if result["success"]:
|
||||
print(f"Removed {result['removed_count']} unused translations")
|
||||
else:
|
||||
print(f"Failed: {result.get('error', 'Unknown error')}")
|
||||
print("No translations applied.")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
||||
Reference in New Issue
Block a user