V1 merge (#5193)

# Description of Changes  --- ## Checklist ### General - [ ] I have read the [Contribution Guidelines](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/CONTRIBUTING.md) - [ ] I have read the [Stirling-PDF Developer Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md) (if applicable) - [ ] I have read the [How to add new languages to Stirling-PDF](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md) (if applicable) - [ ] I have performed a self-review of my own code - [ ] My changes generate no new warnings ### Documentation - [ ] I have updated relevant docs on [Stirling-PDF's doc repo](https://github.com/Stirling-Tools/Stirling-Tools.github.io/blob/main/docs/) (if functionality has heavily changed) - [ ] I have read the section [Add New Translation Tags](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md#add-new-translation-tags) (for new translation tags only) ### UI Changes (if applicable) - [ ] Screenshots or videos demonstrating the UI changes are attached (e.g., as comments or direct attachments in the PR) ### Testing (if applicable) - [ ] I have tested my changes locally. Refer to the [Testing Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md#6-testing) for more details. --------- Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Balázs Szücs <bszucs1209@gmail.com> Signed-off-by: stirlingbot[bot] <stirlingbot[bot]@users.noreply.github.com> Co-authored-by: ConnorYoh <40631091+ConnorYoh@users.noreply.github.com> Co-authored-by: Connor Yoh <connor@stirlingpdf.com> Co-authored-by: OUNZAR Aymane <aymane.ounzar@imt-atlantique.net> Co-authored-by: YAOU Reda <yaoureda24@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: stirlingbot[bot] <195170888+stirlingbot[bot]@users.noreply.github.com> Co-authored-by: Balázs Szücs <127139797+balazs-szucs@users.noreply.github.com> Co-authored-by: Ludy <Ludy87@users.noreply.github.com> Co-authored-by: tkymmm <136296842+tkymmm@users.noreply.github.com> Co-authored-by: Peter Dave Hello <hsu@peterdavehello.org> Co-authored-by: albanobattistella <34811668+albanobattistella@users.noreply.github.com> Co-authored-by: PingLin8888 <88387490+PingLin8888@users.noreply.github.com> Co-authored-by: FdaSilvaYY <FdaSilvaYY@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: OteJlo <106060728+OteJlo@users.noreply.github.com> Co-authored-by: Angel <41905618+TheShadowAngel@users.noreply.github.com> Co-authored-by: Ricardo Catarino <ricardomicc@gmail.com> Co-authored-by: Luis Antonio Argüelles González <luis.arguelles@encora.com> Co-authored-by: Dawid Urbański <31166488+urbaned121@users.noreply.github.com> Co-authored-by: Stephan Paternotte <Stephan-P@users.noreply.github.com> Co-authored-by: Leonardo Santos Paulucio <leonardo.paulucio@hotmail.com> Co-authored-by: hamza khalem <72972114+hamzakhalem@users.noreply.github.com> Co-authored-by: IT Creativity + Art Team <admin@it-playground.net> Co-authored-by: Reece Browne <74901996+reecebrowne@users.noreply.github.com> Co-authored-by: James Brunton <jbrunton96@gmail.com> Co-authored-by: Victor Villarreal <133383186+vvillarreal-cfee@users.noreply.github.com>
2026-04-16 23:08:38 +02:00 · 2025-12-21 10:40:32 +00:00
parent a5dcdd5bd9
commit 68ed54e398
343 changed files with 25212 additions and 6592 deletions
--- a/scripts/counter_translation_v3.py
+++ b/scripts/counter_translation_v3.py
@@ -1,21 +1,56 @@
-"""A script to update language progress status in README.md based on
-TOML translation file comparison.
+"""
+A script to update language progress status in README.md based on
+properties file comparison.

-This script compares the default translation TOML file with others in the locales directory to
-determine language progress.
-It then updates README.md based on provided progress list.
+This script compares the default (reference) properties file, usually
+`messages_en_GB.properties`, with other translation files in the
+`app/core/src/main/resources/` directory.
+It determines how many lines are fully translated and automatically updates
+progress badges in the `README.md`.
+
+Additionally, it maintains a TOML configuration file
+(`scripts/ignore_translation.toml`) that defines which keys are ignored
+during comparison (e.g., static values like `language.direction`).

 Author: Ludy87
-Updated for TOML format

-Example:
-    To use this script, simply run it from command line:
-        $ python counter_translation_v3.py
-"""  # noqa: D205
+Usage:
+    Run this script directly from the project root.

+    # --- Compare all translation files and update README.md ---
+    $ python scripts/counter_translation.py
+
+    This will:
+        • Compare all files matching messages_*.properties
+        • Update progress badges in README.md
+        • Update/format ignore_translation.toml automatically
+
+    # --- Check a single language file ---
+    $ python scripts/counter_translation.py --lang messages_fr_FR.properties
+
+    This will:
+        • Compare the French translation file against the English reference
+        • Print the translation percentage in the console
+
+    # --- Print ONLY the percentage (for CI pipelines or automation) ---
+    $ python scripts/counter_translation.py --lang messages_fr_FR.properties --show-percentage
+
+    Example output:
+        87
+
+Arguments:
+    -l, --lang <file>          Specific properties file to check
+                               (relative or absolute path).
+    --show-percentage          Print only the percentage (no formatting, ideal for CI/CD).
+    --show-missing-keys        Show the list of missing keys when checking a single language file.
+"""
+
+import argparse
 import glob
 import os
 import re
+import sys
+from typing import Iterable

 import tomlkit
 import tomlkit.toml_file
@@ -23,14 +58,15 @@ import tomlkit.toml_file

 def convert_to_multiline(data: tomlkit.TOMLDocument) -> tomlkit.TOMLDocument:
    """Converts 'ignore' and 'missing' arrays to multiline arrays and sorts the first-level keys of the TOML document.
+
    Enhances readability and consistency in the TOML file by ensuring arrays contain unique and sorted entries.

-    Parameters:
+    Args:
        data (tomlkit.TOMLDocument): The original TOML document containing the data.

    Returns:
        tomlkit.TOMLDocument: A new TOML document with sorted keys and properly formatted arrays.
-    """  # noqa: D205
+    """
    sorted_data = tomlkit.document()
    for key in sorted(data.keys()):
        value = data[key]
@@ -53,16 +89,19 @@ def convert_to_multiline(data: tomlkit.TOMLDocument) -> tomlkit.TOMLDocument:


 def write_readme(progress_list: list[tuple[str, int]]) -> None:
-    """Updates the progress status in the README.md file based
-    on the provided progress list.
+    """Updates the progress status in the README.md file based on the provided progress list.

-    Parameters:
+    This function reads the existing README.md content, identifies lines containing
+    language-specific progress badges, and replaces the percentage values and URLs
+    with the new progress data.
+
+    Args:
        progress_list (list[tuple[str, int]]): A list of tuples containing
-        language and progress percentage.
+            language codes (e.g., 'fr_FR') and progress percentages (integers from 0 to 100).

    Returns:
        None
-    """  # noqa: D205
+    """
    with open("README.md", encoding="utf-8") as file:
        content = file.readlines()

@@ -80,70 +119,111 @@ def write_readme(progress_list: list[tuple[str, int]]) -> None:
        file.writelines(content)


-def parse_toml_file(file_path):
-    """
-    Parses a TOML translation file and returns a flat dictionary of all keys.
-    :param file_path: Path to the TOML file.
-    :return: Dictionary with flattened keys and values.
-    """
-    with open(file_path, "r", encoding="utf-8") as file:
-        data = tomlkit.parse(file.read())
+def load_reference_keys(default_file_path: str) -> set[str]:
+    """Reads all keys from the reference properties file (excluding comments and empty lines).

-    def flatten_dict(d, parent_key="", sep="."):
-        items = {}
-        for k, v in d.items():
-            new_key = f"{parent_key}{sep}{k}" if parent_key else k
-            if isinstance(v, dict):
-                items.update(flatten_dict(v, new_key, sep=sep))
-            else:
-                items[new_key] = v
-        return items
+    This function skips the first 5 lines (assumed to be headers or metadata) and then
+    extracts keys from lines containing '=' separators, ignoring comments (#) and empty lines.
+    It also handles potential BOM (Byte Order Mark) characters.

-    return flatten_dict(data)
+    Args:
+        default_file_path (str): The path to the default (reference) properties file.
+
+    Returns:
+        set[str]: A set of unique keys found in the reference file.
+    """
+    keys: set[str] = set()
+    with open(default_file_path, encoding="utf-8") as f:
+        # Skip the first 5 lines (headers)
+        for _ in range(5):
+            try:
+                next(f)
+            except StopIteration:
+                break
+
+        for line in f:
+            s = line.strip()
+            if not s or s.startswith("#") or "=" not in s:
+                continue
+            k, _ = s.split("=", 1)
+            keys.add(k.strip().replace("\ufeff", ""))  # BOM protection
+    return keys
+
+
+def _lang_from_path(file_path: str) -> str:
+    """Extracts the language code from a properties file path.
+
+    Assumes the filename format is 'messages_<language>.properties', where <language>
+    is the code like 'fr_FR'.
+
+    Args:
+        file_path (str): The full path to the properties file.
+
+    Returns:
+        str: The extracted language code.
+    """
+    return (
+        os.path.basename(file_path).split("messages_", 1)[1].split(".properties", 1)[0]
+    )


 def compare_files(
-    default_file_path, file_paths, ignore_translation_file
+    default_file_path: str,
+    file_paths: Iterable[str],
+    ignore_translation_file: str,
+    show_missing_keys: bool = False,
+    show_percentage: bool = False,
 ) -> list[tuple[str, int]]:
-    """Compares the default TOML translation file with other
-    translation files in the locales directory.
+    """Compares the default properties file with other properties files in the directory.

-    Parameters:
-        default_file_path (str): The path to the default translation TOML file.
-        file_paths (list): List of paths to translation TOML files.
-        ignore_translation_file (str): Path to the TOML file with ignore rules.
+    This function calculates translation progress for each language file by comparing
+    keys and values line-by-line, skipping headers. It accounts for ignored keys defined
+    in a TOML configuration file and updates that file with cleaned ignore lists.
+    English variants (en_GB, en_US) are hardcoded to 100% progress.
+
+    Args:
+        default_file_path (str): The path to the default properties file (reference).
+        file_paths (Iterable[str]): Iterable of paths to properties files to compare.
+        ignore_translation_file (str): Path to the TOML file with ignore/missing configurations per language.
+        show_missing_keys (bool, optional): If True, prints the list of missing keys for each file. Defaults to False.
+        show_percentage (bool, optional): If True, suppresses detailed output and focuses on percentage calculation. Defaults to False.

    Returns:
-        list[tuple[str, int]]: A list of tuples containing
-        language and progress percentage.
-    """  # noqa: D205
-    default_keys = parse_toml_file(default_file_path)
-    num_keys = len(default_keys)
+        list[tuple[str, int]]: A sorted list of tuples containing language codes and progress percentages
+            (descending order by percentage). Duplicates are removed.
+    """
+    # Count total translatable lines in reference (excluding empty and comments)
+    num_lines = sum(
+        1
+        for line in open(default_file_path, encoding="utf-8")
+        if line.strip() and not line.strip().startswith("#")
+    )

-    result_list = []
+    ref_keys: set[str] = load_reference_keys(default_file_path)
+
+    result_list: list[tuple[str, int]] = []
    sort_ignore_translation: tomlkit.TOMLDocument

-    # read toml
-    with open(ignore_translation_file, encoding="utf-8") as f:
-        sort_ignore_translation = tomlkit.parse(f.read())
+    # Read or initialize TOML config
+    if os.path.exists(ignore_translation_file):
+        with open(ignore_translation_file, encoding="utf-8") as f:
+            sort_ignore_translation = tomlkit.parse(f.read())
+    else:
+        sort_ignore_translation = tomlkit.document()

    for file_path in file_paths:
-        # Extract language code from directory name
-        locale_dir = os.path.basename(os.path.dirname(file_path))
+        language = _lang_from_path(file_path)

-        # Convert locale format from hyphen to underscore for TOML compatibility
-        # e.g., en-GB -> en_GB, sr-LATN-RS -> sr_LATN_RS
-        language = locale_dir.replace("-", "_")
-
-        fails = 0
-        if language in ["en_GB", "en_US"]:
-            result_list.append(("en_GB", 100))
-            result_list.append(("en_US", 100))
+        # Hardcode English variants to 100%
+        if "en_GB" in language or "en_US" in language:
+            result_list.append((language, 100))
            continue

+        # Initialize language table in TOML if missing
        if language not in sort_ignore_translation:
            sort_ignore_translation[language] = tomlkit.table()

+        # Ensure default ignore list if empty
        if (
            "ignore" not in sort_ignore_translation[language]
            or len(sort_ignore_translation[language].get("ignore", [])) < 1
@@ -152,53 +232,182 @@ def compare_files(
                ["language.direction"]
            )

-        current_keys = parse_toml_file(file_path)
+        # Clean up ignore list to only include keys present in reference
+        sort_ignore_translation[language]["ignore"] = [
+            key
+            for key in sort_ignore_translation[language]["ignore"]
+            if key in ref_keys or key == "language.direction"
+        ]

-        # Compare keys
-        for default_key, default_value in default_keys.items():
-            if default_key not in current_keys:
-                # Key is missing entirely
-                if default_key not in sort_ignore_translation[language]["ignore"]:
-                    print(f"{language}: Key '{default_key}' is missing.")
-                    fails += 1
-            elif (
-                default_value == current_keys[default_key]
-                and default_key not in sort_ignore_translation[language]["ignore"]
+        fails = 0
+        missing_str_keys: list[str] = []
+        with (
+            open(default_file_path, encoding="utf-8") as default_file,
+            open(file_path, encoding="utf-8") as file,
+        ):
+            # Skip headers (first 5 lines) in both files
+            for _ in range(5):
+                next(default_file)
+                try:
+                    next(file)
+                except StopIteration:
+                    fails = num_lines
+                    break
+
+            for line_num, (line_default, line_file) in enumerate(
+                zip(default_file, file), start=6
            ):
-                # Key exists but value is untranslated (same as reference)
-                print(f"{language}: Key '{default_key}' is missing the translation.")
-                fails += 1
-            elif default_value != current_keys[default_key]:
-                # Key is translated, remove from ignore list if present
-                if default_key in sort_ignore_translation[language]["ignore"]:
-                    sort_ignore_translation[language]["ignore"].remove(default_key)
+                try:
+                    # Ignoring empty lines and lines starting with #
+                    if line_default.strip() == "" or line_default.startswith("#"):
+                        continue
+
+                    default_key, default_value = line_default.split("=", 1)
+                    file_key, file_value = line_file.split("=", 1)
+                    default_key = default_key.strip()
+                    default_value = default_value.strip()
+                    file_key = file_key.strip()
+                    file_value = file_value.strip()
+
+                    if (
+                        default_value == file_value
+                        and default_key
+                        not in sort_ignore_translation[language]["ignore"]
+                    ):
+                        # Missing translation (same as default and not ignored)
+                        fails += 1
+                        missing_str_keys.append(default_key)
+                    if default_value != file_value:
+                        if default_key in sort_ignore_translation[language]["ignore"]:
+                            # Remove from ignore if actually translated
+                            sort_ignore_translation[language]["ignore"].remove(
+                                default_key
+                            )
+                except ValueError as e:
+                    print(f"Error processing line {line_num} in {file_path}: {e}")
+                    print(f"{line_default}|{line_file}")
+                    sys.exit(1)
+                except IndexError:
+                    # Handle mismatched line counts
+                    fails += 1
+                    continue
+
+        if show_missing_keys:
+            if len(missing_str_keys) > 0:
+                print(f" Missing keys: {missing_str_keys}")
+            else:
+                print(" No missing keys!")
+
+        if not show_percentage:
+            print(f"{language}: {fails} out of {num_lines} lines are not translated.")

-        print(f"{language}: {fails} out of {num_keys} keys are not translated.")
        result_list.append(
            (
                language,
-                int((num_keys - fails) * 100 / num_keys),
+                int((num_lines - fails) * 100 / num_lines),
            )
        )

+    # Write cleaned and formatted TOML back
    ignore_translation = convert_to_multiline(sort_ignore_translation)
    with open(ignore_translation_file, "w", encoding="utf-8", newline="\n") as file:
        file.write(tomlkit.dumps(ignore_translation))

+    # Remove duplicates and sort by percentage descending
    unique_data = list(set(result_list))
    unique_data.sort(key=lambda x: x[1], reverse=True)

    return unique_data


-if __name__ == "__main__":
-    directory = os.path.join(os.getcwd(), "frontend", "public", "locales")
-    translation_file_paths = glob.glob(os.path.join(directory, "*", "translation.toml"))
-    reference_file = os.path.join(directory, "en-GB", "translation.toml")
+def main() -> None:
+    """Main entry point for the script.

-    scripts_directory = os.path.join(os.getcwd(), "scripts")
+    Parses command-line arguments and either processes a single language file
+    (with optional percentage output) or all files and updates the README.md.
+
+    Command-line options:
+        --lang, -l <file>: Specific properties file to check (e.g., 'messages_fr_FR.properties').
+        --show-percentage: Print only the translation percentage for --lang and exit.
+        --show-missing-keys: Show the list of missing keys when checking a single language file.
+    """
+    parser = argparse.ArgumentParser(
+        description="Compare i18n property files and optionally update README badges."
+    )
+    parser.add_argument(
+        "--lang",
+        "-l",
+        help=(
+            "Specific properties file to check, e.g. 'messages_fr_FR.properties'. "
+            "If a relative filename is given, it is resolved against the resources directory."
+        ),
+    )
+    parser.add_argument(
+        "--show-percentage",
+        "-sp",
+        action="store_true",
+        help="Print ONLY the translation percentage for --lang and exit.",
+    )
+    parser.add_argument(
+        "--show-missing-keys",
+        "-smk",
+        action="store_true",
+        help="Show the list of missing keys when checking a single language file.",
+    )
+
+    args = parser.parse_args()
+
+    # Project layout assumptions
+    cwd = os.getcwd()
+    resources_dir = os.path.join(cwd, "app", "core", "src", "main", "resources")
+    reference_file = os.path.join(resources_dir, "messages_en_GB.properties")
+    scripts_directory = os.path.join(cwd, "scripts")
    translation_state_file = os.path.join(scripts_directory, "ignore_translation.toml")

-    write_readme(
-        compare_files(reference_file, translation_file_paths, translation_state_file)
+    if args.lang:
+        # Resolve provided path
+        lang_input = args.lang
+        if os.path.isabs(lang_input) or os.path.exists(lang_input):
+            lang_file = lang_input
+        else:
+            lang_file = os.path.join(resources_dir, lang_input)
+
+        if not os.path.exists(lang_file):
+            print(f"ERROR: Could not find language file: {lang_file}")
+            sys.exit(2)
+
+        results = compare_files(
+            reference_file,
+            [lang_file],
+            translation_state_file,
+            args.show_missing_keys,
+            args.show_percentage,
+        )
+        # Find the exact tuple for the requested language
+        wanted_key = _lang_from_path(lang_file)
+        for lang, pct in results:
+            if lang == wanted_key:
+                if args.show_percentage:
+                    # Print ONLY the number
+                    print(pct)
+                    return
+                else:
+                    print(f"{lang}: {pct}% translated")
+                    return
+
+        # Fallback (should not happen)
+        print("ERROR: Language not found in results.")
+        sys.exit(3)
+
+    # Default behavior (no --lang): process all and update README
+    messages_file_paths = glob.glob(
+        os.path.join(resources_dir, "messages_*.properties")
    )
+    progress = compare_files(
+        reference_file, messages_file_paths, translation_state_file
+    )
+    write_readme(progress)
+
+
+if __name__ == "__main__":
+    main()
--- a/scripts/generate_requirements.bat
+++ b/scripts/generate_requirements.bat
@@ -29,6 +29,11 @@ if /I not "%confirm%"=="Y" (

 echo Starting generation...

+echo Generating .github\scripts\requirements_dev.txt
+pip-compile --allow-unsafe --generate-hashes --upgrade --strip-extras ^
+  --output-file=".github\scripts\requirements_dev.txt" ^
+  ".github\scripts\requirements_dev.in"
+
 echo Generating .github\scripts\requirements_pre_commit.txt
 pip-compile --generate-hashes --upgrade --strip-extras ^
  --output-file=".github\scripts\requirements_pre_commit.txt" ^
--- a/scripts/ignore_translation.toml
+++ b/scripts/ignore_translation.toml
@@ -3,6 +3,7 @@ ignore = [
    'lang.div',
    'lang.dzo',
    'lang.que',
+    'language.direction',
 ]

 [az_AZ]
@@ -36,10 +37,7 @@ ignore = [

 [bg_BG]
 ignore = [
-    'lang.div',
-    'lang.dzo',
    'lang.iku',
-    'lang.que',
    'language.direction',
 ]

@@ -50,6 +48,7 @@ ignore = [

 [ca_CA]
 ignore = [
+    'adminUserSettings.admin',
    'lang.amh',
    'lang.ceb',
    'lang.chr',
@@ -190,7 +189,8 @@ ignore = [
 ignore = [
    'AddStampRequest.alphabet',
    'AddStampRequest.position',
-    'PDFToBook.selectText.1',
+    'PDFToText.tags',
+    'addPageNumbers.selectText.3',
    'adminUserSettings.team',
    'alphabet',
    'audit.dashboard.modal.id',
@@ -200,6 +200,7 @@ ignore = [
    'audit.dashboard.table.details',
    'audit.dashboard.table.id',
    'certSign.name',
+    'cookieBanner.popUp.acceptAllBtn',
    'endpointStatistics.top10',
    'endpointStatistics.top20',
    'fileChooser.dragAndDrop',
@@ -236,11 +237,11 @@ ignore = [
    'pipelineOptions.pipelineHeader',
    'pro',
    'redact.zoom',
-    'scannerEffect.quality.medium',
    'sponsor',
    'team.status',
    'text',
    'update.version',
+    'validateSignature.cert.bits',
    'validateSignature.cert.version',
    'validateSignature.status',
    'watermark.type.1',
@@ -262,11 +263,17 @@ ignore = [

 [es_ES]
 ignore = [
+    'audit.dashboard.export.csv',
+    'audit.dashboard.export.json',
+    'audit.dashboard.modal.id',
+    'audit.dashboard.table.id',
    'error',
+    'fileChooser.click',
    'lang.ceb',
    'lang.chr',
    'lang.div',
    'lang.dzo',
+    'lang.epo',
    'lang.fil',
    'lang.guj',
    'lang.iku',
@@ -274,6 +281,7 @@ ignore = [
    'lang.lao',
    'lang.mal',
    'lang.ori',
+    'lang.que',
    'lang.snd',
    'lang.tam',
    'lang.tel',
@@ -281,7 +289,12 @@ ignore = [
    'lang.yor',
    'language.direction',
    'no',
+    'pro',
+    'redact.zoom',
+    'scannerEffect.colorspace.color',
    'showJS.tags',
+    'update.priority.normal',
+    'validateSignature.cert.bits',
 ]

 [eu_ES]
@@ -307,50 +320,88 @@ ignore = [
 ]

 [fa_IR]
-ignore = []
+ignore = [
+    'language.direction',
+]

 [fr_FR]
 ignore = [
    'AddStampRequest.alphabet',
    'AddStampRequest.position',
    'AddStampRequest.rotation',
-    'PDFToBook.selectText.1',
+    'addPageNumbers.selectText.3',
    'adminUserSettings.actions',
    'alphabet',
+    'audit.dashboard.modal.id',
+    'audit.dashboard.modal.type',
+    'audit.dashboard.pagination.pageInfo1',
+    'audit.dashboard.table.id',
+    'audit.dashboard.table.type',
    'compare.document.1',
    'compare.document.2',
+    'cookieBanner.preferencesModal.analytics.posthog.label',
+    'cookieBanner.preferencesModal.analytics.scarf.label',
+    'cookieBanner.preferencesModal.serviceCounterLabel',
+    'endpointStatistics.top',
+    'endpointStatistics.top10',
+    'endpointStatistics.top20',
+    'home.pipeline.title',
+    'lang.afr',
+    'lang.ben',
    'lang.bre',
+    'lang.cat',
    'lang.ceb',
    'lang.chr',
    'lang.div',
    'lang.dzo',
    'lang.eus',
    'lang.guj',
+    'lang.hin',
    'lang.iku',
    'lang.kan',
    'lang.kaz',
    'lang.khm',
-    'lang.lao',
-    'lang.ltz',
+    'lang.lat',
    'lang.mal',
    'lang.mar',
+    'lang.mri',
    'lang.oci',
    'lang.ori',
+    'lang.osd',
+    'lang.pan',
+    'lang.pus',
    'lang.que',
    'lang.san',
    'lang.snd',
    'lang.swa',
-    'lang.tel',
+    'lang.tam',
+    'lang.tat',
    'lang.tgl',
-    'lang.tir',
+    'lang.yid',
    'lang.yor',
    'language.direction',
    'licenses.license',
    'licenses.module',
    'licenses.nav',
    'licenses.version',
+    'multiTool.page',
+    'page',
+    'pages',
    'pdfOrganiser.mode',
    'pipeline.title',
+    'pro',
+    'redact.pageRedactionNumbers.title',
+    'redact.zoom',
+    'showJS.tags',
+    'split.desc.3',
+    'split.desc.6',
+    'split.desc.7',
+    'split.desc.8',
+    'update.version',
+    'validateSignature.cert.bits',
+    'validateSignature.cert.version',
+    'validateSignature.date',
+    'validateSignature.signature',
    'watermark.type.2',
 ]

@@ -384,7 +435,6 @@ ignore = [

 [hr_HR]
 ignore = [
-    'PDFToBook.selectText.1',
    'lang.ceb',
    'lang.chr',
    'lang.dzo',
@@ -400,13 +450,14 @@ ignore = [

 [hu_HU]
 ignore = [
-    'audit.dashboard.export.json',
    'audit.dashboard.modal.id',
    'audit.dashboard.table.id',
    'endpointStatistics.top10',
    'endpointStatistics.top20',
    'home.pipeline.title',
    'language.direction',
+    'pipeline.title',
+    'pipelineOptions.pipelineHeader',
    'pro',
    'showJS.tags',
 ]
@@ -515,11 +566,6 @@ ignore = [
    'language.direction',
 ]

-[ml_ML]
-ignore = [
-    'language.direction',
-]
-
 [nl_NL]
 ignore = [
    'compare.document.1',
@@ -556,13 +602,11 @@ ignore = [
    'lang.urd',
    'lang.yor',
    'language.direction',
-    'navbar.allTools',
    'sponsor',
 ]

 [no_NB]
 ignore = [
-    'PDFToBook.selectText.1',
    'adminUserSettings.admin',
    'info',
    'lang.afr',
@@ -609,12 +653,12 @@ ignore = [
    'lang.urd',
    'lang.yor',
    'language.direction',
+    'oops',
    'sponsor',
 ]

 [pl_PL]
 ignore = [
-    'PDFToBook.selectText.1',
    'lang.afr',
    'lang.bre',
    'lang.ceb',
@@ -684,6 +728,12 @@ ignore = [

 [pt_PT]
 ignore = [
+    'audit.dashboard.table.id',
+    'endpointStatistics.endpoint',
+    'endpointStatistics.login',
+    'endpointStatistics.top',
+    'endpointStatistics.top10',
+    'endpointStatistics.top20',
    'lang.bre',
    'lang.ceb',
    'lang.chr',
@@ -710,6 +760,8 @@ ignore = [
    'lang.uzb',
    'lang.yid',
    'language.direction',
+    'pro',
+    'update.priority.normal',
 ]

 [ro_RO]
@@ -763,6 +815,7 @@ ignore = [
 [sk_SK]
 ignore = [
    'adminUserSettings.admin',
+    'home.multiTool.title',
    'info',
    'lang.ceb',
    'lang.chr',
@@ -784,6 +837,7 @@ ignore = [
    'lang.urd',
    'lang.uzb',
    'language.direction',
+    'navbar.sections.security',
    'text',
    'watermark.type.1',
 ]
@@ -833,6 +887,7 @@ ignore = [
    'endpointStatistics.top10',
    'endpointStatistics.top20',
    'font',
+    'info',
    'lang.div',
    'lang.epo',
    'lang.hin',
@@ -890,6 +945,7 @@ ignore = [
    'lang.tir',
    'lang.uzb_cyrl',
    'language.direction',
+    'pipelineOptions.pipelineHeader',
    'showJS.tags',
 ]

@@ -963,15 +1019,11 @@ ignore = [
    'lang.yid',
    'lang.yor',
    'language.direction',
+    'pipeline.title',
    'pipelineOptions.pipelineHeader',
    'showJS.tags',
 ]

-[zh_BO]
-ignore = [
-    'language.direction',
-]
-
 [zh_CN]
 ignore = [
    'language.direction',
@@ -980,5 +1032,6 @@ ignore = [
 [zh_TW]
 ignore = [
    'language.direction',
+    'poweredBy',
    'showJS.tags',
 ]
--- a/scripts/init-without-ocr.sh
+++ b/scripts/init-without-ocr.sh
@@ -1,41 +1,188 @@
 #!/bin/bash
+# This script initializes Stirling PDF without OCR features.
+set -euo pipefail

-export JAVA_TOOL_OPTIONS="${JAVA_BASE_OPTS} ${JAVA_CUSTOM_OPTS}"
-echo "running with JAVA_TOOL_OPTIONS ${JAVA_BASE_OPTS} ${JAVA_CUSTOM_OPTS}"
+log() { printf '%s\n' "$*" >&2; }
+command_exists() { command -v "$1" >/dev/null 2>&1; }

-# Update the user and group IDs as per environment variables
-if [ ! -z "$PUID" ] && [ "$PUID" != "$(id -u stirlingpdfuser)" ]; then
-    usermod -o -u "$PUID" stirlingpdfuser || true
+SU_EXEC_BIN=""
+if command_exists su-exec; then
+  SU_EXEC_BIN="su-exec"
+elif command_exists gosu; then
+  SU_EXEC_BIN="gosu"
 fi

+CURRENT_USER="$(id -un)"
+CURRENT_UID="$(id -u)"
+SWITCH_USER_WARNING_EMITTED=false

-if [ ! -z "$PGID" ] && [ "$PGID" != "$(getent group stirlingpdfgroup | cut -d: -f3)" ]; then
-    groupmod -o -g "$PGID" stirlingpdfgroup || true
-fi
-umask "$UMASK" || true
+warn_switch_user_once() {
+  if [ "$SWITCH_USER_WARNING_EMITTED" = false ]; then
+    log "WARNING: Unable to switch to user ${RUNTIME_USER:-stirlingpdfuser}; running command as ${CURRENT_USER}."
+    SWITCH_USER_WARNING_EMITTED=true
+  fi
+}

-if [[ "$INSTALL_BOOK_AND_ADVANCED_HTML_OPS" == "true" && "$FAT_DOCKER" != "true" ]]; then
-  echo "issue with calibre in current version, feature currently disabled on Stirling-PDF"
-  #apk add --no-cache calibre@testing
+run_as_runtime_user() {
+  if [ "$CURRENT_USER" = "$RUNTIME_USER" ]; then
+    "$@"
+  elif [ "$CURRENT_UID" -eq 0 ] && [ -n "$SU_EXEC_BIN" ]; then
+    "$SU_EXEC_BIN" "$RUNTIME_USER" "$@"
+  else
+    warn_switch_user_once
+    "$@"
+  fi
+}
+
+# ---------- VERSION_TAG ----------
+# Load VERSION_TAG from file if not provided via environment.
+if [ -z "${VERSION_TAG:-}" ] && [ -f /etc/stirling_version ]; then
+  VERSION_TAG="$(tr -d '\r\n' < /etc/stirling_version)"
+  export VERSION_TAG
 fi

-# Security jar is now built into the application jar during Docker build
-# No need to download it separately
+# ---------- JAVA_OPTS ----------
+# Configure Java runtime options.
+export JAVA_TOOL_OPTIONS="${JAVA_BASE_OPTS:-} ${JAVA_CUSTOM_OPTS:-}"
+export JAVA_TOOL_OPTIONS="-Djava.awt.headless=true ${JAVA_TOOL_OPTIONS}"
+log "running with JAVA_TOOL_OPTIONS=${JAVA_TOOL_OPTIONS}"
+log "Running Stirling PDF with DISABLE_ADDITIONAL_FEATURES=${DISABLE_ADDITIONAL_FEATURES:-} and VERSION_TAG=${VERSION_TAG:-<unset>}"

-if [[ -n "$LANGS" ]]; then
-  /scripts/installFonts.sh $LANGS
-fi
+# ---------- UMASK ----------
+# Set default permissions mask.
+UMASK_VAL="${UMASK:-022}"
+umask "$UMASK_VAL" 2>/dev/null || umask 022

-echo "Setting permissions and ownership for necessary directories..."
-# Ensure temp directory exists and has correct permissions
-mkdir -p /tmp/stirling-pdf || true
-# Attempt to change ownership of directories and files
-if chown -R stirlingpdfuser:stirlingpdfgroup $HOME /logs /scripts /usr/share/fonts/opentype/noto /configs /customFiles /pipeline /tmp/stirling-pdf /app.jar; then
-	chmod -R 755 /logs /scripts /usr/share/fonts/opentype/noto /configs /customFiles /pipeline /tmp/stirling-pdf /app.jar || true
-    # If chown succeeds, execute the command as stirlingpdfuser
-    exec su-exec stirlingpdfuser "$@"
+# ---------- XDG_RUNTIME_DIR ----------
+# Create the runtime directory, respecting UID/GID settings.
+RUNTIME_USER="stirlingpdfuser"
+if id -u "$RUNTIME_USER" >/dev/null 2>&1; then
+  RUID="$(id -u "$RUNTIME_USER")"
+  RGRP="$(id -gn "$RUNTIME_USER")"
 else
-    # If chown fails, execute the command without changing the user context
-    echo "[WARN] Chown failed, running as host user"
-    exec "$@"
+  RUID="$(id -u)"
+  RGRP="$(id -gn)"
+  RUNTIME_USER="$(id -un)"
+fi
+CURRENT_USER="$(id -un)"
+CURRENT_UID="$(id -u)"
+
+export XDG_RUNTIME_DIR="/tmp/xdg-${RUID}"
+mkdir -p "${XDG_RUNTIME_DIR}" || true
+if [ "$(id -u)" -eq 0 ]; then
+  chown "${RUNTIME_USER}:${RGRP}" "${XDG_RUNTIME_DIR}" 2>/dev/null || true
+fi
+chmod 700 "${XDG_RUNTIME_DIR}" 2>/dev/null || true
+log "XDG_RUNTIME_DIR=${XDG_RUNTIME_DIR}"
+
+# ---------- Optional ----------
+# Disable advanced HTML operations if required.
+if [[ "${INSTALL_BOOK_AND_ADVANCED_HTML_OPS:-false}" == "true" && "${FAT_DOCKER:-true}" != "true" ]]; then
+  log "issue with calibre in current version, feature currently disabled on Stirling-PDF"
+fi
+
+# Download security JAR in non-fat builds.
+if [[ "${FAT_DOCKER:-true}" != "true" && -x /scripts/download-security-jar.sh ]]; then
+  /scripts/download-security-jar.sh || true
+fi
+
+# ---------- UID/GID remap ----------
+# Remap user/group IDs to match container runtime settings.
+if [ "$(id -u)" -eq 0 ]; then
+  if id -u stirlingpdfuser >/dev/null 2>&1; then
+    if [ -n "${PUID:-}" ] && [ "$PUID" != "$(id -u stirlingpdfuser)" ]; then
+      usermod -o -u "$PUID" stirlingpdfuser || true
+      chown stirlingpdfuser:stirlingpdfgroup "${XDG_RUNTIME_DIR}" 2>/dev/null || true
+    fi
+  fi
+  if getent group stirlingpdfgroup >/dev/null 2>&1; then
+    if [ -n "${PGID:-}" ] && [ "$PGID" != "$(getent group stirlingpdfgroup | cut -d: -f3)" ]; then
+      groupmod -o -g "$PGID" stirlingpdfgroup || true
+    fi
+  fi
+fi
+
+# ---------- Permissions ----------
+# Ensure required directories exist and set correct permissions.
+log "Setting permissions..."
+mkdir -p /tmp/stirling-pdf /logs /configs /customFiles /pipeline || true
+CHOWN_PATHS=("$HOME" "/logs" "/scripts" "/configs" "/customFiles" "/pipeline" "/tmp/stirling-pdf" "/app.jar")
+[ -d /usr/share/fonts/truetype ] && CHOWN_PATHS+=("/usr/share/fonts/truetype")
+CHOWN_OK=true
+for p in "${CHOWN_PATHS[@]}"; do
+  if [ -e "$p" ]; then
+    chown -R "stirlingpdfuser:stirlingpdfgroup" "$p" 2>/dev/null || CHOWN_OK=false
+    chmod -R 755 "$p" 2>/dev/null || true
+  fi
+done
+
+# ---------- Xvfb ----------
+# Start a virtual framebuffer for GUI-based LibreOffice interactions.
+if command_exists Xvfb; then
+  log "Starting Xvfb on :99"
+  Xvfb :99 -screen 0 1024x768x24 -ac +extension GLX +render -noreset > /dev/null 2>&1 &
+  export DISPLAY=:99
+  sleep 1
+else
+  log "Xvfb not installed; skipping virtual display setup"
+fi
+
+# ---------- unoserver ----------
+# Start LibreOffice UNO server for document conversions.
+UNOSERVER_BIN="$(command -v unoserver || true)"
+UNOCONVERT_BIN="$(command -v unoconvert || true)"
+UNOSERVER_PID=""
+
+if [ -n "$UNOSERVER_BIN" ] && [ -n "$UNOCONVERT_BIN" ]; then
+  LIBREOFFICE_PROFILE="${HOME:-/home/${RUNTIME_USER}}/.libreoffice_uno_${RUID}"
+  run_as_runtime_user mkdir -p "$LIBREOFFICE_PROFILE"
+
+  log "Starting unoserver on 127.0.0.1:2003"
+  run_as_runtime_user "$UNOSERVER_BIN" \
+    --interface 127.0.0.1 \
+    --port 2003 \
+    --uno-port 2004 \
+    &
+  UNOSERVER_PID=$!
+  log "unoserver PID: $UNOSERVER_PID (Profile: $LIBREOFFICE_PROFILE)"
+
+  # Wait until UNO server is ready.
+  log "Waiting for unoserver..."
+  for _ in {1..20}; do
+    if run_as_runtime_user "$UNOCONVERT_BIN" --version >/dev/null 2>&1; then
+      log "unoserver is ready!"
+      break
+    fi
+    sleep 1
+  done
+
+  if ! run_as_runtime_user "$UNOCONVERT_BIN" --version >/dev/null 2>&1; then
+    log "ERROR: unoserver failed!"
+    if [ -n "$UNOSERVER_PID" ]; then
+      kill "$UNOSERVER_PID" 2>/dev/null || true
+      wait "$UNOSERVER_PID" 2>/dev/null || true
+    fi
+    exit 1
+  fi
+else
+  log "unoserver/unoconvert not installed; skipping UNO setup"
+fi
+
+# ---------- Java ----------
+# Start Stirling PDF Java application.
+log "Starting Stirling PDF"
+JAVA_CMD=(
+  java
+  -Dfile.encoding=UTF-8
+  -Djava.io.tmpdir=/tmp/stirling-pdf
+  -jar /app.jar
+)
+
+if [ "$CURRENT_USER" = "$RUNTIME_USER" ]; then
+  exec "${JAVA_CMD[@]}"
+elif [ "$CURRENT_UID" -eq 0 ] && [ -n "$SU_EXEC_BIN" ]; then
+  exec "$SU_EXEC_BIN" "$RUNTIME_USER" "${JAVA_CMD[@]}"
+else
+  warn_switch_user_once
+  exec "${JAVA_CMD[@]}"
 fi
--- a/scripts/init.sh
+++ b/scripts/init.sh
@@ -1,36 +1,110 @@
 #!/bin/bash
+# This script initializes environment variables and paths,
+# prepares Tesseract data directories, and then runs the main init script.

-# Copy the original tesseract-ocr files to the volume directory without overwriting existing files
-echo "Copying original files without overwriting existing files"
-mkdir -p /usr/share/tessdata
-cp -rn /usr/share/tessdata-original/* /usr/share/tessdata
+set -euo pipefail

-if [ -d /usr/share/tesseract-ocr/4.00/tessdata ]; then
-        cp -r /usr/share/tesseract-ocr/4.00/tessdata/* /usr/share/tessdata || true;
+append_env_path() {
+  local target="$1" current="$2" separator=":"
+  if [ -d "$target" ] && [[ ":${current}:" != *":${target}:"* ]]; then
+    if [ -n "$current" ]; then
+      printf '%s' "${target}${separator}${current}"
+    else
+      printf '%s' "${target}"
+    fi
+  else
+    printf '%s' "$current"
+  fi
+}
+
+python_site_dir() {
+  local venv_dir="$1"
+  local python_bin="$venv_dir/bin/python"
+  if [ -x "$python_bin" ]; then
+    local py_tag
+    if py_tag="$("$python_bin" -c 'import sys; print(f"python{sys.version_info.major}.{sys.version_info.minor}")' 2>/dev/null)" \
+       && [ -n "$py_tag" ] \
+       && [ -d "$venv_dir/lib/$py_tag/site-packages" ]; then
+      printf '%s' "$venv_dir/lib/$py_tag/site-packages"
+    fi
+  fi
+}
+
+# === LD_LIBRARY_PATH ===
+# Adjust the library path depending on CPU architecture.
+ARCH=$(uname -m)
+case "$ARCH" in
+  x86_64)
+    [ -d /usr/lib/x86_64-linux-gnu ] && export LD_LIBRARY_PATH="/usr/lib/x86_64-linux-gnu${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"
+    ;;
+  aarch64)
+    [ -d /usr/lib/aarch64-linux-gnu ] && export LD_LIBRARY_PATH="/usr/lib/aarch64-linux-gnu${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"
+    ;;
+esac
+
+# Add LibreOffice program directory to library path if available.
+if [ -d /usr/lib/libreoffice/program ]; then
+  export LD_LIBRARY_PATH="/usr/lib/libreoffice/program${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"
+fi
+
+# === Python PATH ===
+# Add virtual environments to PATH and PYTHONPATH.
+for dir in /opt/venv/bin /opt/unoserver-venv/bin; do
+  PATH="$(append_env_path "$dir" "$PATH")"
+done
+export PATH
+
+PYTHON_PATH_ENTRIES=()
+for venv in /opt/venv /opt/unoserver-venv; do
+  if [ -d "$venv" ]; then
+    site_dir="$(python_site_dir "$venv")"
+    [ -n "${site_dir:-}" ] && PYTHON_PATH_ENTRIES+=("$site_dir")
+  fi
+done
+if [ ${#PYTHON_PATH_ENTRIES[@]} -gt 0 ]; then
+  PYTHONPATH="$(IFS=:; printf '%s' "${PYTHON_PATH_ENTRIES[*]}")${PYTHONPATH:+:$PYTHONPATH}"
+  export PYTHONPATH
+fi
+
+# # === tessdata ===
+# # Prepare Tesseract OCR data directory.
+REAL_TESSDATA="/usr/share/tesseract-ocr/5/tessdata"
+SEC_TESSDATA="/usr/share/tessdata"
+
+log_warn() {
+  echo "[init][warn] $*" >&2
+}
+
+if [ -d "$REAL_TESSDATA" ] && [ -w "$REAL_TESSDATA" ]; then
+  log_warn "Skipping tessdata adjustments; directory writable: $REAL_TESSDATA"
+else
+  log_warn "Skipping tessdata adjustments; directory missing or not writable: $REAL_TESSDATA"
 fi

 if [ -d /usr/share/tesseract-ocr/5/tessdata ]; then
-        cp -r /usr/share/tesseract-ocr/5/tessdata/* /usr/share/tessdata || true;
+  REAL_TESSDATA="/usr/share/tesseract-ocr/5/tessdata"
+  log_warn "Using /usr/share/tesseract-ocr/5/tessdata as TESSDATA_PREFIX"
+elif [ -d /usr/share/tessdata ]; then
+  REAL_TESSDATA="/usr/share/tessdata"
+  log_warn "Using /usr/share/tessdata as TESSDATA_PREFIX"
+elif [ -d /tessdata ]; then
+  REAL_TESSDATA="/tessdata"
+  log_warn "Using /tessdata as TESSDATA_PREFIX"
+else
+  REAL_TESSDATA=""
+  log_warn "No tessdata directory found"
 fi

-# Check if TESSERACT_LANGS environment variable is set and is not empty
-if [[ -n "$TESSERACT_LANGS" ]]; then
-  # Convert comma-separated values to a space-separated list
-  SPACE_SEPARATED_LANGS=$(echo $TESSERACT_LANGS | tr ',' ' ')
-  pattern='^[a-zA-Z]{2,4}(_[a-zA-Z]{2,4})?$'
-  # Install each language pack
-  for LANG in $SPACE_SEPARATED_LANGS; do
-     if [[ $LANG =~ $pattern ]]; then
-      apk add --no-cache "tesseract-ocr-data-$LANG"
-     else
-      echo "Skipping invalid language code"
-     fi
-  done
+if [ -n "$REAL_TESSDATA" ]; then
+  export TESSDATA_PREFIX="$REAL_TESSDATA"
 fi

-# Ensure temp directory exists with correct permissions before running main init
-mkdir -p /tmp/stirling-pdf || true
+# === Temp dir ===
+# Ensure the temporary directory exists and has proper permissions.
+mkdir -p /tmp/stirling-pdf
 chown -R stirlingpdfuser:stirlingpdfgroup /tmp/stirling-pdf || true
 chmod -R 755 /tmp/stirling-pdf || true

-/scripts/init-without-ocr.sh "$@"
+# === Start application ===
+# Run the main init script that handles the full startup logic.
+exec /scripts/init-without-ocr.sh