# Description of Changes

<!--
Please provide a summary of the changes, including:

- What was changed
- Why the change was made
- Any challenges encountered

Closes #(issue_number)
-->

---

## Checklist

### General

- [ ] I have read the [Contribution
Guidelines](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/CONTRIBUTING.md)
- [ ] I have read the [Stirling-PDF Developer
Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md)
(if applicable)
- [ ] I have read the [How to add new languages to
Stirling-PDF](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md)
(if applicable)
- [ ] I have performed a self-review of my own code
- [ ] My changes generate no new warnings

### Documentation

- [ ] I have updated relevant docs on [Stirling-PDF's doc
repo](https://github.com/Stirling-Tools/Stirling-Tools.github.io/blob/main/docs/)
(if functionality has heavily changed)
- [ ] I have read the section [Add New Translation
Tags](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md#add-new-translation-tags)
(for new translation tags only)

### UI Changes (if applicable)

- [ ] Screenshots or videos demonstrating the UI changes are attached
(e.g., as comments or direct attachments in the PR)

### Testing (if applicable)

- [ ] I have tested my changes locally. Refer to the [Testing
Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md#6-testing)
for more details.

---------

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Balázs Szücs <bszucs1209@gmail.com>
Signed-off-by: stirlingbot[bot] <stirlingbot[bot]@users.noreply.github.com>
Co-authored-by: ConnorYoh <40631091+ConnorYoh@users.noreply.github.com>
Co-authored-by: Connor Yoh <connor@stirlingpdf.com>
Co-authored-by: OUNZAR Aymane <aymane.ounzar@imt-atlantique.net>
Co-authored-by: YAOU Reda <yaoureda24@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: stirlingbot[bot] <195170888+stirlingbot[bot]@users.noreply.github.com>
Co-authored-by: Balázs Szücs <127139797+balazs-szucs@users.noreply.github.com>
Co-authored-by: Ludy <Ludy87@users.noreply.github.com>
Co-authored-by: tkymmm <136296842+tkymmm@users.noreply.github.com>
Co-authored-by: Peter Dave Hello <hsu@peterdavehello.org>
Co-authored-by: albanobattistella <34811668+albanobattistella@users.noreply.github.com>
Co-authored-by: PingLin8888 <88387490+PingLin8888@users.noreply.github.com>
Co-authored-by: FdaSilvaYY <FdaSilvaYY@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: OteJlo <106060728+OteJlo@users.noreply.github.com>
Co-authored-by: Angel <41905618+TheShadowAngel@users.noreply.github.com>
Co-authored-by: Ricardo Catarino <ricardomicc@gmail.com>
Co-authored-by: Luis Antonio Argüelles González <luis.arguelles@encora.com>
Co-authored-by: Dawid Urbański <31166488+urbaned121@users.noreply.github.com>
Co-authored-by: Stephan Paternotte <Stephan-P@users.noreply.github.com>
Co-authored-by: Leonardo Santos Paulucio <leonardo.paulucio@hotmail.com>
Co-authored-by: hamza khalem <72972114+hamzakhalem@users.noreply.github.com>
Co-authored-by: IT Creativity + Art Team <admin@it-playground.net>
Co-authored-by: Reece Browne <74901996+reecebrowne@users.noreply.github.com>
Co-authored-by: James Brunton <jbrunton96@gmail.com>
Co-authored-by: Victor Villarreal <133383186+vvillarreal-cfee@users.noreply.github.com>
This commit is contained in:
Anthony Stirling
2025-12-21 10:40:32 +00:00
committed by GitHub
parent a5dcdd5bd9
commit 68ed54e398
343 changed files with 25212 additions and 6592 deletions

View File

@@ -1,21 +1,56 @@
"""A script to update language progress status in README.md based on
TOML translation file comparison.
"""
A script to update language progress status in README.md based on
properties file comparison.
This script compares the default translation TOML file with others in the locales directory to
determine language progress.
It then updates README.md based on provided progress list.
This script compares the default (reference) properties file, usually
`messages_en_GB.properties`, with other translation files in the
`app/core/src/main/resources/` directory.
It determines how many lines are fully translated and automatically updates
progress badges in the `README.md`.
Additionally, it maintains a TOML configuration file
(`scripts/ignore_translation.toml`) that defines which keys are ignored
during comparison (e.g., static values like `language.direction`).
Author: Ludy87
Updated for TOML format
Example:
To use this script, simply run it from command line:
$ python counter_translation_v3.py
""" # noqa: D205
Usage:
Run this script directly from the project root.
# --- Compare all translation files and update README.md ---
$ python scripts/counter_translation.py
This will:
• Compare all files matching messages_*.properties
• Update progress badges in README.md
• Update/format ignore_translation.toml automatically
# --- Check a single language file ---
$ python scripts/counter_translation.py --lang messages_fr_FR.properties
This will:
• Compare the French translation file against the English reference
• Print the translation percentage in the console
# --- Print ONLY the percentage (for CI pipelines or automation) ---
$ python scripts/counter_translation.py --lang messages_fr_FR.properties --show-percentage
Example output:
87
Arguments:
-l, --lang <file> Specific properties file to check
(relative or absolute path).
--show-percentage Print only the percentage (no formatting, ideal for CI/CD).
--show-missing-keys Show the list of missing keys when checking a single language file.
"""
import argparse
import glob
import os
import re
import sys
from typing import Iterable
import tomlkit
import tomlkit.toml_file
@@ -23,14 +58,15 @@ import tomlkit.toml_file
def convert_to_multiline(data: tomlkit.TOMLDocument) -> tomlkit.TOMLDocument:
"""Converts 'ignore' and 'missing' arrays to multiline arrays and sorts the first-level keys of the TOML document.
Enhances readability and consistency in the TOML file by ensuring arrays contain unique and sorted entries.
Parameters:
Args:
data (tomlkit.TOMLDocument): The original TOML document containing the data.
Returns:
tomlkit.TOMLDocument: A new TOML document with sorted keys and properly formatted arrays.
""" # noqa: D205
"""
sorted_data = tomlkit.document()
for key in sorted(data.keys()):
value = data[key]
@@ -53,16 +89,19 @@ def convert_to_multiline(data: tomlkit.TOMLDocument) -> tomlkit.TOMLDocument:
def write_readme(progress_list: list[tuple[str, int]]) -> None:
"""Updates the progress status in the README.md file based
on the provided progress list.
"""Updates the progress status in the README.md file based on the provided progress list.
Parameters:
This function reads the existing README.md content, identifies lines containing
language-specific progress badges, and replaces the percentage values and URLs
with the new progress data.
Args:
progress_list (list[tuple[str, int]]): A list of tuples containing
language and progress percentage.
language codes (e.g., 'fr_FR') and progress percentages (integers from 0 to 100).
Returns:
None
""" # noqa: D205
"""
with open("README.md", encoding="utf-8") as file:
content = file.readlines()
@@ -80,70 +119,111 @@ def write_readme(progress_list: list[tuple[str, int]]) -> None:
file.writelines(content)
def parse_toml_file(file_path):
"""
Parses a TOML translation file and returns a flat dictionary of all keys.
:param file_path: Path to the TOML file.
:return: Dictionary with flattened keys and values.
"""
with open(file_path, "r", encoding="utf-8") as file:
data = tomlkit.parse(file.read())
def load_reference_keys(default_file_path: str) -> set[str]:
"""Reads all keys from the reference properties file (excluding comments and empty lines).
def flatten_dict(d, parent_key="", sep="."):
items = {}
for k, v in d.items():
new_key = f"{parent_key}{sep}{k}" if parent_key else k
if isinstance(v, dict):
items.update(flatten_dict(v, new_key, sep=sep))
else:
items[new_key] = v
return items
This function skips the first 5 lines (assumed to be headers or metadata) and then
extracts keys from lines containing '=' separators, ignoring comments (#) and empty lines.
It also handles potential BOM (Byte Order Mark) characters.
return flatten_dict(data)
Args:
default_file_path (str): The path to the default (reference) properties file.
Returns:
set[str]: A set of unique keys found in the reference file.
"""
keys: set[str] = set()
with open(default_file_path, encoding="utf-8") as f:
# Skip the first 5 lines (headers)
for _ in range(5):
try:
next(f)
except StopIteration:
break
for line in f:
s = line.strip()
if not s or s.startswith("#") or "=" not in s:
continue
k, _ = s.split("=", 1)
keys.add(k.strip().replace("\ufeff", "")) # BOM protection
return keys
def _lang_from_path(file_path: str) -> str:
"""Extracts the language code from a properties file path.
Assumes the filename format is 'messages_<language>.properties', where <language>
is the code like 'fr_FR'.
Args:
file_path (str): The full path to the properties file.
Returns:
str: The extracted language code.
"""
return (
os.path.basename(file_path).split("messages_", 1)[1].split(".properties", 1)[0]
)
def compare_files(
default_file_path, file_paths, ignore_translation_file
default_file_path: str,
file_paths: Iterable[str],
ignore_translation_file: str,
show_missing_keys: bool = False,
show_percentage: bool = False,
) -> list[tuple[str, int]]:
"""Compares the default TOML translation file with other
translation files in the locales directory.
"""Compares the default properties file with other properties files in the directory.
Parameters:
default_file_path (str): The path to the default translation TOML file.
file_paths (list): List of paths to translation TOML files.
ignore_translation_file (str): Path to the TOML file with ignore rules.
This function calculates translation progress for each language file by comparing
keys and values line-by-line, skipping headers. It accounts for ignored keys defined
in a TOML configuration file and updates that file with cleaned ignore lists.
English variants (en_GB, en_US) are hardcoded to 100% progress.
Args:
default_file_path (str): The path to the default properties file (reference).
file_paths (Iterable[str]): Iterable of paths to properties files to compare.
ignore_translation_file (str): Path to the TOML file with ignore/missing configurations per language.
show_missing_keys (bool, optional): If True, prints the list of missing keys for each file. Defaults to False.
show_percentage (bool, optional): If True, suppresses detailed output and focuses on percentage calculation. Defaults to False.
Returns:
list[tuple[str, int]]: A list of tuples containing
language and progress percentage.
""" # noqa: D205
default_keys = parse_toml_file(default_file_path)
num_keys = len(default_keys)
list[tuple[str, int]]: A sorted list of tuples containing language codes and progress percentages
(descending order by percentage). Duplicates are removed.
"""
# Count total translatable lines in reference (excluding empty and comments)
num_lines = sum(
1
for line in open(default_file_path, encoding="utf-8")
if line.strip() and not line.strip().startswith("#")
)
result_list = []
ref_keys: set[str] = load_reference_keys(default_file_path)
result_list: list[tuple[str, int]] = []
sort_ignore_translation: tomlkit.TOMLDocument
# read toml
with open(ignore_translation_file, encoding="utf-8") as f:
sort_ignore_translation = tomlkit.parse(f.read())
# Read or initialize TOML config
if os.path.exists(ignore_translation_file):
with open(ignore_translation_file, encoding="utf-8") as f:
sort_ignore_translation = tomlkit.parse(f.read())
else:
sort_ignore_translation = tomlkit.document()
for file_path in file_paths:
# Extract language code from directory name
locale_dir = os.path.basename(os.path.dirname(file_path))
language = _lang_from_path(file_path)
# Convert locale format from hyphen to underscore for TOML compatibility
# e.g., en-GB -> en_GB, sr-LATN-RS -> sr_LATN_RS
language = locale_dir.replace("-", "_")
fails = 0
if language in ["en_GB", "en_US"]:
result_list.append(("en_GB", 100))
result_list.append(("en_US", 100))
# Hardcode English variants to 100%
if "en_GB" in language or "en_US" in language:
result_list.append((language, 100))
continue
# Initialize language table in TOML if missing
if language not in sort_ignore_translation:
sort_ignore_translation[language] = tomlkit.table()
# Ensure default ignore list if empty
if (
"ignore" not in sort_ignore_translation[language]
or len(sort_ignore_translation[language].get("ignore", [])) < 1
@@ -152,53 +232,182 @@ def compare_files(
["language.direction"]
)
current_keys = parse_toml_file(file_path)
# Clean up ignore list to only include keys present in reference
sort_ignore_translation[language]["ignore"] = [
key
for key in sort_ignore_translation[language]["ignore"]
if key in ref_keys or key == "language.direction"
]
# Compare keys
for default_key, default_value in default_keys.items():
if default_key not in current_keys:
# Key is missing entirely
if default_key not in sort_ignore_translation[language]["ignore"]:
print(f"{language}: Key '{default_key}' is missing.")
fails += 1
elif (
default_value == current_keys[default_key]
and default_key not in sort_ignore_translation[language]["ignore"]
fails = 0
missing_str_keys: list[str] = []
with (
open(default_file_path, encoding="utf-8") as default_file,
open(file_path, encoding="utf-8") as file,
):
# Skip headers (first 5 lines) in both files
for _ in range(5):
next(default_file)
try:
next(file)
except StopIteration:
fails = num_lines
break
for line_num, (line_default, line_file) in enumerate(
zip(default_file, file), start=6
):
# Key exists but value is untranslated (same as reference)
print(f"{language}: Key '{default_key}' is missing the translation.")
fails += 1
elif default_value != current_keys[default_key]:
# Key is translated, remove from ignore list if present
if default_key in sort_ignore_translation[language]["ignore"]:
sort_ignore_translation[language]["ignore"].remove(default_key)
try:
# Ignoring empty lines and lines starting with #
if line_default.strip() == "" or line_default.startswith("#"):
continue
default_key, default_value = line_default.split("=", 1)
file_key, file_value = line_file.split("=", 1)
default_key = default_key.strip()
default_value = default_value.strip()
file_key = file_key.strip()
file_value = file_value.strip()
if (
default_value == file_value
and default_key
not in sort_ignore_translation[language]["ignore"]
):
# Missing translation (same as default and not ignored)
fails += 1
missing_str_keys.append(default_key)
if default_value != file_value:
if default_key in sort_ignore_translation[language]["ignore"]:
# Remove from ignore if actually translated
sort_ignore_translation[language]["ignore"].remove(
default_key
)
except ValueError as e:
print(f"Error processing line {line_num} in {file_path}: {e}")
print(f"{line_default}|{line_file}")
sys.exit(1)
except IndexError:
# Handle mismatched line counts
fails += 1
continue
if show_missing_keys:
if len(missing_str_keys) > 0:
print(f" Missing keys: {missing_str_keys}")
else:
print(" No missing keys!")
if not show_percentage:
print(f"{language}: {fails} out of {num_lines} lines are not translated.")
print(f"{language}: {fails} out of {num_keys} keys are not translated.")
result_list.append(
(
language,
int((num_keys - fails) * 100 / num_keys),
int((num_lines - fails) * 100 / num_lines),
)
)
# Write cleaned and formatted TOML back
ignore_translation = convert_to_multiline(sort_ignore_translation)
with open(ignore_translation_file, "w", encoding="utf-8", newline="\n") as file:
file.write(tomlkit.dumps(ignore_translation))
# Remove duplicates and sort by percentage descending
unique_data = list(set(result_list))
unique_data.sort(key=lambda x: x[1], reverse=True)
return unique_data
if __name__ == "__main__":
directory = os.path.join(os.getcwd(), "frontend", "public", "locales")
translation_file_paths = glob.glob(os.path.join(directory, "*", "translation.toml"))
reference_file = os.path.join(directory, "en-GB", "translation.toml")
def main() -> None:
"""Main entry point for the script.
scripts_directory = os.path.join(os.getcwd(), "scripts")
Parses command-line arguments and either processes a single language file
(with optional percentage output) or all files and updates the README.md.
Command-line options:
--lang, -l <file>: Specific properties file to check (e.g., 'messages_fr_FR.properties').
--show-percentage: Print only the translation percentage for --lang and exit.
--show-missing-keys: Show the list of missing keys when checking a single language file.
"""
parser = argparse.ArgumentParser(
description="Compare i18n property files and optionally update README badges."
)
parser.add_argument(
"--lang",
"-l",
help=(
"Specific properties file to check, e.g. 'messages_fr_FR.properties'. "
"If a relative filename is given, it is resolved against the resources directory."
),
)
parser.add_argument(
"--show-percentage",
"-sp",
action="store_true",
help="Print ONLY the translation percentage for --lang and exit.",
)
parser.add_argument(
"--show-missing-keys",
"-smk",
action="store_true",
help="Show the list of missing keys when checking a single language file.",
)
args = parser.parse_args()
# Project layout assumptions
cwd = os.getcwd()
resources_dir = os.path.join(cwd, "app", "core", "src", "main", "resources")
reference_file = os.path.join(resources_dir, "messages_en_GB.properties")
scripts_directory = os.path.join(cwd, "scripts")
translation_state_file = os.path.join(scripts_directory, "ignore_translation.toml")
write_readme(
compare_files(reference_file, translation_file_paths, translation_state_file)
if args.lang:
# Resolve provided path
lang_input = args.lang
if os.path.isabs(lang_input) or os.path.exists(lang_input):
lang_file = lang_input
else:
lang_file = os.path.join(resources_dir, lang_input)
if not os.path.exists(lang_file):
print(f"ERROR: Could not find language file: {lang_file}")
sys.exit(2)
results = compare_files(
reference_file,
[lang_file],
translation_state_file,
args.show_missing_keys,
args.show_percentage,
)
# Find the exact tuple for the requested language
wanted_key = _lang_from_path(lang_file)
for lang, pct in results:
if lang == wanted_key:
if args.show_percentage:
# Print ONLY the number
print(pct)
return
else:
print(f"{lang}: {pct}% translated")
return
# Fallback (should not happen)
print("ERROR: Language not found in results.")
sys.exit(3)
# Default behavior (no --lang): process all and update README
messages_file_paths = glob.glob(
os.path.join(resources_dir, "messages_*.properties")
)
progress = compare_files(
reference_file, messages_file_paths, translation_state_file
)
write_readme(progress)
if __name__ == "__main__":
main()

View File

@@ -29,6 +29,11 @@ if /I not "%confirm%"=="Y" (
echo Starting generation...
echo Generating .github\scripts\requirements_dev.txt
pip-compile --allow-unsafe --generate-hashes --upgrade --strip-extras ^
--output-file=".github\scripts\requirements_dev.txt" ^
".github\scripts\requirements_dev.in"
echo Generating .github\scripts\requirements_pre_commit.txt
pip-compile --generate-hashes --upgrade --strip-extras ^
--output-file=".github\scripts\requirements_pre_commit.txt" ^

View File

@@ -3,6 +3,7 @@ ignore = [
'lang.div',
'lang.dzo',
'lang.que',
'language.direction',
]
[az_AZ]
@@ -36,10 +37,7 @@ ignore = [
[bg_BG]
ignore = [
'lang.div',
'lang.dzo',
'lang.iku',
'lang.que',
'language.direction',
]
@@ -50,6 +48,7 @@ ignore = [
[ca_CA]
ignore = [
'adminUserSettings.admin',
'lang.amh',
'lang.ceb',
'lang.chr',
@@ -190,7 +189,8 @@ ignore = [
ignore = [
'AddStampRequest.alphabet',
'AddStampRequest.position',
'PDFToBook.selectText.1',
'PDFToText.tags',
'addPageNumbers.selectText.3',
'adminUserSettings.team',
'alphabet',
'audit.dashboard.modal.id',
@@ -200,6 +200,7 @@ ignore = [
'audit.dashboard.table.details',
'audit.dashboard.table.id',
'certSign.name',
'cookieBanner.popUp.acceptAllBtn',
'endpointStatistics.top10',
'endpointStatistics.top20',
'fileChooser.dragAndDrop',
@@ -236,11 +237,11 @@ ignore = [
'pipelineOptions.pipelineHeader',
'pro',
'redact.zoom',
'scannerEffect.quality.medium',
'sponsor',
'team.status',
'text',
'update.version',
'validateSignature.cert.bits',
'validateSignature.cert.version',
'validateSignature.status',
'watermark.type.1',
@@ -262,11 +263,17 @@ ignore = [
[es_ES]
ignore = [
'audit.dashboard.export.csv',
'audit.dashboard.export.json',
'audit.dashboard.modal.id',
'audit.dashboard.table.id',
'error',
'fileChooser.click',
'lang.ceb',
'lang.chr',
'lang.div',
'lang.dzo',
'lang.epo',
'lang.fil',
'lang.guj',
'lang.iku',
@@ -274,6 +281,7 @@ ignore = [
'lang.lao',
'lang.mal',
'lang.ori',
'lang.que',
'lang.snd',
'lang.tam',
'lang.tel',
@@ -281,7 +289,12 @@ ignore = [
'lang.yor',
'language.direction',
'no',
'pro',
'redact.zoom',
'scannerEffect.colorspace.color',
'showJS.tags',
'update.priority.normal',
'validateSignature.cert.bits',
]
[eu_ES]
@@ -307,50 +320,88 @@ ignore = [
]
[fa_IR]
ignore = []
ignore = [
'language.direction',
]
[fr_FR]
ignore = [
'AddStampRequest.alphabet',
'AddStampRequest.position',
'AddStampRequest.rotation',
'PDFToBook.selectText.1',
'addPageNumbers.selectText.3',
'adminUserSettings.actions',
'alphabet',
'audit.dashboard.modal.id',
'audit.dashboard.modal.type',
'audit.dashboard.pagination.pageInfo1',
'audit.dashboard.table.id',
'audit.dashboard.table.type',
'compare.document.1',
'compare.document.2',
'cookieBanner.preferencesModal.analytics.posthog.label',
'cookieBanner.preferencesModal.analytics.scarf.label',
'cookieBanner.preferencesModal.serviceCounterLabel',
'endpointStatistics.top',
'endpointStatistics.top10',
'endpointStatistics.top20',
'home.pipeline.title',
'lang.afr',
'lang.ben',
'lang.bre',
'lang.cat',
'lang.ceb',
'lang.chr',
'lang.div',
'lang.dzo',
'lang.eus',
'lang.guj',
'lang.hin',
'lang.iku',
'lang.kan',
'lang.kaz',
'lang.khm',
'lang.lao',
'lang.ltz',
'lang.lat',
'lang.mal',
'lang.mar',
'lang.mri',
'lang.oci',
'lang.ori',
'lang.osd',
'lang.pan',
'lang.pus',
'lang.que',
'lang.san',
'lang.snd',
'lang.swa',
'lang.tel',
'lang.tam',
'lang.tat',
'lang.tgl',
'lang.tir',
'lang.yid',
'lang.yor',
'language.direction',
'licenses.license',
'licenses.module',
'licenses.nav',
'licenses.version',
'multiTool.page',
'page',
'pages',
'pdfOrganiser.mode',
'pipeline.title',
'pro',
'redact.pageRedactionNumbers.title',
'redact.zoom',
'showJS.tags',
'split.desc.3',
'split.desc.6',
'split.desc.7',
'split.desc.8',
'update.version',
'validateSignature.cert.bits',
'validateSignature.cert.version',
'validateSignature.date',
'validateSignature.signature',
'watermark.type.2',
]
@@ -384,7 +435,6 @@ ignore = [
[hr_HR]
ignore = [
'PDFToBook.selectText.1',
'lang.ceb',
'lang.chr',
'lang.dzo',
@@ -400,13 +450,14 @@ ignore = [
[hu_HU]
ignore = [
'audit.dashboard.export.json',
'audit.dashboard.modal.id',
'audit.dashboard.table.id',
'endpointStatistics.top10',
'endpointStatistics.top20',
'home.pipeline.title',
'language.direction',
'pipeline.title',
'pipelineOptions.pipelineHeader',
'pro',
'showJS.tags',
]
@@ -515,11 +566,6 @@ ignore = [
'language.direction',
]
[ml_ML]
ignore = [
'language.direction',
]
[nl_NL]
ignore = [
'compare.document.1',
@@ -556,13 +602,11 @@ ignore = [
'lang.urd',
'lang.yor',
'language.direction',
'navbar.allTools',
'sponsor',
]
[no_NB]
ignore = [
'PDFToBook.selectText.1',
'adminUserSettings.admin',
'info',
'lang.afr',
@@ -609,12 +653,12 @@ ignore = [
'lang.urd',
'lang.yor',
'language.direction',
'oops',
'sponsor',
]
[pl_PL]
ignore = [
'PDFToBook.selectText.1',
'lang.afr',
'lang.bre',
'lang.ceb',
@@ -684,6 +728,12 @@ ignore = [
[pt_PT]
ignore = [
'audit.dashboard.table.id',
'endpointStatistics.endpoint',
'endpointStatistics.login',
'endpointStatistics.top',
'endpointStatistics.top10',
'endpointStatistics.top20',
'lang.bre',
'lang.ceb',
'lang.chr',
@@ -710,6 +760,8 @@ ignore = [
'lang.uzb',
'lang.yid',
'language.direction',
'pro',
'update.priority.normal',
]
[ro_RO]
@@ -763,6 +815,7 @@ ignore = [
[sk_SK]
ignore = [
'adminUserSettings.admin',
'home.multiTool.title',
'info',
'lang.ceb',
'lang.chr',
@@ -784,6 +837,7 @@ ignore = [
'lang.urd',
'lang.uzb',
'language.direction',
'navbar.sections.security',
'text',
'watermark.type.1',
]
@@ -833,6 +887,7 @@ ignore = [
'endpointStatistics.top10',
'endpointStatistics.top20',
'font',
'info',
'lang.div',
'lang.epo',
'lang.hin',
@@ -890,6 +945,7 @@ ignore = [
'lang.tir',
'lang.uzb_cyrl',
'language.direction',
'pipelineOptions.pipelineHeader',
'showJS.tags',
]
@@ -963,15 +1019,11 @@ ignore = [
'lang.yid',
'lang.yor',
'language.direction',
'pipeline.title',
'pipelineOptions.pipelineHeader',
'showJS.tags',
]
[zh_BO]
ignore = [
'language.direction',
]
[zh_CN]
ignore = [
'language.direction',
@@ -980,5 +1032,6 @@ ignore = [
[zh_TW]
ignore = [
'language.direction',
'poweredBy',
'showJS.tags',
]

View File

@@ -1,41 +1,188 @@
#!/bin/bash
# This script initializes Stirling PDF without OCR features.
set -euo pipefail
export JAVA_TOOL_OPTIONS="${JAVA_BASE_OPTS} ${JAVA_CUSTOM_OPTS}"
echo "running with JAVA_TOOL_OPTIONS ${JAVA_BASE_OPTS} ${JAVA_CUSTOM_OPTS}"
log() { printf '%s\n' "$*" >&2; }
command_exists() { command -v "$1" >/dev/null 2>&1; }
# Update the user and group IDs as per environment variables
if [ ! -z "$PUID" ] && [ "$PUID" != "$(id -u stirlingpdfuser)" ]; then
usermod -o -u "$PUID" stirlingpdfuser || true
SU_EXEC_BIN=""
if command_exists su-exec; then
SU_EXEC_BIN="su-exec"
elif command_exists gosu; then
SU_EXEC_BIN="gosu"
fi
CURRENT_USER="$(id -un)"
CURRENT_UID="$(id -u)"
SWITCH_USER_WARNING_EMITTED=false
if [ ! -z "$PGID" ] && [ "$PGID" != "$(getent group stirlingpdfgroup | cut -d: -f3)" ]; then
groupmod -o -g "$PGID" stirlingpdfgroup || true
fi
umask "$UMASK" || true
warn_switch_user_once() {
if [ "$SWITCH_USER_WARNING_EMITTED" = false ]; then
log "WARNING: Unable to switch to user ${RUNTIME_USER:-stirlingpdfuser}; running command as ${CURRENT_USER}."
SWITCH_USER_WARNING_EMITTED=true
fi
}
if [[ "$INSTALL_BOOK_AND_ADVANCED_HTML_OPS" == "true" && "$FAT_DOCKER" != "true" ]]; then
echo "issue with calibre in current version, feature currently disabled on Stirling-PDF"
#apk add --no-cache calibre@testing
run_as_runtime_user() {
if [ "$CURRENT_USER" = "$RUNTIME_USER" ]; then
"$@"
elif [ "$CURRENT_UID" -eq 0 ] && [ -n "$SU_EXEC_BIN" ]; then
"$SU_EXEC_BIN" "$RUNTIME_USER" "$@"
else
warn_switch_user_once
"$@"
fi
}
# ---------- VERSION_TAG ----------
# Load VERSION_TAG from file if not provided via environment.
if [ -z "${VERSION_TAG:-}" ] && [ -f /etc/stirling_version ]; then
VERSION_TAG="$(tr -d '\r\n' < /etc/stirling_version)"
export VERSION_TAG
fi
# Security jar is now built into the application jar during Docker build
# No need to download it separately
# ---------- JAVA_OPTS ----------
# Configure Java runtime options.
export JAVA_TOOL_OPTIONS="${JAVA_BASE_OPTS:-} ${JAVA_CUSTOM_OPTS:-}"
export JAVA_TOOL_OPTIONS="-Djava.awt.headless=true ${JAVA_TOOL_OPTIONS}"
log "running with JAVA_TOOL_OPTIONS=${JAVA_TOOL_OPTIONS}"
log "Running Stirling PDF with DISABLE_ADDITIONAL_FEATURES=${DISABLE_ADDITIONAL_FEATURES:-} and VERSION_TAG=${VERSION_TAG:-<unset>}"
if [[ -n "$LANGS" ]]; then
/scripts/installFonts.sh $LANGS
fi
# ---------- UMASK ----------
# Set default permissions mask.
UMASK_VAL="${UMASK:-022}"
umask "$UMASK_VAL" 2>/dev/null || umask 022
echo "Setting permissions and ownership for necessary directories..."
# Ensure temp directory exists and has correct permissions
mkdir -p /tmp/stirling-pdf || true
# Attempt to change ownership of directories and files
if chown -R stirlingpdfuser:stirlingpdfgroup $HOME /logs /scripts /usr/share/fonts/opentype/noto /configs /customFiles /pipeline /tmp/stirling-pdf /app.jar; then
chmod -R 755 /logs /scripts /usr/share/fonts/opentype/noto /configs /customFiles /pipeline /tmp/stirling-pdf /app.jar || true
# If chown succeeds, execute the command as stirlingpdfuser
exec su-exec stirlingpdfuser "$@"
# ---------- XDG_RUNTIME_DIR ----------
# Create the runtime directory, respecting UID/GID settings.
RUNTIME_USER="stirlingpdfuser"
if id -u "$RUNTIME_USER" >/dev/null 2>&1; then
RUID="$(id -u "$RUNTIME_USER")"
RGRP="$(id -gn "$RUNTIME_USER")"
else
# If chown fails, execute the command without changing the user context
echo "[WARN] Chown failed, running as host user"
exec "$@"
RUID="$(id -u)"
RGRP="$(id -gn)"
RUNTIME_USER="$(id -un)"
fi
CURRENT_USER="$(id -un)"
CURRENT_UID="$(id -u)"
export XDG_RUNTIME_DIR="/tmp/xdg-${RUID}"
mkdir -p "${XDG_RUNTIME_DIR}" || true
if [ "$(id -u)" -eq 0 ]; then
chown "${RUNTIME_USER}:${RGRP}" "${XDG_RUNTIME_DIR}" 2>/dev/null || true
fi
chmod 700 "${XDG_RUNTIME_DIR}" 2>/dev/null || true
log "XDG_RUNTIME_DIR=${XDG_RUNTIME_DIR}"
# ---------- Optional ----------
# Disable advanced HTML operations if required.
if [[ "${INSTALL_BOOK_AND_ADVANCED_HTML_OPS:-false}" == "true" && "${FAT_DOCKER:-true}" != "true" ]]; then
log "issue with calibre in current version, feature currently disabled on Stirling-PDF"
fi
# Download security JAR in non-fat builds.
if [[ "${FAT_DOCKER:-true}" != "true" && -x /scripts/download-security-jar.sh ]]; then
/scripts/download-security-jar.sh || true
fi
# ---------- UID/GID remap ----------
# Remap user/group IDs to match container runtime settings.
if [ "$(id -u)" -eq 0 ]; then
if id -u stirlingpdfuser >/dev/null 2>&1; then
if [ -n "${PUID:-}" ] && [ "$PUID" != "$(id -u stirlingpdfuser)" ]; then
usermod -o -u "$PUID" stirlingpdfuser || true
chown stirlingpdfuser:stirlingpdfgroup "${XDG_RUNTIME_DIR}" 2>/dev/null || true
fi
fi
if getent group stirlingpdfgroup >/dev/null 2>&1; then
if [ -n "${PGID:-}" ] && [ "$PGID" != "$(getent group stirlingpdfgroup | cut -d: -f3)" ]; then
groupmod -o -g "$PGID" stirlingpdfgroup || true
fi
fi
fi
# ---------- Permissions ----------
# Ensure required directories exist and set correct permissions.
log "Setting permissions..."
mkdir -p /tmp/stirling-pdf /logs /configs /customFiles /pipeline || true
CHOWN_PATHS=("$HOME" "/logs" "/scripts" "/configs" "/customFiles" "/pipeline" "/tmp/stirling-pdf" "/app.jar")
[ -d /usr/share/fonts/truetype ] && CHOWN_PATHS+=("/usr/share/fonts/truetype")
CHOWN_OK=true
for p in "${CHOWN_PATHS[@]}"; do
if [ -e "$p" ]; then
chown -R "stirlingpdfuser:stirlingpdfgroup" "$p" 2>/dev/null || CHOWN_OK=false
chmod -R 755 "$p" 2>/dev/null || true
fi
done
# ---------- Xvfb ----------
# Start a virtual framebuffer for GUI-based LibreOffice interactions.
if command_exists Xvfb; then
log "Starting Xvfb on :99"
Xvfb :99 -screen 0 1024x768x24 -ac +extension GLX +render -noreset > /dev/null 2>&1 &
export DISPLAY=:99
sleep 1
else
log "Xvfb not installed; skipping virtual display setup"
fi
# ---------- unoserver ----------
# Start LibreOffice UNO server for document conversions.
UNOSERVER_BIN="$(command -v unoserver || true)"
UNOCONVERT_BIN="$(command -v unoconvert || true)"
UNOSERVER_PID=""
if [ -n "$UNOSERVER_BIN" ] && [ -n "$UNOCONVERT_BIN" ]; then
LIBREOFFICE_PROFILE="${HOME:-/home/${RUNTIME_USER}}/.libreoffice_uno_${RUID}"
run_as_runtime_user mkdir -p "$LIBREOFFICE_PROFILE"
log "Starting unoserver on 127.0.0.1:2003"
run_as_runtime_user "$UNOSERVER_BIN" \
--interface 127.0.0.1 \
--port 2003 \
--uno-port 2004 \
&
UNOSERVER_PID=$!
log "unoserver PID: $UNOSERVER_PID (Profile: $LIBREOFFICE_PROFILE)"
# Wait until UNO server is ready.
log "Waiting for unoserver..."
for _ in {1..20}; do
if run_as_runtime_user "$UNOCONVERT_BIN" --version >/dev/null 2>&1; then
log "unoserver is ready!"
break
fi
sleep 1
done
if ! run_as_runtime_user "$UNOCONVERT_BIN" --version >/dev/null 2>&1; then
log "ERROR: unoserver failed!"
if [ -n "$UNOSERVER_PID" ]; then
kill "$UNOSERVER_PID" 2>/dev/null || true
wait "$UNOSERVER_PID" 2>/dev/null || true
fi
exit 1
fi
else
log "unoserver/unoconvert not installed; skipping UNO setup"
fi
# ---------- Java ----------
# Start Stirling PDF Java application.
log "Starting Stirling PDF"
JAVA_CMD=(
java
-Dfile.encoding=UTF-8
-Djava.io.tmpdir=/tmp/stirling-pdf
-jar /app.jar
)
if [ "$CURRENT_USER" = "$RUNTIME_USER" ]; then
exec "${JAVA_CMD[@]}"
elif [ "$CURRENT_UID" -eq 0 ] && [ -n "$SU_EXEC_BIN" ]; then
exec "$SU_EXEC_BIN" "$RUNTIME_USER" "${JAVA_CMD[@]}"
else
warn_switch_user_once
exec "${JAVA_CMD[@]}"
fi

View File

@@ -1,36 +1,110 @@
#!/bin/bash
# This script initializes environment variables and paths,
# prepares Tesseract data directories, and then runs the main init script.
# Copy the original tesseract-ocr files to the volume directory without overwriting existing files
echo "Copying original files without overwriting existing files"
mkdir -p /usr/share/tessdata
cp -rn /usr/share/tessdata-original/* /usr/share/tessdata
set -euo pipefail
if [ -d /usr/share/tesseract-ocr/4.00/tessdata ]; then
cp -r /usr/share/tesseract-ocr/4.00/tessdata/* /usr/share/tessdata || true;
append_env_path() {
local target="$1" current="$2" separator=":"
if [ -d "$target" ] && [[ ":${current}:" != *":${target}:"* ]]; then
if [ -n "$current" ]; then
printf '%s' "${target}${separator}${current}"
else
printf '%s' "${target}"
fi
else
printf '%s' "$current"
fi
}
python_site_dir() {
local venv_dir="$1"
local python_bin="$venv_dir/bin/python"
if [ -x "$python_bin" ]; then
local py_tag
if py_tag="$("$python_bin" -c 'import sys; print(f"python{sys.version_info.major}.{sys.version_info.minor}")' 2>/dev/null)" \
&& [ -n "$py_tag" ] \
&& [ -d "$venv_dir/lib/$py_tag/site-packages" ]; then
printf '%s' "$venv_dir/lib/$py_tag/site-packages"
fi
fi
}
# === LD_LIBRARY_PATH ===
# Adjust the library path depending on CPU architecture.
ARCH=$(uname -m)
case "$ARCH" in
x86_64)
[ -d /usr/lib/x86_64-linux-gnu ] && export LD_LIBRARY_PATH="/usr/lib/x86_64-linux-gnu${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"
;;
aarch64)
[ -d /usr/lib/aarch64-linux-gnu ] && export LD_LIBRARY_PATH="/usr/lib/aarch64-linux-gnu${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"
;;
esac
# Add LibreOffice program directory to library path if available.
if [ -d /usr/lib/libreoffice/program ]; then
export LD_LIBRARY_PATH="/usr/lib/libreoffice/program${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"
fi
# === Python PATH ===
# Add virtual environments to PATH and PYTHONPATH.
for dir in /opt/venv/bin /opt/unoserver-venv/bin; do
PATH="$(append_env_path "$dir" "$PATH")"
done
export PATH
PYTHON_PATH_ENTRIES=()
for venv in /opt/venv /opt/unoserver-venv; do
if [ -d "$venv" ]; then
site_dir="$(python_site_dir "$venv")"
[ -n "${site_dir:-}" ] && PYTHON_PATH_ENTRIES+=("$site_dir")
fi
done
if [ ${#PYTHON_PATH_ENTRIES[@]} -gt 0 ]; then
PYTHONPATH="$(IFS=:; printf '%s' "${PYTHON_PATH_ENTRIES[*]}")${PYTHONPATH:+:$PYTHONPATH}"
export PYTHONPATH
fi
# # === tessdata ===
# # Prepare Tesseract OCR data directory.
REAL_TESSDATA="/usr/share/tesseract-ocr/5/tessdata"
SEC_TESSDATA="/usr/share/tessdata"
log_warn() {
echo "[init][warn] $*" >&2
}
if [ -d "$REAL_TESSDATA" ] && [ -w "$REAL_TESSDATA" ]; then
log_warn "Skipping tessdata adjustments; directory writable: $REAL_TESSDATA"
else
log_warn "Skipping tessdata adjustments; directory missing or not writable: $REAL_TESSDATA"
fi
if [ -d /usr/share/tesseract-ocr/5/tessdata ]; then
cp -r /usr/share/tesseract-ocr/5/tessdata/* /usr/share/tessdata || true;
REAL_TESSDATA="/usr/share/tesseract-ocr/5/tessdata"
log_warn "Using /usr/share/tesseract-ocr/5/tessdata as TESSDATA_PREFIX"
elif [ -d /usr/share/tessdata ]; then
REAL_TESSDATA="/usr/share/tessdata"
log_warn "Using /usr/share/tessdata as TESSDATA_PREFIX"
elif [ -d /tessdata ]; then
REAL_TESSDATA="/tessdata"
log_warn "Using /tessdata as TESSDATA_PREFIX"
else
REAL_TESSDATA=""
log_warn "No tessdata directory found"
fi
# Check if TESSERACT_LANGS environment variable is set and is not empty
if [[ -n "$TESSERACT_LANGS" ]]; then
# Convert comma-separated values to a space-separated list
SPACE_SEPARATED_LANGS=$(echo $TESSERACT_LANGS | tr ',' ' ')
pattern='^[a-zA-Z]{2,4}(_[a-zA-Z]{2,4})?$'
# Install each language pack
for LANG in $SPACE_SEPARATED_LANGS; do
if [[ $LANG =~ $pattern ]]; then
apk add --no-cache "tesseract-ocr-data-$LANG"
else
echo "Skipping invalid language code"
fi
done
if [ -n "$REAL_TESSDATA" ]; then
export TESSDATA_PREFIX="$REAL_TESSDATA"
fi
# Ensure temp directory exists with correct permissions before running main init
mkdir -p /tmp/stirling-pdf || true
# === Temp dir ===
# Ensure the temporary directory exists and has proper permissions.
mkdir -p /tmp/stirling-pdf
chown -R stirlingpdfuser:stirlingpdfgroup /tmp/stirling-pdf || true
chmod -R 755 /tmp/stirling-pdf || true
/scripts/init-without-ocr.sh "$@"
# === Start application ===
# Run the main init script that handles the full startup logic.
exec /scripts/init-without-ocr.sh