feat(docker-runtime): unified Debian-based images, dynamic path resolution & enhanced UNO/LibreOffice handling (#4880)

# Description of Changes

### What was changed

This PR introduces a major refinement to the Docker runtime, system path
resolution, conversion tooling, and integration logic across the
codebase. Key improvements include:

- Migration of **Dockerfile**, **Dockerfile.fat** to a unified
Debian-based environment.
- Introduction of **RuntimePathConfig** enhancements to dynamically
resolve:
  - `weasyprint`, `unoconvert`, `calibre`, `ocrmypdf`, `soffice`
  - Tesseract `tessdata` paths with Docker-aware defaults.
- Support for **UNO server (unoserver/unoconvert)** as primary document
converter with automatic fallback to `soffice`.
- Isolation of Python environments for WeasyPrint and UNO tooling.
- Updated controllers and services to correctly inject
`RuntimePathConfig`.
- Improved process execution logic in converters and OCR handling.
- Major updates to `init.sh` and `init-without-ocr.sh`:
  - Unified environment initialization
  - Proper UID/GID remapping
  - Safer permissions handling
  - Automatic Tesseract path detection
  - Reliable startup of headless LibreOffice + Xvfb + UNO server
- Full test suite updates:
  - Adaptation to new conversion paths
  - Mocking of UNO and LibreOffice commands
  - More robust Docker test logic
- Updated example docker-compose files referencing GHCR test images.
- Expanded configuration schema for new operations paths.

### Why the change was made

These changes address long-standing issues around:

- Inconsistent or missing binary paths between image variants.
- Reduced reliability of document conversions (UNO vs. soffice).
- Lack of uniform runtime initialization across Docker images.
- Repetitive environment setup logic split across multiple scripts.
- Fragile test scenarios tied to Alpine-based images.

Switching to a unified Debian-based runtime significantly improves:

- Compatibility with LibreOffice, Calibre, WebEngine and graphics stack.
- UNO stability for document conversions.
- Tesseract deterministic behavior.
- Debuggability and reliability of CI/CD Docker-based tests.

The improvements to `RuntimePathConfig` ensure all system binaries are
fully configurable and correctly detected at runtime.

---

## Checklist

### General

- [x] I have read the [Contribution
Guidelines](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/CONTRIBUTING.md)
- [x] I have read the [Stirling-PDF Developer
Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md)
(if applicable)
- [ ] I have read the [How to add new languages to
Stirling-PDF](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md)
(if applicable)
- [x] I have performed a self-review of my own code
- [x] My changes generate no new warnings

### Documentation

- [ ] I have updated relevant docs on [Stirling-PDF's doc
repo](https://github.com/Stirling-Tools/Stirling-Tools.github.io/blob/main/docs/)
(if functionality has heavily changed)
- [ ] I have read the section [Add New Translation
Tags](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md#add-new-translation-tags)
(for new translation tags only)

### Translations (if applicable)

- [ ] I ran
[`scripts/counter_translation.py`](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/docs/counter_translation.md)

### UI Changes (if applicable)

- [ ] Screenshots or videos demonstrating the UI changes are attached
(e.g., as comments or direct attachments in the PR)

### Testing (if applicable)

- [x] I have tested my changes locally. Refer to the [Testing
Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md#6-testing)
for more details.
This commit is contained in:
Ludy
2025-11-25 00:07:54 +01:00
committed by GitHub
parent 43345021bf
commit 886f9b379e
31 changed files with 1292 additions and 440 deletions

View File

@@ -16,27 +16,47 @@ find_root() {
PROJECT_ROOT=$(find_root)
# Function to check the health of the service with a timeout of 80 seconds
# Function to check application readiness via HTTP instead of Docker's health status
check_health() {
local service_name=$1
local container_name=$1 # real container name
local compose_file=$2
local end=$((SECONDS+60))
local timeout=80 # total timeout in seconds
local interval=3 # poll interval in seconds
local end=$((SECONDS + timeout))
local last_code="000"
echo -n "Waiting for $service_name to become healthy..."
until [ "$(docker inspect --format='{{if .State.Health}}{{.State.Health.Status}}{{else}}healthy{{end}}' "$service_name")" == "healthy" ] || [ $SECONDS -ge $end ]; do
sleep 3
echo -n "."
if [ $SECONDS -ge $end ]; then
echo -e "\n$service_name health check timed out after 80 seconds."
echo "Printing logs for $service_name:"
docker logs "$service_name"
return 1
echo "Waiting for $container_name to become reachable on http://localhost:8080/ (timeout ${timeout}s)..."
while [ $SECONDS -lt $end ]; do
# Optional: check if container is running at all (nice for debugging)
if ! docker ps --format '{{.Names}}' | grep -Fxq "$container_name"; then
echo " Container $container_name not running yet (still waiting)..."
fi
# Try simple HTTP GET on the root page
last_code=$(curl -s -o /dev/null -w '%{http_code}' "http://localhost:8080/") || last_code="000"
# Treat any 2xx or 3xx as "ready"
if [ "$last_code" -ge 200 ] && [ "$last_code" -lt 400 ]; then
echo "$container_name is reachable over HTTP (status $last_code)."
echo "Printing logs for $container_name:"
docker logs "$container_name" || true
return 0
fi
echo " Still waiting for HTTP readiness, current status: $last_code"
sleep "$interval"
done
echo -e "\n$service_name is healthy!"
echo "Printing logs for $service_name:"
docker logs "$service_name"
return 0
echo "$container_name did not become HTTP-ready within ${timeout}s (last HTTP status: $last_code)."
# For extra debugging: show Docker health status, but DO NOT depend on it
local docker_health
docker_health=$(docker inspect --format='{{if .State.Health}}{{.State.Health.Status}}{{else}}(no healthcheck){{end}}' "$container_name" 2>/dev/null || echo "inspect failed")
echo "Docker-reported health status for $container_name: $docker_health"
echo "Printing logs for $container_name:"
docker logs "$container_name" || true
return 1
}
# Function to capture file list from a Docker container
@@ -48,7 +68,7 @@ capture_file_list() {
# Get all files in one command, output directly from Docker to avoid path issues
# Skip proc, sys, dev, and the specified LibreOffice config directory
# Also skip PDFBox and LibreOffice temporary files
docker exec $container_name sh -c "find / -type f \
docker exec "$container_name" sh -c "find / -type f \
-not -path '*/proc/*' \
-not -path '*/sys/*' \
-not -path '*/dev/*' \
@@ -69,7 +89,7 @@ capture_file_list() {
echo "Trying alternative approach..."
# Alternative simpler approach - just get paths as a fallback
docker exec $container_name sh -c "find / -type f \
docker exec "$container_name" sh -c "find / -type f \
-not -path '*/proc/*' \
-not -path '*/sys/*' \
-not -path '*/dev/*' \
@@ -106,14 +126,8 @@ compare_file_lists() {
# Check if files exist and have content
if [ ! -s "$before_file" ] || [ ! -s "$after_file" ]; then
echo "WARNING: One or both file lists are empty."
if [ ! -s "$before_file" ]; then
echo "Before file is empty: $before_file"
fi
if [ ! -s "$after_file" ]; then
echo "After file is empty: $after_file"
fi
if [ ! -s "$before_file" ]; then echo "Before file is empty: $before_file"; fi
if [ ! -s "$after_file" ]; then echo "After file is empty: $after_file"; fi
# Create empty diff file
> "$diff_file"
@@ -132,7 +146,6 @@ compare_file_lists() {
echo "No temporary files found in the after snapshot."
fi
fi
return 0
fi
@@ -169,7 +182,6 @@ compare_file_lists() {
else
echo "No file changes detected during test."
fi
return 0
}
@@ -220,19 +232,33 @@ verify_app_version() {
# Function to test a Docker Compose configuration
test_compose() {
local compose_file=$1
local service_name=$2
local test_name=$2
local status=0
echo "Testing $compose_file configuration..."
echo "Testing ${compose_file} configuration..."
# Start up the Docker Compose service
docker-compose -f "$compose_file" up -d
# Wait for the service to become healthy
if check_health "$service_name" "$compose_file"; then
echo "$service_name test passed."
# Wait a moment for containers to appear
sleep 3
local container_name
container_name=$(docker-compose -f "$compose_file" ps --format '{{.Names}}' --filter "status=running" | head -n1)
if [[ -z "$container_name" ]]; then
echo "ERROR: No running container found for ${compose_file}"
docker-compose -f "$compose_file" ps
return 1
fi
echo "Started container: $container_name"
# Wait for the service to become healthy (HTTP-based)
if check_health "$container_name" "$compose_file"; then
echo "${test_name} test passed."
else
echo "$service_name test failed."
echo "${test_name} test failed."
status=1
fi
@@ -246,7 +272,6 @@ declare -a failed_tests
run_tests() {
local test_name=$1
local compose_file=$2
if test_compose "$compose_file" "$test_name"; then
passed_tests+=("$test_name")
else
@@ -254,18 +279,18 @@ run_tests() {
fi
}
# Main testing routine
main() {
SECONDS=0
cd "$PROJECT_ROOT"
export DOCKER_CLI_EXPERIMENTAL=enabled
export COMPOSE_DOCKER_CLI_BUILD=0
export DISABLE_ADDITIONAL_FEATURES=true
# Run the gradlew build command and check if it fails
# ==================================================================
# 1. Ultra-Lite (no additional features)
# ==================================================================
export DISABLE_ADDITIONAL_FEATURES=true
if ! ./gradlew clean build; then
echo "Gradle build failed with security disabled, exiting script."
exit 1
@@ -276,11 +301,12 @@ main() {
EXPECTED_VERSION=$(get_expected_version)
echo "Expected version: $EXPECTED_VERSION"
# Building Docker images
# docker build --no-cache --pull --build-arg VERSION_TAG=alpha -t stirlingtools/stirling-pdf:latest -f ./Dockerfile .
docker build --build-arg VERSION_TAG=alpha -t docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest-ultra-lite -f ./Dockerfile.ultra-lite .
# Build Ultra-Lite image (GHCR tag, matching docker-compose-latest-ultra-lite.yml)
docker build --build-arg VERSION_TAG=alpha \
-t docker.stirlingpdf.com/stirlingtools/stirling-pdf:ultra-lite \
-f ./Dockerfile.ultra-lite .
# Test each configuration
# Test Ultra-Lite configuration
run_tests "Stirling-PDF-Ultra-Lite" "./exampleYmlFiles/docker-compose-latest-ultra-lite.yml"
echo "Testing webpage accessibility..."
@@ -302,36 +328,27 @@ main() {
echo "Version verification failed for Stirling-PDF-Ultra-Lite"
fi
docker-compose -f "./exampleYmlFiles/docker-compose-latest-ultra-lite.yml" down
# run_tests "Stirling-PDF" "./exampleYmlFiles/docker-compose-latest.yml"
# docker-compose -f "./exampleYmlFiles/docker-compose-latest.yml" down
docker-compose -f "./exampleYmlFiles/docker-compose-latest-ultra-lite.yml" down -v
# ==================================================================
# 2. Full Fat + Security
# ==================================================================
export DISABLE_ADDITIONAL_FEATURES=false
# Run the gradlew build command and check if it fails
if ! ./gradlew clean build; then
echo "Gradle build failed with security enabled, exiting script."
exit 1
fi
# Get expected version after the security-enabled build
echo "Getting expected version from Gradle (security enabled)..."
EXPECTED_VERSION=$(get_expected_version)
echo "Expected version with security enabled: $EXPECTED_VERSION"
# Building Docker images with security enabled
# docker build --no-cache --pull --build-arg VERSION_TAG=alpha -t stirlingtools/stirling-pdf:latest -f ./Dockerfile .
# docker build --no-cache --pull --build-arg VERSION_TAG=alpha -t stirlingtools/stirling-pdf:latest-ultra-lite -f ./Dockerfile.ultra-lite .
docker build --no-cache --pull --build-arg VERSION_TAG=alpha -t docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest-fat -f ./Dockerfile.fat .
# Test each configuration with security
# run_tests "Stirling-PDF-Ultra-Lite-Security" "./exampleYmlFiles/docker-compose-latest-ultra-lite-security.yml"
# docker-compose -f "./exampleYmlFiles/docker-compose-latest-ultra-lite-security.yml" down
# run_tests "Stirling-PDF-Security" "./exampleYmlFiles/docker-compose-latest-security.yml"
# docker-compose -f "./exampleYmlFiles/docker-compose-latest-security.yml" down
# Build Fat (Security) image for GHCR tag used in all 'fat' compose files
docker build --no-cache --pull --build-arg VERSION_TAG=alpha \
-t docker.stirlingpdf.com/stirlingtools/stirling-pdf:fat \
-f ./Dockerfile.fat .
# Test fat + security compose
run_tests "Stirling-PDF-Security-Fat" "./exampleYmlFiles/docker-compose-latest-fat-security.yml"
echo "Testing webpage accessibility..."
@@ -353,54 +370,50 @@ main() {
echo "Version verification failed for Stirling-PDF-Security-Fat"
fi
docker-compose -f "./exampleYmlFiles/docker-compose-latest-fat-security.yml" down
docker-compose -f "./exampleYmlFiles/docker-compose-latest-fat-security.yml" down -v
# ==================================================================
# 3. Regression test with login (test_cicd.yml)
# ==================================================================
run_tests "Stirling-PDF-Security-Fat-with-login" "./exampleYmlFiles/test_cicd.yml"
if [ $? -eq 0 ]; then
# Create directory for file snapshots if it doesn't exist
# Only run behave tests if the container started successfully
if [[ " ${passed_tests[*]} " =~ "Stirling-PDF-Security-Fat-with-login" ]]; then
CONTAINER_NAME=$(docker-compose -f "./exampleYmlFiles/test_cicd.yml" ps --format '{{.Names}}' --filter "status=running" | head -n1)
SNAPSHOT_DIR="$PROJECT_ROOT/testing/file_snapshots"
mkdir -p "$SNAPSHOT_DIR"
# Capture file list before running behave tests
BEFORE_FILE="$SNAPSHOT_DIR/files_before_behave.txt"
AFTER_FILE="$SNAPSHOT_DIR/files_after_behave.txt"
DIFF_FILE="$SNAPSHOT_DIR/files_diff.txt"
# Define container name variable for consistency
CONTAINER_NAME="Stirling-PDF-Security-Fat-with-login"
capture_file_list "$CONTAINER_NAME" "$BEFORE_FILE"
cd "testing/cucumber"
if python -m behave; then
# Wait 10 seconds before capturing the file list after tests
echo "Waiting 5 seconds for any file operations to complete..."
sleep 5
# Capture file list after running behave tests
cd "$PROJECT_ROOT"
capture_file_list "$CONTAINER_NAME" "$AFTER_FILE"
# Compare file lists
if compare_file_lists "$BEFORE_FILE" "$AFTER_FILE" "$DIFF_FILE" "$CONTAINER_NAME"; then
echo "No unexpected temporary files found."
passed_tests+=("Stirling-PDF-Regression")
passed_tests+=("Stirling-PDF-Regression $CONTAINER_NAME")
else
echo "WARNING: Unexpected temporary files detected after behave tests!"
failed_tests+=("Stirling-PDF-Regression-Temp-Files")
fi
passed_tests+=("Stirling-PDF-Regression")
passed_tests+=("Stirling-PDF-Regression $CONTAINER_NAME")
else
failed_tests+=("Stirling-PDF-Regression")
failed_tests+=("Stirling-PDF-Regression $CONTAINER_NAME")
echo "Printing docker logs of failed regression"
docker logs "$CONTAINER_NAME"
echo "Printed docker logs of failed regression"
# Still capture file list after failure for analysis
# Wait 10 seconds before capturing the file list
echo "Waiting 5 seconds before capturing file list..."
echo "Waiting 10 seconds before capturing file list..."
sleep 10
cd "$PROJECT_ROOT"
@@ -408,9 +421,11 @@ main() {
compare_file_lists "$BEFORE_FILE" "$AFTER_FILE" "$DIFF_FILE" "$CONTAINER_NAME"
fi
fi
docker-compose -f "./exampleYmlFiles/test_cicd.yml" down -v
docker-compose -f "./exampleYmlFiles/test_cicd.yml" down
# ==================================================================
# 4. Disabled Endpoints Test
# ==================================================================
run_tests "Stirling-PDF-Fat-Disable-Endpoints" "./exampleYmlFiles/docker-compose-latest-fat-endpoints-disabled.yml"
echo "Testing disabled endpoints..."
@@ -430,27 +445,27 @@ main() {
echo "Version verification failed for Stirling-PDF-Fat-Disable-Endpoints"
fi
docker-compose -f "./exampleYmlFiles/docker-compose-latest-fat-endpoints-disabled.yml" down
docker-compose -f "./exampleYmlFiles/docker-compose-latest-fat-endpoints-disabled.yml" down -v
# Report results
# ==================================================================
# Final Report
# ==================================================================
echo "All tests completed in $SECONDS seconds."
if [ ${#passed_tests[@]} -ne 0 ]; then
echo "Passed tests:"
for test in "${passed_tests[@]}"; do
echo -e "\e[32m$test\e[0m"
done
fi
for test in "${passed_tests[@]}"; do
echo -e "\e[32m$test\e[0m" # Green color for passed tests
done
if [ ${#failed_tests[@]} -ne 0 ]; then
echo "Failed tests:"
for test in "${failed_tests[@]}"; do
echo -e "\e[31m$test\e[0m"
done
fi
for test in "${failed_tests[@]}"; do
echo -e "\e[31m$test\e[0m" # Red color for failed tests
done
# Check if there are any failed tests and exit with an error code if so
if [ ${#failed_tests[@]} -ne 0 ]; then
echo "Some tests failed."
exit 1