feat(docker-runtime): unified Debian-based images, dynamic path resolution & enhanced UNO/LibreOffice handling (#4880)

# Description of Changes

### What was changed

This PR introduces a major refinement to the Docker runtime, system path
resolution, conversion tooling, and integration logic across the
codebase. Key improvements include:

- Migration of **Dockerfile**, **Dockerfile.fat** to a unified
Debian-based environment.
- Introduction of **RuntimePathConfig** enhancements to dynamically
resolve:
  - `weasyprint`, `unoconvert`, `calibre`, `ocrmypdf`, `soffice`
  - Tesseract `tessdata` paths with Docker-aware defaults.
- Support for **UNO server (unoserver/unoconvert)** as primary document
converter with automatic fallback to `soffice`.
- Isolation of Python environments for WeasyPrint and UNO tooling.
- Updated controllers and services to correctly inject
`RuntimePathConfig`.
- Improved process execution logic in converters and OCR handling.
- Major updates to `init.sh` and `init-without-ocr.sh`:
  - Unified environment initialization
  - Proper UID/GID remapping
  - Safer permissions handling
  - Automatic Tesseract path detection
  - Reliable startup of headless LibreOffice + Xvfb + UNO server
- Full test suite updates:
  - Adaptation to new conversion paths
  - Mocking of UNO and LibreOffice commands
  - More robust Docker test logic
- Updated example docker-compose files referencing GHCR test images.
- Expanded configuration schema for new operations paths.

### Why the change was made

These changes address long-standing issues around:

- Inconsistent or missing binary paths between image variants.
- Reduced reliability of document conversions (UNO vs. soffice).
- Lack of uniform runtime initialization across Docker images.
- Repetitive environment setup logic split across multiple scripts.
- Fragile test scenarios tied to Alpine-based images.

Switching to a unified Debian-based runtime significantly improves:

- Compatibility with LibreOffice, Calibre, WebEngine and graphics stack.
- UNO stability for document conversions.
- Tesseract deterministic behavior.
- Debuggability and reliability of CI/CD Docker-based tests.

The improvements to `RuntimePathConfig` ensure all system binaries are
fully configurable and correctly detected at runtime.

---

## Checklist

### General

- [x] I have read the [Contribution
Guidelines](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/CONTRIBUTING.md)
- [x] I have read the [Stirling-PDF Developer
Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md)
(if applicable)
- [ ] I have read the [How to add new languages to
Stirling-PDF](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md)
(if applicable)
- [x] I have performed a self-review of my own code
- [x] My changes generate no new warnings

### Documentation

- [ ] I have updated relevant docs on [Stirling-PDF's doc
repo](https://github.com/Stirling-Tools/Stirling-Tools.github.io/blob/main/docs/)
(if functionality has heavily changed)
- [ ] I have read the section [Add New Translation
Tags](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md#add-new-translation-tags)
(for new translation tags only)

### Translations (if applicable)

- [ ] I ran
[`scripts/counter_translation.py`](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/docs/counter_translation.md)

### UI Changes (if applicable)

- [ ] Screenshots or videos demonstrating the UI changes are attached
(e.g., as comments or direct attachments in the PR)

### Testing (if applicable)

- [x] I have tested my changes locally. Refer to the [Testing
Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md#6-testing)
for more details.
This commit is contained in:
Ludy
2025-11-25 00:07:54 +01:00
committed by GitHub
parent 43345021bf
commit 886f9b379e
31 changed files with 1292 additions and 440 deletions

View File

@@ -1,122 +1,209 @@
# Build the application
FROM gradle:8.14-jdk21 AS build
# ==============================================================================
# Multi-stage Dockerfile for Stirling-PDF "fat" image with everything included
# Includes: LibreOffice, Calibre, Tesseract, OCRmyPDF, unoserver, WeasyPrint, etc.
# ==============================================================================
COPY build.gradle .
COPY settings.gradle .
COPY gradlew .
COPY gradle gradle/
# ========================================
# STAGE 1: Build Stirling-PDF with Gradle (Alpine)
# ========================================
FROM eclipse-temurin:21-jdk-alpine@sha256:c4799f335a65b1ecca8a31239b05522f2b0a184d6818f6349e83484ee6956198 AS build
# Install build tools
RUN apk add --no-cache bash unzip curl git
WORKDIR /workspace
# Copy Gradle wrapper and configuration files
COPY build.gradle settings.gradle gradlew ./
COPY gradle ./gradle/
# Make gradlew executable
RUN chmod +x gradlew
# Create module directories and copy module build files (for Gradle layer caching)
RUN mkdir -p core common proprietary
COPY app/core/build.gradle core/.
COPY app/common/build.gradle common/.
COPY app/proprietary/build.gradle proprietary/.
RUN ./gradlew build -x spotlessApply -x spotlessCheck -x test -x sonarqube || return 0
# Set the working directory
# Warm-up Gradle dependency cache (optional but improves subsequent builds)
RUN ./gradlew --no-daemon printVersion --quiet | tail -1 > /tmp/version_tag || true
RUN ./gradlew --no-daemon build -x spotlessApply -x spotlessCheck -x test -x sonarqube || true
# Switch to final source directory and copy full source code
WORKDIR /app
# Copy the entire project to the working directory
COPY . .
# Build the application with DISABLE_ADDITIONAL_FEATURES=false
# Environment variables (can be overridden at build time)
ENV DISABLE_ADDITIONAL_FEATURES=false \
STIRLING_PDF_DESKTOP_UI=false
RUN ./gradlew clean build -x spotlessApply -x spotlessCheck -x test -x sonarqube
# Main stage
FROM alpine:3.22.2@sha256:4b7ce07002c69e8f3d704a9c5d6fd3053be500b7f1c69fc0d80990c2ad8dd412
# Final build produce the fat JAR
RUN ./gradlew --no-daemon clean build \
-x spotlessApply -x spotlessCheck -x test -x sonarqube \
&& apk del bash unzip curl git
# Copy necessary files
COPY scripts /scripts
COPY app/core/src/main/resources/static/fonts/*.ttf /usr/share/fonts/opentype/noto/
# first /app directory is for the build stage, second is for the final image
COPY --from=build /app/app/core/build/libs/*.jar app.jar
# ========================================
# STAGE 2: Runtime image based on Debian stable-slim
# Contains Java runtime + LibreOffice + Calibre + all PDF tools
# ========================================
FROM debian:stable-slim@sha256:7cb087f19bcc175b96fbe4c2aef42ed00733a659581a80f6ebccfd8fe3185a3d
SHELL ["/bin/bash", "-o", "pipefail", "-c"]
ENV DEBIAN_FRONTEND=noninteractive
# Install core runtime dependencies + tools required by Stirling-PDF features
RUN apt-get update && apt-get install -y --no-install-recommends \
ca-certificates tzdata tini bash fontconfig \
openjdk-21-jre-headless \
ffmpeg poppler-utils qpdf ghostscript ocrmypdf \
libreoffice-nogui libreoffice-java-common \
python3 python3-venv python3-uno \
tesseract-ocr tesseract-ocr-eng tesseract-ocr-deu tesseract-ocr-fra \
tesseract-ocr-por tesseract-ocr-chi-sim \
libcairo2 libpango-1.0-0 libpangoft2-1.0-0 libgdk-pixbuf-2.0-0 \
gosu unpaper \
# AWT headless support (required for some Java graphics operations)
libfreetype6 libfontconfig1 libx11-6 libxt6 libxext6 libxrender1 libxtst6 libxi6 \
libxinerama1 libxkbcommon0 libxkbfile1 libsm6 libice6 \
# Qt WebEngine dependencies for Calibre
libegl1 libopengl0 libgl1 libxdamage1 libxfixes3 libxshmfence1 libdrm2 libgbm1 \
libxkbcommon-x11-0 libxrandr2 libxcomposite1 libnss3 libx11-xcb1 \
libxcb-cursor0 libdbus-1-3 libglib2.0-0 \
# Virtual framebuffer (required for headless LibreOffice)
xvfb x11-utils coreutils \
# Temporary packages only needed for Calibre installer
xz-utils gpgv curl xdg-utils \
\
# Install Calibre from official installer script
&& curl -fsSL https://download.calibre-ebook.com/linux-installer.sh | sh /dev/stdin \
\
# Clean up installer-only packages
&& apt-get purge -y xz-utils gpgv xdg-utils \
&& apt-get autoremove -y \
&& rm -rf /var/lib/apt/lists/*
# Make ebook-convert available in PATH
RUN ln -sf /opt/calibre/ebook-convert /usr/bin/ebook-convert \
&& /opt/calibre/ebook-convert --version
# ==============================================================================
# Create non-root user (stirlingpdfuser) with configurable UID/GID
# ==============================================================================
ARG PUID=1000
ARG PGID=1000
RUN set -eux; \
# Create group if it doesn't exist
if ! getent group stirlingpdfgroup >/dev/null 2>&1; then \
if getent group "${PGID}" >/dev/null 2>&1; then \
groupadd -o -g "${PGID}" stirlingpdfgroup; \
else \
groupadd -g "${PGID}" stirlingpdfgroup; \
fi; \
fi; \
# Create user if it doesn't exist, avoid UID conflicts
if ! id -u stirlingpdfuser >/dev/null 2>&1; then \
if getent passwd | awk -F: -v id="${PUID}" '$3==id{found=1} END{exit !found}'; then \
echo "UID ${PUID} already in use creating stirlingpdfuser with automatic UID"; \
useradd -m -g stirlingpdfgroup -d /home/stirlingpdfuser -s /bin/bash stirlingpdfuser; \
else \
useradd -m -u "${PUID}" -g stirlingpdfgroup -d /home/stirlingpdfuser -s /bin/bash stirlingpdfuser; \
fi; \
fi
# Compatibility alias for older entrypoint scripts expecting su-exec
RUN ln -sf /usr/sbin/gosu /usr/local/bin/su-exec
# Copy application files from build stage
COPY scripts/ /scripts/
COPY app/core/src/main/resources/static/fonts/*.ttf /usr/share/fonts/truetype/
COPY --from=build /app/app/core/build/libs/*.jar /app.jar
# Copy version tag generated during build
COPY --from=build /tmp/version_tag /etc/stirling_version
# Optional version tag (can be passed at build time)
ARG VERSION_TAG
# Set Environment Variables
# Metadata labels
LABEL org.opencontainers.image.title="Stirling-PDF"
LABEL org.opencontainers.image.description="A powerful locally hosted web-based PDF manipulation tool supporting 50+ operations including merging, splitting, conversion, OCR, watermarking, and more."
LABEL org.opencontainers.image.source="https://github.com/Stirling-Tools/Stirling-PDF"
LABEL org.opencontainers.image.licenses="MIT"
LABEL org.opencontainers.image.vendor="Stirling-Tools"
LABEL org.opencontainers.image.url="https://www.stirlingpdf.com"
LABEL org.opencontainers.image.documentation="https://docs.stirlingpdf.com"
LABEL maintainer="Stirling-Tools"
LABEL org.opencontainers.image.authors="Stirling-Tools"
LABEL org.opencontainers.image.version="${VERSION_TAG}"
LABEL org.opencontainers.image.keywords="PDF, manipulation, merge, split, convert, OCR, watermark"
# ==============================================================================
# Runtime environment variables
# ==============================================================================
ENV DISABLE_ADDITIONAL_FEATURES=true \
VERSION_TAG=$VERSION_TAG \
JAVA_BASE_OPTS="-XX:+UnlockExperimentalVMOptions -XX:MaxRAMPercentage=75 -XX:InitiatingHeapOccupancyPercent=20 -XX:+G1PeriodicGCInvokesConcurrent -XX:G1PeriodicGCInterval=10000 -XX:+UseStringDeduplication -XX:G1PeriodicGCSystemLoadThreshold=70" \
JAVA_BASE_OPTS="-XX:+UnlockExperimentalVMOptions -XX:MaxRAMPercentage=75 -XX:InitiatingHeapOccupancyPercent=20 \
-XX:+G1PeriodicGCInvokesConcurrent -XX:G1PeriodicGCInterval=10000 \
-XX:+UseStringDeduplication -XX:G1PeriodicGCSystemLoadThreshold=70 \
-Djava.awt.headless=true" \
JAVA_CUSTOM_OPTS="" \
HOME=/home/stirlingpdfuser \
PUID=1000 \
PGID=1000 \
PUID=${PUID} \
PGID=${PGID} \
UMASK=022 \
FAT_DOCKER=true \
INSTALL_BOOK_AND_ADVANCED_HTML_OPS=false \
PYTHONPATH=/usr/lib/libreoffice/program:/opt/venv/lib/python3.12/site-packages \
UNO_PATH=/usr/lib/libreoffice/program \
URE_BOOTSTRAP=file:///usr/lib/libreoffice/program/fundamentalrc \
PATH=$PATH:/opt/venv/bin \
STIRLING_TEMPFILES_DIRECTORY=/tmp/stirling-pdf \
TMPDIR=/tmp/stirling-pdf \
TEMP=/tmp/stirling-pdf \
TMP=/tmp/stirling-pdf
# JDK for app
RUN apk add --no-cache bash \
&& ln -sf /bin/bash /bin/sh \
&& printf '%s\n' \
'https://dl-cdn.alpinelinux.org/alpine/edge/main' \
'https://dl-cdn.alpinelinux.org/alpine/edge/community' \
'https://dl-cdn.alpinelinux.org/alpine/edge/testing' \
> /etc/apk/repositories && \
apk upgrade --no-cache -a && \
apk add --no-cache \
ca-certificates \
tzdata \
tini \
bash \
curl \
shadow \
su-exec \
openssl \
openssl-dev \
openjdk21-jre \
ffmpeg \
# Doc conversion
gcompat \
libc6-compat \
libreoffice \
# pdftohtml
poppler-utils \
# OCR MY PDF (unpaper for descew and other advanced featues)
tesseract-ocr-data-eng \
tesseract-ocr-data-chi_sim \
tesseract-ocr-data-deu \
tesseract-ocr-data-fra \
tesseract-ocr-data-por \
unpaper \
font-terminus font-dejavu font-noto font-noto-cjk font-awesome font-noto-extra font-liberation font-linux-libertine font-urw-base35 \
# CV / Python
py3-opencv \
python3 \
ocrmypdf \
py3-pip \
py3-pillow \
py3-pdf2image \
# Calibre (musl-native) + QtWebEngine Runtime
calibre && \
# Calibre fixes
apk fix --no-cache calibre && \
python3 -m venv /opt/venv && \
/opt/venv/bin/pip install --no-cache-dir --upgrade pip setuptools && \
/opt/venv/bin/pip install --no-cache-dir --upgrade unoserver weasyprint && \
ln -s /usr/lib/libreoffice/program/uno.py /opt/venv/lib/python3.12/site-packages/ && \
ln -s /usr/lib/libreoffice/program/unohelper.py /opt/venv/lib/python3.12/site-packages/ && \
ln -s /usr/lib/libreoffice/program /opt/venv/lib/python3.12/site-packages/LibreOffice && \
mv /usr/share/tessdata /usr/share/tessdata-original && \
mkdir -p $HOME /configs /logs /customFiles /pipeline/watchedFolders /pipeline/finishedFolders /tmp/stirling-pdf && \
# Configure URW Base 35 fonts
ln -s /usr/share/fontconfig/conf.avail/69-urw-*.conf /etc/fonts/conf.d/ && \
fc-cache -f -v && \
chmod +x /scripts/* && \
# User permissions
addgroup -S stirlingpdfgroup && adduser -S stirlingpdfuser -G stirlingpdfgroup && \
chown -R stirlingpdfuser:stirlingpdfgroup $HOME /scripts /usr/share/fonts/opentype/noto /configs /customFiles /pipeline /tmp/stirling-pdf && \
chown stirlingpdfuser:stirlingpdfgroup /app.jar && \
ln -sf /bin/busybox /bin/sh
# ==============================================================================
# Python virtual environment for additional Python tools (WeasyPrint, OpenCV, etc.)
# ==============================================================================
RUN python3 -m venv /opt/venv --system-site-packages \
&& /opt/venv/bin/pip install --no-cache-dir weasyprint pdf2image opencv-python-headless \
&& /opt/venv/bin/python -c "import cv2; print('OpenCV version:', cv2.__version__)"
# Separate venv for unoserver (keeps it isolated)
RUN python3 -m venv /opt/unoserver-venv --system-site-packages \
&& /opt/unoserver-venv/bin/pip install --no-cache-dir unoserver
# Make unoserver tools available in main venv PATH
RUN ln -sf /opt/unoserver-venv/bin/unoconvert /opt/venv/bin/unoconvert \
&& ln -sf /opt/unoserver-venv/bin/unoserver /opt/venv/bin/unoserver
# Extend PATH to include both virtual environments
ENV PATH="/opt/venv/bin:/opt/unoserver-venv/bin:${PATH}"
# ==============================================================================
# Final permissions, directories and font cache
# ==============================================================================
RUN set -eux; \
chmod +x /scripts/*; \
mkdir -p /configs /logs /customFiles /pipeline/watchedFolders /pipeline/finishedFolders /tmp/stirling-pdf; \
chown -R stirlingpdfuser:stirlingpdfgroup \
/home/stirlingpdfuser /configs /logs /customFiles /pipeline /tmp/stirling-pdf \
/app.jar /usr/share/fonts/truetype /scripts; \
chmod -R 755 /tmp/stirling-pdf
# Rebuild font cache
RUN fc-cache -f -v
# Force Qt/WebEngine to run headlessly (required for Calibre in Docker)
ENV QT_QPA_PLATFORM=offscreen \
QTWEBENGINE_CHROMIUM_FLAGS="--disable-gpu --disable-dev-shm-usage"
# Expose web UI port
EXPOSE 8080/tcp
# Set user and run command
STOPSIGNAL SIGTERM
# Use tini as init (handles signals and zombies correctly)
ENTRYPOINT ["tini", "--", "/scripts/init.sh"]
CMD ["sh", "-c", "java -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/tmp/stirling-pdf -jar /app.jar & /opt/venv/bin/unoserver --port 2003 --interface 127.0.0.1"]
# CMD is empty actual start command is defined in init.sh
CMD []