feat(docker-runtime): unified Debian-based images, dynamic path resolution & enhanced UNO/LibreOffice handling (#4880)

# Description of Changes

### What was changed

This PR introduces a major refinement to the Docker runtime, system path
resolution, conversion tooling, and integration logic across the
codebase. Key improvements include:

- Migration of **Dockerfile**, **Dockerfile.fat** to a unified
Debian-based environment.
- Introduction of **RuntimePathConfig** enhancements to dynamically
resolve:
  - `weasyprint`, `unoconvert`, `calibre`, `ocrmypdf`, `soffice`
  - Tesseract `tessdata` paths with Docker-aware defaults.
- Support for **UNO server (unoserver/unoconvert)** as primary document
converter with automatic fallback to `soffice`.
- Isolation of Python environments for WeasyPrint and UNO tooling.
- Updated controllers and services to correctly inject
`RuntimePathConfig`.
- Improved process execution logic in converters and OCR handling.
- Major updates to `init.sh` and `init-without-ocr.sh`:
  - Unified environment initialization
  - Proper UID/GID remapping
  - Safer permissions handling
  - Automatic Tesseract path detection
  - Reliable startup of headless LibreOffice + Xvfb + UNO server
- Full test suite updates:
  - Adaptation to new conversion paths
  - Mocking of UNO and LibreOffice commands
  - More robust Docker test logic
- Updated example docker-compose files referencing GHCR test images.
- Expanded configuration schema for new operations paths.

### Why the change was made

These changes address long-standing issues around:

- Inconsistent or missing binary paths between image variants.
- Reduced reliability of document conversions (UNO vs. soffice).
- Lack of uniform runtime initialization across Docker images.
- Repetitive environment setup logic split across multiple scripts.
- Fragile test scenarios tied to Alpine-based images.

Switching to a unified Debian-based runtime significantly improves:

- Compatibility with LibreOffice, Calibre, WebEngine and graphics stack.
- UNO stability for document conversions.
- Tesseract deterministic behavior.
- Debuggability and reliability of CI/CD Docker-based tests.

The improvements to `RuntimePathConfig` ensure all system binaries are
fully configurable and correctly detected at runtime.

---

## Checklist

### General

- [x] I have read the [Contribution
Guidelines](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/CONTRIBUTING.md)
- [x] I have read the [Stirling-PDF Developer
Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md)
(if applicable)
- [ ] I have read the [How to add new languages to
Stirling-PDF](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md)
(if applicable)
- [x] I have performed a self-review of my own code
- [x] My changes generate no new warnings

### Documentation

- [ ] I have updated relevant docs on [Stirling-PDF's doc
repo](https://github.com/Stirling-Tools/Stirling-Tools.github.io/blob/main/docs/)
(if functionality has heavily changed)
- [ ] I have read the section [Add New Translation
Tags](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md#add-new-translation-tags)
(for new translation tags only)

### Translations (if applicable)

- [ ] I ran
[`scripts/counter_translation.py`](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/docs/counter_translation.md)

### UI Changes (if applicable)

- [ ] Screenshots or videos demonstrating the UI changes are attached
(e.g., as comments or direct attachments in the PR)

### Testing (if applicable)

- [x] I have tested my changes locally. Refer to the [Testing
Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md#6-testing)
for more details.
This commit is contained in:
Ludy
2025-11-25 00:07:54 +01:00
committed by GitHub
parent 43345021bf
commit 886f9b379e
31 changed files with 1292 additions and 440 deletions

View File

@@ -1,11 +1,88 @@
# Main stage
FROM alpine:3.22.2@sha256:4b7ce07002c69e8f3d704a9c5d6fd3053be500b7f1c69fc0d80990c2ad8dd412
# ==============================================================================
# Multi-stage Dockerfile for Stirling-PDF image with everything included
# Includes: LibreOffice, Calibre, Tesseract, OCRmyPDF, unoserver, WeasyPrint, etc.
# ==============================================================================
# Copy necessary files
COPY scripts /scripts
COPY app/core/src/main/resources/static/fonts/*.ttf /usr/share/fonts/opentype/noto/
# ========================================
# STAGE 1: Runtime image based on Debian stable-slim
# Contains Java runtime + LibreOffice + Calibre + all PDF tools
# ========================================
FROM debian:stable-slim@sha256:7cb087f19bcc175b96fbe4c2aef42ed00733a659581a80f6ebccfd8fe3185a3d
SHELL ["/bin/bash", "-o", "pipefail", "-c"]
ENV DEBIAN_FRONTEND=noninteractive
ENV TESS_BASE_PATH=/usr/share/tesseract-ocr/5/tessdata
# Install core runtime dependencies + tools required by Stirling-PDF features
RUN apt-get update && apt-get install -y --no-install-recommends \
ca-certificates tzdata tini bash fontconfig \
openjdk-21-jre-headless \
ffmpeg poppler-utils ocrmypdf \
libreoffice-nogui libreoffice-java-common \
python3 python3-venv python3-uno \
tesseract-ocr tesseract-ocr-eng tesseract-ocr-deu tesseract-ocr-fra \
tesseract-ocr-por tesseract-ocr-chi-sim \
libcairo2 libpango-1.0-0 libpangoft2-1.0-0 libgdk-pixbuf-2.0-0 \
gosu unpaper \
# AWT headless support (required for some Java graphics operations)
libfreetype6 libfontconfig1 libx11-6 libxt6 libxext6 libxrender1 libxtst6 libxi6 \
libxinerama1 libxkbcommon0 libxkbfile1 libsm6 libice6 \
# Qt WebEngine dependencies for Calibre
libegl1 libopengl0 libgl1 libxdamage1 libxfixes3 libxshmfence1 libdrm2 libgbm1 \
libxkbcommon-x11-0 libxrandr2 libxcomposite1 libnss3 libx11-xcb1 \
libxcb-cursor0 libdbus-1-3 libglib2.0-0 \
# Virtual framebuffer (required for headless LibreOffice)
xvfb x11-utils coreutils \
# Temporary packages only needed for Calibre installer
xz-utils gpgv curl xdg-utils \
\
# Install Calibre from official installer script
&& curl -fsSL https://download.calibre-ebook.com/linux-installer.sh | sh /dev/stdin \
\
# Clean up installer-only packages
&& apt-get purge -y xz-utils gpgv xdg-utils \
&& apt-get autoremove -y \
&& rm -rf /var/lib/apt/lists/*
# Make ebook-convert available in PATH
RUN ln -sf /opt/calibre/ebook-convert /usr/bin/ebook-convert \
&& /opt/calibre/ebook-convert --version
# ==============================================================================
# Create non-root user (stirlingpdfuser) with configurable UID/GID
# ==============================================================================
ARG PUID=1000
ARG PGID=1000
RUN set -eux; \
# Create group if it doesn't exist
if ! getent group stirlingpdfgroup >/dev/null 2>&1; then \
if getent group "${PGID}" >/dev/null 2>&1; then \
groupadd -o -g "${PGID}" stirlingpdfgroup; \
else \
groupadd -g "${PGID}" stirlingpdfgroup; \
fi; \
fi; \
# Create user if it doesn't exist, avoid UID conflicts
if ! id -u stirlingpdfuser >/dev/null 2>&1; then \
if getent passwd | awk -F: -v id="${PUID}" '$3==id{found=1} END{exit !found}'; then \
echo "UID ${PUID} already in use creating stirlingpdfuser with automatic UID"; \
useradd -m -g stirlingpdfgroup -d /home/stirlingpdfuser -s /bin/bash stirlingpdfuser; \
else \
useradd -m -u "${PUID}" -g stirlingpdfgroup -d /home/stirlingpdfuser -s /bin/bash stirlingpdfuser; \
fi; \
fi
# Compatibility alias for older entrypoint scripts expecting su-exec
RUN ln -sf /usr/sbin/gosu /usr/local/bin/su-exec
# Copy application files from build stage
COPY scripts/ /scripts/
COPY app/core/src/main/resources/static/fonts/*.ttf /usr/share/fonts/truetype/
COPY app/core/build/libs/*.jar app.jar
# Optional version tag (can be passed at build time)
ARG VERSION_TAG
LABEL org.opencontainers.image.title="Stirling-PDF"
@@ -20,91 +97,68 @@ LABEL org.opencontainers.image.authors="Stirling-Tools"
LABEL org.opencontainers.image.version="${VERSION_TAG}"
LABEL org.opencontainers.image.keywords="PDF, manipulation, merge, split, convert, OCR, watermark"
# Set Environment Variables
# ==============================================================================
# Runtime environment variables
# ==============================================================================
ENV DISABLE_ADDITIONAL_FEATURES=true \
VERSION_TAG=$VERSION_TAG \
JAVA_BASE_OPTS="-XX:+UnlockExperimentalVMOptions -XX:MaxRAMPercentage=75 -XX:InitiatingHeapOccupancyPercent=20 -XX:+G1PeriodicGCInvokesConcurrent -XX:G1PeriodicGCInterval=10000 -XX:+UseStringDeduplication -XX:G1PeriodicGCSystemLoadThreshold=70" \
JAVA_BASE_OPTS="-XX:+UnlockExperimentalVMOptions -XX:MaxRAMPercentage=75 -XX:InitiatingHeapOccupancyPercent=20 \
-XX:+G1PeriodicGCInvokesConcurrent -XX:G1PeriodicGCInterval=10000 \
-XX:+UseStringDeduplication -XX:G1PeriodicGCSystemLoadThreshold=70 \
-Djava.awt.headless=true" \
JAVA_CUSTOM_OPTS="" \
HOME=/home/stirlingpdfuser \
PUID=1000 \
PGID=1000 \
PUID=${PUID} \
PGID=${PGID} \
UMASK=022 \
PYTHONPATH=/usr/lib/libreoffice/program:/opt/venv/lib/python3.12/site-packages \
UNO_PATH=/usr/lib/libreoffice/program \
URE_BOOTSTRAP=file:///usr/lib/libreoffice/program/fundamentalrc \
PATH=$PATH:/opt/venv/bin \
STIRLING_TEMPFILES_DIRECTORY=/tmp/stirling-pdf \
TMPDIR=/tmp/stirling-pdf \
TEMP=/tmp/stirling-pdf \
TMP=/tmp/stirling-pdf
# JDK for app
RUN apk add --no-cache bash \
&& ln -sf /bin/bash /bin/sh \
&& printf '%s\n' \
'https://dl-cdn.alpinelinux.org/alpine/edge/main' \
'https://dl-cdn.alpinelinux.org/alpine/edge/community' \
'https://dl-cdn.alpinelinux.org/alpine/edge/testing' \
> /etc/apk/repositories && \
apk upgrade --no-cache -a && \
apk add --no-cache \
ca-certificates \
tzdata \
tini \
bash \
curl \
shadow \
su-exec \
openssl \
openssl-dev \
openjdk21-jre \
ffmpeg \
# Doc conversion
gcompat \
libc6-compat \
libreoffice \
# pdftohtml
poppler-utils \
# OCR MY PDF (unpaper for descew and other advanced features)
tesseract-ocr-data-eng \
tesseract-ocr-data-chi_sim \
tesseract-ocr-data-deu \
tesseract-ocr-data-fra \
tesseract-ocr-data-por \
unpaper \
# CV / Python
py3-opencv \
python3 \
ocrmypdf \
py3-pip \
py3-pillow \
py3-pdf2image \
# Calibre
calibre \
# URW Base 35 fonts for better PDF rendering
font-urw-base35 && \
# Calibre fixes
apk fix --no-cache calibre && \
python3 -m venv /opt/venv && \
/opt/venv/bin/pip install --no-cache-dir --upgrade pip setuptools && \
/opt/venv/bin/pip install --no-cache-dir --upgrade unoserver weasyprint && \
ln -s /usr/lib/libreoffice/program/uno.py /opt/venv/lib/python3.12/site-packages/ && \
ln -s /usr/lib/libreoffice/program/unohelper.py /opt/venv/lib/python3.12/site-packages/ && \
ln -s /usr/lib/libreoffice/program /opt/venv/lib/python3.12/site-packages/LibreOffice && \
mv /usr/share/tessdata /usr/share/tessdata-original && \
mkdir -p $HOME /configs /logs /customFiles /pipeline/watchedFolders /pipeline/finishedFolders /tmp/stirling-pdf && \
# Configure URW Base 35 fonts
ln -s /usr/share/fontconfig/conf.avail/69-urw-*.conf /etc/fonts/conf.d/ && \
fc-cache -f -v && \
chmod +x /scripts/* && \
# User permissions
addgroup -S stirlingpdfgroup && adduser -S stirlingpdfuser -G stirlingpdfgroup && \
chown -R stirlingpdfuser:stirlingpdfgroup $HOME /scripts /usr/share/fonts/opentype/noto /configs /customFiles /pipeline /tmp/stirling-pdf && \
chown stirlingpdfuser:stirlingpdfgroup /app.jar && \
ln -sf /bin/busybox /bin/sh
# ==============================================================================
# Python virtual environment for additional Python tools (WeasyPrint, OpenCV, etc.)
# ==============================================================================
RUN python3 -m venv /opt/venv --system-site-packages \
&& /opt/venv/bin/pip install --no-cache-dir weasyprint pdf2image opencv-python-headless \
&& /opt/venv/bin/python -c "import cv2; print('OpenCV version:', cv2.__version__)"
# Separate venv for unoserver (keeps it isolated)
RUN python3 -m venv /opt/unoserver-venv --system-site-packages \
&& /opt/unoserver-venv/bin/pip install --no-cache-dir unoserver
# Make unoserver tools available in main venv PATH
RUN ln -sf /opt/unoserver-venv/bin/unoconvert /opt/venv/bin/unoconvert \
&& ln -sf /opt/unoserver-venv/bin/unoserver /opt/venv/bin/unoserver
# Extend PATH to include both virtual environments
ENV PATH="/opt/venv/bin:/opt/unoserver-venv/bin:${PATH}"
# ==============================================================================
# Final permissions, directories and font cache
# ==============================================================================
RUN set -eux; \
chmod +x /scripts/*; \
mkdir -p /configs /logs /customFiles /pipeline/watchedFolders /pipeline/finishedFolders /tmp/stirling-pdf; \
chown -R stirlingpdfuser:stirlingpdfgroup \
/home/stirlingpdfuser /configs /logs /customFiles /pipeline /tmp/stirling-pdf \
/app.jar /usr/share/fonts/truetype /scripts; \
chmod -R 755 /tmp/stirling-pdf
# Rebuild font cache
RUN fc-cache -f -v
# Force Qt/WebEngine to run headlessly (required for Calibre in Docker)
ENV QT_QPA_PLATFORM=offscreen \
QTWEBENGINE_CHROMIUM_FLAGS="--disable-gpu --disable-dev-shm-usage"
# Expose web UI port
EXPOSE 8080/tcp
# Set user and run command
STOPSIGNAL SIGTERM
# Use tini as init (handles signals and zombies correctly)
ENTRYPOINT ["tini", "--", "/scripts/init.sh"]
CMD ["sh", "-c", "java -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/tmp/stirling-pdf -jar /app.jar & /opt/venv/bin/unoserver --port 2003 --interface 127.0.0.1"]
# CMD is empty actual start command is defined in init.sh
CMD []