mirror of
https://github.com/Frooodle/Stirling-PDF.git
synced 2026-02-17 13:52:14 +01:00
feat(docker-runtime): unified Debian-based images, dynamic path resolution & enhanced UNO/LibreOffice handling (#4880)
# Description of Changes ### What was changed This PR introduces a major refinement to the Docker runtime, system path resolution, conversion tooling, and integration logic across the codebase. Key improvements include: - Migration of **Dockerfile**, **Dockerfile.fat** to a unified Debian-based environment. - Introduction of **RuntimePathConfig** enhancements to dynamically resolve: - `weasyprint`, `unoconvert`, `calibre`, `ocrmypdf`, `soffice` - Tesseract `tessdata` paths with Docker-aware defaults. - Support for **UNO server (unoserver/unoconvert)** as primary document converter with automatic fallback to `soffice`. - Isolation of Python environments for WeasyPrint and UNO tooling. - Updated controllers and services to correctly inject `RuntimePathConfig`. - Improved process execution logic in converters and OCR handling. - Major updates to `init.sh` and `init-without-ocr.sh`: - Unified environment initialization - Proper UID/GID remapping - Safer permissions handling - Automatic Tesseract path detection - Reliable startup of headless LibreOffice + Xvfb + UNO server - Full test suite updates: - Adaptation to new conversion paths - Mocking of UNO and LibreOffice commands - More robust Docker test logic - Updated example docker-compose files referencing GHCR test images. - Expanded configuration schema for new operations paths. ### Why the change was made These changes address long-standing issues around: - Inconsistent or missing binary paths between image variants. - Reduced reliability of document conversions (UNO vs. soffice). - Lack of uniform runtime initialization across Docker images. - Repetitive environment setup logic split across multiple scripts. - Fragile test scenarios tied to Alpine-based images. Switching to a unified Debian-based runtime significantly improves: - Compatibility with LibreOffice, Calibre, WebEngine and graphics stack. - UNO stability for document conversions. - Tesseract deterministic behavior. - Debuggability and reliability of CI/CD Docker-based tests. The improvements to `RuntimePathConfig` ensure all system binaries are fully configurable and correctly detected at runtime. --- ## Checklist ### General - [x] I have read the [Contribution Guidelines](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/CONTRIBUTING.md) - [x] I have read the [Stirling-PDF Developer Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md) (if applicable) - [ ] I have read the [How to add new languages to Stirling-PDF](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md) (if applicable) - [x] I have performed a self-review of my own code - [x] My changes generate no new warnings ### Documentation - [ ] I have updated relevant docs on [Stirling-PDF's doc repo](https://github.com/Stirling-Tools/Stirling-Tools.github.io/blob/main/docs/) (if functionality has heavily changed) - [ ] I have read the section [Add New Translation Tags](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md#add-new-translation-tags) (for new translation tags only) ### Translations (if applicable) - [ ] I ran [`scripts/counter_translation.py`](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/docs/counter_translation.md) ### UI Changes (if applicable) - [ ] Screenshots or videos demonstrating the UI changes are attached (e.g., as comments or direct attachments in the PR) ### Testing (if applicable) - [x] I have tested my changes locally. Refer to the [Testing Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md#6-testing) for more details.
This commit is contained in:
212
Dockerfile
212
Dockerfile
@@ -1,11 +1,88 @@
|
||||
# Main stage
|
||||
FROM alpine:3.22.2@sha256:4b7ce07002c69e8f3d704a9c5d6fd3053be500b7f1c69fc0d80990c2ad8dd412
|
||||
# ==============================================================================
|
||||
# Multi-stage Dockerfile for Stirling-PDF – image with everything included
|
||||
# Includes: LibreOffice, Calibre, Tesseract, OCRmyPDF, unoserver, WeasyPrint, etc.
|
||||
# ==============================================================================
|
||||
|
||||
# Copy necessary files
|
||||
COPY scripts /scripts
|
||||
COPY app/core/src/main/resources/static/fonts/*.ttf /usr/share/fonts/opentype/noto/
|
||||
# ========================================
|
||||
# STAGE 1: Runtime image based on Debian stable-slim
|
||||
# Contains Java runtime + LibreOffice + Calibre + all PDF tools
|
||||
# ========================================
|
||||
FROM debian:stable-slim@sha256:7cb087f19bcc175b96fbe4c2aef42ed00733a659581a80f6ebccfd8fe3185a3d
|
||||
|
||||
SHELL ["/bin/bash", "-o", "pipefail", "-c"]
|
||||
ENV DEBIAN_FRONTEND=noninteractive
|
||||
|
||||
ENV TESS_BASE_PATH=/usr/share/tesseract-ocr/5/tessdata
|
||||
|
||||
# Install core runtime dependencies + tools required by Stirling-PDF features
|
||||
RUN apt-get update && apt-get install -y --no-install-recommends \
|
||||
ca-certificates tzdata tini bash fontconfig \
|
||||
openjdk-21-jre-headless \
|
||||
ffmpeg poppler-utils ocrmypdf \
|
||||
libreoffice-nogui libreoffice-java-common \
|
||||
python3 python3-venv python3-uno \
|
||||
tesseract-ocr tesseract-ocr-eng tesseract-ocr-deu tesseract-ocr-fra \
|
||||
tesseract-ocr-por tesseract-ocr-chi-sim \
|
||||
libcairo2 libpango-1.0-0 libpangoft2-1.0-0 libgdk-pixbuf-2.0-0 \
|
||||
gosu unpaper \
|
||||
# AWT headless support (required for some Java graphics operations)
|
||||
libfreetype6 libfontconfig1 libx11-6 libxt6 libxext6 libxrender1 libxtst6 libxi6 \
|
||||
libxinerama1 libxkbcommon0 libxkbfile1 libsm6 libice6 \
|
||||
# Qt WebEngine dependencies for Calibre
|
||||
libegl1 libopengl0 libgl1 libxdamage1 libxfixes3 libxshmfence1 libdrm2 libgbm1 \
|
||||
libxkbcommon-x11-0 libxrandr2 libxcomposite1 libnss3 libx11-xcb1 \
|
||||
libxcb-cursor0 libdbus-1-3 libglib2.0-0 \
|
||||
# Virtual framebuffer (required for headless LibreOffice)
|
||||
xvfb x11-utils coreutils \
|
||||
# Temporary packages only needed for Calibre installer
|
||||
xz-utils gpgv curl xdg-utils \
|
||||
\
|
||||
# Install Calibre from official installer script
|
||||
&& curl -fsSL https://download.calibre-ebook.com/linux-installer.sh | sh /dev/stdin \
|
||||
\
|
||||
# Clean up installer-only packages
|
||||
&& apt-get purge -y xz-utils gpgv xdg-utils \
|
||||
&& apt-get autoremove -y \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# Make ebook-convert available in PATH
|
||||
RUN ln -sf /opt/calibre/ebook-convert /usr/bin/ebook-convert \
|
||||
&& /opt/calibre/ebook-convert --version
|
||||
|
||||
# ==============================================================================
|
||||
# Create non-root user (stirlingpdfuser) with configurable UID/GID
|
||||
# ==============================================================================
|
||||
ARG PUID=1000
|
||||
ARG PGID=1000
|
||||
|
||||
RUN set -eux; \
|
||||
# Create group if it doesn't exist
|
||||
if ! getent group stirlingpdfgroup >/dev/null 2>&1; then \
|
||||
if getent group "${PGID}" >/dev/null 2>&1; then \
|
||||
groupadd -o -g "${PGID}" stirlingpdfgroup; \
|
||||
else \
|
||||
groupadd -g "${PGID}" stirlingpdfgroup; \
|
||||
fi; \
|
||||
fi; \
|
||||
# Create user if it doesn't exist, avoid UID conflicts
|
||||
if ! id -u stirlingpdfuser >/dev/null 2>&1; then \
|
||||
if getent passwd | awk -F: -v id="${PUID}" '$3==id{found=1} END{exit !found}'; then \
|
||||
echo "UID ${PUID} already in use – creating stirlingpdfuser with automatic UID"; \
|
||||
useradd -m -g stirlingpdfgroup -d /home/stirlingpdfuser -s /bin/bash stirlingpdfuser; \
|
||||
else \
|
||||
useradd -m -u "${PUID}" -g stirlingpdfgroup -d /home/stirlingpdfuser -s /bin/bash stirlingpdfuser; \
|
||||
fi; \
|
||||
fi
|
||||
|
||||
# Compatibility alias for older entrypoint scripts expecting su-exec
|
||||
RUN ln -sf /usr/sbin/gosu /usr/local/bin/su-exec
|
||||
|
||||
# Copy application files from build stage
|
||||
COPY scripts/ /scripts/
|
||||
COPY app/core/src/main/resources/static/fonts/*.ttf /usr/share/fonts/truetype/
|
||||
COPY app/core/build/libs/*.jar app.jar
|
||||
|
||||
# Optional version tag (can be passed at build time)
|
||||
ARG VERSION_TAG
|
||||
|
||||
LABEL org.opencontainers.image.title="Stirling-PDF"
|
||||
@@ -20,91 +97,68 @@ LABEL org.opencontainers.image.authors="Stirling-Tools"
|
||||
LABEL org.opencontainers.image.version="${VERSION_TAG}"
|
||||
LABEL org.opencontainers.image.keywords="PDF, manipulation, merge, split, convert, OCR, watermark"
|
||||
|
||||
# Set Environment Variables
|
||||
# ==============================================================================
|
||||
# Runtime environment variables
|
||||
# ==============================================================================
|
||||
ENV DISABLE_ADDITIONAL_FEATURES=true \
|
||||
VERSION_TAG=$VERSION_TAG \
|
||||
JAVA_BASE_OPTS="-XX:+UnlockExperimentalVMOptions -XX:MaxRAMPercentage=75 -XX:InitiatingHeapOccupancyPercent=20 -XX:+G1PeriodicGCInvokesConcurrent -XX:G1PeriodicGCInterval=10000 -XX:+UseStringDeduplication -XX:G1PeriodicGCSystemLoadThreshold=70" \
|
||||
JAVA_BASE_OPTS="-XX:+UnlockExperimentalVMOptions -XX:MaxRAMPercentage=75 -XX:InitiatingHeapOccupancyPercent=20 \
|
||||
-XX:+G1PeriodicGCInvokesConcurrent -XX:G1PeriodicGCInterval=10000 \
|
||||
-XX:+UseStringDeduplication -XX:G1PeriodicGCSystemLoadThreshold=70 \
|
||||
-Djava.awt.headless=true" \
|
||||
JAVA_CUSTOM_OPTS="" \
|
||||
HOME=/home/stirlingpdfuser \
|
||||
PUID=1000 \
|
||||
PGID=1000 \
|
||||
PUID=${PUID} \
|
||||
PGID=${PGID} \
|
||||
UMASK=022 \
|
||||
PYTHONPATH=/usr/lib/libreoffice/program:/opt/venv/lib/python3.12/site-packages \
|
||||
UNO_PATH=/usr/lib/libreoffice/program \
|
||||
URE_BOOTSTRAP=file:///usr/lib/libreoffice/program/fundamentalrc \
|
||||
PATH=$PATH:/opt/venv/bin \
|
||||
STIRLING_TEMPFILES_DIRECTORY=/tmp/stirling-pdf \
|
||||
TMPDIR=/tmp/stirling-pdf \
|
||||
TEMP=/tmp/stirling-pdf \
|
||||
TMP=/tmp/stirling-pdf
|
||||
|
||||
# JDK for app
|
||||
RUN apk add --no-cache bash \
|
||||
&& ln -sf /bin/bash /bin/sh \
|
||||
&& printf '%s\n' \
|
||||
'https://dl-cdn.alpinelinux.org/alpine/edge/main' \
|
||||
'https://dl-cdn.alpinelinux.org/alpine/edge/community' \
|
||||
'https://dl-cdn.alpinelinux.org/alpine/edge/testing' \
|
||||
> /etc/apk/repositories && \
|
||||
apk upgrade --no-cache -a && \
|
||||
apk add --no-cache \
|
||||
ca-certificates \
|
||||
tzdata \
|
||||
tini \
|
||||
bash \
|
||||
curl \
|
||||
shadow \
|
||||
su-exec \
|
||||
openssl \
|
||||
openssl-dev \
|
||||
openjdk21-jre \
|
||||
ffmpeg \
|
||||
# Doc conversion
|
||||
gcompat \
|
||||
libc6-compat \
|
||||
libreoffice \
|
||||
# pdftohtml
|
||||
poppler-utils \
|
||||
# OCR MY PDF (unpaper for descew and other advanced features)
|
||||
tesseract-ocr-data-eng \
|
||||
tesseract-ocr-data-chi_sim \
|
||||
tesseract-ocr-data-deu \
|
||||
tesseract-ocr-data-fra \
|
||||
tesseract-ocr-data-por \
|
||||
unpaper \
|
||||
# CV / Python
|
||||
py3-opencv \
|
||||
python3 \
|
||||
ocrmypdf \
|
||||
py3-pip \
|
||||
py3-pillow \
|
||||
py3-pdf2image \
|
||||
# Calibre
|
||||
calibre \
|
||||
# URW Base 35 fonts for better PDF rendering
|
||||
font-urw-base35 && \
|
||||
# Calibre fixes
|
||||
apk fix --no-cache calibre && \
|
||||
python3 -m venv /opt/venv && \
|
||||
/opt/venv/bin/pip install --no-cache-dir --upgrade pip setuptools && \
|
||||
/opt/venv/bin/pip install --no-cache-dir --upgrade unoserver weasyprint && \
|
||||
ln -s /usr/lib/libreoffice/program/uno.py /opt/venv/lib/python3.12/site-packages/ && \
|
||||
ln -s /usr/lib/libreoffice/program/unohelper.py /opt/venv/lib/python3.12/site-packages/ && \
|
||||
ln -s /usr/lib/libreoffice/program /opt/venv/lib/python3.12/site-packages/LibreOffice && \
|
||||
mv /usr/share/tessdata /usr/share/tessdata-original && \
|
||||
mkdir -p $HOME /configs /logs /customFiles /pipeline/watchedFolders /pipeline/finishedFolders /tmp/stirling-pdf && \
|
||||
# Configure URW Base 35 fonts
|
||||
ln -s /usr/share/fontconfig/conf.avail/69-urw-*.conf /etc/fonts/conf.d/ && \
|
||||
fc-cache -f -v && \
|
||||
chmod +x /scripts/* && \
|
||||
# User permissions
|
||||
addgroup -S stirlingpdfgroup && adduser -S stirlingpdfuser -G stirlingpdfgroup && \
|
||||
chown -R stirlingpdfuser:stirlingpdfgroup $HOME /scripts /usr/share/fonts/opentype/noto /configs /customFiles /pipeline /tmp/stirling-pdf && \
|
||||
chown stirlingpdfuser:stirlingpdfgroup /app.jar && \
|
||||
ln -sf /bin/busybox /bin/sh
|
||||
# ==============================================================================
|
||||
# Python virtual environment for additional Python tools (WeasyPrint, OpenCV, etc.)
|
||||
# ==============================================================================
|
||||
RUN python3 -m venv /opt/venv --system-site-packages \
|
||||
&& /opt/venv/bin/pip install --no-cache-dir weasyprint pdf2image opencv-python-headless \
|
||||
&& /opt/venv/bin/python -c "import cv2; print('OpenCV version:', cv2.__version__)"
|
||||
|
||||
# Separate venv for unoserver (keeps it isolated)
|
||||
RUN python3 -m venv /opt/unoserver-venv --system-site-packages \
|
||||
&& /opt/unoserver-venv/bin/pip install --no-cache-dir unoserver
|
||||
|
||||
# Make unoserver tools available in main venv PATH
|
||||
RUN ln -sf /opt/unoserver-venv/bin/unoconvert /opt/venv/bin/unoconvert \
|
||||
&& ln -sf /opt/unoserver-venv/bin/unoserver /opt/venv/bin/unoserver
|
||||
|
||||
# Extend PATH to include both virtual environments
|
||||
ENV PATH="/opt/venv/bin:/opt/unoserver-venv/bin:${PATH}"
|
||||
|
||||
# ==============================================================================
|
||||
# Final permissions, directories and font cache
|
||||
# ==============================================================================
|
||||
RUN set -eux; \
|
||||
chmod +x /scripts/*; \
|
||||
mkdir -p /configs /logs /customFiles /pipeline/watchedFolders /pipeline/finishedFolders /tmp/stirling-pdf; \
|
||||
chown -R stirlingpdfuser:stirlingpdfgroup \
|
||||
/home/stirlingpdfuser /configs /logs /customFiles /pipeline /tmp/stirling-pdf \
|
||||
/app.jar /usr/share/fonts/truetype /scripts; \
|
||||
chmod -R 755 /tmp/stirling-pdf
|
||||
|
||||
# Rebuild font cache
|
||||
RUN fc-cache -f -v
|
||||
|
||||
# Force Qt/WebEngine to run headlessly (required for Calibre in Docker)
|
||||
ENV QT_QPA_PLATFORM=offscreen \
|
||||
QTWEBENGINE_CHROMIUM_FLAGS="--disable-gpu --disable-dev-shm-usage"
|
||||
|
||||
# Expose web UI port
|
||||
EXPOSE 8080/tcp
|
||||
|
||||
# Set user and run command
|
||||
STOPSIGNAL SIGTERM
|
||||
|
||||
# Use tini as init (handles signals and zombies correctly)
|
||||
ENTRYPOINT ["tini", "--", "/scripts/init.sh"]
|
||||
CMD ["sh", "-c", "java -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/tmp/stirling-pdf -jar /app.jar & /opt/venv/bin/unoserver --port 2003 --interface 127.0.0.1"]
|
||||
|
||||
# CMD is empty – actual start command is defined in init.sh
|
||||
CMD []
|
||||
|
||||
Reference in New Issue
Block a user