mirror of
https://github.com/Frooodle/Stirling-PDF.git
synced 2025-12-18 20:04:17 +01:00
# Description of Changes ### What was changed This PR introduces a major refinement to the Docker runtime, system path resolution, conversion tooling, and integration logic across the codebase. Key improvements include: - Migration of **Dockerfile**, **Dockerfile.fat** to a unified Debian-based environment. - Introduction of **RuntimePathConfig** enhancements to dynamically resolve: - `weasyprint`, `unoconvert`, `calibre`, `ocrmypdf`, `soffice` - Tesseract `tessdata` paths with Docker-aware defaults. - Support for **UNO server (unoserver/unoconvert)** as primary document converter with automatic fallback to `soffice`. - Isolation of Python environments for WeasyPrint and UNO tooling. - Updated controllers and services to correctly inject `RuntimePathConfig`. - Improved process execution logic in converters and OCR handling. - Major updates to `init.sh` and `init-without-ocr.sh`: - Unified environment initialization - Proper UID/GID remapping - Safer permissions handling - Automatic Tesseract path detection - Reliable startup of headless LibreOffice + Xvfb + UNO server - Full test suite updates: - Adaptation to new conversion paths - Mocking of UNO and LibreOffice commands - More robust Docker test logic - Updated example docker-compose files referencing GHCR test images. - Expanded configuration schema for new operations paths. ### Why the change was made These changes address long-standing issues around: - Inconsistent or missing binary paths between image variants. - Reduced reliability of document conversions (UNO vs. soffice). - Lack of uniform runtime initialization across Docker images. - Repetitive environment setup logic split across multiple scripts. - Fragile test scenarios tied to Alpine-based images. Switching to a unified Debian-based runtime significantly improves: - Compatibility with LibreOffice, Calibre, WebEngine and graphics stack. - UNO stability for document conversions. - Tesseract deterministic behavior. - Debuggability and reliability of CI/CD Docker-based tests. The improvements to `RuntimePathConfig` ensure all system binaries are fully configurable and correctly detected at runtime. --- ## Checklist ### General - [x] I have read the [Contribution Guidelines](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/CONTRIBUTING.md) - [x] I have read the [Stirling-PDF Developer Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md) (if applicable) - [ ] I have read the [How to add new languages to Stirling-PDF](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md) (if applicable) - [x] I have performed a self-review of my own code - [x] My changes generate no new warnings ### Documentation - [ ] I have updated relevant docs on [Stirling-PDF's doc repo](https://github.com/Stirling-Tools/Stirling-Tools.github.io/blob/main/docs/) (if functionality has heavily changed) - [ ] I have read the section [Add New Translation Tags](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md#add-new-translation-tags) (for new translation tags only) ### Translations (if applicable) - [ ] I ran [`scripts/counter_translation.py`](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/docs/counter_translation.md) ### UI Changes (if applicable) - [ ] Screenshots or videos demonstrating the UI changes are attached (e.g., as comments or direct attachments in the PR) ### Testing (if applicable) - [x] I have tested my changes locally. Refer to the [Testing Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md#6-testing) for more details.
165 lines
7.4 KiB
Docker
165 lines
7.4 KiB
Docker
# ==============================================================================
|
||
# Multi-stage Dockerfile for Stirling-PDF – image with everything included
|
||
# Includes: LibreOffice, Calibre, Tesseract, OCRmyPDF, unoserver, WeasyPrint, etc.
|
||
# ==============================================================================
|
||
|
||
# ========================================
|
||
# STAGE 1: Runtime image based on Debian stable-slim
|
||
# Contains Java runtime + LibreOffice + Calibre + all PDF tools
|
||
# ========================================
|
||
FROM debian:stable-slim@sha256:7cb087f19bcc175b96fbe4c2aef42ed00733a659581a80f6ebccfd8fe3185a3d
|
||
|
||
SHELL ["/bin/bash", "-o", "pipefail", "-c"]
|
||
ENV DEBIAN_FRONTEND=noninteractive
|
||
|
||
ENV TESS_BASE_PATH=/usr/share/tesseract-ocr/5/tessdata
|
||
|
||
# Install core runtime dependencies + tools required by Stirling-PDF features
|
||
RUN apt-get update && apt-get install -y --no-install-recommends \
|
||
ca-certificates tzdata tini bash fontconfig \
|
||
openjdk-21-jre-headless \
|
||
ffmpeg poppler-utils ocrmypdf \
|
||
libreoffice-nogui libreoffice-java-common \
|
||
python3 python3-venv python3-uno \
|
||
tesseract-ocr tesseract-ocr-eng tesseract-ocr-deu tesseract-ocr-fra \
|
||
tesseract-ocr-por tesseract-ocr-chi-sim \
|
||
libcairo2 libpango-1.0-0 libpangoft2-1.0-0 libgdk-pixbuf-2.0-0 \
|
||
gosu unpaper \
|
||
# AWT headless support (required for some Java graphics operations)
|
||
libfreetype6 libfontconfig1 libx11-6 libxt6 libxext6 libxrender1 libxtst6 libxi6 \
|
||
libxinerama1 libxkbcommon0 libxkbfile1 libsm6 libice6 \
|
||
# Qt WebEngine dependencies for Calibre
|
||
libegl1 libopengl0 libgl1 libxdamage1 libxfixes3 libxshmfence1 libdrm2 libgbm1 \
|
||
libxkbcommon-x11-0 libxrandr2 libxcomposite1 libnss3 libx11-xcb1 \
|
||
libxcb-cursor0 libdbus-1-3 libglib2.0-0 \
|
||
# Virtual framebuffer (required for headless LibreOffice)
|
||
xvfb x11-utils coreutils \
|
||
# Temporary packages only needed for Calibre installer
|
||
xz-utils gpgv curl xdg-utils \
|
||
\
|
||
# Install Calibre from official installer script
|
||
&& curl -fsSL https://download.calibre-ebook.com/linux-installer.sh | sh /dev/stdin \
|
||
\
|
||
# Clean up installer-only packages
|
||
&& apt-get purge -y xz-utils gpgv xdg-utils \
|
||
&& apt-get autoremove -y \
|
||
&& rm -rf /var/lib/apt/lists/*
|
||
|
||
# Make ebook-convert available in PATH
|
||
RUN ln -sf /opt/calibre/ebook-convert /usr/bin/ebook-convert \
|
||
&& /opt/calibre/ebook-convert --version
|
||
|
||
# ==============================================================================
|
||
# Create non-root user (stirlingpdfuser) with configurable UID/GID
|
||
# ==============================================================================
|
||
ARG PUID=1000
|
||
ARG PGID=1000
|
||
|
||
RUN set -eux; \
|
||
# Create group if it doesn't exist
|
||
if ! getent group stirlingpdfgroup >/dev/null 2>&1; then \
|
||
if getent group "${PGID}" >/dev/null 2>&1; then \
|
||
groupadd -o -g "${PGID}" stirlingpdfgroup; \
|
||
else \
|
||
groupadd -g "${PGID}" stirlingpdfgroup; \
|
||
fi; \
|
||
fi; \
|
||
# Create user if it doesn't exist, avoid UID conflicts
|
||
if ! id -u stirlingpdfuser >/dev/null 2>&1; then \
|
||
if getent passwd | awk -F: -v id="${PUID}" '$3==id{found=1} END{exit !found}'; then \
|
||
echo "UID ${PUID} already in use – creating stirlingpdfuser with automatic UID"; \
|
||
useradd -m -g stirlingpdfgroup -d /home/stirlingpdfuser -s /bin/bash stirlingpdfuser; \
|
||
else \
|
||
useradd -m -u "${PUID}" -g stirlingpdfgroup -d /home/stirlingpdfuser -s /bin/bash stirlingpdfuser; \
|
||
fi; \
|
||
fi
|
||
|
||
# Compatibility alias for older entrypoint scripts expecting su-exec
|
||
RUN ln -sf /usr/sbin/gosu /usr/local/bin/su-exec
|
||
|
||
# Copy application files from build stage
|
||
COPY scripts/ /scripts/
|
||
COPY app/core/src/main/resources/static/fonts/*.ttf /usr/share/fonts/truetype/
|
||
COPY app/core/build/libs/*.jar app.jar
|
||
|
||
# Optional version tag (can be passed at build time)
|
||
ARG VERSION_TAG
|
||
|
||
LABEL org.opencontainers.image.title="Stirling-PDF"
|
||
LABEL org.opencontainers.image.description="A powerful locally hosted web-based PDF manipulation tool supporting 50+ operations including merging, splitting, conversion, OCR, watermarking, and more."
|
||
LABEL org.opencontainers.image.source="https://github.com/Stirling-Tools/Stirling-PDF"
|
||
LABEL org.opencontainers.image.licenses="MIT"
|
||
LABEL org.opencontainers.image.vendor="Stirling-Tools"
|
||
LABEL org.opencontainers.image.url="https://www.stirlingpdf.com"
|
||
LABEL org.opencontainers.image.documentation="https://docs.stirlingpdf.com"
|
||
LABEL maintainer="Stirling-Tools"
|
||
LABEL org.opencontainers.image.authors="Stirling-Tools"
|
||
LABEL org.opencontainers.image.version="${VERSION_TAG}"
|
||
LABEL org.opencontainers.image.keywords="PDF, manipulation, merge, split, convert, OCR, watermark"
|
||
|
||
# ==============================================================================
|
||
# Runtime environment variables
|
||
# ==============================================================================
|
||
ENV DISABLE_ADDITIONAL_FEATURES=true \
|
||
JAVA_BASE_OPTS="-XX:+UnlockExperimentalVMOptions -XX:MaxRAMPercentage=75 -XX:InitiatingHeapOccupancyPercent=20 \
|
||
-XX:+G1PeriodicGCInvokesConcurrent -XX:G1PeriodicGCInterval=10000 \
|
||
-XX:+UseStringDeduplication -XX:G1PeriodicGCSystemLoadThreshold=70 \
|
||
-Djava.awt.headless=true" \
|
||
JAVA_CUSTOM_OPTS="" \
|
||
HOME=/home/stirlingpdfuser \
|
||
PUID=${PUID} \
|
||
PGID=${PGID} \
|
||
UMASK=022 \
|
||
UNO_PATH=/usr/lib/libreoffice/program \
|
||
STIRLING_TEMPFILES_DIRECTORY=/tmp/stirling-pdf \
|
||
TMPDIR=/tmp/stirling-pdf \
|
||
TEMP=/tmp/stirling-pdf \
|
||
TMP=/tmp/stirling-pdf
|
||
|
||
# ==============================================================================
|
||
# Python virtual environment for additional Python tools (WeasyPrint, OpenCV, etc.)
|
||
# ==============================================================================
|
||
RUN python3 -m venv /opt/venv --system-site-packages \
|
||
&& /opt/venv/bin/pip install --no-cache-dir weasyprint pdf2image opencv-python-headless \
|
||
&& /opt/venv/bin/python -c "import cv2; print('OpenCV version:', cv2.__version__)"
|
||
|
||
# Separate venv for unoserver (keeps it isolated)
|
||
RUN python3 -m venv /opt/unoserver-venv --system-site-packages \
|
||
&& /opt/unoserver-venv/bin/pip install --no-cache-dir unoserver
|
||
|
||
# Make unoserver tools available in main venv PATH
|
||
RUN ln -sf /opt/unoserver-venv/bin/unoconvert /opt/venv/bin/unoconvert \
|
||
&& ln -sf /opt/unoserver-venv/bin/unoserver /opt/venv/bin/unoserver
|
||
|
||
# Extend PATH to include both virtual environments
|
||
ENV PATH="/opt/venv/bin:/opt/unoserver-venv/bin:${PATH}"
|
||
|
||
# ==============================================================================
|
||
# Final permissions, directories and font cache
|
||
# ==============================================================================
|
||
RUN set -eux; \
|
||
chmod +x /scripts/*; \
|
||
mkdir -p /configs /logs /customFiles /pipeline/watchedFolders /pipeline/finishedFolders /tmp/stirling-pdf; \
|
||
chown -R stirlingpdfuser:stirlingpdfgroup \
|
||
/home/stirlingpdfuser /configs /logs /customFiles /pipeline /tmp/stirling-pdf \
|
||
/app.jar /usr/share/fonts/truetype /scripts; \
|
||
chmod -R 755 /tmp/stirling-pdf
|
||
|
||
# Rebuild font cache
|
||
RUN fc-cache -f -v
|
||
|
||
# Force Qt/WebEngine to run headlessly (required for Calibre in Docker)
|
||
ENV QT_QPA_PLATFORM=offscreen \
|
||
QTWEBENGINE_CHROMIUM_FLAGS="--disable-gpu --disable-dev-shm-usage"
|
||
|
||
# Expose web UI port
|
||
EXPOSE 8080/tcp
|
||
|
||
STOPSIGNAL SIGTERM
|
||
|
||
# Use tini as init (handles signals and zombies correctly)
|
||
ENTRYPOINT ["tini", "--", "/scripts/init.sh"]
|
||
|
||
# CMD is empty – actual start command is defined in init.sh
|
||
CMD []
|