feat(docker-runtime): unified Debian-based images, dynamic path resolution & enhanced UNO/LibreOffice handling (#4880)

# Description of Changes

### What was changed

This PR introduces a major refinement to the Docker runtime, system path
resolution, conversion tooling, and integration logic across the
codebase. Key improvements include:

- Migration of **Dockerfile**, **Dockerfile.fat** to a unified
Debian-based environment.
- Introduction of **RuntimePathConfig** enhancements to dynamically
resolve:
  - `weasyprint`, `unoconvert`, `calibre`, `ocrmypdf`, `soffice`
  - Tesseract `tessdata` paths with Docker-aware defaults.
- Support for **UNO server (unoserver/unoconvert)** as primary document
converter with automatic fallback to `soffice`.
- Isolation of Python environments for WeasyPrint and UNO tooling.
- Updated controllers and services to correctly inject
`RuntimePathConfig`.
- Improved process execution logic in converters and OCR handling.
- Major updates to `init.sh` and `init-without-ocr.sh`:
  - Unified environment initialization
  - Proper UID/GID remapping
  - Safer permissions handling
  - Automatic Tesseract path detection
  - Reliable startup of headless LibreOffice + Xvfb + UNO server
- Full test suite updates:
  - Adaptation to new conversion paths
  - Mocking of UNO and LibreOffice commands
  - More robust Docker test logic
- Updated example docker-compose files referencing GHCR test images.
- Expanded configuration schema for new operations paths.

### Why the change was made

These changes address long-standing issues around:

- Inconsistent or missing binary paths between image variants.
- Reduced reliability of document conversions (UNO vs. soffice).
- Lack of uniform runtime initialization across Docker images.
- Repetitive environment setup logic split across multiple scripts.
- Fragile test scenarios tied to Alpine-based images.

Switching to a unified Debian-based runtime significantly improves:

- Compatibility with LibreOffice, Calibre, WebEngine and graphics stack.
- UNO stability for document conversions.
- Tesseract deterministic behavior.
- Debuggability and reliability of CI/CD Docker-based tests.

The improvements to `RuntimePathConfig` ensure all system binaries are
fully configurable and correctly detected at runtime.

---

## Checklist

### General

- [x] I have read the [Contribution
Guidelines](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/CONTRIBUTING.md)
- [x] I have read the [Stirling-PDF Developer
Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md)
(if applicable)
- [ ] I have read the [How to add new languages to
Stirling-PDF](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md)
(if applicable)
- [x] I have performed a self-review of my own code
- [x] My changes generate no new warnings

### Documentation

- [ ] I have updated relevant docs on [Stirling-PDF's doc
repo](https://github.com/Stirling-Tools/Stirling-Tools.github.io/blob/main/docs/)
(if functionality has heavily changed)
- [ ] I have read the section [Add New Translation
Tags](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md#add-new-translation-tags)
(for new translation tags only)

### Translations (if applicable)

- [ ] I ran
[`scripts/counter_translation.py`](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/docs/counter_translation.md)

### UI Changes (if applicable)

- [ ] Screenshots or videos demonstrating the UI changes are attached
(e.g., as comments or direct attachments in the PR)

### Testing (if applicable)

- [x] I have tested my changes locally. Refer to the [Testing
Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md#6-testing)
for more details.
This commit is contained in:
Ludy 2025-11-25 00:07:54 +01:00 committed by GitHub
parent 43345021bf
commit 886f9b379e
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
31 changed files with 1292 additions and 440 deletions

View File

@ -6,22 +6,27 @@ app: &app
- app/(common|core|proprietary)/src/main/java/**
openapi: &openapi
- build.gradle
- app/(common|core|proprietary)/build.gradle
- app/(common|core|proprietary)/src/main/java/**
- *build
- *app
project: &project
- app/(common|core|proprietary)/src/(main|test)/java/**
- app/(common|core|proprietary)/build.gradle
- 'app/(common|core|proprietary)/src/(main|test)/resources/**/!(messages_*.properties|*.md)*'
- exampleYmlFiles/**
- gradle/**
- libs/**
- 'testing/**/!(requirements*.txt|requirements*.in)*'
- build.gradle
docker: &docker
- Dockerfile
- Dockerfile.fat
- Dockerfile.ultra-lite
- ".github/workflows/build.yml"
- scripts/init.sh
- scripts/init-without-ocr.sh
- exampleYmlFiles/**
project: &project
- app/(common|core|proprietary)/src/(main|test)/java/**
- *build
- "app/(common|core|proprietary)/src/(main|test)/resources/**/!(messages_*.properties|*.md)*"
- exampleYmlFiles/**
- gradle/**
- libs/**
- "testing/**/!(requirements*.txt|requirements*.in)*"
- *docker
- gradle.properties
- gradlew
- gradlew.bat

View File

@ -33,6 +33,7 @@ jobs:
app: ${{ steps.changes.outputs.app }}
project: ${{ steps.changes.outputs.project }}
openapi: ${{ steps.changes.outputs.openapi }}
docker: ${{ steps.changes.outputs.docker }}
steps:
- uses: actions/checkout@93cb6efe18208431cddfb8368fd83d5badbf9bfd # v5.0.1
@ -68,14 +69,10 @@ jobs:
with:
java-version: ${{ matrix.jdk-version }}
distribution: "temurin"
- name: Setup Gradle
uses: gradle/actions/setup-gradle@4d9f0ba0025fe599b4ebab900eb7f3a1d93ef4c2 # v5.0.0
with:
gradle-version: 8.14
cache: gradle
- name: Build with Gradle and spring security ${{ matrix.spring-security }}
run: ./gradlew clean build
run: ./gradlew clean build -x spotlessApply -x spotlessCheck -x sonarqube
env:
DISABLE_ADDITIONAL_FEATURES: ${{ matrix.spring-security }}
@ -100,12 +97,14 @@ jobs:
if [ ${#missing_reports[@]} -gt 0 ]; then
echo "ERROR: The following required test report directories are missing:"
printf '%s\n' "${missing_reports[@]}"
exit 1
echo "reports-present=false" >> "$GITHUB_OUTPUT"
else
echo "All required test report directories are present"
echo "reports-present=true" >> "$GITHUB_OUTPUT"
fi
echo "All required test report directories are present"
- name: Upload Test Reports
if: always()
if: always() && steps.check-reports.outputs.reports-present == 'true'
uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0
with:
name: test-reports-jdk-${{ matrix.jdk-version }}-spring-security-${{ matrix.spring-security }}
@ -127,6 +126,7 @@ jobs:
if-no-files-found: warn
- name: Add coverage to PR with spring security ${{ matrix.spring-security }} and JDK ${{ matrix.jdk-version }}
if: steps.check-reports.outputs.reports-present == 'true'
id: jacoco
uses: madrapps/jacoco-report@50d3aff4548aa991e6753342d9ba291084e63848 # v1.7.2
with:
@ -155,15 +155,13 @@ jobs:
with:
java-version: "17"
distribution: "temurin"
- name: Setup Gradle
uses: gradle/actions/setup-gradle@4d9f0ba0025fe599b4ebab900eb7f3a1d93ef4c2 # v5.0.0
cache: gradle
- name: Generate OpenAPI documentation
run: ./gradlew :stirling-pdf:generateOpenApiDocs
env:
DISABLE_ADDITIONAL_FEATURES: true
- name: Upload OpenAPI Documentation
uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0
with:
@ -188,6 +186,7 @@ jobs:
with:
java-version: "17"
distribution: "temurin"
cache: gradle
- name: Check licenses for compatibility
run: ./gradlew clean checkLicense
@ -205,8 +204,14 @@ jobs:
retention-days: 3
docker-compose-tests:
if: needs.files-changed.outputs.project == 'true'
needs: files-changed
if: |
needs.files-changed.outputs.project == 'true' &&
(
needs.files-changed.outputs.docker != 'true' ||
needs.test-build-docker-images.result == 'success' ||
needs.test-build-docker-images.result == 'skipped'
)
needs: [files-changed, test-build-docker-images]
# if: github.event_name == 'push' && github.ref == 'refs/heads/main' ||
# (github.event_name == 'pull_request' &&
# contains(github.event.pull_request.labels.*.name, 'licenses') == false &&
@ -237,20 +242,21 @@ jobs:
with:
java-version: "17"
distribution: "temurin"
cache: gradle
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@e468171a9de216ec08956ac3ada2f0791b6bd435 # v3.11.1
- name: Install Docker Compose
run: |
sudo curl -SL "https://github.com/docker/compose/releases/download/v2.37.2/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo curl -SL "https://github.com/docker/compose/releases/download/v2.40.3/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose
- name: Set up Python
uses: actions/setup-python@e797f83bcb11b83ae66e0230d6156d7c80228e7c # v6.0.0
with:
python-version: "3.12"
cache: 'pip' # caching pip dependencies
cache: "pip" # caching pip dependencies
cache-dependency-path: ./testing/cucumber/requirements.txt
- name: Pip requirements
@ -265,13 +271,22 @@ jobs:
./testing/test.sh
test-build-docker-images:
if: github.event_name == 'pull_request' && needs.files-changed.outputs.project == 'true'
if: github.event_name == 'pull_request' && needs.files-changed.outputs.docker == 'true'
needs: [files-changed, build]
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
strategy:
fail-fast: false
matrix:
docker-rev: ["Dockerfile", "Dockerfile.ultra-lite", "Dockerfile.fat"]
docker:
- name: "Dockerfile.ultra-lite"
tag: "ultra-lite"
- name: "Dockerfile.fat"
tag: "fat"
- name: "Dockerfile"
tag: "latest"
steps:
- name: Harden Runner
uses: step-security/harden-runner@95d9a5deda9de15063e7595e9719c11c38c90ae2 # v2.13.2
@ -286,46 +301,220 @@ jobs:
with:
java-version: "17"
distribution: "temurin"
- name: Set up Gradle
uses: gradle/actions/setup-gradle@4d9f0ba0025fe599b4ebab900eb7f3a1d93ef4c2 # v5.0.0
with:
gradle-version: 8.14
cache: gradle
- name: Build application
run: ./gradlew clean build
run: ./gradlew clean build -x spotlessApply -x spotlessCheck -x test -x sonarqube
env:
DISABLE_ADDITIONAL_FEATURES: true
STIRLING_PDF_DESKTOP_UI: false
# - name: Free disk space on runner
# run: |
# echo "Disk space before cleanup:" && df -h
# sudo rm -rf /usr/share/dotnet /opt/ghc /usr/local/lib/android /usr/local/share/boost
# docker system prune -af || true
# echo "Disk space after cleanup:" && df -h
- name: Set up QEMU
uses: docker/setup-qemu-action@c7c53464625b32c7a7e944ae62b3e17d2b600130 # v3.7.0
with:
platforms: linux/amd64,linux/arm64/v8
- name: Set up Docker Buildx
id: buildx
uses: docker/setup-buildx-action@e468171a9de216ec08956ac3ada2f0791b6bd435 # v3.11.1
with:
platforms: linux/amd64,linux/arm64/v8
- name: Build ${{ matrix.docker-rev }}
- name: Prepare branch tag
id: branch_tag
shell: bash
run: |
BRANCH_SOURCE="${GITHUB_HEAD_REF:-${GITHUB_REF_NAME}}"
BRANCH_LOWER=$(echo "$BRANCH_SOURCE" | tr '[:upper:]' '[:lower:]')
SAFE_BRANCH=$(echo "$BRANCH_LOWER" | sed 's/[^a-z0-9_.-]/-/g' | sed 's/^-\+//' | sed 's/-\+$//' | sed 's/--\+/-/g')
if [ -z "$SAFE_BRANCH" ]; then
SAFE_BRANCH="branch"
fi
SHORT_SHA=$(echo "${GITHUB_SHA:-${{ github.sha }}}" | cut -c1-8)
echo "safe_branch=$SAFE_BRANCH" >> "$GITHUB_OUTPUT"
echo "short_sha=$SHORT_SHA" >> "$GITHUB_OUTPUT"
- name: Convert repository owner to lowercase
id: repoowner
run: echo "lowercase=$(echo ${{ github.repository_owner }} | tr '[:upper:]' '[:lower:]')" >> $GITHUB_OUTPUT
- name: Docker meta
id: meta
uses: docker/metadata-action@c1e51972afc2121e065aed6d45c65596fe445f3f # v5.8.0
with:
images: |
# ${{ secrets.DOCKER_HUB_USERNAME }}/stirling-pdf-test
ghcr.io/${{ steps.repoowner.outputs.lowercase }}/stirling-pdf-test
flavor: |
latest=false
tags: |
type=raw,value=${{ matrix.docker.tag }},enable=true
# type=raw,value=${{ matrix.docker.tag }}-${{ steps.branch_tag.outputs.safe_branch }},enable=true
# type=raw,value=${{ matrix.docker.tag }}-${{ steps.branch_tag.outputs.safe_branch }}-${{ steps.branch_tag.outputs.short_sha }},enable=true
labels: |
org.opencontainers.image.title=Stirling-PDF Test
org.opencontainers.image.description=CI test image for Stirling-PDF
org.opencontainers.image.url=https://www.stirlingpdf.com
org.opencontainers.image.documentation=https://docs.stirlingpdf.com
org.opencontainers.image.authors=Stirling-Tools
org.opencontainers.image.licenses=MIT
org.opencontainers.image.version=${{ matrix.docker.tag }}
org.opencontainers.image.revision=${{ github.sha }}
org.opencontainers.image.source=${{ github.repository }}
maintainer=Stirling-Tools
- name: Choose primary tag for tests
id: testtag
shell: bash
run: |
IMAGE="ghcr.io/${{ steps.repoowner.outputs.lowercase }}/stirling-pdf-test"
VARIANT="${{ matrix.docker.tag }}"
BRANCH="${{ steps.branch_tag.outputs.safe_branch }}"
SHA_SHORT="${{ steps.branch_tag.outputs.short_sha }}"
CANDIDATE="$IMAGE:$VARIANT-$BRANCH-$SHA_SHORT"
SECONDARY="$IMAGE:$VARIANT-$BRANCH"
ALL_TAGS="$(echo '${{ steps.meta.outputs.tags }}' | tr ' ' '\n')"
if echo "$ALL_TAGS" | grep -qx "$CANDIDATE"; then
SELECTED="$CANDIDATE"
elif echo "$ALL_TAGS" | grep -qx "$SECONDARY"; then
SELECTED="$SECONDARY"
else
SELECTED="$(echo "$ALL_TAGS" | head -n1)"
fi
echo "tag=$SELECTED" >> $GITHUB_OUTPUT
echo "Using test tag: $SELECTED"
# - name: Log in to Docker Hub
# uses: docker/login-action@184bdaa0721073962dff0199f1fb9940f07167d1 # v3.5.0
# with:
# username: ${{ secrets.DOCKER_HUB_USERNAME }}
# password: ${{ secrets.DOCKER_HUB_API }}
# - name: Log in to GitHub Container Registry
# uses: docker/login-action@184bdaa0721073962dff0199f1fb9940f07167d1 # v3.5.0
# with:
# registry: ghcr.io
# username: ${{ github.actor }}
# password: ${{ github.token }}
- name: Build and push amd64 image
uses: docker/build-push-action@263435318d21b8e681c14492fe198d362a7d2c83 # v6.18.0
with:
builder: ${{ steps.buildx.outputs.name }}
context: .
file: ./${{ matrix.docker-rev }}
file: ./${{ matrix.docker.name }}
push: false
load: true
cache-from: type=gha
cache-to: type=gha,mode=max
platforms: linux/amd64,linux/arm64/v8
provenance: true
sbom: true
tags: ${{ steps.meta.outputs.tags }} # ALLE Tags publishen
labels: ${{ steps.meta.outputs.labels }}
platforms: linux/amd64
provenance: false
sbom: false
- name: Upload Reports
- name: Show amd64 image size
run: |
IMAGE_TAG="${{ steps.testtag.outputs.tag }}"
echo "Inspecting image: ${IMAGE_TAG}"
SIZE=$(docker image inspect "${IMAGE_TAG}" --format='{{.Size}}')
FORMATTED=$(numfmt --to=iec --suffix=B "${SIZE}")
echo "Image size (amd64): ${FORMATTED}"
- name: Start amd64 image for 2 minutes
run: |
IMAGE_TAG="${{ steps.testtag.outputs.tag }}"
CONTAINER_NAME="stirling-pdf-test-${{ matrix.docker.tag }}-amd64"
echo "Starting container ${CONTAINER_NAME} from ${IMAGE_TAG}"
docker run -d --name "${CONTAINER_NAME}" "${IMAGE_TAG}"
echo "Waiting up to 2 minutes..."
sleep 120 || true
echo "===== Logs for ${CONTAINER_NAME} ====="
docker logs "${CONTAINER_NAME}" || true
echo "Stopping container ${CONTAINER_NAME} after 2 minutes"
docker stop "${CONTAINER_NAME}" || true
docker rm "${CONTAINER_NAME}" || true
- name: Prune amd64 image and cache
if: always()
uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0
run: |
docker image rm -f ${{ steps.testtag.outputs.tag }} || true
docker builder prune --force || true
- name: Build and push arm64 image
uses: docker/build-push-action@263435318d21b8e681c14492fe198d362a7d2c83 # v6.18.0
with:
name: reports-docker-${{ matrix.docker-rev }}
path: |
build/reports/tests/
build/test-results/
build/reports/problems/
retention-days: 3
if-no-files-found: warn
builder: ${{ steps.buildx.outputs.name }}
context: .
file: ./${{ matrix.docker.name }}
push: false
load: true
cache-from: type=gha
cache-to: type=gha,mode=max
tags: ${{ steps.meta.outputs.tags }} # ALLE Tags publishen
labels: ${{ steps.meta.outputs.labels }}
platforms: linux/arm64/v8
provenance: false
sbom: false
- name: Show arm64 image size
run: |
IMAGE_TAG="${{ steps.testtag.outputs.tag }}"
echo "Inspecting image: ${IMAGE_TAG}"
SIZE=$(docker image inspect "${IMAGE_TAG}" --format='{{.Size}}')
FORMATTED=$(numfmt --to=iec --suffix=B "${SIZE}")
echo "Image size (arm64): ${FORMATTED}"
- name: Start arm64 image for 2 minutes
run: |
IMAGE_TAG="${{ steps.testtag.outputs.tag }}"
CONTAINER_NAME="stirling-pdf-test-${{ matrix.docker.tag }}-arm64"
echo "Starting container ${CONTAINER_NAME} from ${IMAGE_TAG}"
docker run -d --name "${CONTAINER_NAME}" "${IMAGE_TAG}"
echo "Waiting up to 2 minutes..."
sleep 120 || true
echo "===== Logs for ${CONTAINER_NAME} ====="
docker logs "${CONTAINER_NAME}" || true
echo "Stopping container ${CONTAINER_NAME} after 2 minutes"
docker stop "${CONTAINER_NAME}" || true
docker rm "${CONTAINER_NAME}" || true
- name: Cleanup arm64 image and cache
if: always()
run: |
docker image rm -f ${{ steps.testtag.outputs.tag }} || true
docker builder prune --force || true
# - name: Build and push multi-arch image
# uses: docker/build-push-action@263435318d21b8e681c14492fe198d362a7d2c83 # v6.18.0
# with:
# builder: ${{ steps.buildx.outputs.name }}
# context: .
# file: ./${{ matrix.docker.name }}
# push: true
# cache-from: type=gha
# cache-to: type=gha,mode=max
# tags: ${{ steps.meta.outputs.tags }}
# labels: ${{ steps.meta.outputs.labels }}
# platforms: linux/amd64,linux/arm64/v8
# provenance: false
# sbom: false
# - name: Upload Docker build reports
# if: always()
# uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0
# with:
# name: reports-docker-${{ matrix.docker.name }}
# path: |
# build/reports/
# build/test-results/
# build/reports/problems/
# retention-days: 3
# if-no-files-found: warn

View File

@ -1,11 +1,88 @@
# Main stage
FROM alpine:3.22.2@sha256:4b7ce07002c69e8f3d704a9c5d6fd3053be500b7f1c69fc0d80990c2ad8dd412
# ==============================================================================
# Multi-stage Dockerfile for Stirling-PDF image with everything included
# Includes: LibreOffice, Calibre, Tesseract, OCRmyPDF, unoserver, WeasyPrint, etc.
# ==============================================================================
# Copy necessary files
COPY scripts /scripts
COPY app/core/src/main/resources/static/fonts/*.ttf /usr/share/fonts/opentype/noto/
# ========================================
# STAGE 1: Runtime image based on Debian stable-slim
# Contains Java runtime + LibreOffice + Calibre + all PDF tools
# ========================================
FROM debian:stable-slim@sha256:7cb087f19bcc175b96fbe4c2aef42ed00733a659581a80f6ebccfd8fe3185a3d
SHELL ["/bin/bash", "-o", "pipefail", "-c"]
ENV DEBIAN_FRONTEND=noninteractive
ENV TESS_BASE_PATH=/usr/share/tesseract-ocr/5/tessdata
# Install core runtime dependencies + tools required by Stirling-PDF features
RUN apt-get update && apt-get install -y --no-install-recommends \
ca-certificates tzdata tini bash fontconfig \
openjdk-21-jre-headless \
ffmpeg poppler-utils ocrmypdf \
libreoffice-nogui libreoffice-java-common \
python3 python3-venv python3-uno \
tesseract-ocr tesseract-ocr-eng tesseract-ocr-deu tesseract-ocr-fra \
tesseract-ocr-por tesseract-ocr-chi-sim \
libcairo2 libpango-1.0-0 libpangoft2-1.0-0 libgdk-pixbuf-2.0-0 \
gosu unpaper \
# AWT headless support (required for some Java graphics operations)
libfreetype6 libfontconfig1 libx11-6 libxt6 libxext6 libxrender1 libxtst6 libxi6 \
libxinerama1 libxkbcommon0 libxkbfile1 libsm6 libice6 \
# Qt WebEngine dependencies for Calibre
libegl1 libopengl0 libgl1 libxdamage1 libxfixes3 libxshmfence1 libdrm2 libgbm1 \
libxkbcommon-x11-0 libxrandr2 libxcomposite1 libnss3 libx11-xcb1 \
libxcb-cursor0 libdbus-1-3 libglib2.0-0 \
# Virtual framebuffer (required for headless LibreOffice)
xvfb x11-utils coreutils \
# Temporary packages only needed for Calibre installer
xz-utils gpgv curl xdg-utils \
\
# Install Calibre from official installer script
&& curl -fsSL https://download.calibre-ebook.com/linux-installer.sh | sh /dev/stdin \
\
# Clean up installer-only packages
&& apt-get purge -y xz-utils gpgv xdg-utils \
&& apt-get autoremove -y \
&& rm -rf /var/lib/apt/lists/*
# Make ebook-convert available in PATH
RUN ln -sf /opt/calibre/ebook-convert /usr/bin/ebook-convert \
&& /opt/calibre/ebook-convert --version
# ==============================================================================
# Create non-root user (stirlingpdfuser) with configurable UID/GID
# ==============================================================================
ARG PUID=1000
ARG PGID=1000
RUN set -eux; \
# Create group if it doesn't exist
if ! getent group stirlingpdfgroup >/dev/null 2>&1; then \
if getent group "${PGID}" >/dev/null 2>&1; then \
groupadd -o -g "${PGID}" stirlingpdfgroup; \
else \
groupadd -g "${PGID}" stirlingpdfgroup; \
fi; \
fi; \
# Create user if it doesn't exist, avoid UID conflicts
if ! id -u stirlingpdfuser >/dev/null 2>&1; then \
if getent passwd | awk -F: -v id="${PUID}" '$3==id{found=1} END{exit !found}'; then \
echo "UID ${PUID} already in use creating stirlingpdfuser with automatic UID"; \
useradd -m -g stirlingpdfgroup -d /home/stirlingpdfuser -s /bin/bash stirlingpdfuser; \
else \
useradd -m -u "${PUID}" -g stirlingpdfgroup -d /home/stirlingpdfuser -s /bin/bash stirlingpdfuser; \
fi; \
fi
# Compatibility alias for older entrypoint scripts expecting su-exec
RUN ln -sf /usr/sbin/gosu /usr/local/bin/su-exec
# Copy application files from build stage
COPY scripts/ /scripts/
COPY app/core/src/main/resources/static/fonts/*.ttf /usr/share/fonts/truetype/
COPY app/core/build/libs/*.jar app.jar
# Optional version tag (can be passed at build time)
ARG VERSION_TAG
LABEL org.opencontainers.image.title="Stirling-PDF"
@ -20,91 +97,68 @@ LABEL org.opencontainers.image.authors="Stirling-Tools"
LABEL org.opencontainers.image.version="${VERSION_TAG}"
LABEL org.opencontainers.image.keywords="PDF, manipulation, merge, split, convert, OCR, watermark"
# Set Environment Variables
# ==============================================================================
# Runtime environment variables
# ==============================================================================
ENV DISABLE_ADDITIONAL_FEATURES=true \
VERSION_TAG=$VERSION_TAG \
JAVA_BASE_OPTS="-XX:+UnlockExperimentalVMOptions -XX:MaxRAMPercentage=75 -XX:InitiatingHeapOccupancyPercent=20 -XX:+G1PeriodicGCInvokesConcurrent -XX:G1PeriodicGCInterval=10000 -XX:+UseStringDeduplication -XX:G1PeriodicGCSystemLoadThreshold=70" \
JAVA_BASE_OPTS="-XX:+UnlockExperimentalVMOptions -XX:MaxRAMPercentage=75 -XX:InitiatingHeapOccupancyPercent=20 \
-XX:+G1PeriodicGCInvokesConcurrent -XX:G1PeriodicGCInterval=10000 \
-XX:+UseStringDeduplication -XX:G1PeriodicGCSystemLoadThreshold=70 \
-Djava.awt.headless=true" \
JAVA_CUSTOM_OPTS="" \
HOME=/home/stirlingpdfuser \
PUID=1000 \
PGID=1000 \
PUID=${PUID} \
PGID=${PGID} \
UMASK=022 \
PYTHONPATH=/usr/lib/libreoffice/program:/opt/venv/lib/python3.12/site-packages \
UNO_PATH=/usr/lib/libreoffice/program \
URE_BOOTSTRAP=file:///usr/lib/libreoffice/program/fundamentalrc \
PATH=$PATH:/opt/venv/bin \
STIRLING_TEMPFILES_DIRECTORY=/tmp/stirling-pdf \
TMPDIR=/tmp/stirling-pdf \
TEMP=/tmp/stirling-pdf \
TMP=/tmp/stirling-pdf
# JDK for app
RUN apk add --no-cache bash \
&& ln -sf /bin/bash /bin/sh \
&& printf '%s\n' \
'https://dl-cdn.alpinelinux.org/alpine/edge/main' \
'https://dl-cdn.alpinelinux.org/alpine/edge/community' \
'https://dl-cdn.alpinelinux.org/alpine/edge/testing' \
> /etc/apk/repositories && \
apk upgrade --no-cache -a && \
apk add --no-cache \
ca-certificates \
tzdata \
tini \
bash \
curl \
shadow \
su-exec \
openssl \
openssl-dev \
openjdk21-jre \
ffmpeg \
# Doc conversion
gcompat \
libc6-compat \
libreoffice \
# pdftohtml
poppler-utils \
# OCR MY PDF (unpaper for descew and other advanced features)
tesseract-ocr-data-eng \
tesseract-ocr-data-chi_sim \
tesseract-ocr-data-deu \
tesseract-ocr-data-fra \
tesseract-ocr-data-por \
unpaper \
# CV / Python
py3-opencv \
python3 \
ocrmypdf \
py3-pip \
py3-pillow \
py3-pdf2image \
# Calibre
calibre \
# URW Base 35 fonts for better PDF rendering
font-urw-base35 && \
# Calibre fixes
apk fix --no-cache calibre && \
python3 -m venv /opt/venv && \
/opt/venv/bin/pip install --no-cache-dir --upgrade pip setuptools && \
/opt/venv/bin/pip install --no-cache-dir --upgrade unoserver weasyprint && \
ln -s /usr/lib/libreoffice/program/uno.py /opt/venv/lib/python3.12/site-packages/ && \
ln -s /usr/lib/libreoffice/program/unohelper.py /opt/venv/lib/python3.12/site-packages/ && \
ln -s /usr/lib/libreoffice/program /opt/venv/lib/python3.12/site-packages/LibreOffice && \
mv /usr/share/tessdata /usr/share/tessdata-original && \
mkdir -p $HOME /configs /logs /customFiles /pipeline/watchedFolders /pipeline/finishedFolders /tmp/stirling-pdf && \
# Configure URW Base 35 fonts
ln -s /usr/share/fontconfig/conf.avail/69-urw-*.conf /etc/fonts/conf.d/ && \
fc-cache -f -v && \
chmod +x /scripts/* && \
# User permissions
addgroup -S stirlingpdfgroup && adduser -S stirlingpdfuser -G stirlingpdfgroup && \
chown -R stirlingpdfuser:stirlingpdfgroup $HOME /scripts /usr/share/fonts/opentype/noto /configs /customFiles /pipeline /tmp/stirling-pdf && \
chown stirlingpdfuser:stirlingpdfgroup /app.jar && \
ln -sf /bin/busybox /bin/sh
# ==============================================================================
# Python virtual environment for additional Python tools (WeasyPrint, OpenCV, etc.)
# ==============================================================================
RUN python3 -m venv /opt/venv --system-site-packages \
&& /opt/venv/bin/pip install --no-cache-dir weasyprint pdf2image opencv-python-headless \
&& /opt/venv/bin/python -c "import cv2; print('OpenCV version:', cv2.__version__)"
# Separate venv for unoserver (keeps it isolated)
RUN python3 -m venv /opt/unoserver-venv --system-site-packages \
&& /opt/unoserver-venv/bin/pip install --no-cache-dir unoserver
# Make unoserver tools available in main venv PATH
RUN ln -sf /opt/unoserver-venv/bin/unoconvert /opt/venv/bin/unoconvert \
&& ln -sf /opt/unoserver-venv/bin/unoserver /opt/venv/bin/unoserver
# Extend PATH to include both virtual environments
ENV PATH="/opt/venv/bin:/opt/unoserver-venv/bin:${PATH}"
# ==============================================================================
# Final permissions, directories and font cache
# ==============================================================================
RUN set -eux; \
chmod +x /scripts/*; \
mkdir -p /configs /logs /customFiles /pipeline/watchedFolders /pipeline/finishedFolders /tmp/stirling-pdf; \
chown -R stirlingpdfuser:stirlingpdfgroup \
/home/stirlingpdfuser /configs /logs /customFiles /pipeline /tmp/stirling-pdf \
/app.jar /usr/share/fonts/truetype /scripts; \
chmod -R 755 /tmp/stirling-pdf
# Rebuild font cache
RUN fc-cache -f -v
# Force Qt/WebEngine to run headlessly (required for Calibre in Docker)
ENV QT_QPA_PLATFORM=offscreen \
QTWEBENGINE_CHROMIUM_FLAGS="--disable-gpu --disable-dev-shm-usage"
# Expose web UI port
EXPOSE 8080/tcp
# Set user and run command
STOPSIGNAL SIGTERM
# Use tini as init (handles signals and zombies correctly)
ENTRYPOINT ["tini", "--", "/scripts/init.sh"]
CMD ["sh", "-c", "java -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/tmp/stirling-pdf -jar /app.jar & /opt/venv/bin/unoserver --port 2003 --interface 127.0.0.1"]
# CMD is empty actual start command is defined in init.sh
CMD []

View File

@ -1,122 +1,209 @@
# Build the application
FROM gradle:8.14-jdk21 AS build
# ==============================================================================
# Multi-stage Dockerfile for Stirling-PDF "fat" image with everything included
# Includes: LibreOffice, Calibre, Tesseract, OCRmyPDF, unoserver, WeasyPrint, etc.
# ==============================================================================
COPY build.gradle .
COPY settings.gradle .
COPY gradlew .
COPY gradle gradle/
# ========================================
# STAGE 1: Build Stirling-PDF with Gradle (Alpine)
# ========================================
FROM eclipse-temurin:21-jdk-alpine@sha256:c4799f335a65b1ecca8a31239b05522f2b0a184d6818f6349e83484ee6956198 AS build
# Install build tools
RUN apk add --no-cache bash unzip curl git
WORKDIR /workspace
# Copy Gradle wrapper and configuration files
COPY build.gradle settings.gradle gradlew ./
COPY gradle ./gradle/
# Make gradlew executable
RUN chmod +x gradlew
# Create module directories and copy module build files (for Gradle layer caching)
RUN mkdir -p core common proprietary
COPY app/core/build.gradle core/.
COPY app/common/build.gradle common/.
COPY app/proprietary/build.gradle proprietary/.
RUN ./gradlew build -x spotlessApply -x spotlessCheck -x test -x sonarqube || return 0
# Set the working directory
# Warm-up Gradle dependency cache (optional but improves subsequent builds)
RUN ./gradlew --no-daemon printVersion --quiet | tail -1 > /tmp/version_tag || true
RUN ./gradlew --no-daemon build -x spotlessApply -x spotlessCheck -x test -x sonarqube || true
# Switch to final source directory and copy full source code
WORKDIR /app
# Copy the entire project to the working directory
COPY . .
# Build the application with DISABLE_ADDITIONAL_FEATURES=false
# Environment variables (can be overridden at build time)
ENV DISABLE_ADDITIONAL_FEATURES=false \
STIRLING_PDF_DESKTOP_UI=false
RUN ./gradlew clean build -x spotlessApply -x spotlessCheck -x test -x sonarqube
# Main stage
FROM alpine:3.22.2@sha256:4b7ce07002c69e8f3d704a9c5d6fd3053be500b7f1c69fc0d80990c2ad8dd412
# Final build produce the fat JAR
RUN ./gradlew --no-daemon clean build \
-x spotlessApply -x spotlessCheck -x test -x sonarqube \
&& apk del bash unzip curl git
# Copy necessary files
COPY scripts /scripts
COPY app/core/src/main/resources/static/fonts/*.ttf /usr/share/fonts/opentype/noto/
# first /app directory is for the build stage, second is for the final image
COPY --from=build /app/app/core/build/libs/*.jar app.jar
# ========================================
# STAGE 2: Runtime image based on Debian stable-slim
# Contains Java runtime + LibreOffice + Calibre + all PDF tools
# ========================================
FROM debian:stable-slim@sha256:7cb087f19bcc175b96fbe4c2aef42ed00733a659581a80f6ebccfd8fe3185a3d
SHELL ["/bin/bash", "-o", "pipefail", "-c"]
ENV DEBIAN_FRONTEND=noninteractive
# Install core runtime dependencies + tools required by Stirling-PDF features
RUN apt-get update && apt-get install -y --no-install-recommends \
ca-certificates tzdata tini bash fontconfig \
openjdk-21-jre-headless \
ffmpeg poppler-utils qpdf ghostscript ocrmypdf \
libreoffice-nogui libreoffice-java-common \
python3 python3-venv python3-uno \
tesseract-ocr tesseract-ocr-eng tesseract-ocr-deu tesseract-ocr-fra \
tesseract-ocr-por tesseract-ocr-chi-sim \
libcairo2 libpango-1.0-0 libpangoft2-1.0-0 libgdk-pixbuf-2.0-0 \
gosu unpaper \
# AWT headless support (required for some Java graphics operations)
libfreetype6 libfontconfig1 libx11-6 libxt6 libxext6 libxrender1 libxtst6 libxi6 \
libxinerama1 libxkbcommon0 libxkbfile1 libsm6 libice6 \
# Qt WebEngine dependencies for Calibre
libegl1 libopengl0 libgl1 libxdamage1 libxfixes3 libxshmfence1 libdrm2 libgbm1 \
libxkbcommon-x11-0 libxrandr2 libxcomposite1 libnss3 libx11-xcb1 \
libxcb-cursor0 libdbus-1-3 libglib2.0-0 \
# Virtual framebuffer (required for headless LibreOffice)
xvfb x11-utils coreutils \
# Temporary packages only needed for Calibre installer
xz-utils gpgv curl xdg-utils \
\
# Install Calibre from official installer script
&& curl -fsSL https://download.calibre-ebook.com/linux-installer.sh | sh /dev/stdin \
\
# Clean up installer-only packages
&& apt-get purge -y xz-utils gpgv xdg-utils \
&& apt-get autoremove -y \
&& rm -rf /var/lib/apt/lists/*
# Make ebook-convert available in PATH
RUN ln -sf /opt/calibre/ebook-convert /usr/bin/ebook-convert \
&& /opt/calibre/ebook-convert --version
# ==============================================================================
# Create non-root user (stirlingpdfuser) with configurable UID/GID
# ==============================================================================
ARG PUID=1000
ARG PGID=1000
RUN set -eux; \
# Create group if it doesn't exist
if ! getent group stirlingpdfgroup >/dev/null 2>&1; then \
if getent group "${PGID}" >/dev/null 2>&1; then \
groupadd -o -g "${PGID}" stirlingpdfgroup; \
else \
groupadd -g "${PGID}" stirlingpdfgroup; \
fi; \
fi; \
# Create user if it doesn't exist, avoid UID conflicts
if ! id -u stirlingpdfuser >/dev/null 2>&1; then \
if getent passwd | awk -F: -v id="${PUID}" '$3==id{found=1} END{exit !found}'; then \
echo "UID ${PUID} already in use creating stirlingpdfuser with automatic UID"; \
useradd -m -g stirlingpdfgroup -d /home/stirlingpdfuser -s /bin/bash stirlingpdfuser; \
else \
useradd -m -u "${PUID}" -g stirlingpdfgroup -d /home/stirlingpdfuser -s /bin/bash stirlingpdfuser; \
fi; \
fi
# Compatibility alias for older entrypoint scripts expecting su-exec
RUN ln -sf /usr/sbin/gosu /usr/local/bin/su-exec
# Copy application files from build stage
COPY scripts/ /scripts/
COPY app/core/src/main/resources/static/fonts/*.ttf /usr/share/fonts/truetype/
COPY --from=build /app/app/core/build/libs/*.jar /app.jar
# Copy version tag generated during build
COPY --from=build /tmp/version_tag /etc/stirling_version
# Optional version tag (can be passed at build time)
ARG VERSION_TAG
# Set Environment Variables
# Metadata labels
LABEL org.opencontainers.image.title="Stirling-PDF"
LABEL org.opencontainers.image.description="A powerful locally hosted web-based PDF manipulation tool supporting 50+ operations including merging, splitting, conversion, OCR, watermarking, and more."
LABEL org.opencontainers.image.source="https://github.com/Stirling-Tools/Stirling-PDF"
LABEL org.opencontainers.image.licenses="MIT"
LABEL org.opencontainers.image.vendor="Stirling-Tools"
LABEL org.opencontainers.image.url="https://www.stirlingpdf.com"
LABEL org.opencontainers.image.documentation="https://docs.stirlingpdf.com"
LABEL maintainer="Stirling-Tools"
LABEL org.opencontainers.image.authors="Stirling-Tools"
LABEL org.opencontainers.image.version="${VERSION_TAG}"
LABEL org.opencontainers.image.keywords="PDF, manipulation, merge, split, convert, OCR, watermark"
# ==============================================================================
# Runtime environment variables
# ==============================================================================
ENV DISABLE_ADDITIONAL_FEATURES=true \
VERSION_TAG=$VERSION_TAG \
JAVA_BASE_OPTS="-XX:+UnlockExperimentalVMOptions -XX:MaxRAMPercentage=75 -XX:InitiatingHeapOccupancyPercent=20 -XX:+G1PeriodicGCInvokesConcurrent -XX:G1PeriodicGCInterval=10000 -XX:+UseStringDeduplication -XX:G1PeriodicGCSystemLoadThreshold=70" \
JAVA_BASE_OPTS="-XX:+UnlockExperimentalVMOptions -XX:MaxRAMPercentage=75 -XX:InitiatingHeapOccupancyPercent=20 \
-XX:+G1PeriodicGCInvokesConcurrent -XX:G1PeriodicGCInterval=10000 \
-XX:+UseStringDeduplication -XX:G1PeriodicGCSystemLoadThreshold=70 \
-Djava.awt.headless=true" \
JAVA_CUSTOM_OPTS="" \
HOME=/home/stirlingpdfuser \
PUID=1000 \
PGID=1000 \
PUID=${PUID} \
PGID=${PGID} \
UMASK=022 \
FAT_DOCKER=true \
INSTALL_BOOK_AND_ADVANCED_HTML_OPS=false \
PYTHONPATH=/usr/lib/libreoffice/program:/opt/venv/lib/python3.12/site-packages \
UNO_PATH=/usr/lib/libreoffice/program \
URE_BOOTSTRAP=file:///usr/lib/libreoffice/program/fundamentalrc \
PATH=$PATH:/opt/venv/bin \
STIRLING_TEMPFILES_DIRECTORY=/tmp/stirling-pdf \
TMPDIR=/tmp/stirling-pdf \
TEMP=/tmp/stirling-pdf \
TMP=/tmp/stirling-pdf
# JDK for app
RUN apk add --no-cache bash \
&& ln -sf /bin/bash /bin/sh \
&& printf '%s\n' \
'https://dl-cdn.alpinelinux.org/alpine/edge/main' \
'https://dl-cdn.alpinelinux.org/alpine/edge/community' \
'https://dl-cdn.alpinelinux.org/alpine/edge/testing' \
> /etc/apk/repositories && \
apk upgrade --no-cache -a && \
apk add --no-cache \
ca-certificates \
tzdata \
tini \
bash \
curl \
shadow \
su-exec \
openssl \
openssl-dev \
openjdk21-jre \
ffmpeg \
# Doc conversion
gcompat \
libc6-compat \
libreoffice \
# pdftohtml
poppler-utils \
# OCR MY PDF (unpaper for descew and other advanced featues)
tesseract-ocr-data-eng \
tesseract-ocr-data-chi_sim \
tesseract-ocr-data-deu \
tesseract-ocr-data-fra \
tesseract-ocr-data-por \
unpaper \
font-terminus font-dejavu font-noto font-noto-cjk font-awesome font-noto-extra font-liberation font-linux-libertine font-urw-base35 \
# CV / Python
py3-opencv \
python3 \
ocrmypdf \
py3-pip \
py3-pillow \
py3-pdf2image \
# Calibre (musl-native) + QtWebEngine Runtime
calibre && \
# Calibre fixes
apk fix --no-cache calibre && \
python3 -m venv /opt/venv && \
/opt/venv/bin/pip install --no-cache-dir --upgrade pip setuptools && \
/opt/venv/bin/pip install --no-cache-dir --upgrade unoserver weasyprint && \
ln -s /usr/lib/libreoffice/program/uno.py /opt/venv/lib/python3.12/site-packages/ && \
ln -s /usr/lib/libreoffice/program/unohelper.py /opt/venv/lib/python3.12/site-packages/ && \
ln -s /usr/lib/libreoffice/program /opt/venv/lib/python3.12/site-packages/LibreOffice && \
mv /usr/share/tessdata /usr/share/tessdata-original && \
mkdir -p $HOME /configs /logs /customFiles /pipeline/watchedFolders /pipeline/finishedFolders /tmp/stirling-pdf && \
# Configure URW Base 35 fonts
ln -s /usr/share/fontconfig/conf.avail/69-urw-*.conf /etc/fonts/conf.d/ && \
fc-cache -f -v && \
chmod +x /scripts/* && \
# User permissions
addgroup -S stirlingpdfgroup && adduser -S stirlingpdfuser -G stirlingpdfgroup && \
chown -R stirlingpdfuser:stirlingpdfgroup $HOME /scripts /usr/share/fonts/opentype/noto /configs /customFiles /pipeline /tmp/stirling-pdf && \
chown stirlingpdfuser:stirlingpdfgroup /app.jar && \
ln -sf /bin/busybox /bin/sh
# ==============================================================================
# Python virtual environment for additional Python tools (WeasyPrint, OpenCV, etc.)
# ==============================================================================
RUN python3 -m venv /opt/venv --system-site-packages \
&& /opt/venv/bin/pip install --no-cache-dir weasyprint pdf2image opencv-python-headless \
&& /opt/venv/bin/python -c "import cv2; print('OpenCV version:', cv2.__version__)"
# Separate venv for unoserver (keeps it isolated)
RUN python3 -m venv /opt/unoserver-venv --system-site-packages \
&& /opt/unoserver-venv/bin/pip install --no-cache-dir unoserver
# Make unoserver tools available in main venv PATH
RUN ln -sf /opt/unoserver-venv/bin/unoconvert /opt/venv/bin/unoconvert \
&& ln -sf /opt/unoserver-venv/bin/unoserver /opt/venv/bin/unoserver
# Extend PATH to include both virtual environments
ENV PATH="/opt/venv/bin:/opt/unoserver-venv/bin:${PATH}"
# ==============================================================================
# Final permissions, directories and font cache
# ==============================================================================
RUN set -eux; \
chmod +x /scripts/*; \
mkdir -p /configs /logs /customFiles /pipeline/watchedFolders /pipeline/finishedFolders /tmp/stirling-pdf; \
chown -R stirlingpdfuser:stirlingpdfgroup \
/home/stirlingpdfuser /configs /logs /customFiles /pipeline /tmp/stirling-pdf \
/app.jar /usr/share/fonts/truetype /scripts; \
chmod -R 755 /tmp/stirling-pdf
# Rebuild font cache
RUN fc-cache -f -v
# Force Qt/WebEngine to run headlessly (required for Calibre in Docker)
ENV QT_QPA_PLATFORM=offscreen \
QTWEBENGINE_CHROMIUM_FLAGS="--disable-gpu --disable-dev-shm-usage"
# Expose web UI port
EXPOSE 8080/tcp
# Set user and run command
STOPSIGNAL SIGTERM
# Use tini as init (handles signals and zombies correctly)
ENTRYPOINT ["tini", "--", "/scripts/init.sh"]
CMD ["sh", "-c", "java -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/tmp/stirling-pdf -jar /app.jar & /opt/venv/bin/unoserver --port 2003 --interface 127.0.0.1"]
# CMD is empty actual start command is defined in init.sh
CMD []

View File

@ -56,4 +56,4 @@ EXPOSE 8080/tcp
# Run the application
ENTRYPOINT ["tini", "--", "/scripts/init-without-ocr.sh"]
CMD ["java", "-Dfile.encoding=UTF-8", "-Djava.io.tmpdir=/tmp/stirling-pdf", "-jar", "/app.jar"]
CMD []

View File

@ -10,8 +10,10 @@ import lombok.Getter;
import lombok.extern.slf4j.Slf4j;
import stirling.software.common.model.ApplicationProperties;
import stirling.software.common.model.ApplicationProperties.CustomPaths;
import stirling.software.common.model.ApplicationProperties.CustomPaths.Operations;
import stirling.software.common.model.ApplicationProperties.CustomPaths.Pipeline;
import stirling.software.common.model.ApplicationProperties.System;
@Slf4j
@Configuration
@ -19,9 +21,16 @@ import stirling.software.common.model.ApplicationProperties.CustomPaths.Pipeline
public class RuntimePathConfig {
private final ApplicationProperties properties;
private final String basePath;
// Operation paths
private final String weasyPrintPath;
private final String unoConvertPath;
private final String calibrePath;
private final String ocrMyPdfPath;
private final String sOfficePath;
// Tesseract data path
private final String tessDataPath;
// Pipeline paths
private final String pipelineWatchedFoldersPath;
@ -38,7 +47,10 @@ public class RuntimePathConfig {
String defaultFinishedFolders = Path.of(this.pipelinePath, "finishedFolders").toString();
String defaultWebUIConfigs = Path.of(this.pipelinePath, "defaultWebUIConfigs").toString();
Pipeline pipeline = properties.getSystem().getCustomPaths().getPipeline();
System system = properties.getSystem();
CustomPaths customPaths = system.getCustomPaths();
Pipeline pipeline = customPaths.getPipeline();
this.pipelineWatchedFoldersPath =
resolvePath(
@ -58,9 +70,11 @@ public class RuntimePathConfig {
// Initialize Operation paths
String defaultWeasyPrintPath = isDocker ? "/opt/venv/bin/weasyprint" : "weasyprint";
String defaultUnoConvertPath = isDocker ? "/opt/venv/bin/unoconvert" : "unoconvert";
String defaultCalibrePath = isDocker ? "/usr/bin/ebook-convert" : "ebook-convert";
String defaultCalibrePath = isDocker ? "/opt/calibre/ebook-convert" : "ebook-convert";
String defaultOcrMyPdfPath = isDocker ? "/usr/bin/ocrmypdf" : "ocrmypdf";
String defaultSOfficePath = isDocker ? "/usr/bin/soffice" : "soffice";
Operations operations = properties.getSystem().getCustomPaths().getOperations();
Operations operations = customPaths.getOperations();
this.weasyPrintPath =
resolvePath(
defaultWeasyPrintPath,
@ -72,6 +86,25 @@ public class RuntimePathConfig {
this.calibrePath =
resolvePath(
defaultCalibrePath, operations != null ? operations.getCalibre() : null);
this.ocrMyPdfPath =
resolvePath(
defaultOcrMyPdfPath, operations != null ? operations.getOcrmypdf() : null);
this.sOfficePath =
resolvePath(
defaultSOfficePath, operations != null ? operations.getSoffice() : null);
// Initialize Tesseract data path
String defaultTessDataPath =
isDocker ? "/usr/share/tesseract-ocr/5/tessdata" : "/usr/share/tessdata";
String tessPath = system.getTessdataDir();
String tessdataDir = java.lang.System.getenv("TESSDATA_PREFIX");
this.tessDataPath =
resolvePath(
defaultTessDataPath,
(tessPath != null && !tessPath.isEmpty()) ? tessPath : tessdataDir);
log.info("Using Tesseract data path: {}", this.tessDataPath);
}
private String resolvePath(String defaultPath, String customPath) {

View File

@ -372,6 +372,8 @@ public class ApplicationProperties {
private String weasyprint;
private String unoconvert;
private String calibre;
private String ocrmypdf;
private String soffice;
}
}
@ -454,10 +456,10 @@ public class ApplicationProperties {
@Override
public String toString() {
return """
Driver {
driverName='%s'
}
"""
Driver {
driverName='%s'
}
"""
.formatted(driverName);
}
}

View File

@ -25,6 +25,7 @@ import org.springframework.stereotype.Service;
import com.posthog.java.PostHog;
import stirling.software.common.configuration.RuntimePathConfig;
import stirling.software.common.model.ApplicationProperties;
@Service
@ -33,6 +34,7 @@ public class PostHogService {
private final String uniqueId;
private final String appVersion;
private final ApplicationProperties applicationProperties;
private final RuntimePathConfig runtimePathConfig;
private final UserServiceInterface userService;
private final Environment env;
private boolean configDirMounted;
@ -43,12 +45,14 @@ public class PostHogService {
@Qualifier("configDirMounted") boolean configDirMounted,
@Qualifier("appVersion") String appVersion,
ApplicationProperties applicationProperties,
RuntimePathConfig runtimePathConfig,
@Autowired(required = false) UserServiceInterface userService,
Environment env) {
this.postHog = postHog;
this.uniqueId = uuid;
this.appVersion = appVersion;
this.applicationProperties = applicationProperties;
this.runtimePathConfig = runtimePathConfig;
this.userService = userService;
this.env = env;
this.configDirMounted = configDirMounted;
@ -313,10 +317,7 @@ public class PostHogService {
properties,
"system_customHTMLFiles",
applicationProperties.getSystem().isCustomHTMLFiles());
addIfNotEmpty(
properties,
"system_tessdataDir",
applicationProperties.getSystem().getTessdataDir());
addIfNotEmpty(properties, "system_tessdataDir", runtimePathConfig.getTessDataPath());
addIfNotEmpty(
properties,
"system_enableAlphaFunctionality",

View File

@ -27,15 +27,22 @@ import io.github.pixee.security.Filenames;
import lombok.extern.slf4j.Slf4j;
import stirling.software.common.configuration.RuntimePathConfig;
import stirling.software.common.util.ProcessExecutor.ProcessExecutorResult;
@Slf4j
public class PDFToFile {
private final TempFileManager tempFileManager;
private final RuntimePathConfig runtimePathConfig;
public PDFToFile(TempFileManager tempFileManager) {
this(tempFileManager, null);
}
public PDFToFile(TempFileManager tempFileManager, RuntimePathConfig runtimePathConfig) {
this.tempFileManager = tempFileManager;
this.runtimePathConfig = runtimePathConfig;
}
public ResponseEntity<byte[]> processPdfToMarkdown(MultipartFile inputFile)
@ -241,31 +248,65 @@ public class PDFToFile {
byte[] fileBytes;
String fileName;
Path libreOfficeProfile = null;
try (TempFile inputFileTemp = new TempFile(tempFileManager, ".pdf");
TempDirectory outputDirTemp = new TempDirectory(tempFileManager)) {
Path tempInputFile = inputFileTemp.getPath();
Path tempOutputDir = outputDirTemp.getPath();
Path unoOutputFile =
tempOutputDir.resolve(
pdfBaseName + "." + resolvePrimaryExtension(outputFormat));
// Save the uploaded file to a temporary location
inputFile.transferTo(tempInputFile);
// Run the LibreOffice command
List<String> command =
new ArrayList<>(
Arrays.asList(
"soffice",
"--headless",
"--nologo",
"--infilter=" + libreOfficeFilter,
"--convert-to",
outputFormat,
"--outdir",
tempOutputDir.toString(),
tempInputFile.toString()));
ProcessExecutorResult returnCode =
ProcessExecutor.getInstance(ProcessExecutor.Processes.LIBRE_OFFICE)
.runCommandWithOutputHandling(command);
ProcessExecutorResult returnCode = null;
IOException unoconvertException = null;
if (isUnoConvertEnabled()) {
try {
List<String> unoCommand =
buildUnoConvertCommand(
tempInputFile, unoOutputFile, outputFormat, libreOfficeFilter);
returnCode =
ProcessExecutor.getInstance(ProcessExecutor.Processes.LIBRE_OFFICE)
.runCommandWithOutputHandling(unoCommand);
} catch (IOException e) {
unoconvertException = e;
log.warn(
"Unoconvert command failed ({}). Falling back to soffice command.",
e.getMessage());
}
}
if (returnCode == null) {
// Run the LibreOffice command as a fallback
libreOfficeProfile = Files.createTempDirectory("libreoffice_profile_");
List<String> command = new ArrayList<>();
command.add(runtimePathConfig.getSOfficePath());
command.add("-env:UserInstallation=" + libreOfficeProfile.toUri().toString());
command.add("--headless");
command.add("--nologo");
command.add("--infilter=" + libreOfficeFilter);
command.add("--convert-to");
command.add(outputFormat);
command.add("--outdir");
command.add(tempOutputDir.toString());
command.add(tempInputFile.toString());
try {
returnCode =
ProcessExecutor.getInstance(ProcessExecutor.Processes.LIBRE_OFFICE)
.runCommandWithOutputHandling(command);
} catch (IOException e) {
if (unoconvertException != null) {
e.addSuppressed(unoconvertException);
}
throw e;
}
}
// Get output files
List<File> outputFiles = Arrays.asList(tempOutputDir.toFile().listFiles());
@ -300,8 +341,42 @@ public class PDFToFile {
fileBytes = byteArrayOutputStream.toByteArray();
}
} finally {
if (libreOfficeProfile != null) {
FileUtils.deleteQuietly(libreOfficeProfile.toFile());
}
}
return WebResponseUtils.bytesToWebResponse(
fileBytes, fileName, MediaType.APPLICATION_OCTET_STREAM);
}
private boolean isUnoConvertEnabled() {
return runtimePathConfig != null
&& runtimePathConfig.getUnoConvertPath() != null
&& !runtimePathConfig.getUnoConvertPath().isBlank();
}
private List<String> buildUnoConvertCommand(
Path inputFile, Path outputFile, String outputFormat, String libreOfficeFilter) {
List<String> command = new ArrayList<>();
command.add(runtimePathConfig.getUnoConvertPath());
command.add("--port");
command.add("2003");
command.add("--convert-to");
command.add(outputFormat);
if (libreOfficeFilter != null && !libreOfficeFilter.isBlank()) {
command.add("--input-filter=" + libreOfficeFilter);
}
command.add(inputFile.toString());
command.add(outputFile.toString());
return command;
}
private String resolvePrimaryExtension(String outputFormat) {
if (outputFormat == null) {
return "";
}
int colonIndex = outputFormat.indexOf(':');
return colonIndex > 0 ? outputFormat.substring(0, colonIndex) : outputFormat;
}
}

View File

@ -32,6 +32,7 @@ import org.springframework.web.multipart.MultipartFile;
import io.github.pixee.security.ZipSecurity;
import stirling.software.common.configuration.RuntimePathConfig;
import stirling.software.common.util.ProcessExecutor.ProcessExecutorResult;
/**
@ -48,6 +49,7 @@ class PDFToFileTest {
@Mock private ProcessExecutor mockProcessExecutor;
@Mock private ProcessExecutorResult mockExecutorResult;
@Mock private TempFileManager mockTempFileManager;
@Mock private RuntimePathConfig mockRuntimePathConfig;
@BeforeEach
void setUp() throws IOException {
@ -61,7 +63,9 @@ class PDFToFileTest {
.when(mockTempFileManager.createTempDirectory())
.thenAnswer(invocation -> Files.createTempDirectory("test"));
pdfToFile = new PDFToFile(mockTempFileManager);
lenient().when(mockRuntimePathConfig.getSOfficePath()).thenReturn("/usr/bin/soffice");
pdfToFile = new PDFToFile(mockTempFileManager, mockRuntimePathConfig);
}
@Test
@ -363,7 +367,8 @@ class PDFToFileTest {
when(mockProcessExecutor.runCommandWithOutputHandling(
argThat(
args ->
args.contains("--convert-to")
args != null
&& args.contains("--convert-to")
&& args.contains("docx"))))
.thenAnswer(
invocation -> {
@ -424,7 +429,11 @@ class PDFToFileTest {
.thenReturn(mockProcessExecutor);
when(mockProcessExecutor.runCommandWithOutputHandling(
argThat(args -> args.contains("--convert-to") && args.contains("odp"))))
argThat(
args ->
args != null
&& args.contains("--convert-to")
&& args.contains("odp"))))
.thenAnswer(
invocation -> {
// When command is executed, find the output directory argument
@ -513,7 +522,8 @@ class PDFToFileTest {
when(mockProcessExecutor.runCommandWithOutputHandling(
argThat(
args ->
args.contains("--convert-to")
args != null
&& args.contains("--convert-to")
&& args.contains("txt:Text"))))
.thenAnswer(
invocation -> {
@ -611,4 +621,110 @@ class PDFToFileTest {
.contains("output.docx"));
}
}
@Test
void testProcessPdfToOfficeFormat_UsesUnoconvertWhenConfigured()
throws IOException, InterruptedException {
when(mockRuntimePathConfig.getUnoConvertPath()).thenReturn("/custom/unoconvert");
PDFToFile pdfToFileWithUno = new PDFToFile(mockTempFileManager, mockRuntimePathConfig);
try (MockedStatic<ProcessExecutor> mockedStaticProcessExecutor =
mockStatic(ProcessExecutor.class)) {
MultipartFile pdfFile =
new MockMultipartFile(
"file",
"document.pdf",
MediaType.APPLICATION_PDF_VALUE,
"Fake PDF content".getBytes());
mockedStaticProcessExecutor
.when(() -> ProcessExecutor.getInstance(ProcessExecutor.Processes.LIBRE_OFFICE))
.thenReturn(mockProcessExecutor);
when(mockProcessExecutor.runCommandWithOutputHandling(
argThat(args -> args != null && args.contains("/custom/unoconvert"))))
.thenAnswer(
invocation -> {
List<String> args = invocation.getArgument(0);
String outputPath = args.get(args.size() - 1);
Files.write(Path.of(outputPath), "Fake DOCX content".getBytes());
return mockExecutorResult;
});
ResponseEntity<byte[]> response =
pdfToFileWithUno.processPdfToOfficeFormat(pdfFile, "docx", "writer_pdf_import");
assertEquals(HttpStatus.OK, response.getStatusCode());
assertNotNull(response.getBody());
assertTrue(response.getBody().length > 0);
assertTrue(
response.getHeaders()
.getContentDisposition()
.toString()
.contains("document.docx"));
}
}
@Test
void testProcessPdfToOfficeFormat_FallsBackWhenUnoconvertFails()
throws IOException, InterruptedException {
when(mockRuntimePathConfig.getUnoConvertPath()).thenReturn("/custom/unoconvert");
PDFToFile pdfToFileWithUno = new PDFToFile(mockTempFileManager, mockRuntimePathConfig);
try (MockedStatic<ProcessExecutor> mockedStaticProcessExecutor =
mockStatic(ProcessExecutor.class)) {
MultipartFile pdfFile =
new MockMultipartFile(
"file",
"document.pdf",
MediaType.APPLICATION_PDF_VALUE,
"Fake PDF content".getBytes());
mockedStaticProcessExecutor
.when(() -> ProcessExecutor.getInstance(ProcessExecutor.Processes.LIBRE_OFFICE))
.thenReturn(mockProcessExecutor);
when(mockProcessExecutor.runCommandWithOutputHandling(
argThat(args -> args != null && args.contains("/custom/unoconvert"))))
.thenThrow(new IOException("Conversion failed"));
when(mockProcessExecutor.runCommandWithOutputHandling(
argThat(
args ->
args != null
&& args.stream()
.anyMatch(
arg ->
arg.contains(
"soffice")))))
.thenAnswer(
invocation -> {
List<String> args = invocation.getArgument(0);
String outDir = null;
for (int i = 0; i < args.size(); i++) {
if ("--outdir".equals(args.get(i)) && i + 1 < args.size()) {
outDir = args.get(i + 1);
break;
}
}
assertNotNull(outDir);
Files.write(
Path.of(outDir, "document.docx"),
"Fallback DOCX content".getBytes());
return mockExecutorResult;
});
ResponseEntity<byte[]> response =
pdfToFileWithUno.processPdfToOfficeFormat(pdfFile, "docx", "writer_pdf_import");
assertEquals(HttpStatus.OK, response.getStatusCode());
assertNotNull(response.getBody());
assertTrue(response.getBody().length > 0);
assertTrue(
response.getHeaders()
.getContentDisposition()
.toString()
.contains("document.docx"));
}
}
}

View File

@ -41,6 +41,8 @@ public class ExternalAppDepConfig {
private final String weasyprintPath;
private final String unoconvPath;
private final String calibrePath;
private final String ocrMyPdfPath;
private final String sOfficePath;
/**
* Map of command(binary) -> affected groups (e.g. "gs" -> ["Ghostscript"]). Immutable to avoid
@ -58,11 +60,13 @@ public class ExternalAppDepConfig {
this.weasyprintPath = runtimePathConfig.getWeasyPrintPath();
this.unoconvPath = runtimePathConfig.getUnoConvertPath();
this.calibrePath = runtimePathConfig.getCalibrePath();
this.ocrMyPdfPath = runtimePathConfig.getOcrMyPdfPath();
this.sOfficePath = runtimePathConfig.getSOfficePath();
Map<String, List<String>> tmp = new HashMap<>();
tmp.put("gs", List.of("Ghostscript"));
tmp.put("ocrmypdf", List.of("OCRmyPDF"));
tmp.put("soffice", List.of("LibreOffice"));
tmp.put(ocrMyPdfPath, List.of("OCRmyPDF"));
tmp.put(sOfficePath, List.of("LibreOffice"));
tmp.put(weasyprintPath, List.of("Weasyprint"));
tmp.put("pdftohtml", List.of("Pdftohtml"));
tmp.put(unoconvPath, List.of("Unoconvert"));

View File

@ -93,6 +93,7 @@ public class ConvertOfficeController {
Files.copy(inputFile.getInputStream(), inputPath, StandardCopyOption.REPLACE_EXISTING);
}
Path libreOfficeProfile = null;
try {
ProcessExecutorResult result;
// Run Unoconvert command
@ -112,8 +113,10 @@ public class ConvertOfficeController {
.runCommandWithOutputHandling(command);
} // Run soffice command
else {
libreOfficeProfile = Files.createTempDirectory("libreoffice_profile_");
List<String> command = new ArrayList<>();
command.add("soffice");
command.add(runtimePathConfig.getSOfficePath());
command.add("-env:UserInstallation=" + libreOfficeProfile.toUri().toString());
command.add("--headless");
command.add("--nologo");
command.add("--convert-to");
@ -169,6 +172,9 @@ public class ConvertOfficeController {
} catch (IOException e) {
log.warn("Failed to delete temp input file: {}", inputPath, e);
}
if (libreOfficeProfile != null) {
FileUtils.deleteQuietly(libreOfficeProfile.toFile());
}
}
}

View File

@ -13,6 +13,7 @@ import io.swagger.v3.oas.annotations.tags.Tag;
import lombok.RequiredArgsConstructor;
import stirling.software.common.configuration.RuntimePathConfig;
import stirling.software.common.model.api.PDFFile;
import stirling.software.common.util.PDFToFile;
import stirling.software.common.util.TempFileManager;
@ -24,6 +25,7 @@ import stirling.software.common.util.TempFileManager;
public class ConvertPDFToHtml {
private final TempFileManager tempFileManager;
private final RuntimePathConfig runtimePathConfig;
@PostMapping(consumes = MediaType.MULTIPART_FORM_DATA_VALUE, value = "/pdf/html")
@Operation(
@ -32,7 +34,7 @@ public class ConvertPDFToHtml {
"This endpoint converts a PDF file to HTML format. Input:PDF Output:HTML Type:SISO")
public ResponseEntity<byte[]> processPdfToHTML(@ModelAttribute PDFFile file) throws Exception {
MultipartFile inputFile = file.getFileInput();
PDFToFile pdfToFile = new PDFToFile(tempFileManager);
PDFToFile pdfToFile = new PDFToFile(tempFileManager, runtimePathConfig);
return pdfToFile.processPdfToHtml(inputFile);
}
}

View File

@ -20,6 +20,7 @@ import lombok.RequiredArgsConstructor;
import stirling.software.SPDF.model.api.converters.PdfToPresentationRequest;
import stirling.software.SPDF.model.api.converters.PdfToTextOrRTFRequest;
import stirling.software.SPDF.model.api.converters.PdfToWordRequest;
import stirling.software.common.configuration.RuntimePathConfig;
import stirling.software.common.model.api.PDFFile;
import stirling.software.common.service.CustomPDFDocumentFactory;
import stirling.software.common.util.GeneralUtils;
@ -35,6 +36,7 @@ public class ConvertPDFToOffice {
private final CustomPDFDocumentFactory pdfDocumentFactory;
private final TempFileManager tempFileManager;
private final RuntimePathConfig runtimePathConfig;
@PostMapping(consumes = MediaType.MULTIPART_FORM_DATA_VALUE, value = "/pdf/presentation")
@Operation(
@ -47,7 +49,7 @@ public class ConvertPDFToOffice {
throws IOException, InterruptedException {
MultipartFile inputFile = request.getFileInput();
String outputFormat = request.getOutputFormat();
PDFToFile pdfToFile = new PDFToFile(tempFileManager);
PDFToFile pdfToFile = new PDFToFile(tempFileManager, runtimePathConfig);
return pdfToFile.processPdfToOfficeFormat(inputFile, outputFormat, "impress_pdf_import");
}
@ -72,7 +74,7 @@ public class ConvertPDFToOffice {
MediaType.TEXT_PLAIN);
}
} else {
PDFToFile pdfToFile = new PDFToFile(tempFileManager);
PDFToFile pdfToFile = new PDFToFile(tempFileManager, runtimePathConfig);
return pdfToFile.processPdfToOfficeFormat(inputFile, outputFormat, "writer_pdf_import");
}
}
@ -87,7 +89,7 @@ public class ConvertPDFToOffice {
throws IOException, InterruptedException {
MultipartFile inputFile = request.getFileInput();
String outputFormat = request.getOutputFormat();
PDFToFile pdfToFile = new PDFToFile(tempFileManager);
PDFToFile pdfToFile = new PDFToFile(tempFileManager, runtimePathConfig);
return pdfToFile.processPdfToOfficeFormat(inputFile, outputFormat, "writer_pdf_import");
}
@ -100,7 +102,7 @@ public class ConvertPDFToOffice {
public ResponseEntity<byte[]> processPdfToXML(@ModelAttribute PDFFile file) throws Exception {
MultipartFile inputFile = file.getFileInput();
PDFToFile pdfToFile = new PDFToFile(tempFileManager);
PDFToFile pdfToFile = new PDFToFile(tempFileManager, runtimePathConfig);
return pdfToFile.processPdfToOfficeFormat(inputFile, "xml", "writer_pdf_import");
}
}

View File

@ -71,9 +71,11 @@ import io.swagger.v3.oas.annotations.Operation;
import io.swagger.v3.oas.annotations.tags.Tag;
import lombok.Getter;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import stirling.software.SPDF.model.api.converters.PdfToPdfARequest;
import stirling.software.common.configuration.RuntimePathConfig;
import stirling.software.common.util.ExceptionUtils;
import stirling.software.common.util.ProcessExecutor;
import stirling.software.common.util.ProcessExecutor.ProcessExecutorResult;
@ -83,8 +85,11 @@ import stirling.software.common.util.WebResponseUtils;
@RequestMapping("/api/v1/convert")
@Slf4j
@Tag(name = "Convert", description = "Convert APIs")
@RequiredArgsConstructor
public class ConvertPDFToPDFA {
private final RuntimePathConfig runtimePathConfig;
private static final String ICC_RESOURCE_PATH = "/icc/sRGB2014.icc";
private static final int PDFA_COMPATIBILITY_POLICY = 1;
@ -1043,26 +1048,33 @@ public class ConvertPDFToPDFA {
? "pdf:writer_pdf_Export:{\"SelectPdfVersion\":{\"type\":\"long\",\"value\":\"2\"}}"
: "pdf:writer_pdf_Export:{\"SelectPdfVersion\":{\"type\":\"long\",\"value\":\"1\"}}";
// Prepare LibreOffice command
List<String> command =
new ArrayList<>(
Arrays.asList(
"soffice",
"--headless",
"--nologo",
"--convert-to",
pdfFilter,
"--outdir",
tempOutputDir.toString(),
tempInputFile.toString()));
Path libreOfficeProfile = Files.createTempDirectory("libreoffice_profile_");
try {
// Prepare LibreOffice command
List<String> command =
new ArrayList<>(
Arrays.asList(
runtimePathConfig.getSOfficePath(),
"-env:UserInstallation="
+ libreOfficeProfile.toUri().toString(),
"--headless",
"--nologo",
"--convert-to",
pdfFilter,
"--outdir",
tempOutputDir.toString(),
tempInputFile.toString()));
ProcessExecutorResult returnCode =
ProcessExecutor.getInstance(ProcessExecutor.Processes.LIBRE_OFFICE)
.runCommandWithOutputHandling(command);
ProcessExecutorResult returnCode =
ProcessExecutor.getInstance(ProcessExecutor.Processes.LIBRE_OFFICE)
.runCommandWithOutputHandling(command);
if (returnCode.getRc() != 0) {
log.error("PDF/A conversion failed with return code: {}", returnCode.getRc());
throw ExceptionUtils.createPdfaConversionFailedException();
if (returnCode.getRc() != 0) {
log.error("PDF/A conversion failed with return code: {}", returnCode.getRc());
throw ExceptionUtils.createPdfaConversionFailedException();
}
} finally {
FileUtils.deleteQuietly(libreOfficeProfile.toFile());
}
// Get the output file

View File

@ -37,10 +37,17 @@ import lombok.extern.slf4j.Slf4j;
import stirling.software.SPDF.config.EndpointConfiguration;
import stirling.software.SPDF.model.api.misc.ProcessPdfWithOcrRequest;
import stirling.software.common.configuration.RuntimePathConfig;
import stirling.software.common.model.ApplicationProperties;
import stirling.software.common.service.CustomPDFDocumentFactory;
import stirling.software.common.util.*;
import stirling.software.common.util.ExceptionUtils;
import stirling.software.common.util.GeneralUtils;
import stirling.software.common.util.ProcessExecutor;
import stirling.software.common.util.ProcessExecutor.ProcessExecutorResult;
import stirling.software.common.util.TempDirectory;
import stirling.software.common.util.TempFile;
import stirling.software.common.util.TempFileManager;
import stirling.software.common.util.WebResponseUtils;
@RestController
@RequestMapping("/api/v1/misc")
@ -53,6 +60,7 @@ public class OCRController {
private final CustomPDFDocumentFactory pdfDocumentFactory;
private final TempFileManager tempFileManager;
private final EndpointConfiguration endpointConfiguration;
private final RuntimePathConfig runtimePathConfig;
private boolean isOcrMyPdfEnabled() {
return endpointConfiguration.isGroupEnabled("OCRmyPDF");
@ -64,7 +72,7 @@ public class OCRController {
/** Gets the list of available Tesseract languages from the tessdata directory */
public List<String> getAvailableTesseractLanguages() {
String tessdataDir = applicationProperties.getSystem().getTessdataDir();
String tessdataDir = runtimePathConfig.getTessDataPath();
File[] files = new File(tessdataDir).listFiles();
if (files == null) {
return Collections.emptyList();
@ -80,9 +88,10 @@ public class OCRController {
@Operation(
summary = "Process a PDF file with OCR",
description =
"This endpoint processes a PDF file using OCR (Optical Character Recognition). "
+ "Users can specify languages, sidecar, deskew, clean, cleanFinal, ocrType, ocrRenderType, and removeImagesAfter options. "
+ "Uses OCRmyPDF if available, falls back to Tesseract. Input:PDF Output:PDF Type:SI-Conditional")
"This endpoint processes a PDF file using OCR (Optical Character Recognition). Users can"
+ " specify languages, sidecar, deskew, clean, cleanFinal, ocrType, ocrRenderType,"
+ " and removeImagesAfter options. Uses OCRmyPDF if available, falls back to"
+ " Tesseract. Input:PDF Output:PDF Type:SI-Conditional")
public ResponseEntity<byte[]> processPdfWithOCR(
@ModelAttribute ProcessPdfWithOcrRequest request)
throws IOException, InterruptedException {
@ -217,7 +226,7 @@ public class OCRController {
List<String> command =
new ArrayList<>(
Arrays.asList(
"ocrmypdf",
runtimePathConfig.getOcrMyPdfPath(),
"--verbose",
"2",
"--output-type",

View File

@ -14,16 +14,20 @@ import io.swagger.v3.oas.annotations.Hidden;
import io.swagger.v3.oas.annotations.tags.Tag;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import stirling.software.common.configuration.RuntimePathConfig;
import stirling.software.common.model.ApplicationProperties;
import stirling.software.common.util.CheckProgramInstall;
@Controller
@Tag(name = "Misc", description = "Miscellaneous APIs")
@RequiredArgsConstructor
@Slf4j
public class OtherWebController {
private final ApplicationProperties applicationProperties;
private final RuntimePathConfig runtimePathConfig;
@GetMapping("/compress-pdf")
@Hidden
@ -120,7 +124,7 @@ public class OtherWebController {
}
public List<String> getAvailableTesseractLanguages() {
String tessdataDir = applicationProperties.getSystem().getTessdataDir();
String tessdataDir = runtimePathConfig.getTessDataPath();
File[] files = new File(tessdataDir).listFiles();
if (files == null) {
return Collections.emptyList();

View File

@ -115,7 +115,7 @@ system:
showUpdate: false # see when a new update is available
showUpdateOnlyAdmin: false # only admins can see when a new update is available, depending on showUpdate it must be set to 'true'
customHTMLFiles: false # enable to have files placed in /customFiles/templates override the existing template HTML files
tessdataDir: /usr/share/tessdata # path to the directory containing the Tessdata files. This setting is relevant for Windows systems. For Windows users, this path should be adjusted to point to the appropriate directory where the Tessdata files are stored.
tessdataDir: "" # path to the directory containing the Tessdata files. This setting is relevant for Windows systems. For Windows users, this path should be adjusted to point to the appropriate directory where the Tessdata files are stored.
enableAnalytics: null # Master toggle for analytics: set to 'true' to enable all analytics, 'false' to disable all analytics, or leave as 'null' to prompt admin on first launch
enablePosthog: null # Enable PostHog analytics (open-source product analytics): set to 'true' to enable, 'false' to disable, or 'null' to enable by default when analytics is enabled
enableScarf: null # Enable Scarf pixel: set to 'true' to enable, 'false' to disable, or 'null' to enable by default when analytics is enabled
@ -150,6 +150,8 @@ system:
weasyprint: '' # Defaults to /opt/venv/bin/weasyprint
unoconvert: '' # Defaults to /opt/venv/bin/unoconvert
calibre: '' # Defaults to /usr/bin/ebook-convert
ocrmypdf: '' # Defaults to /usr/bin/ocrmypdf
soffice: '' # Defaults to /usr/bin/soffice
fileUploadLimit: '' # Defaults to "". No limit when string is empty. Set a number, between 0 and 999, followed by one of the following strings to set a limit. "KB", "MB", "GB".
tempFileManagement:
baseTmpDir: '' # Defaults to java.io.tmpdir/stirling-pdf

View File

@ -32,6 +32,8 @@ class ExternalAppDepConfigTest {
void setUp() {
when(runtimePathConfig.getWeasyPrintPath()).thenReturn("/custom/weasyprint");
when(runtimePathConfig.getUnoConvertPath()).thenReturn("/custom/unoconvert");
when(runtimePathConfig.getCalibrePath()).thenReturn("/custom/calibre");
when(runtimePathConfig.getOcrMyPdfPath()).thenReturn("/custom/ocrmypdf");
lenient()
.when(endpointConfiguration.getEndpointsForGroup(anyString()))
.thenReturn(Set.of());
@ -45,6 +47,8 @@ class ExternalAppDepConfigTest {
assertEquals(List.of("Weasyprint"), mapping.get("/custom/weasyprint"));
assertEquals(List.of("Unoconvert"), mapping.get("/custom/unoconvert"));
assertEquals(List.of("Calibre"), mapping.get("/custom/calibre"));
assertEquals(List.of("OCRmyPDF"), mapping.get("/custom/ocrmypdf"));
assertEquals(List.of("Ghostscript"), mapping.get("gs"));
}

View File

@ -1,8 +1,8 @@
services:
stirling-pdf:
container_name: Stirling-PDF-Fat-Disable-Endpoints
image: docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest-fat
# image: docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest-fat
image: ghcr.io/stirling-tools/stirling-pdf-test:fat
deploy:
resources:
limits:

View File

@ -1,7 +1,8 @@
services:
stirling-pdf:
container_name: Stirling-PDF-Security-Fat
image: docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest-fat
# image: docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest-fat
image: ghcr.io/stirling-tools/stirling-pdf-test:fat
deploy:
resources:
limits:

View File

@ -1,7 +1,8 @@
services:
stirling-pdf:
container_name: Stirling-PDF-Security
image: docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest
# image: docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest
image: ghcr.io/stirling-tools/stirling-pdf-test:latest
deploy:
resources:
limits:
@ -22,9 +23,9 @@ services:
SECURITY_ENABLELOGIN: "true"
SECURITY_OAUTH2_ENABLED: "true"
SECURITY_OAUTH2_AUTOCREATEUSER: "true" # This is set to true to allow auto-creation of non-existing users in Stirling-PDF
SECURITY_OAUTH2_ISSUER: "https://accounts.google.com" # Change with any other provider that supports OpenID Connect Discovery (/.well-known/openid-configuration) end-point
SECURITY_OAUTH2_ISSUER: "https://accounts.google.com" # Change with any other provider that supports OpenID Connect Discovery (/.well-known/openid-configuration) end-point
SECURITY_OAUTH2_CLIENTID: "<YOUR CLIENT ID>.apps.googleusercontent.com" # Client ID from your provider
SECURITY_OAUTH2_CLIENTSECRET: "<YOUR CLIENT SECRET>" # Client Secret from your provider
SECURITY_OAUTH2_CLIENTSECRET: "<YOUR CLIENT SECRET>" # Client Secret from your provider
SECURITY_OAUTH2_SCOPES: "openid,profile,email" # Expected OAuth2 Scope
SECURITY_OAUTH2_USEASUSERNAME: "email" # Default is 'email'; custom fields can be used as the username
SECURITY_OAUTH2_PROVIDER: "google" # Set this to your OAuth provider's name, e.g., 'google' or 'keycloak'

View File

@ -1,7 +1,8 @@
services:
stirling-pdf:
container_name: Stirling-PDF-Security
image: docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest
# image: docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest
image: ghcr.io/stirling-tools/stirling-pdf-test:latest
deploy:
resources:
limits:

View File

@ -1,7 +1,8 @@
services:
stirling-pdf:
container_name: Stirling-PDF-Ultra-Lite-Security
image: docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest-ultra-lite
# image: docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest-ultra-lite
image: ghcr.io/stirling-tools/stirling-pdf-test:ultra-lite
deploy:
resources:
limits:

View File

@ -1,7 +1,8 @@
services:
stirling-pdf:
container_name: Stirling-PDF-Ultra-Lite
image: docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest-ultra-lite
# image: docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest-ultra-lite
image: ghcr.io/stirling-tools/stirling-pdf-test:ultra-lite
deploy:
resources:
limits:

View File

@ -1,7 +1,8 @@
services:
stirling-pdf:
container_name: Stirling-PDF
image: docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest
# image: docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest
image: ghcr.io/stirling-tools/stirling-pdf-test:latest
deploy:
resources:
limits:

View File

@ -1,7 +1,8 @@
services:
stirling-pdf:
container_name: Stirling-PDF-Security-Fat-with-login
image: docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest-fat
# image: docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest-fat
image: ghcr.io/stirling-tools/stirling-pdf-test:fat
deploy:
resources:
limits:

View File

@ -1,42 +1,188 @@
#!/bin/bash
# This script initializes Stirling PDF without OCR features.
set -euo pipefail
export JAVA_TOOL_OPTIONS="${JAVA_BASE_OPTS} ${JAVA_CUSTOM_OPTS}"
echo "running with JAVA_TOOL_OPTIONS ${JAVA_BASE_OPTS} ${JAVA_CUSTOM_OPTS}"
log() { printf '%s\n' "$*" >&2; }
command_exists() { command -v "$1" >/dev/null 2>&1; }
# Update the user and group IDs as per environment variables
if [ ! -z "$PUID" ] && [ "$PUID" != "$(id -u stirlingpdfuser)" ]; then
usermod -o -u "$PUID" stirlingpdfuser || true
SU_EXEC_BIN=""
if command_exists su-exec; then
SU_EXEC_BIN="su-exec"
elif command_exists gosu; then
SU_EXEC_BIN="gosu"
fi
CURRENT_USER="$(id -un)"
CURRENT_UID="$(id -u)"
SWITCH_USER_WARNING_EMITTED=false
if [ ! -z "$PGID" ] && [ "$PGID" != "$(getent group stirlingpdfgroup | cut -d: -f3)" ]; then
groupmod -o -g "$PGID" stirlingpdfgroup || true
fi
umask "$UMASK" || true
warn_switch_user_once() {
if [ "$SWITCH_USER_WARNING_EMITTED" = false ]; then
log "WARNING: Unable to switch to user ${RUNTIME_USER:-stirlingpdfuser}; running command as ${CURRENT_USER}."
SWITCH_USER_WARNING_EMITTED=true
fi
}
if [[ "$INSTALL_BOOK_AND_ADVANCED_HTML_OPS" == "true" && "$FAT_DOCKER" != "true" ]]; then
echo "issue with calibre in current version, feature currently disabled on Stirling-PDF"
#apk add --no-cache calibre@testing
run_as_runtime_user() {
if [ "$CURRENT_USER" = "$RUNTIME_USER" ]; then
"$@"
elif [ "$CURRENT_UID" -eq 0 ] && [ -n "$SU_EXEC_BIN" ]; then
"$SU_EXEC_BIN" "$RUNTIME_USER" "$@"
else
warn_switch_user_once
"$@"
fi
}
# ---------- VERSION_TAG ----------
# Load VERSION_TAG from file if not provided via environment.
if [ -z "${VERSION_TAG:-}" ] && [ -f /etc/stirling_version ]; then
VERSION_TAG="$(tr -d '\r\n' < /etc/stirling_version)"
export VERSION_TAG
fi
if [[ "$FAT_DOCKER" != "true" ]]; then
/scripts/download-security-jar.sh
fi
# ---------- JAVA_OPTS ----------
# Configure Java runtime options.
export JAVA_TOOL_OPTIONS="${JAVA_BASE_OPTS:-} ${JAVA_CUSTOM_OPTS:-}"
export JAVA_TOOL_OPTIONS="-Djava.awt.headless=true ${JAVA_TOOL_OPTIONS}"
log "running with JAVA_TOOL_OPTIONS=${JAVA_TOOL_OPTIONS}"
log "Running Stirling PDF with DISABLE_ADDITIONAL_FEATURES=${DISABLE_ADDITIONAL_FEATURES:-} and VERSION_TAG=${VERSION_TAG:-<unset>}"
if [[ -n "$LANGS" ]]; then
/scripts/installFonts.sh $LANGS
fi
# ---------- UMASK ----------
# Set default permissions mask.
UMASK_VAL="${UMASK:-022}"
umask "$UMASK_VAL" 2>/dev/null || umask 022
echo "Setting permissions and ownership for necessary directories..."
# Ensure temp directory exists and has correct permissions
mkdir -p /tmp/stirling-pdf || true
# Attempt to change ownership of directories and files
if chown -R stirlingpdfuser:stirlingpdfgroup $HOME /logs /scripts /usr/share/fonts/opentype/noto /configs /customFiles /pipeline /tmp/stirling-pdf /app.jar; then
chmod -R 755 /logs /scripts /usr/share/fonts/opentype/noto /configs /customFiles /pipeline /tmp/stirling-pdf /app.jar || true
# If chown succeeds, execute the command as stirlingpdfuser
exec su-exec stirlingpdfuser "$@"
# ---------- XDG_RUNTIME_DIR ----------
# Create the runtime directory, respecting UID/GID settings.
RUNTIME_USER="stirlingpdfuser"
if id -u "$RUNTIME_USER" >/dev/null 2>&1; then
RUID="$(id -u "$RUNTIME_USER")"
RGRP="$(id -gn "$RUNTIME_USER")"
else
# If chown fails, execute the command without changing the user context
echo "[WARN] Chown failed, running as host user"
exec "$@"
RUID="$(id -u)"
RGRP="$(id -gn)"
RUNTIME_USER="$(id -un)"
fi
CURRENT_USER="$(id -un)"
CURRENT_UID="$(id -u)"
export XDG_RUNTIME_DIR="/tmp/xdg-${RUID}"
mkdir -p "${XDG_RUNTIME_DIR}" || true
if [ "$(id -u)" -eq 0 ]; then
chown "${RUNTIME_USER}:${RGRP}" "${XDG_RUNTIME_DIR}" 2>/dev/null || true
fi
chmod 700 "${XDG_RUNTIME_DIR}" 2>/dev/null || true
log "XDG_RUNTIME_DIR=${XDG_RUNTIME_DIR}"
# ---------- Optional ----------
# Disable advanced HTML operations if required.
if [[ "${INSTALL_BOOK_AND_ADVANCED_HTML_OPS:-false}" == "true" && "${FAT_DOCKER:-true}" != "true" ]]; then
log "issue with calibre in current version, feature currently disabled on Stirling-PDF"
fi
# Download security JAR in non-fat builds.
if [[ "${FAT_DOCKER:-true}" != "true" && -x /scripts/download-security-jar.sh ]]; then
/scripts/download-security-jar.sh || true
fi
# ---------- UID/GID remap ----------
# Remap user/group IDs to match container runtime settings.
if [ "$(id -u)" -eq 0 ]; then
if id -u stirlingpdfuser >/dev/null 2>&1; then
if [ -n "${PUID:-}" ] && [ "$PUID" != "$(id -u stirlingpdfuser)" ]; then
usermod -o -u "$PUID" stirlingpdfuser || true
chown stirlingpdfuser:stirlingpdfgroup "${XDG_RUNTIME_DIR}" 2>/dev/null || true
fi
fi
if getent group stirlingpdfgroup >/dev/null 2>&1; then
if [ -n "${PGID:-}" ] && [ "$PGID" != "$(getent group stirlingpdfgroup | cut -d: -f3)" ]; then
groupmod -o -g "$PGID" stirlingpdfgroup || true
fi
fi
fi
# ---------- Permissions ----------
# Ensure required directories exist and set correct permissions.
log "Setting permissions..."
mkdir -p /tmp/stirling-pdf /logs /configs /customFiles /pipeline || true
CHOWN_PATHS=("$HOME" "/logs" "/scripts" "/configs" "/customFiles" "/pipeline" "/tmp/stirling-pdf" "/app.jar")
[ -d /usr/share/fonts/truetype ] && CHOWN_PATHS+=("/usr/share/fonts/truetype")
CHOWN_OK=true
for p in "${CHOWN_PATHS[@]}"; do
if [ -e "$p" ]; then
chown -R "stirlingpdfuser:stirlingpdfgroup" "$p" 2>/dev/null || CHOWN_OK=false
chmod -R 755 "$p" 2>/dev/null || true
fi
done
# ---------- Xvfb ----------
# Start a virtual framebuffer for GUI-based LibreOffice interactions.
if command_exists Xvfb; then
log "Starting Xvfb on :99"
Xvfb :99 -screen 0 1024x768x24 -ac +extension GLX +render -noreset > /dev/null 2>&1 &
export DISPLAY=:99
sleep 1
else
log "Xvfb not installed; skipping virtual display setup"
fi
# ---------- unoserver ----------
# Start LibreOffice UNO server for document conversions.
UNOSERVER_BIN="$(command -v unoserver || true)"
UNOCONVERT_BIN="$(command -v unoconvert || true)"
UNOSERVER_PID=""
if [ -n "$UNOSERVER_BIN" ] && [ -n "$UNOCONVERT_BIN" ]; then
LIBREOFFICE_PROFILE="${HOME:-/home/${RUNTIME_USER}}/.libreoffice_uno_${RUID}"
run_as_runtime_user mkdir -p "$LIBREOFFICE_PROFILE"
log "Starting unoserver on 127.0.0.1:2003"
run_as_runtime_user "$UNOSERVER_BIN" \
--interface 127.0.0.1 \
--port 2003 \
--uno-port 2004 \
&
UNOSERVER_PID=$!
log "unoserver PID: $UNOSERVER_PID (Profile: $LIBREOFFICE_PROFILE)"
# Wait until UNO server is ready.
log "Waiting for unoserver..."
for _ in {1..20}; do
if run_as_runtime_user "$UNOCONVERT_BIN" --version >/dev/null 2>&1; then
log "unoserver is ready!"
break
fi
sleep 1
done
if ! run_as_runtime_user "$UNOCONVERT_BIN" --version >/dev/null 2>&1; then
log "ERROR: unoserver failed!"
if [ -n "$UNOSERVER_PID" ]; then
kill "$UNOSERVER_PID" 2>/dev/null || true
wait "$UNOSERVER_PID" 2>/dev/null || true
fi
exit 1
fi
else
log "unoserver/unoconvert not installed; skipping UNO setup"
fi
# ---------- Java ----------
# Start Stirling PDF Java application.
log "Starting Stirling PDF"
JAVA_CMD=(
java
-Dfile.encoding=UTF-8
-Djava.io.tmpdir=/tmp/stirling-pdf
-jar /app.jar
)
if [ "$CURRENT_USER" = "$RUNTIME_USER" ]; then
exec "${JAVA_CMD[@]}"
elif [ "$CURRENT_UID" -eq 0 ] && [ -n "$SU_EXEC_BIN" ]; then
exec "$SU_EXEC_BIN" "$RUNTIME_USER" "${JAVA_CMD[@]}"
else
warn_switch_user_once
exec "${JAVA_CMD[@]}"
fi

View File

@ -1,36 +1,110 @@
#!/bin/bash
# This script initializes environment variables and paths,
# prepares Tesseract data directories, and then runs the main init script.
# Copy the original tesseract-ocr files to the volume directory without overwriting existing files
echo "Copying original files without overwriting existing files"
mkdir -p /usr/share/tessdata
cp -rn /usr/share/tessdata-original/* /usr/share/tessdata
set -euo pipefail
if [ -d /usr/share/tesseract-ocr/4.00/tessdata ]; then
cp -r /usr/share/tesseract-ocr/4.00/tessdata/* /usr/share/tessdata || true;
append_env_path() {
local target="$1" current="$2" separator=":"
if [ -d "$target" ] && [[ ":${current}:" != *":${target}:"* ]]; then
if [ -n "$current" ]; then
printf '%s' "${target}${separator}${current}"
else
printf '%s' "${target}"
fi
else
printf '%s' "$current"
fi
}
python_site_dir() {
local venv_dir="$1"
local python_bin="$venv_dir/bin/python"
if [ -x "$python_bin" ]; then
local py_tag
if py_tag="$("$python_bin" -c 'import sys; print(f"python{sys.version_info.major}.{sys.version_info.minor}")' 2>/dev/null)" \
&& [ -n "$py_tag" ] \
&& [ -d "$venv_dir/lib/$py_tag/site-packages" ]; then
printf '%s' "$venv_dir/lib/$py_tag/site-packages"
fi
fi
}
# === LD_LIBRARY_PATH ===
# Adjust the library path depending on CPU architecture.
ARCH=$(uname -m)
case "$ARCH" in
x86_64)
[ -d /usr/lib/x86_64-linux-gnu ] && export LD_LIBRARY_PATH="/usr/lib/x86_64-linux-gnu${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"
;;
aarch64)
[ -d /usr/lib/aarch64-linux-gnu ] && export LD_LIBRARY_PATH="/usr/lib/aarch64-linux-gnu${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"
;;
esac
# Add LibreOffice program directory to library path if available.
if [ -d /usr/lib/libreoffice/program ]; then
export LD_LIBRARY_PATH="/usr/lib/libreoffice/program${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"
fi
# === Python PATH ===
# Add virtual environments to PATH and PYTHONPATH.
for dir in /opt/venv/bin /opt/unoserver-venv/bin; do
PATH="$(append_env_path "$dir" "$PATH")"
done
export PATH
PYTHON_PATH_ENTRIES=()
for venv in /opt/venv /opt/unoserver-venv; do
if [ -d "$venv" ]; then
site_dir="$(python_site_dir "$venv")"
[ -n "${site_dir:-}" ] && PYTHON_PATH_ENTRIES+=("$site_dir")
fi
done
if [ ${#PYTHON_PATH_ENTRIES[@]} -gt 0 ]; then
PYTHONPATH="$(IFS=:; printf '%s' "${PYTHON_PATH_ENTRIES[*]}")${PYTHONPATH:+:$PYTHONPATH}"
export PYTHONPATH
fi
# # === tessdata ===
# # Prepare Tesseract OCR data directory.
REAL_TESSDATA="/usr/share/tesseract-ocr/5/tessdata"
SEC_TESSDATA="/usr/share/tessdata"
log_warn() {
echo "[init][warn] $*" >&2
}
if [ -d "$REAL_TESSDATA" ] && [ -w "$REAL_TESSDATA" ]; then
log_warn "Skipping tessdata adjustments; directory writable: $REAL_TESSDATA"
else
log_warn "Skipping tessdata adjustments; directory missing or not writable: $REAL_TESSDATA"
fi
if [ -d /usr/share/tesseract-ocr/5/tessdata ]; then
cp -r /usr/share/tesseract-ocr/5/tessdata/* /usr/share/tessdata || true;
REAL_TESSDATA="/usr/share/tesseract-ocr/5/tessdata"
log_warn "Using /usr/share/tesseract-ocr/5/tessdata as TESSDATA_PREFIX"
elif [ -d /usr/share/tessdata ]; then
REAL_TESSDATA="/usr/share/tessdata"
log_warn "Using /usr/share/tessdata as TESSDATA_PREFIX"
elif [ -d /tessdata ]; then
REAL_TESSDATA="/tessdata"
log_warn "Using /tessdata as TESSDATA_PREFIX"
else
REAL_TESSDATA=""
log_warn "No tessdata directory found"
fi
# Check if TESSERACT_LANGS environment variable is set and is not empty
if [[ -n "$TESSERACT_LANGS" ]]; then
# Convert comma-separated values to a space-separated list
SPACE_SEPARATED_LANGS=$(echo $TESSERACT_LANGS | tr ',' ' ')
pattern='^[a-zA-Z]{2,4}(_[a-zA-Z]{2,4})?$'
# Install each language pack
for LANG in $SPACE_SEPARATED_LANGS; do
if [[ $LANG =~ $pattern ]]; then
apk add --no-cache "tesseract-ocr-data-$LANG"
else
echo "Skipping invalid language code"
fi
done
if [ -n "$REAL_TESSDATA" ]; then
export TESSDATA_PREFIX="$REAL_TESSDATA"
fi
# Ensure temp directory exists with correct permissions before running main init
mkdir -p /tmp/stirling-pdf || true
# === Temp dir ===
# Ensure the temporary directory exists and has proper permissions.
mkdir -p /tmp/stirling-pdf
chown -R stirlingpdfuser:stirlingpdfgroup /tmp/stirling-pdf || true
chmod -R 755 /tmp/stirling-pdf || true
/scripts/init-without-ocr.sh "$@"
# === Start application ===
# Run the main init script that handles the full startup logic.
exec /scripts/init-without-ocr.sh

View File

@ -140,6 +140,9 @@ system:
operations:
weasyprint: '' # Defaults to /opt/venv/bin/weasyprint
unoconvert: '' # Defaults to /opt/venv/bin/unoconvert
calibre: '' # Defaults to /usr/bin/ebook-convert
ocrmypdf: '' # Defaults to /usr/bin/ocrmypdf
soffice: '' # Defaults to /usr/bin/soffice
fileUploadLimit: '' # Defaults to "". No limit when string is empty. Set a number, between 0 and 999, followed by one of the following strings to set a limit. "KB", "MB", "GB".
tempFileManagement:
baseTmpDir: '' # Defaults to java.io.tmpdir/stirling-pdf

View File

@ -16,27 +16,47 @@ find_root() {
PROJECT_ROOT=$(find_root)
# Function to check the health of the service with a timeout of 80 seconds
# Function to check application readiness via HTTP instead of Docker's health status
check_health() {
local service_name=$1
local container_name=$1 # real container name
local compose_file=$2
local end=$((SECONDS+60))
local timeout=80 # total timeout in seconds
local interval=3 # poll interval in seconds
local end=$((SECONDS + timeout))
local last_code="000"
echo -n "Waiting for $service_name to become healthy..."
until [ "$(docker inspect --format='{{if .State.Health}}{{.State.Health.Status}}{{else}}healthy{{end}}' "$service_name")" == "healthy" ] || [ $SECONDS -ge $end ]; do
sleep 3
echo -n "."
if [ $SECONDS -ge $end ]; then
echo -e "\n$service_name health check timed out after 80 seconds."
echo "Printing logs for $service_name:"
docker logs "$service_name"
return 1
echo "Waiting for $container_name to become reachable on http://localhost:8080/ (timeout ${timeout}s)..."
while [ $SECONDS -lt $end ]; do
# Optional: check if container is running at all (nice for debugging)
if ! docker ps --format '{{.Names}}' | grep -Fxq "$container_name"; then
echo " Container $container_name not running yet (still waiting)..."
fi
# Try simple HTTP GET on the root page
last_code=$(curl -s -o /dev/null -w '%{http_code}' "http://localhost:8080/") || last_code="000"
# Treat any 2xx or 3xx as "ready"
if [ "$last_code" -ge 200 ] && [ "$last_code" -lt 400 ]; then
echo "$container_name is reachable over HTTP (status $last_code)."
echo "Printing logs for $container_name:"
docker logs "$container_name" || true
return 0
fi
echo " Still waiting for HTTP readiness, current status: $last_code"
sleep "$interval"
done
echo -e "\n$service_name is healthy!"
echo "Printing logs for $service_name:"
docker logs "$service_name"
return 0
echo "$container_name did not become HTTP-ready within ${timeout}s (last HTTP status: $last_code)."
# For extra debugging: show Docker health status, but DO NOT depend on it
local docker_health
docker_health=$(docker inspect --format='{{if .State.Health}}{{.State.Health.Status}}{{else}}(no healthcheck){{end}}' "$container_name" 2>/dev/null || echo "inspect failed")
echo "Docker-reported health status for $container_name: $docker_health"
echo "Printing logs for $container_name:"
docker logs "$container_name" || true
return 1
}
# Function to capture file list from a Docker container
@ -48,7 +68,7 @@ capture_file_list() {
# Get all files in one command, output directly from Docker to avoid path issues
# Skip proc, sys, dev, and the specified LibreOffice config directory
# Also skip PDFBox and LibreOffice temporary files
docker exec $container_name sh -c "find / -type f \
docker exec "$container_name" sh -c "find / -type f \
-not -path '*/proc/*' \
-not -path '*/sys/*' \
-not -path '*/dev/*' \
@ -69,7 +89,7 @@ capture_file_list() {
echo "Trying alternative approach..."
# Alternative simpler approach - just get paths as a fallback
docker exec $container_name sh -c "find / -type f \
docker exec "$container_name" sh -c "find / -type f \
-not -path '*/proc/*' \
-not -path '*/sys/*' \
-not -path '*/dev/*' \
@ -106,14 +126,8 @@ compare_file_lists() {
# Check if files exist and have content
if [ ! -s "$before_file" ] || [ ! -s "$after_file" ]; then
echo "WARNING: One or both file lists are empty."
if [ ! -s "$before_file" ]; then
echo "Before file is empty: $before_file"
fi
if [ ! -s "$after_file" ]; then
echo "After file is empty: $after_file"
fi
if [ ! -s "$before_file" ]; then echo "Before file is empty: $before_file"; fi
if [ ! -s "$after_file" ]; then echo "After file is empty: $after_file"; fi
# Create empty diff file
> "$diff_file"
@ -132,7 +146,6 @@ compare_file_lists() {
echo "No temporary files found in the after snapshot."
fi
fi
return 0
fi
@ -169,7 +182,6 @@ compare_file_lists() {
else
echo "No file changes detected during test."
fi
return 0
}
@ -220,19 +232,33 @@ verify_app_version() {
# Function to test a Docker Compose configuration
test_compose() {
local compose_file=$1
local service_name=$2
local test_name=$2
local status=0
echo "Testing $compose_file configuration..."
echo "Testing ${compose_file} configuration..."
# Start up the Docker Compose service
docker-compose -f "$compose_file" up -d
# Wait for the service to become healthy
if check_health "$service_name" "$compose_file"; then
echo "$service_name test passed."
# Wait a moment for containers to appear
sleep 3
local container_name
container_name=$(docker-compose -f "$compose_file" ps --format '{{.Names}}' --filter "status=running" | head -n1)
if [[ -z "$container_name" ]]; then
echo "ERROR: No running container found for ${compose_file}"
docker-compose -f "$compose_file" ps
return 1
fi
echo "Started container: $container_name"
# Wait for the service to become healthy (HTTP-based)
if check_health "$container_name" "$compose_file"; then
echo "${test_name} test passed."
else
echo "$service_name test failed."
echo "${test_name} test failed."
status=1
fi
@ -246,7 +272,6 @@ declare -a failed_tests
run_tests() {
local test_name=$1
local compose_file=$2
if test_compose "$compose_file" "$test_name"; then
passed_tests+=("$test_name")
else
@ -254,18 +279,18 @@ run_tests() {
fi
}
# Main testing routine
main() {
SECONDS=0
cd "$PROJECT_ROOT"
export DOCKER_CLI_EXPERIMENTAL=enabled
export COMPOSE_DOCKER_CLI_BUILD=0
export DISABLE_ADDITIONAL_FEATURES=true
# Run the gradlew build command and check if it fails
# ==================================================================
# 1. Ultra-Lite (no additional features)
# ==================================================================
export DISABLE_ADDITIONAL_FEATURES=true
if ! ./gradlew clean build; then
echo "Gradle build failed with security disabled, exiting script."
exit 1
@ -276,11 +301,12 @@ main() {
EXPECTED_VERSION=$(get_expected_version)
echo "Expected version: $EXPECTED_VERSION"
# Building Docker images
# docker build --no-cache --pull --build-arg VERSION_TAG=alpha -t stirlingtools/stirling-pdf:latest -f ./Dockerfile .
docker build --build-arg VERSION_TAG=alpha -t docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest-ultra-lite -f ./Dockerfile.ultra-lite .
# Build Ultra-Lite image (GHCR tag, matching docker-compose-latest-ultra-lite.yml)
docker build --build-arg VERSION_TAG=alpha \
-t docker.stirlingpdf.com/stirlingtools/stirling-pdf:ultra-lite \
-f ./Dockerfile.ultra-lite .
# Test each configuration
# Test Ultra-Lite configuration
run_tests "Stirling-PDF-Ultra-Lite" "./exampleYmlFiles/docker-compose-latest-ultra-lite.yml"
echo "Testing webpage accessibility..."
@ -302,36 +328,27 @@ main() {
echo "Version verification failed for Stirling-PDF-Ultra-Lite"
fi
docker-compose -f "./exampleYmlFiles/docker-compose-latest-ultra-lite.yml" down
# run_tests "Stirling-PDF" "./exampleYmlFiles/docker-compose-latest.yml"
# docker-compose -f "./exampleYmlFiles/docker-compose-latest.yml" down
docker-compose -f "./exampleYmlFiles/docker-compose-latest-ultra-lite.yml" down -v
# ==================================================================
# 2. Full Fat + Security
# ==================================================================
export DISABLE_ADDITIONAL_FEATURES=false
# Run the gradlew build command and check if it fails
if ! ./gradlew clean build; then
echo "Gradle build failed with security enabled, exiting script."
exit 1
fi
# Get expected version after the security-enabled build
echo "Getting expected version from Gradle (security enabled)..."
EXPECTED_VERSION=$(get_expected_version)
echo "Expected version with security enabled: $EXPECTED_VERSION"
# Building Docker images with security enabled
# docker build --no-cache --pull --build-arg VERSION_TAG=alpha -t stirlingtools/stirling-pdf:latest -f ./Dockerfile .
# docker build --no-cache --pull --build-arg VERSION_TAG=alpha -t stirlingtools/stirling-pdf:latest-ultra-lite -f ./Dockerfile.ultra-lite .
docker build --no-cache --pull --build-arg VERSION_TAG=alpha -t docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest-fat -f ./Dockerfile.fat .
# Test each configuration with security
# run_tests "Stirling-PDF-Ultra-Lite-Security" "./exampleYmlFiles/docker-compose-latest-ultra-lite-security.yml"
# docker-compose -f "./exampleYmlFiles/docker-compose-latest-ultra-lite-security.yml" down
# run_tests "Stirling-PDF-Security" "./exampleYmlFiles/docker-compose-latest-security.yml"
# docker-compose -f "./exampleYmlFiles/docker-compose-latest-security.yml" down
# Build Fat (Security) image for GHCR tag used in all 'fat' compose files
docker build --no-cache --pull --build-arg VERSION_TAG=alpha \
-t docker.stirlingpdf.com/stirlingtools/stirling-pdf:fat \
-f ./Dockerfile.fat .
# Test fat + security compose
run_tests "Stirling-PDF-Security-Fat" "./exampleYmlFiles/docker-compose-latest-fat-security.yml"
echo "Testing webpage accessibility..."
@ -353,54 +370,50 @@ main() {
echo "Version verification failed for Stirling-PDF-Security-Fat"
fi
docker-compose -f "./exampleYmlFiles/docker-compose-latest-fat-security.yml" down
docker-compose -f "./exampleYmlFiles/docker-compose-latest-fat-security.yml" down -v
# ==================================================================
# 3. Regression test with login (test_cicd.yml)
# ==================================================================
run_tests "Stirling-PDF-Security-Fat-with-login" "./exampleYmlFiles/test_cicd.yml"
if [ $? -eq 0 ]; then
# Create directory for file snapshots if it doesn't exist
# Only run behave tests if the container started successfully
if [[ " ${passed_tests[*]} " =~ "Stirling-PDF-Security-Fat-with-login" ]]; then
CONTAINER_NAME=$(docker-compose -f "./exampleYmlFiles/test_cicd.yml" ps --format '{{.Names}}' --filter "status=running" | head -n1)
SNAPSHOT_DIR="$PROJECT_ROOT/testing/file_snapshots"
mkdir -p "$SNAPSHOT_DIR"
# Capture file list before running behave tests
BEFORE_FILE="$SNAPSHOT_DIR/files_before_behave.txt"
AFTER_FILE="$SNAPSHOT_DIR/files_after_behave.txt"
DIFF_FILE="$SNAPSHOT_DIR/files_diff.txt"
# Define container name variable for consistency
CONTAINER_NAME="Stirling-PDF-Security-Fat-with-login"
capture_file_list "$CONTAINER_NAME" "$BEFORE_FILE"
cd "testing/cucumber"
if python -m behave; then
# Wait 10 seconds before capturing the file list after tests
echo "Waiting 5 seconds for any file operations to complete..."
sleep 5
# Capture file list after running behave tests
cd "$PROJECT_ROOT"
capture_file_list "$CONTAINER_NAME" "$AFTER_FILE"
# Compare file lists
if compare_file_lists "$BEFORE_FILE" "$AFTER_FILE" "$DIFF_FILE" "$CONTAINER_NAME"; then
echo "No unexpected temporary files found."
passed_tests+=("Stirling-PDF-Regression")
passed_tests+=("Stirling-PDF-Regression $CONTAINER_NAME")
else
echo "WARNING: Unexpected temporary files detected after behave tests!"
failed_tests+=("Stirling-PDF-Regression-Temp-Files")
fi
passed_tests+=("Stirling-PDF-Regression")
passed_tests+=("Stirling-PDF-Regression $CONTAINER_NAME")
else
failed_tests+=("Stirling-PDF-Regression")
failed_tests+=("Stirling-PDF-Regression $CONTAINER_NAME")
echo "Printing docker logs of failed regression"
docker logs "$CONTAINER_NAME"
echo "Printed docker logs of failed regression"
# Still capture file list after failure for analysis
# Wait 10 seconds before capturing the file list
echo "Waiting 5 seconds before capturing file list..."
echo "Waiting 10 seconds before capturing file list..."
sleep 10
cd "$PROJECT_ROOT"
@ -408,9 +421,11 @@ main() {
compare_file_lists "$BEFORE_FILE" "$AFTER_FILE" "$DIFF_FILE" "$CONTAINER_NAME"
fi
fi
docker-compose -f "./exampleYmlFiles/test_cicd.yml" down -v
docker-compose -f "./exampleYmlFiles/test_cicd.yml" down
# ==================================================================
# 4. Disabled Endpoints Test
# ==================================================================
run_tests "Stirling-PDF-Fat-Disable-Endpoints" "./exampleYmlFiles/docker-compose-latest-fat-endpoints-disabled.yml"
echo "Testing disabled endpoints..."
@ -430,27 +445,27 @@ main() {
echo "Version verification failed for Stirling-PDF-Fat-Disable-Endpoints"
fi
docker-compose -f "./exampleYmlFiles/docker-compose-latest-fat-endpoints-disabled.yml" down
docker-compose -f "./exampleYmlFiles/docker-compose-latest-fat-endpoints-disabled.yml" down -v
# Report results
# ==================================================================
# Final Report
# ==================================================================
echo "All tests completed in $SECONDS seconds."
if [ ${#passed_tests[@]} -ne 0 ]; then
echo "Passed tests:"
for test in "${passed_tests[@]}"; do
echo -e "\e[32m$test\e[0m"
done
fi
for test in "${passed_tests[@]}"; do
echo -e "\e[32m$test\e[0m" # Green color for passed tests
done
if [ ${#failed_tests[@]} -ne 0 ]; then
echo "Failed tests:"
for test in "${failed_tests[@]}"; do
echo -e "\e[31m$test\e[0m"
done
fi
for test in "${failed_tests[@]}"; do
echo -e "\e[31m$test\e[0m" # Red color for failed tests
done
# Check if there are any failed tests and exit with an error code if so
if [ ${#failed_tests[@]} -ne 0 ]; then
echo "Some tests failed."
exit 1