mirror of
https://github.com/Frooodle/Stirling-PDF.git
synced 2025-12-18 20:04:17 +01:00
feat(docker-runtime): unified Debian-based images, dynamic path resolution & enhanced UNO/LibreOffice handling (#4880)
# Description of Changes ### What was changed This PR introduces a major refinement to the Docker runtime, system path resolution, conversion tooling, and integration logic across the codebase. Key improvements include: - Migration of **Dockerfile**, **Dockerfile.fat** to a unified Debian-based environment. - Introduction of **RuntimePathConfig** enhancements to dynamically resolve: - `weasyprint`, `unoconvert`, `calibre`, `ocrmypdf`, `soffice` - Tesseract `tessdata` paths with Docker-aware defaults. - Support for **UNO server (unoserver/unoconvert)** as primary document converter with automatic fallback to `soffice`. - Isolation of Python environments for WeasyPrint and UNO tooling. - Updated controllers and services to correctly inject `RuntimePathConfig`. - Improved process execution logic in converters and OCR handling. - Major updates to `init.sh` and `init-without-ocr.sh`: - Unified environment initialization - Proper UID/GID remapping - Safer permissions handling - Automatic Tesseract path detection - Reliable startup of headless LibreOffice + Xvfb + UNO server - Full test suite updates: - Adaptation to new conversion paths - Mocking of UNO and LibreOffice commands - More robust Docker test logic - Updated example docker-compose files referencing GHCR test images. - Expanded configuration schema for new operations paths. ### Why the change was made These changes address long-standing issues around: - Inconsistent or missing binary paths between image variants. - Reduced reliability of document conversions (UNO vs. soffice). - Lack of uniform runtime initialization across Docker images. - Repetitive environment setup logic split across multiple scripts. - Fragile test scenarios tied to Alpine-based images. Switching to a unified Debian-based runtime significantly improves: - Compatibility with LibreOffice, Calibre, WebEngine and graphics stack. - UNO stability for document conversions. - Tesseract deterministic behavior. - Debuggability and reliability of CI/CD Docker-based tests. The improvements to `RuntimePathConfig` ensure all system binaries are fully configurable and correctly detected at runtime. --- ## Checklist ### General - [x] I have read the [Contribution Guidelines](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/CONTRIBUTING.md) - [x] I have read the [Stirling-PDF Developer Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md) (if applicable) - [ ] I have read the [How to add new languages to Stirling-PDF](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md) (if applicable) - [x] I have performed a self-review of my own code - [x] My changes generate no new warnings ### Documentation - [ ] I have updated relevant docs on [Stirling-PDF's doc repo](https://github.com/Stirling-Tools/Stirling-Tools.github.io/blob/main/docs/) (if functionality has heavily changed) - [ ] I have read the section [Add New Translation Tags](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md#add-new-translation-tags) (for new translation tags only) ### Translations (if applicable) - [ ] I ran [`scripts/counter_translation.py`](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/docs/counter_translation.md) ### UI Changes (if applicable) - [ ] Screenshots or videos demonstrating the UI changes are attached (e.g., as comments or direct attachments in the PR) ### Testing (if applicable) - [x] I have tested my changes locally. Refer to the [Testing Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md#6-testing) for more details.
This commit is contained in:
parent
43345021bf
commit
886f9b379e
29
.github/config/.files.yaml
vendored
29
.github/config/.files.yaml
vendored
@ -6,22 +6,27 @@ app: &app
|
||||
- app/(common|core|proprietary)/src/main/java/**
|
||||
|
||||
openapi: &openapi
|
||||
- build.gradle
|
||||
- app/(common|core|proprietary)/build.gradle
|
||||
- app/(common|core|proprietary)/src/main/java/**
|
||||
- *build
|
||||
- *app
|
||||
|
||||
project: &project
|
||||
- app/(common|core|proprietary)/src/(main|test)/java/**
|
||||
- app/(common|core|proprietary)/build.gradle
|
||||
- 'app/(common|core|proprietary)/src/(main|test)/resources/**/!(messages_*.properties|*.md)*'
|
||||
- exampleYmlFiles/**
|
||||
- gradle/**
|
||||
- libs/**
|
||||
- 'testing/**/!(requirements*.txt|requirements*.in)*'
|
||||
- build.gradle
|
||||
docker: &docker
|
||||
- Dockerfile
|
||||
- Dockerfile.fat
|
||||
- Dockerfile.ultra-lite
|
||||
- ".github/workflows/build.yml"
|
||||
- scripts/init.sh
|
||||
- scripts/init-without-ocr.sh
|
||||
- exampleYmlFiles/**
|
||||
|
||||
project: &project
|
||||
- app/(common|core|proprietary)/src/(main|test)/java/**
|
||||
- *build
|
||||
- "app/(common|core|proprietary)/src/(main|test)/resources/**/!(messages_*.properties|*.md)*"
|
||||
- exampleYmlFiles/**
|
||||
- gradle/**
|
||||
- libs/**
|
||||
- "testing/**/!(requirements*.txt|requirements*.in)*"
|
||||
- *docker
|
||||
- gradle.properties
|
||||
- gradlew
|
||||
- gradlew.bat
|
||||
|
||||
267
.github/workflows/build.yml
vendored
267
.github/workflows/build.yml
vendored
@ -33,6 +33,7 @@ jobs:
|
||||
app: ${{ steps.changes.outputs.app }}
|
||||
project: ${{ steps.changes.outputs.project }}
|
||||
openapi: ${{ steps.changes.outputs.openapi }}
|
||||
docker: ${{ steps.changes.outputs.docker }}
|
||||
steps:
|
||||
- uses: actions/checkout@93cb6efe18208431cddfb8368fd83d5badbf9bfd # v5.0.1
|
||||
|
||||
@ -68,14 +69,10 @@ jobs:
|
||||
with:
|
||||
java-version: ${{ matrix.jdk-version }}
|
||||
distribution: "temurin"
|
||||
|
||||
- name: Setup Gradle
|
||||
uses: gradle/actions/setup-gradle@4d9f0ba0025fe599b4ebab900eb7f3a1d93ef4c2 # v5.0.0
|
||||
with:
|
||||
gradle-version: 8.14
|
||||
cache: gradle
|
||||
|
||||
- name: Build with Gradle and spring security ${{ matrix.spring-security }}
|
||||
run: ./gradlew clean build
|
||||
run: ./gradlew clean build -x spotlessApply -x spotlessCheck -x sonarqube
|
||||
env:
|
||||
DISABLE_ADDITIONAL_FEATURES: ${{ matrix.spring-security }}
|
||||
|
||||
@ -100,12 +97,14 @@ jobs:
|
||||
if [ ${#missing_reports[@]} -gt 0 ]; then
|
||||
echo "ERROR: The following required test report directories are missing:"
|
||||
printf '%s\n' "${missing_reports[@]}"
|
||||
exit 1
|
||||
echo "reports-present=false" >> "$GITHUB_OUTPUT"
|
||||
else
|
||||
echo "All required test report directories are present"
|
||||
echo "reports-present=true" >> "$GITHUB_OUTPUT"
|
||||
fi
|
||||
echo "All required test report directories are present"
|
||||
|
||||
- name: Upload Test Reports
|
||||
if: always()
|
||||
if: always() && steps.check-reports.outputs.reports-present == 'true'
|
||||
uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0
|
||||
with:
|
||||
name: test-reports-jdk-${{ matrix.jdk-version }}-spring-security-${{ matrix.spring-security }}
|
||||
@ -127,6 +126,7 @@ jobs:
|
||||
if-no-files-found: warn
|
||||
|
||||
- name: Add coverage to PR with spring security ${{ matrix.spring-security }} and JDK ${{ matrix.jdk-version }}
|
||||
if: steps.check-reports.outputs.reports-present == 'true'
|
||||
id: jacoco
|
||||
uses: madrapps/jacoco-report@50d3aff4548aa991e6753342d9ba291084e63848 # v1.7.2
|
||||
with:
|
||||
@ -155,15 +155,13 @@ jobs:
|
||||
with:
|
||||
java-version: "17"
|
||||
distribution: "temurin"
|
||||
|
||||
- name: Setup Gradle
|
||||
uses: gradle/actions/setup-gradle@4d9f0ba0025fe599b4ebab900eb7f3a1d93ef4c2 # v5.0.0
|
||||
cache: gradle
|
||||
|
||||
- name: Generate OpenAPI documentation
|
||||
run: ./gradlew :stirling-pdf:generateOpenApiDocs
|
||||
env:
|
||||
DISABLE_ADDITIONAL_FEATURES: true
|
||||
|
||||
|
||||
- name: Upload OpenAPI Documentation
|
||||
uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0
|
||||
with:
|
||||
@ -188,6 +186,7 @@ jobs:
|
||||
with:
|
||||
java-version: "17"
|
||||
distribution: "temurin"
|
||||
cache: gradle
|
||||
|
||||
- name: Check licenses for compatibility
|
||||
run: ./gradlew clean checkLicense
|
||||
@ -205,8 +204,14 @@ jobs:
|
||||
retention-days: 3
|
||||
|
||||
docker-compose-tests:
|
||||
if: needs.files-changed.outputs.project == 'true'
|
||||
needs: files-changed
|
||||
if: |
|
||||
needs.files-changed.outputs.project == 'true' &&
|
||||
(
|
||||
needs.files-changed.outputs.docker != 'true' ||
|
||||
needs.test-build-docker-images.result == 'success' ||
|
||||
needs.test-build-docker-images.result == 'skipped'
|
||||
)
|
||||
needs: [files-changed, test-build-docker-images]
|
||||
# if: github.event_name == 'push' && github.ref == 'refs/heads/main' ||
|
||||
# (github.event_name == 'pull_request' &&
|
||||
# contains(github.event.pull_request.labels.*.name, 'licenses') == false &&
|
||||
@ -237,20 +242,21 @@ jobs:
|
||||
with:
|
||||
java-version: "17"
|
||||
distribution: "temurin"
|
||||
cache: gradle
|
||||
|
||||
- name: Set up Docker Buildx
|
||||
uses: docker/setup-buildx-action@e468171a9de216ec08956ac3ada2f0791b6bd435 # v3.11.1
|
||||
|
||||
- name: Install Docker Compose
|
||||
run: |
|
||||
sudo curl -SL "https://github.com/docker/compose/releases/download/v2.37.2/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
|
||||
sudo curl -SL "https://github.com/docker/compose/releases/download/v2.40.3/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
|
||||
sudo chmod +x /usr/local/bin/docker-compose
|
||||
|
||||
- name: Set up Python
|
||||
uses: actions/setup-python@e797f83bcb11b83ae66e0230d6156d7c80228e7c # v6.0.0
|
||||
with:
|
||||
python-version: "3.12"
|
||||
cache: 'pip' # caching pip dependencies
|
||||
cache: "pip" # caching pip dependencies
|
||||
cache-dependency-path: ./testing/cucumber/requirements.txt
|
||||
|
||||
- name: Pip requirements
|
||||
@ -265,13 +271,22 @@ jobs:
|
||||
./testing/test.sh
|
||||
|
||||
test-build-docker-images:
|
||||
if: github.event_name == 'pull_request' && needs.files-changed.outputs.project == 'true'
|
||||
if: github.event_name == 'pull_request' && needs.files-changed.outputs.docker == 'true'
|
||||
needs: [files-changed, build]
|
||||
runs-on: ubuntu-latest
|
||||
permissions:
|
||||
contents: read
|
||||
packages: write
|
||||
strategy:
|
||||
fail-fast: false
|
||||
matrix:
|
||||
docker-rev: ["Dockerfile", "Dockerfile.ultra-lite", "Dockerfile.fat"]
|
||||
docker:
|
||||
- name: "Dockerfile.ultra-lite"
|
||||
tag: "ultra-lite"
|
||||
- name: "Dockerfile.fat"
|
||||
tag: "fat"
|
||||
- name: "Dockerfile"
|
||||
tag: "latest"
|
||||
steps:
|
||||
- name: Harden Runner
|
||||
uses: step-security/harden-runner@95d9a5deda9de15063e7595e9719c11c38c90ae2 # v2.13.2
|
||||
@ -286,46 +301,220 @@ jobs:
|
||||
with:
|
||||
java-version: "17"
|
||||
distribution: "temurin"
|
||||
|
||||
- name: Set up Gradle
|
||||
uses: gradle/actions/setup-gradle@4d9f0ba0025fe599b4ebab900eb7f3a1d93ef4c2 # v5.0.0
|
||||
with:
|
||||
gradle-version: 8.14
|
||||
cache: gradle
|
||||
|
||||
- name: Build application
|
||||
run: ./gradlew clean build
|
||||
run: ./gradlew clean build -x spotlessApply -x spotlessCheck -x test -x sonarqube
|
||||
env:
|
||||
DISABLE_ADDITIONAL_FEATURES: true
|
||||
STIRLING_PDF_DESKTOP_UI: false
|
||||
|
||||
# - name: Free disk space on runner
|
||||
# run: |
|
||||
# echo "Disk space before cleanup:" && df -h
|
||||
# sudo rm -rf /usr/share/dotnet /opt/ghc /usr/local/lib/android /usr/local/share/boost
|
||||
# docker system prune -af || true
|
||||
# echo "Disk space after cleanup:" && df -h
|
||||
|
||||
- name: Set up QEMU
|
||||
uses: docker/setup-qemu-action@c7c53464625b32c7a7e944ae62b3e17d2b600130 # v3.7.0
|
||||
with:
|
||||
platforms: linux/amd64,linux/arm64/v8
|
||||
|
||||
- name: Set up Docker Buildx
|
||||
id: buildx
|
||||
uses: docker/setup-buildx-action@e468171a9de216ec08956ac3ada2f0791b6bd435 # v3.11.1
|
||||
with:
|
||||
platforms: linux/amd64,linux/arm64/v8
|
||||
|
||||
- name: Build ${{ matrix.docker-rev }}
|
||||
- name: Prepare branch tag
|
||||
id: branch_tag
|
||||
shell: bash
|
||||
run: |
|
||||
BRANCH_SOURCE="${GITHUB_HEAD_REF:-${GITHUB_REF_NAME}}"
|
||||
BRANCH_LOWER=$(echo "$BRANCH_SOURCE" | tr '[:upper:]' '[:lower:]')
|
||||
SAFE_BRANCH=$(echo "$BRANCH_LOWER" | sed 's/[^a-z0-9_.-]/-/g' | sed 's/^-\+//' | sed 's/-\+$//' | sed 's/--\+/-/g')
|
||||
if [ -z "$SAFE_BRANCH" ]; then
|
||||
SAFE_BRANCH="branch"
|
||||
fi
|
||||
SHORT_SHA=$(echo "${GITHUB_SHA:-${{ github.sha }}}" | cut -c1-8)
|
||||
echo "safe_branch=$SAFE_BRANCH" >> "$GITHUB_OUTPUT"
|
||||
echo "short_sha=$SHORT_SHA" >> "$GITHUB_OUTPUT"
|
||||
|
||||
- name: Convert repository owner to lowercase
|
||||
id: repoowner
|
||||
run: echo "lowercase=$(echo ${{ github.repository_owner }} | tr '[:upper:]' '[:lower:]')" >> $GITHUB_OUTPUT
|
||||
|
||||
- name: Docker meta
|
||||
id: meta
|
||||
uses: docker/metadata-action@c1e51972afc2121e065aed6d45c65596fe445f3f # v5.8.0
|
||||
with:
|
||||
images: |
|
||||
# ${{ secrets.DOCKER_HUB_USERNAME }}/stirling-pdf-test
|
||||
ghcr.io/${{ steps.repoowner.outputs.lowercase }}/stirling-pdf-test
|
||||
flavor: |
|
||||
latest=false
|
||||
tags: |
|
||||
type=raw,value=${{ matrix.docker.tag }},enable=true
|
||||
# type=raw,value=${{ matrix.docker.tag }}-${{ steps.branch_tag.outputs.safe_branch }},enable=true
|
||||
# type=raw,value=${{ matrix.docker.tag }}-${{ steps.branch_tag.outputs.safe_branch }}-${{ steps.branch_tag.outputs.short_sha }},enable=true
|
||||
labels: |
|
||||
org.opencontainers.image.title=Stirling-PDF Test
|
||||
org.opencontainers.image.description=CI test image for Stirling-PDF
|
||||
org.opencontainers.image.url=https://www.stirlingpdf.com
|
||||
org.opencontainers.image.documentation=https://docs.stirlingpdf.com
|
||||
org.opencontainers.image.authors=Stirling-Tools
|
||||
org.opencontainers.image.licenses=MIT
|
||||
org.opencontainers.image.version=${{ matrix.docker.tag }}
|
||||
org.opencontainers.image.revision=${{ github.sha }}
|
||||
org.opencontainers.image.source=${{ github.repository }}
|
||||
maintainer=Stirling-Tools
|
||||
|
||||
- name: Choose primary tag for tests
|
||||
id: testtag
|
||||
shell: bash
|
||||
run: |
|
||||
IMAGE="ghcr.io/${{ steps.repoowner.outputs.lowercase }}/stirling-pdf-test"
|
||||
VARIANT="${{ matrix.docker.tag }}"
|
||||
BRANCH="${{ steps.branch_tag.outputs.safe_branch }}"
|
||||
SHA_SHORT="${{ steps.branch_tag.outputs.short_sha }}"
|
||||
CANDIDATE="$IMAGE:$VARIANT-$BRANCH-$SHA_SHORT"
|
||||
SECONDARY="$IMAGE:$VARIANT-$BRANCH"
|
||||
ALL_TAGS="$(echo '${{ steps.meta.outputs.tags }}' | tr ' ' '\n')"
|
||||
if echo "$ALL_TAGS" | grep -qx "$CANDIDATE"; then
|
||||
SELECTED="$CANDIDATE"
|
||||
elif echo "$ALL_TAGS" | grep -qx "$SECONDARY"; then
|
||||
SELECTED="$SECONDARY"
|
||||
else
|
||||
SELECTED="$(echo "$ALL_TAGS" | head -n1)"
|
||||
fi
|
||||
echo "tag=$SELECTED" >> $GITHUB_OUTPUT
|
||||
echo "Using test tag: $SELECTED"
|
||||
|
||||
# - name: Log in to Docker Hub
|
||||
# uses: docker/login-action@184bdaa0721073962dff0199f1fb9940f07167d1 # v3.5.0
|
||||
# with:
|
||||
# username: ${{ secrets.DOCKER_HUB_USERNAME }}
|
||||
# password: ${{ secrets.DOCKER_HUB_API }}
|
||||
|
||||
# - name: Log in to GitHub Container Registry
|
||||
# uses: docker/login-action@184bdaa0721073962dff0199f1fb9940f07167d1 # v3.5.0
|
||||
# with:
|
||||
# registry: ghcr.io
|
||||
# username: ${{ github.actor }}
|
||||
# password: ${{ github.token }}
|
||||
|
||||
- name: Build and push amd64 image
|
||||
uses: docker/build-push-action@263435318d21b8e681c14492fe198d362a7d2c83 # v6.18.0
|
||||
with:
|
||||
builder: ${{ steps.buildx.outputs.name }}
|
||||
context: .
|
||||
file: ./${{ matrix.docker-rev }}
|
||||
file: ./${{ matrix.docker.name }}
|
||||
push: false
|
||||
load: true
|
||||
cache-from: type=gha
|
||||
cache-to: type=gha,mode=max
|
||||
platforms: linux/amd64,linux/arm64/v8
|
||||
provenance: true
|
||||
sbom: true
|
||||
tags: ${{ steps.meta.outputs.tags }} # ALLE Tags publishen
|
||||
labels: ${{ steps.meta.outputs.labels }}
|
||||
platforms: linux/amd64
|
||||
provenance: false
|
||||
sbom: false
|
||||
|
||||
- name: Upload Reports
|
||||
- name: Show amd64 image size
|
||||
run: |
|
||||
IMAGE_TAG="${{ steps.testtag.outputs.tag }}"
|
||||
echo "Inspecting image: ${IMAGE_TAG}"
|
||||
SIZE=$(docker image inspect "${IMAGE_TAG}" --format='{{.Size}}')
|
||||
FORMATTED=$(numfmt --to=iec --suffix=B "${SIZE}")
|
||||
echo "Image size (amd64): ${FORMATTED}"
|
||||
|
||||
- name: Start amd64 image for 2 minutes
|
||||
run: |
|
||||
IMAGE_TAG="${{ steps.testtag.outputs.tag }}"
|
||||
CONTAINER_NAME="stirling-pdf-test-${{ matrix.docker.tag }}-amd64"
|
||||
echo "Starting container ${CONTAINER_NAME} from ${IMAGE_TAG}"
|
||||
docker run -d --name "${CONTAINER_NAME}" "${IMAGE_TAG}"
|
||||
echo "Waiting up to 2 minutes..."
|
||||
sleep 120 || true
|
||||
echo "===== Logs for ${CONTAINER_NAME} ====="
|
||||
docker logs "${CONTAINER_NAME}" || true
|
||||
echo "Stopping container ${CONTAINER_NAME} after 2 minutes"
|
||||
docker stop "${CONTAINER_NAME}" || true
|
||||
docker rm "${CONTAINER_NAME}" || true
|
||||
|
||||
- name: Prune amd64 image and cache
|
||||
if: always()
|
||||
uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0
|
||||
run: |
|
||||
docker image rm -f ${{ steps.testtag.outputs.tag }} || true
|
||||
docker builder prune --force || true
|
||||
|
||||
- name: Build and push arm64 image
|
||||
uses: docker/build-push-action@263435318d21b8e681c14492fe198d362a7d2c83 # v6.18.0
|
||||
with:
|
||||
name: reports-docker-${{ matrix.docker-rev }}
|
||||
path: |
|
||||
build/reports/tests/
|
||||
build/test-results/
|
||||
build/reports/problems/
|
||||
retention-days: 3
|
||||
if-no-files-found: warn
|
||||
builder: ${{ steps.buildx.outputs.name }}
|
||||
context: .
|
||||
file: ./${{ matrix.docker.name }}
|
||||
push: false
|
||||
load: true
|
||||
cache-from: type=gha
|
||||
cache-to: type=gha,mode=max
|
||||
tags: ${{ steps.meta.outputs.tags }} # ALLE Tags publishen
|
||||
labels: ${{ steps.meta.outputs.labels }}
|
||||
platforms: linux/arm64/v8
|
||||
provenance: false
|
||||
sbom: false
|
||||
|
||||
- name: Show arm64 image size
|
||||
run: |
|
||||
IMAGE_TAG="${{ steps.testtag.outputs.tag }}"
|
||||
echo "Inspecting image: ${IMAGE_TAG}"
|
||||
SIZE=$(docker image inspect "${IMAGE_TAG}" --format='{{.Size}}')
|
||||
FORMATTED=$(numfmt --to=iec --suffix=B "${SIZE}")
|
||||
echo "Image size (arm64): ${FORMATTED}"
|
||||
|
||||
- name: Start arm64 image for 2 minutes
|
||||
run: |
|
||||
IMAGE_TAG="${{ steps.testtag.outputs.tag }}"
|
||||
CONTAINER_NAME="stirling-pdf-test-${{ matrix.docker.tag }}-arm64"
|
||||
echo "Starting container ${CONTAINER_NAME} from ${IMAGE_TAG}"
|
||||
docker run -d --name "${CONTAINER_NAME}" "${IMAGE_TAG}"
|
||||
echo "Waiting up to 2 minutes..."
|
||||
sleep 120 || true
|
||||
echo "===== Logs for ${CONTAINER_NAME} ====="
|
||||
docker logs "${CONTAINER_NAME}" || true
|
||||
echo "Stopping container ${CONTAINER_NAME} after 2 minutes"
|
||||
docker stop "${CONTAINER_NAME}" || true
|
||||
docker rm "${CONTAINER_NAME}" || true
|
||||
|
||||
- name: Cleanup arm64 image and cache
|
||||
if: always()
|
||||
run: |
|
||||
docker image rm -f ${{ steps.testtag.outputs.tag }} || true
|
||||
docker builder prune --force || true
|
||||
|
||||
# - name: Build and push multi-arch image
|
||||
# uses: docker/build-push-action@263435318d21b8e681c14492fe198d362a7d2c83 # v6.18.0
|
||||
# with:
|
||||
# builder: ${{ steps.buildx.outputs.name }}
|
||||
# context: .
|
||||
# file: ./${{ matrix.docker.name }}
|
||||
# push: true
|
||||
# cache-from: type=gha
|
||||
# cache-to: type=gha,mode=max
|
||||
# tags: ${{ steps.meta.outputs.tags }}
|
||||
# labels: ${{ steps.meta.outputs.labels }}
|
||||
# platforms: linux/amd64,linux/arm64/v8
|
||||
# provenance: false
|
||||
# sbom: false
|
||||
|
||||
# - name: Upload Docker build reports
|
||||
# if: always()
|
||||
# uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0
|
||||
# with:
|
||||
# name: reports-docker-${{ matrix.docker.name }}
|
||||
# path: |
|
||||
# build/reports/
|
||||
# build/test-results/
|
||||
# build/reports/problems/
|
||||
# retention-days: 3
|
||||
# if-no-files-found: warn
|
||||
|
||||
212
Dockerfile
212
Dockerfile
@ -1,11 +1,88 @@
|
||||
# Main stage
|
||||
FROM alpine:3.22.2@sha256:4b7ce07002c69e8f3d704a9c5d6fd3053be500b7f1c69fc0d80990c2ad8dd412
|
||||
# ==============================================================================
|
||||
# Multi-stage Dockerfile for Stirling-PDF – image with everything included
|
||||
# Includes: LibreOffice, Calibre, Tesseract, OCRmyPDF, unoserver, WeasyPrint, etc.
|
||||
# ==============================================================================
|
||||
|
||||
# Copy necessary files
|
||||
COPY scripts /scripts
|
||||
COPY app/core/src/main/resources/static/fonts/*.ttf /usr/share/fonts/opentype/noto/
|
||||
# ========================================
|
||||
# STAGE 1: Runtime image based on Debian stable-slim
|
||||
# Contains Java runtime + LibreOffice + Calibre + all PDF tools
|
||||
# ========================================
|
||||
FROM debian:stable-slim@sha256:7cb087f19bcc175b96fbe4c2aef42ed00733a659581a80f6ebccfd8fe3185a3d
|
||||
|
||||
SHELL ["/bin/bash", "-o", "pipefail", "-c"]
|
||||
ENV DEBIAN_FRONTEND=noninteractive
|
||||
|
||||
ENV TESS_BASE_PATH=/usr/share/tesseract-ocr/5/tessdata
|
||||
|
||||
# Install core runtime dependencies + tools required by Stirling-PDF features
|
||||
RUN apt-get update && apt-get install -y --no-install-recommends \
|
||||
ca-certificates tzdata tini bash fontconfig \
|
||||
openjdk-21-jre-headless \
|
||||
ffmpeg poppler-utils ocrmypdf \
|
||||
libreoffice-nogui libreoffice-java-common \
|
||||
python3 python3-venv python3-uno \
|
||||
tesseract-ocr tesseract-ocr-eng tesseract-ocr-deu tesseract-ocr-fra \
|
||||
tesseract-ocr-por tesseract-ocr-chi-sim \
|
||||
libcairo2 libpango-1.0-0 libpangoft2-1.0-0 libgdk-pixbuf-2.0-0 \
|
||||
gosu unpaper \
|
||||
# AWT headless support (required for some Java graphics operations)
|
||||
libfreetype6 libfontconfig1 libx11-6 libxt6 libxext6 libxrender1 libxtst6 libxi6 \
|
||||
libxinerama1 libxkbcommon0 libxkbfile1 libsm6 libice6 \
|
||||
# Qt WebEngine dependencies for Calibre
|
||||
libegl1 libopengl0 libgl1 libxdamage1 libxfixes3 libxshmfence1 libdrm2 libgbm1 \
|
||||
libxkbcommon-x11-0 libxrandr2 libxcomposite1 libnss3 libx11-xcb1 \
|
||||
libxcb-cursor0 libdbus-1-3 libglib2.0-0 \
|
||||
# Virtual framebuffer (required for headless LibreOffice)
|
||||
xvfb x11-utils coreutils \
|
||||
# Temporary packages only needed for Calibre installer
|
||||
xz-utils gpgv curl xdg-utils \
|
||||
\
|
||||
# Install Calibre from official installer script
|
||||
&& curl -fsSL https://download.calibre-ebook.com/linux-installer.sh | sh /dev/stdin \
|
||||
\
|
||||
# Clean up installer-only packages
|
||||
&& apt-get purge -y xz-utils gpgv xdg-utils \
|
||||
&& apt-get autoremove -y \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# Make ebook-convert available in PATH
|
||||
RUN ln -sf /opt/calibre/ebook-convert /usr/bin/ebook-convert \
|
||||
&& /opt/calibre/ebook-convert --version
|
||||
|
||||
# ==============================================================================
|
||||
# Create non-root user (stirlingpdfuser) with configurable UID/GID
|
||||
# ==============================================================================
|
||||
ARG PUID=1000
|
||||
ARG PGID=1000
|
||||
|
||||
RUN set -eux; \
|
||||
# Create group if it doesn't exist
|
||||
if ! getent group stirlingpdfgroup >/dev/null 2>&1; then \
|
||||
if getent group "${PGID}" >/dev/null 2>&1; then \
|
||||
groupadd -o -g "${PGID}" stirlingpdfgroup; \
|
||||
else \
|
||||
groupadd -g "${PGID}" stirlingpdfgroup; \
|
||||
fi; \
|
||||
fi; \
|
||||
# Create user if it doesn't exist, avoid UID conflicts
|
||||
if ! id -u stirlingpdfuser >/dev/null 2>&1; then \
|
||||
if getent passwd | awk -F: -v id="${PUID}" '$3==id{found=1} END{exit !found}'; then \
|
||||
echo "UID ${PUID} already in use – creating stirlingpdfuser with automatic UID"; \
|
||||
useradd -m -g stirlingpdfgroup -d /home/stirlingpdfuser -s /bin/bash stirlingpdfuser; \
|
||||
else \
|
||||
useradd -m -u "${PUID}" -g stirlingpdfgroup -d /home/stirlingpdfuser -s /bin/bash stirlingpdfuser; \
|
||||
fi; \
|
||||
fi
|
||||
|
||||
# Compatibility alias for older entrypoint scripts expecting su-exec
|
||||
RUN ln -sf /usr/sbin/gosu /usr/local/bin/su-exec
|
||||
|
||||
# Copy application files from build stage
|
||||
COPY scripts/ /scripts/
|
||||
COPY app/core/src/main/resources/static/fonts/*.ttf /usr/share/fonts/truetype/
|
||||
COPY app/core/build/libs/*.jar app.jar
|
||||
|
||||
# Optional version tag (can be passed at build time)
|
||||
ARG VERSION_TAG
|
||||
|
||||
LABEL org.opencontainers.image.title="Stirling-PDF"
|
||||
@ -20,91 +97,68 @@ LABEL org.opencontainers.image.authors="Stirling-Tools"
|
||||
LABEL org.opencontainers.image.version="${VERSION_TAG}"
|
||||
LABEL org.opencontainers.image.keywords="PDF, manipulation, merge, split, convert, OCR, watermark"
|
||||
|
||||
# Set Environment Variables
|
||||
# ==============================================================================
|
||||
# Runtime environment variables
|
||||
# ==============================================================================
|
||||
ENV DISABLE_ADDITIONAL_FEATURES=true \
|
||||
VERSION_TAG=$VERSION_TAG \
|
||||
JAVA_BASE_OPTS="-XX:+UnlockExperimentalVMOptions -XX:MaxRAMPercentage=75 -XX:InitiatingHeapOccupancyPercent=20 -XX:+G1PeriodicGCInvokesConcurrent -XX:G1PeriodicGCInterval=10000 -XX:+UseStringDeduplication -XX:G1PeriodicGCSystemLoadThreshold=70" \
|
||||
JAVA_BASE_OPTS="-XX:+UnlockExperimentalVMOptions -XX:MaxRAMPercentage=75 -XX:InitiatingHeapOccupancyPercent=20 \
|
||||
-XX:+G1PeriodicGCInvokesConcurrent -XX:G1PeriodicGCInterval=10000 \
|
||||
-XX:+UseStringDeduplication -XX:G1PeriodicGCSystemLoadThreshold=70 \
|
||||
-Djava.awt.headless=true" \
|
||||
JAVA_CUSTOM_OPTS="" \
|
||||
HOME=/home/stirlingpdfuser \
|
||||
PUID=1000 \
|
||||
PGID=1000 \
|
||||
PUID=${PUID} \
|
||||
PGID=${PGID} \
|
||||
UMASK=022 \
|
||||
PYTHONPATH=/usr/lib/libreoffice/program:/opt/venv/lib/python3.12/site-packages \
|
||||
UNO_PATH=/usr/lib/libreoffice/program \
|
||||
URE_BOOTSTRAP=file:///usr/lib/libreoffice/program/fundamentalrc \
|
||||
PATH=$PATH:/opt/venv/bin \
|
||||
STIRLING_TEMPFILES_DIRECTORY=/tmp/stirling-pdf \
|
||||
TMPDIR=/tmp/stirling-pdf \
|
||||
TEMP=/tmp/stirling-pdf \
|
||||
TMP=/tmp/stirling-pdf
|
||||
|
||||
# JDK for app
|
||||
RUN apk add --no-cache bash \
|
||||
&& ln -sf /bin/bash /bin/sh \
|
||||
&& printf '%s\n' \
|
||||
'https://dl-cdn.alpinelinux.org/alpine/edge/main' \
|
||||
'https://dl-cdn.alpinelinux.org/alpine/edge/community' \
|
||||
'https://dl-cdn.alpinelinux.org/alpine/edge/testing' \
|
||||
> /etc/apk/repositories && \
|
||||
apk upgrade --no-cache -a && \
|
||||
apk add --no-cache \
|
||||
ca-certificates \
|
||||
tzdata \
|
||||
tini \
|
||||
bash \
|
||||
curl \
|
||||
shadow \
|
||||
su-exec \
|
||||
openssl \
|
||||
openssl-dev \
|
||||
openjdk21-jre \
|
||||
ffmpeg \
|
||||
# Doc conversion
|
||||
gcompat \
|
||||
libc6-compat \
|
||||
libreoffice \
|
||||
# pdftohtml
|
||||
poppler-utils \
|
||||
# OCR MY PDF (unpaper for descew and other advanced features)
|
||||
tesseract-ocr-data-eng \
|
||||
tesseract-ocr-data-chi_sim \
|
||||
tesseract-ocr-data-deu \
|
||||
tesseract-ocr-data-fra \
|
||||
tesseract-ocr-data-por \
|
||||
unpaper \
|
||||
# CV / Python
|
||||
py3-opencv \
|
||||
python3 \
|
||||
ocrmypdf \
|
||||
py3-pip \
|
||||
py3-pillow \
|
||||
py3-pdf2image \
|
||||
# Calibre
|
||||
calibre \
|
||||
# URW Base 35 fonts for better PDF rendering
|
||||
font-urw-base35 && \
|
||||
# Calibre fixes
|
||||
apk fix --no-cache calibre && \
|
||||
python3 -m venv /opt/venv && \
|
||||
/opt/venv/bin/pip install --no-cache-dir --upgrade pip setuptools && \
|
||||
/opt/venv/bin/pip install --no-cache-dir --upgrade unoserver weasyprint && \
|
||||
ln -s /usr/lib/libreoffice/program/uno.py /opt/venv/lib/python3.12/site-packages/ && \
|
||||
ln -s /usr/lib/libreoffice/program/unohelper.py /opt/venv/lib/python3.12/site-packages/ && \
|
||||
ln -s /usr/lib/libreoffice/program /opt/venv/lib/python3.12/site-packages/LibreOffice && \
|
||||
mv /usr/share/tessdata /usr/share/tessdata-original && \
|
||||
mkdir -p $HOME /configs /logs /customFiles /pipeline/watchedFolders /pipeline/finishedFolders /tmp/stirling-pdf && \
|
||||
# Configure URW Base 35 fonts
|
||||
ln -s /usr/share/fontconfig/conf.avail/69-urw-*.conf /etc/fonts/conf.d/ && \
|
||||
fc-cache -f -v && \
|
||||
chmod +x /scripts/* && \
|
||||
# User permissions
|
||||
addgroup -S stirlingpdfgroup && adduser -S stirlingpdfuser -G stirlingpdfgroup && \
|
||||
chown -R stirlingpdfuser:stirlingpdfgroup $HOME /scripts /usr/share/fonts/opentype/noto /configs /customFiles /pipeline /tmp/stirling-pdf && \
|
||||
chown stirlingpdfuser:stirlingpdfgroup /app.jar && \
|
||||
ln -sf /bin/busybox /bin/sh
|
||||
# ==============================================================================
|
||||
# Python virtual environment for additional Python tools (WeasyPrint, OpenCV, etc.)
|
||||
# ==============================================================================
|
||||
RUN python3 -m venv /opt/venv --system-site-packages \
|
||||
&& /opt/venv/bin/pip install --no-cache-dir weasyprint pdf2image opencv-python-headless \
|
||||
&& /opt/venv/bin/python -c "import cv2; print('OpenCV version:', cv2.__version__)"
|
||||
|
||||
# Separate venv for unoserver (keeps it isolated)
|
||||
RUN python3 -m venv /opt/unoserver-venv --system-site-packages \
|
||||
&& /opt/unoserver-venv/bin/pip install --no-cache-dir unoserver
|
||||
|
||||
# Make unoserver tools available in main venv PATH
|
||||
RUN ln -sf /opt/unoserver-venv/bin/unoconvert /opt/venv/bin/unoconvert \
|
||||
&& ln -sf /opt/unoserver-venv/bin/unoserver /opt/venv/bin/unoserver
|
||||
|
||||
# Extend PATH to include both virtual environments
|
||||
ENV PATH="/opt/venv/bin:/opt/unoserver-venv/bin:${PATH}"
|
||||
|
||||
# ==============================================================================
|
||||
# Final permissions, directories and font cache
|
||||
# ==============================================================================
|
||||
RUN set -eux; \
|
||||
chmod +x /scripts/*; \
|
||||
mkdir -p /configs /logs /customFiles /pipeline/watchedFolders /pipeline/finishedFolders /tmp/stirling-pdf; \
|
||||
chown -R stirlingpdfuser:stirlingpdfgroup \
|
||||
/home/stirlingpdfuser /configs /logs /customFiles /pipeline /tmp/stirling-pdf \
|
||||
/app.jar /usr/share/fonts/truetype /scripts; \
|
||||
chmod -R 755 /tmp/stirling-pdf
|
||||
|
||||
# Rebuild font cache
|
||||
RUN fc-cache -f -v
|
||||
|
||||
# Force Qt/WebEngine to run headlessly (required for Calibre in Docker)
|
||||
ENV QT_QPA_PLATFORM=offscreen \
|
||||
QTWEBENGINE_CHROMIUM_FLAGS="--disable-gpu --disable-dev-shm-usage"
|
||||
|
||||
# Expose web UI port
|
||||
EXPOSE 8080/tcp
|
||||
|
||||
# Set user and run command
|
||||
STOPSIGNAL SIGTERM
|
||||
|
||||
# Use tini as init (handles signals and zombies correctly)
|
||||
ENTRYPOINT ["tini", "--", "/scripts/init.sh"]
|
||||
CMD ["sh", "-c", "java -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/tmp/stirling-pdf -jar /app.jar & /opt/venv/bin/unoserver --port 2003 --interface 127.0.0.1"]
|
||||
|
||||
# CMD is empty – actual start command is defined in init.sh
|
||||
CMD []
|
||||
|
||||
271
Dockerfile.fat
271
Dockerfile.fat
@ -1,122 +1,209 @@
|
||||
# Build the application
|
||||
FROM gradle:8.14-jdk21 AS build
|
||||
# ==============================================================================
|
||||
# Multi-stage Dockerfile for Stirling-PDF – "fat" image with everything included
|
||||
# Includes: LibreOffice, Calibre, Tesseract, OCRmyPDF, unoserver, WeasyPrint, etc.
|
||||
# ==============================================================================
|
||||
|
||||
COPY build.gradle .
|
||||
COPY settings.gradle .
|
||||
COPY gradlew .
|
||||
COPY gradle gradle/
|
||||
# ========================================
|
||||
# STAGE 1: Build Stirling-PDF with Gradle (Alpine)
|
||||
# ========================================
|
||||
FROM eclipse-temurin:21-jdk-alpine@sha256:c4799f335a65b1ecca8a31239b05522f2b0a184d6818f6349e83484ee6956198 AS build
|
||||
|
||||
# Install build tools
|
||||
RUN apk add --no-cache bash unzip curl git
|
||||
|
||||
WORKDIR /workspace
|
||||
|
||||
# Copy Gradle wrapper and configuration files
|
||||
COPY build.gradle settings.gradle gradlew ./
|
||||
COPY gradle ./gradle/
|
||||
|
||||
# Make gradlew executable
|
||||
RUN chmod +x gradlew
|
||||
|
||||
# Create module directories and copy module build files (for Gradle layer caching)
|
||||
RUN mkdir -p core common proprietary
|
||||
COPY app/core/build.gradle core/.
|
||||
COPY app/common/build.gradle common/.
|
||||
COPY app/proprietary/build.gradle proprietary/.
|
||||
RUN ./gradlew build -x spotlessApply -x spotlessCheck -x test -x sonarqube || return 0
|
||||
|
||||
# Set the working directory
|
||||
# Warm-up Gradle dependency cache (optional but improves subsequent builds)
|
||||
RUN ./gradlew --no-daemon printVersion --quiet | tail -1 > /tmp/version_tag || true
|
||||
RUN ./gradlew --no-daemon build -x spotlessApply -x spotlessCheck -x test -x sonarqube || true
|
||||
|
||||
# Switch to final source directory and copy full source code
|
||||
WORKDIR /app
|
||||
|
||||
# Copy the entire project to the working directory
|
||||
COPY . .
|
||||
|
||||
# Build the application with DISABLE_ADDITIONAL_FEATURES=false
|
||||
# Environment variables (can be overridden at build time)
|
||||
ENV DISABLE_ADDITIONAL_FEATURES=false \
|
||||
STIRLING_PDF_DESKTOP_UI=false
|
||||
RUN ./gradlew clean build -x spotlessApply -x spotlessCheck -x test -x sonarqube
|
||||
|
||||
# Main stage
|
||||
FROM alpine:3.22.2@sha256:4b7ce07002c69e8f3d704a9c5d6fd3053be500b7f1c69fc0d80990c2ad8dd412
|
||||
# Final build – produce the fat JAR
|
||||
RUN ./gradlew --no-daemon clean build \
|
||||
-x spotlessApply -x spotlessCheck -x test -x sonarqube \
|
||||
&& apk del bash unzip curl git
|
||||
|
||||
# Copy necessary files
|
||||
COPY scripts /scripts
|
||||
COPY app/core/src/main/resources/static/fonts/*.ttf /usr/share/fonts/opentype/noto/
|
||||
# first /app directory is for the build stage, second is for the final image
|
||||
COPY --from=build /app/app/core/build/libs/*.jar app.jar
|
||||
|
||||
# ========================================
|
||||
# STAGE 2: Runtime image based on Debian stable-slim
|
||||
# Contains Java runtime + LibreOffice + Calibre + all PDF tools
|
||||
# ========================================
|
||||
FROM debian:stable-slim@sha256:7cb087f19bcc175b96fbe4c2aef42ed00733a659581a80f6ebccfd8fe3185a3d
|
||||
|
||||
SHELL ["/bin/bash", "-o", "pipefail", "-c"]
|
||||
ENV DEBIAN_FRONTEND=noninteractive
|
||||
|
||||
# Install core runtime dependencies + tools required by Stirling-PDF features
|
||||
RUN apt-get update && apt-get install -y --no-install-recommends \
|
||||
ca-certificates tzdata tini bash fontconfig \
|
||||
openjdk-21-jre-headless \
|
||||
ffmpeg poppler-utils qpdf ghostscript ocrmypdf \
|
||||
libreoffice-nogui libreoffice-java-common \
|
||||
python3 python3-venv python3-uno \
|
||||
tesseract-ocr tesseract-ocr-eng tesseract-ocr-deu tesseract-ocr-fra \
|
||||
tesseract-ocr-por tesseract-ocr-chi-sim \
|
||||
libcairo2 libpango-1.0-0 libpangoft2-1.0-0 libgdk-pixbuf-2.0-0 \
|
||||
gosu unpaper \
|
||||
# AWT headless support (required for some Java graphics operations)
|
||||
libfreetype6 libfontconfig1 libx11-6 libxt6 libxext6 libxrender1 libxtst6 libxi6 \
|
||||
libxinerama1 libxkbcommon0 libxkbfile1 libsm6 libice6 \
|
||||
# Qt WebEngine dependencies for Calibre
|
||||
libegl1 libopengl0 libgl1 libxdamage1 libxfixes3 libxshmfence1 libdrm2 libgbm1 \
|
||||
libxkbcommon-x11-0 libxrandr2 libxcomposite1 libnss3 libx11-xcb1 \
|
||||
libxcb-cursor0 libdbus-1-3 libglib2.0-0 \
|
||||
# Virtual framebuffer (required for headless LibreOffice)
|
||||
xvfb x11-utils coreutils \
|
||||
# Temporary packages only needed for Calibre installer
|
||||
xz-utils gpgv curl xdg-utils \
|
||||
\
|
||||
# Install Calibre from official installer script
|
||||
&& curl -fsSL https://download.calibre-ebook.com/linux-installer.sh | sh /dev/stdin \
|
||||
\
|
||||
# Clean up installer-only packages
|
||||
&& apt-get purge -y xz-utils gpgv xdg-utils \
|
||||
&& apt-get autoremove -y \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# Make ebook-convert available in PATH
|
||||
RUN ln -sf /opt/calibre/ebook-convert /usr/bin/ebook-convert \
|
||||
&& /opt/calibre/ebook-convert --version
|
||||
|
||||
# ==============================================================================
|
||||
# Create non-root user (stirlingpdfuser) with configurable UID/GID
|
||||
# ==============================================================================
|
||||
ARG PUID=1000
|
||||
ARG PGID=1000
|
||||
|
||||
RUN set -eux; \
|
||||
# Create group if it doesn't exist
|
||||
if ! getent group stirlingpdfgroup >/dev/null 2>&1; then \
|
||||
if getent group "${PGID}" >/dev/null 2>&1; then \
|
||||
groupadd -o -g "${PGID}" stirlingpdfgroup; \
|
||||
else \
|
||||
groupadd -g "${PGID}" stirlingpdfgroup; \
|
||||
fi; \
|
||||
fi; \
|
||||
# Create user if it doesn't exist, avoid UID conflicts
|
||||
if ! id -u stirlingpdfuser >/dev/null 2>&1; then \
|
||||
if getent passwd | awk -F: -v id="${PUID}" '$3==id{found=1} END{exit !found}'; then \
|
||||
echo "UID ${PUID} already in use – creating stirlingpdfuser with automatic UID"; \
|
||||
useradd -m -g stirlingpdfgroup -d /home/stirlingpdfuser -s /bin/bash stirlingpdfuser; \
|
||||
else \
|
||||
useradd -m -u "${PUID}" -g stirlingpdfgroup -d /home/stirlingpdfuser -s /bin/bash stirlingpdfuser; \
|
||||
fi; \
|
||||
fi
|
||||
|
||||
# Compatibility alias for older entrypoint scripts expecting su-exec
|
||||
RUN ln -sf /usr/sbin/gosu /usr/local/bin/su-exec
|
||||
|
||||
# Copy application files from build stage
|
||||
COPY scripts/ /scripts/
|
||||
COPY app/core/src/main/resources/static/fonts/*.ttf /usr/share/fonts/truetype/
|
||||
COPY --from=build /app/app/core/build/libs/*.jar /app.jar
|
||||
|
||||
# Copy version tag generated during build
|
||||
COPY --from=build /tmp/version_tag /etc/stirling_version
|
||||
|
||||
# Optional version tag (can be passed at build time)
|
||||
ARG VERSION_TAG
|
||||
|
||||
# Set Environment Variables
|
||||
# Metadata labels
|
||||
LABEL org.opencontainers.image.title="Stirling-PDF"
|
||||
LABEL org.opencontainers.image.description="A powerful locally hosted web-based PDF manipulation tool supporting 50+ operations including merging, splitting, conversion, OCR, watermarking, and more."
|
||||
LABEL org.opencontainers.image.source="https://github.com/Stirling-Tools/Stirling-PDF"
|
||||
LABEL org.opencontainers.image.licenses="MIT"
|
||||
LABEL org.opencontainers.image.vendor="Stirling-Tools"
|
||||
LABEL org.opencontainers.image.url="https://www.stirlingpdf.com"
|
||||
LABEL org.opencontainers.image.documentation="https://docs.stirlingpdf.com"
|
||||
LABEL maintainer="Stirling-Tools"
|
||||
LABEL org.opencontainers.image.authors="Stirling-Tools"
|
||||
LABEL org.opencontainers.image.version="${VERSION_TAG}"
|
||||
LABEL org.opencontainers.image.keywords="PDF, manipulation, merge, split, convert, OCR, watermark"
|
||||
|
||||
# ==============================================================================
|
||||
# Runtime environment variables
|
||||
# ==============================================================================
|
||||
ENV DISABLE_ADDITIONAL_FEATURES=true \
|
||||
VERSION_TAG=$VERSION_TAG \
|
||||
JAVA_BASE_OPTS="-XX:+UnlockExperimentalVMOptions -XX:MaxRAMPercentage=75 -XX:InitiatingHeapOccupancyPercent=20 -XX:+G1PeriodicGCInvokesConcurrent -XX:G1PeriodicGCInterval=10000 -XX:+UseStringDeduplication -XX:G1PeriodicGCSystemLoadThreshold=70" \
|
||||
JAVA_BASE_OPTS="-XX:+UnlockExperimentalVMOptions -XX:MaxRAMPercentage=75 -XX:InitiatingHeapOccupancyPercent=20 \
|
||||
-XX:+G1PeriodicGCInvokesConcurrent -XX:G1PeriodicGCInterval=10000 \
|
||||
-XX:+UseStringDeduplication -XX:G1PeriodicGCSystemLoadThreshold=70 \
|
||||
-Djava.awt.headless=true" \
|
||||
JAVA_CUSTOM_OPTS="" \
|
||||
HOME=/home/stirlingpdfuser \
|
||||
PUID=1000 \
|
||||
PGID=1000 \
|
||||
PUID=${PUID} \
|
||||
PGID=${PGID} \
|
||||
UMASK=022 \
|
||||
FAT_DOCKER=true \
|
||||
INSTALL_BOOK_AND_ADVANCED_HTML_OPS=false \
|
||||
PYTHONPATH=/usr/lib/libreoffice/program:/opt/venv/lib/python3.12/site-packages \
|
||||
UNO_PATH=/usr/lib/libreoffice/program \
|
||||
URE_BOOTSTRAP=file:///usr/lib/libreoffice/program/fundamentalrc \
|
||||
PATH=$PATH:/opt/venv/bin \
|
||||
STIRLING_TEMPFILES_DIRECTORY=/tmp/stirling-pdf \
|
||||
TMPDIR=/tmp/stirling-pdf \
|
||||
TEMP=/tmp/stirling-pdf \
|
||||
TMP=/tmp/stirling-pdf
|
||||
|
||||
# JDK for app
|
||||
RUN apk add --no-cache bash \
|
||||
&& ln -sf /bin/bash /bin/sh \
|
||||
&& printf '%s\n' \
|
||||
'https://dl-cdn.alpinelinux.org/alpine/edge/main' \
|
||||
'https://dl-cdn.alpinelinux.org/alpine/edge/community' \
|
||||
'https://dl-cdn.alpinelinux.org/alpine/edge/testing' \
|
||||
> /etc/apk/repositories && \
|
||||
apk upgrade --no-cache -a && \
|
||||
apk add --no-cache \
|
||||
ca-certificates \
|
||||
tzdata \
|
||||
tini \
|
||||
bash \
|
||||
curl \
|
||||
shadow \
|
||||
su-exec \
|
||||
openssl \
|
||||
openssl-dev \
|
||||
openjdk21-jre \
|
||||
ffmpeg \
|
||||
# Doc conversion
|
||||
gcompat \
|
||||
libc6-compat \
|
||||
libreoffice \
|
||||
# pdftohtml
|
||||
poppler-utils \
|
||||
# OCR MY PDF (unpaper for descew and other advanced featues)
|
||||
tesseract-ocr-data-eng \
|
||||
tesseract-ocr-data-chi_sim \
|
||||
tesseract-ocr-data-deu \
|
||||
tesseract-ocr-data-fra \
|
||||
tesseract-ocr-data-por \
|
||||
unpaper \
|
||||
font-terminus font-dejavu font-noto font-noto-cjk font-awesome font-noto-extra font-liberation font-linux-libertine font-urw-base35 \
|
||||
# CV / Python
|
||||
py3-opencv \
|
||||
python3 \
|
||||
ocrmypdf \
|
||||
py3-pip \
|
||||
py3-pillow \
|
||||
py3-pdf2image \
|
||||
# Calibre (musl-native) + QtWebEngine Runtime
|
||||
calibre && \
|
||||
# Calibre fixes
|
||||
apk fix --no-cache calibre && \
|
||||
python3 -m venv /opt/venv && \
|
||||
/opt/venv/bin/pip install --no-cache-dir --upgrade pip setuptools && \
|
||||
/opt/venv/bin/pip install --no-cache-dir --upgrade unoserver weasyprint && \
|
||||
ln -s /usr/lib/libreoffice/program/uno.py /opt/venv/lib/python3.12/site-packages/ && \
|
||||
ln -s /usr/lib/libreoffice/program/unohelper.py /opt/venv/lib/python3.12/site-packages/ && \
|
||||
ln -s /usr/lib/libreoffice/program /opt/venv/lib/python3.12/site-packages/LibreOffice && \
|
||||
mv /usr/share/tessdata /usr/share/tessdata-original && \
|
||||
mkdir -p $HOME /configs /logs /customFiles /pipeline/watchedFolders /pipeline/finishedFolders /tmp/stirling-pdf && \
|
||||
# Configure URW Base 35 fonts
|
||||
ln -s /usr/share/fontconfig/conf.avail/69-urw-*.conf /etc/fonts/conf.d/ && \
|
||||
fc-cache -f -v && \
|
||||
chmod +x /scripts/* && \
|
||||
# User permissions
|
||||
addgroup -S stirlingpdfgroup && adduser -S stirlingpdfuser -G stirlingpdfgroup && \
|
||||
chown -R stirlingpdfuser:stirlingpdfgroup $HOME /scripts /usr/share/fonts/opentype/noto /configs /customFiles /pipeline /tmp/stirling-pdf && \
|
||||
chown stirlingpdfuser:stirlingpdfgroup /app.jar && \
|
||||
ln -sf /bin/busybox /bin/sh
|
||||
# ==============================================================================
|
||||
# Python virtual environment for additional Python tools (WeasyPrint, OpenCV, etc.)
|
||||
# ==============================================================================
|
||||
RUN python3 -m venv /opt/venv --system-site-packages \
|
||||
&& /opt/venv/bin/pip install --no-cache-dir weasyprint pdf2image opencv-python-headless \
|
||||
&& /opt/venv/bin/python -c "import cv2; print('OpenCV version:', cv2.__version__)"
|
||||
|
||||
# Separate venv for unoserver (keeps it isolated)
|
||||
RUN python3 -m venv /opt/unoserver-venv --system-site-packages \
|
||||
&& /opt/unoserver-venv/bin/pip install --no-cache-dir unoserver
|
||||
|
||||
# Make unoserver tools available in main venv PATH
|
||||
RUN ln -sf /opt/unoserver-venv/bin/unoconvert /opt/venv/bin/unoconvert \
|
||||
&& ln -sf /opt/unoserver-venv/bin/unoserver /opt/venv/bin/unoserver
|
||||
|
||||
# Extend PATH to include both virtual environments
|
||||
ENV PATH="/opt/venv/bin:/opt/unoserver-venv/bin:${PATH}"
|
||||
|
||||
# ==============================================================================
|
||||
# Final permissions, directories and font cache
|
||||
# ==============================================================================
|
||||
RUN set -eux; \
|
||||
chmod +x /scripts/*; \
|
||||
mkdir -p /configs /logs /customFiles /pipeline/watchedFolders /pipeline/finishedFolders /tmp/stirling-pdf; \
|
||||
chown -R stirlingpdfuser:stirlingpdfgroup \
|
||||
/home/stirlingpdfuser /configs /logs /customFiles /pipeline /tmp/stirling-pdf \
|
||||
/app.jar /usr/share/fonts/truetype /scripts; \
|
||||
chmod -R 755 /tmp/stirling-pdf
|
||||
|
||||
# Rebuild font cache
|
||||
RUN fc-cache -f -v
|
||||
|
||||
# Force Qt/WebEngine to run headlessly (required for Calibre in Docker)
|
||||
ENV QT_QPA_PLATFORM=offscreen \
|
||||
QTWEBENGINE_CHROMIUM_FLAGS="--disable-gpu --disable-dev-shm-usage"
|
||||
|
||||
# Expose web UI port
|
||||
EXPOSE 8080/tcp
|
||||
# Set user and run command
|
||||
|
||||
STOPSIGNAL SIGTERM
|
||||
|
||||
# Use tini as init (handles signals and zombies correctly)
|
||||
ENTRYPOINT ["tini", "--", "/scripts/init.sh"]
|
||||
CMD ["sh", "-c", "java -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/tmp/stirling-pdf -jar /app.jar & /opt/venv/bin/unoserver --port 2003 --interface 127.0.0.1"]
|
||||
|
||||
# CMD is empty – actual start command is defined in init.sh
|
||||
CMD []
|
||||
|
||||
@ -56,4 +56,4 @@ EXPOSE 8080/tcp
|
||||
|
||||
# Run the application
|
||||
ENTRYPOINT ["tini", "--", "/scripts/init-without-ocr.sh"]
|
||||
CMD ["java", "-Dfile.encoding=UTF-8", "-Djava.io.tmpdir=/tmp/stirling-pdf", "-jar", "/app.jar"]
|
||||
CMD []
|
||||
|
||||
@ -10,8 +10,10 @@ import lombok.Getter;
|
||||
import lombok.extern.slf4j.Slf4j;
|
||||
|
||||
import stirling.software.common.model.ApplicationProperties;
|
||||
import stirling.software.common.model.ApplicationProperties.CustomPaths;
|
||||
import stirling.software.common.model.ApplicationProperties.CustomPaths.Operations;
|
||||
import stirling.software.common.model.ApplicationProperties.CustomPaths.Pipeline;
|
||||
import stirling.software.common.model.ApplicationProperties.System;
|
||||
|
||||
@Slf4j
|
||||
@Configuration
|
||||
@ -19,9 +21,16 @@ import stirling.software.common.model.ApplicationProperties.CustomPaths.Pipeline
|
||||
public class RuntimePathConfig {
|
||||
private final ApplicationProperties properties;
|
||||
private final String basePath;
|
||||
|
||||
// Operation paths
|
||||
private final String weasyPrintPath;
|
||||
private final String unoConvertPath;
|
||||
private final String calibrePath;
|
||||
private final String ocrMyPdfPath;
|
||||
private final String sOfficePath;
|
||||
|
||||
// Tesseract data path
|
||||
private final String tessDataPath;
|
||||
|
||||
// Pipeline paths
|
||||
private final String pipelineWatchedFoldersPath;
|
||||
@ -38,7 +47,10 @@ public class RuntimePathConfig {
|
||||
String defaultFinishedFolders = Path.of(this.pipelinePath, "finishedFolders").toString();
|
||||
String defaultWebUIConfigs = Path.of(this.pipelinePath, "defaultWebUIConfigs").toString();
|
||||
|
||||
Pipeline pipeline = properties.getSystem().getCustomPaths().getPipeline();
|
||||
System system = properties.getSystem();
|
||||
CustomPaths customPaths = system.getCustomPaths();
|
||||
|
||||
Pipeline pipeline = customPaths.getPipeline();
|
||||
|
||||
this.pipelineWatchedFoldersPath =
|
||||
resolvePath(
|
||||
@ -58,9 +70,11 @@ public class RuntimePathConfig {
|
||||
// Initialize Operation paths
|
||||
String defaultWeasyPrintPath = isDocker ? "/opt/venv/bin/weasyprint" : "weasyprint";
|
||||
String defaultUnoConvertPath = isDocker ? "/opt/venv/bin/unoconvert" : "unoconvert";
|
||||
String defaultCalibrePath = isDocker ? "/usr/bin/ebook-convert" : "ebook-convert";
|
||||
String defaultCalibrePath = isDocker ? "/opt/calibre/ebook-convert" : "ebook-convert";
|
||||
String defaultOcrMyPdfPath = isDocker ? "/usr/bin/ocrmypdf" : "ocrmypdf";
|
||||
String defaultSOfficePath = isDocker ? "/usr/bin/soffice" : "soffice";
|
||||
|
||||
Operations operations = properties.getSystem().getCustomPaths().getOperations();
|
||||
Operations operations = customPaths.getOperations();
|
||||
this.weasyPrintPath =
|
||||
resolvePath(
|
||||
defaultWeasyPrintPath,
|
||||
@ -72,6 +86,25 @@ public class RuntimePathConfig {
|
||||
this.calibrePath =
|
||||
resolvePath(
|
||||
defaultCalibrePath, operations != null ? operations.getCalibre() : null);
|
||||
this.ocrMyPdfPath =
|
||||
resolvePath(
|
||||
defaultOcrMyPdfPath, operations != null ? operations.getOcrmypdf() : null);
|
||||
this.sOfficePath =
|
||||
resolvePath(
|
||||
defaultSOfficePath, operations != null ? operations.getSoffice() : null);
|
||||
|
||||
// Initialize Tesseract data path
|
||||
String defaultTessDataPath =
|
||||
isDocker ? "/usr/share/tesseract-ocr/5/tessdata" : "/usr/share/tessdata";
|
||||
|
||||
String tessPath = system.getTessdataDir();
|
||||
String tessdataDir = java.lang.System.getenv("TESSDATA_PREFIX");
|
||||
|
||||
this.tessDataPath =
|
||||
resolvePath(
|
||||
defaultTessDataPath,
|
||||
(tessPath != null && !tessPath.isEmpty()) ? tessPath : tessdataDir);
|
||||
log.info("Using Tesseract data path: {}", this.tessDataPath);
|
||||
}
|
||||
|
||||
private String resolvePath(String defaultPath, String customPath) {
|
||||
|
||||
@ -372,6 +372,8 @@ public class ApplicationProperties {
|
||||
private String weasyprint;
|
||||
private String unoconvert;
|
||||
private String calibre;
|
||||
private String ocrmypdf;
|
||||
private String soffice;
|
||||
}
|
||||
}
|
||||
|
||||
@ -454,10 +456,10 @@ public class ApplicationProperties {
|
||||
@Override
|
||||
public String toString() {
|
||||
return """
|
||||
Driver {
|
||||
driverName='%s'
|
||||
}
|
||||
"""
|
||||
Driver {
|
||||
driverName='%s'
|
||||
}
|
||||
"""
|
||||
.formatted(driverName);
|
||||
}
|
||||
}
|
||||
|
||||
@ -25,6 +25,7 @@ import org.springframework.stereotype.Service;
|
||||
|
||||
import com.posthog.java.PostHog;
|
||||
|
||||
import stirling.software.common.configuration.RuntimePathConfig;
|
||||
import stirling.software.common.model.ApplicationProperties;
|
||||
|
||||
@Service
|
||||
@ -33,6 +34,7 @@ public class PostHogService {
|
||||
private final String uniqueId;
|
||||
private final String appVersion;
|
||||
private final ApplicationProperties applicationProperties;
|
||||
private final RuntimePathConfig runtimePathConfig;
|
||||
private final UserServiceInterface userService;
|
||||
private final Environment env;
|
||||
private boolean configDirMounted;
|
||||
@ -43,12 +45,14 @@ public class PostHogService {
|
||||
@Qualifier("configDirMounted") boolean configDirMounted,
|
||||
@Qualifier("appVersion") String appVersion,
|
||||
ApplicationProperties applicationProperties,
|
||||
RuntimePathConfig runtimePathConfig,
|
||||
@Autowired(required = false) UserServiceInterface userService,
|
||||
Environment env) {
|
||||
this.postHog = postHog;
|
||||
this.uniqueId = uuid;
|
||||
this.appVersion = appVersion;
|
||||
this.applicationProperties = applicationProperties;
|
||||
this.runtimePathConfig = runtimePathConfig;
|
||||
this.userService = userService;
|
||||
this.env = env;
|
||||
this.configDirMounted = configDirMounted;
|
||||
@ -313,10 +317,7 @@ public class PostHogService {
|
||||
properties,
|
||||
"system_customHTMLFiles",
|
||||
applicationProperties.getSystem().isCustomHTMLFiles());
|
||||
addIfNotEmpty(
|
||||
properties,
|
||||
"system_tessdataDir",
|
||||
applicationProperties.getSystem().getTessdataDir());
|
||||
addIfNotEmpty(properties, "system_tessdataDir", runtimePathConfig.getTessDataPath());
|
||||
addIfNotEmpty(
|
||||
properties,
|
||||
"system_enableAlphaFunctionality",
|
||||
|
||||
@ -27,15 +27,22 @@ import io.github.pixee.security.Filenames;
|
||||
|
||||
import lombok.extern.slf4j.Slf4j;
|
||||
|
||||
import stirling.software.common.configuration.RuntimePathConfig;
|
||||
import stirling.software.common.util.ProcessExecutor.ProcessExecutorResult;
|
||||
|
||||
@Slf4j
|
||||
public class PDFToFile {
|
||||
|
||||
private final TempFileManager tempFileManager;
|
||||
private final RuntimePathConfig runtimePathConfig;
|
||||
|
||||
public PDFToFile(TempFileManager tempFileManager) {
|
||||
this(tempFileManager, null);
|
||||
}
|
||||
|
||||
public PDFToFile(TempFileManager tempFileManager, RuntimePathConfig runtimePathConfig) {
|
||||
this.tempFileManager = tempFileManager;
|
||||
this.runtimePathConfig = runtimePathConfig;
|
||||
}
|
||||
|
||||
public ResponseEntity<byte[]> processPdfToMarkdown(MultipartFile inputFile)
|
||||
@ -241,31 +248,65 @@ public class PDFToFile {
|
||||
byte[] fileBytes;
|
||||
String fileName;
|
||||
|
||||
Path libreOfficeProfile = null;
|
||||
try (TempFile inputFileTemp = new TempFile(tempFileManager, ".pdf");
|
||||
TempDirectory outputDirTemp = new TempDirectory(tempFileManager)) {
|
||||
|
||||
Path tempInputFile = inputFileTemp.getPath();
|
||||
Path tempOutputDir = outputDirTemp.getPath();
|
||||
Path unoOutputFile =
|
||||
tempOutputDir.resolve(
|
||||
pdfBaseName + "." + resolvePrimaryExtension(outputFormat));
|
||||
|
||||
// Save the uploaded file to a temporary location
|
||||
inputFile.transferTo(tempInputFile);
|
||||
|
||||
// Run the LibreOffice command
|
||||
List<String> command =
|
||||
new ArrayList<>(
|
||||
Arrays.asList(
|
||||
"soffice",
|
||||
"--headless",
|
||||
"--nologo",
|
||||
"--infilter=" + libreOfficeFilter,
|
||||
"--convert-to",
|
||||
outputFormat,
|
||||
"--outdir",
|
||||
tempOutputDir.toString(),
|
||||
tempInputFile.toString()));
|
||||
ProcessExecutorResult returnCode =
|
||||
ProcessExecutor.getInstance(ProcessExecutor.Processes.LIBRE_OFFICE)
|
||||
.runCommandWithOutputHandling(command);
|
||||
ProcessExecutorResult returnCode = null;
|
||||
IOException unoconvertException = null;
|
||||
|
||||
if (isUnoConvertEnabled()) {
|
||||
try {
|
||||
List<String> unoCommand =
|
||||
buildUnoConvertCommand(
|
||||
tempInputFile, unoOutputFile, outputFormat, libreOfficeFilter);
|
||||
returnCode =
|
||||
ProcessExecutor.getInstance(ProcessExecutor.Processes.LIBRE_OFFICE)
|
||||
.runCommandWithOutputHandling(unoCommand);
|
||||
} catch (IOException e) {
|
||||
unoconvertException = e;
|
||||
log.warn(
|
||||
"Unoconvert command failed ({}). Falling back to soffice command.",
|
||||
e.getMessage());
|
||||
}
|
||||
}
|
||||
|
||||
if (returnCode == null) {
|
||||
// Run the LibreOffice command as a fallback
|
||||
libreOfficeProfile = Files.createTempDirectory("libreoffice_profile_");
|
||||
List<String> command = new ArrayList<>();
|
||||
command.add(runtimePathConfig.getSOfficePath());
|
||||
command.add("-env:UserInstallation=" + libreOfficeProfile.toUri().toString());
|
||||
command.add("--headless");
|
||||
command.add("--nologo");
|
||||
command.add("--infilter=" + libreOfficeFilter);
|
||||
command.add("--convert-to");
|
||||
command.add(outputFormat);
|
||||
command.add("--outdir");
|
||||
command.add(tempOutputDir.toString());
|
||||
command.add(tempInputFile.toString());
|
||||
|
||||
try {
|
||||
returnCode =
|
||||
ProcessExecutor.getInstance(ProcessExecutor.Processes.LIBRE_OFFICE)
|
||||
.runCommandWithOutputHandling(command);
|
||||
} catch (IOException e) {
|
||||
if (unoconvertException != null) {
|
||||
e.addSuppressed(unoconvertException);
|
||||
}
|
||||
throw e;
|
||||
}
|
||||
}
|
||||
|
||||
// Get output files
|
||||
List<File> outputFiles = Arrays.asList(tempOutputDir.toFile().listFiles());
|
||||
@ -300,8 +341,42 @@ public class PDFToFile {
|
||||
|
||||
fileBytes = byteArrayOutputStream.toByteArray();
|
||||
}
|
||||
} finally {
|
||||
if (libreOfficeProfile != null) {
|
||||
FileUtils.deleteQuietly(libreOfficeProfile.toFile());
|
||||
}
|
||||
}
|
||||
return WebResponseUtils.bytesToWebResponse(
|
||||
fileBytes, fileName, MediaType.APPLICATION_OCTET_STREAM);
|
||||
}
|
||||
|
||||
private boolean isUnoConvertEnabled() {
|
||||
return runtimePathConfig != null
|
||||
&& runtimePathConfig.getUnoConvertPath() != null
|
||||
&& !runtimePathConfig.getUnoConvertPath().isBlank();
|
||||
}
|
||||
|
||||
private List<String> buildUnoConvertCommand(
|
||||
Path inputFile, Path outputFile, String outputFormat, String libreOfficeFilter) {
|
||||
List<String> command = new ArrayList<>();
|
||||
command.add(runtimePathConfig.getUnoConvertPath());
|
||||
command.add("--port");
|
||||
command.add("2003");
|
||||
command.add("--convert-to");
|
||||
command.add(outputFormat);
|
||||
if (libreOfficeFilter != null && !libreOfficeFilter.isBlank()) {
|
||||
command.add("--input-filter=" + libreOfficeFilter);
|
||||
}
|
||||
command.add(inputFile.toString());
|
||||
command.add(outputFile.toString());
|
||||
return command;
|
||||
}
|
||||
|
||||
private String resolvePrimaryExtension(String outputFormat) {
|
||||
if (outputFormat == null) {
|
||||
return "";
|
||||
}
|
||||
int colonIndex = outputFormat.indexOf(':');
|
||||
return colonIndex > 0 ? outputFormat.substring(0, colonIndex) : outputFormat;
|
||||
}
|
||||
}
|
||||
|
||||
@ -32,6 +32,7 @@ import org.springframework.web.multipart.MultipartFile;
|
||||
|
||||
import io.github.pixee.security.ZipSecurity;
|
||||
|
||||
import stirling.software.common.configuration.RuntimePathConfig;
|
||||
import stirling.software.common.util.ProcessExecutor.ProcessExecutorResult;
|
||||
|
||||
/**
|
||||
@ -48,6 +49,7 @@ class PDFToFileTest {
|
||||
@Mock private ProcessExecutor mockProcessExecutor;
|
||||
@Mock private ProcessExecutorResult mockExecutorResult;
|
||||
@Mock private TempFileManager mockTempFileManager;
|
||||
@Mock private RuntimePathConfig mockRuntimePathConfig;
|
||||
|
||||
@BeforeEach
|
||||
void setUp() throws IOException {
|
||||
@ -61,7 +63,9 @@ class PDFToFileTest {
|
||||
.when(mockTempFileManager.createTempDirectory())
|
||||
.thenAnswer(invocation -> Files.createTempDirectory("test"));
|
||||
|
||||
pdfToFile = new PDFToFile(mockTempFileManager);
|
||||
lenient().when(mockRuntimePathConfig.getSOfficePath()).thenReturn("/usr/bin/soffice");
|
||||
|
||||
pdfToFile = new PDFToFile(mockTempFileManager, mockRuntimePathConfig);
|
||||
}
|
||||
|
||||
@Test
|
||||
@ -363,7 +367,8 @@ class PDFToFileTest {
|
||||
when(mockProcessExecutor.runCommandWithOutputHandling(
|
||||
argThat(
|
||||
args ->
|
||||
args.contains("--convert-to")
|
||||
args != null
|
||||
&& args.contains("--convert-to")
|
||||
&& args.contains("docx"))))
|
||||
.thenAnswer(
|
||||
invocation -> {
|
||||
@ -424,7 +429,11 @@ class PDFToFileTest {
|
||||
.thenReturn(mockProcessExecutor);
|
||||
|
||||
when(mockProcessExecutor.runCommandWithOutputHandling(
|
||||
argThat(args -> args.contains("--convert-to") && args.contains("odp"))))
|
||||
argThat(
|
||||
args ->
|
||||
args != null
|
||||
&& args.contains("--convert-to")
|
||||
&& args.contains("odp"))))
|
||||
.thenAnswer(
|
||||
invocation -> {
|
||||
// When command is executed, find the output directory argument
|
||||
@ -513,7 +522,8 @@ class PDFToFileTest {
|
||||
when(mockProcessExecutor.runCommandWithOutputHandling(
|
||||
argThat(
|
||||
args ->
|
||||
args.contains("--convert-to")
|
||||
args != null
|
||||
&& args.contains("--convert-to")
|
||||
&& args.contains("txt:Text"))))
|
||||
.thenAnswer(
|
||||
invocation -> {
|
||||
@ -611,4 +621,110 @@ class PDFToFileTest {
|
||||
.contains("output.docx"));
|
||||
}
|
||||
}
|
||||
|
||||
@Test
|
||||
void testProcessPdfToOfficeFormat_UsesUnoconvertWhenConfigured()
|
||||
throws IOException, InterruptedException {
|
||||
when(mockRuntimePathConfig.getUnoConvertPath()).thenReturn("/custom/unoconvert");
|
||||
PDFToFile pdfToFileWithUno = new PDFToFile(mockTempFileManager, mockRuntimePathConfig);
|
||||
|
||||
try (MockedStatic<ProcessExecutor> mockedStaticProcessExecutor =
|
||||
mockStatic(ProcessExecutor.class)) {
|
||||
MultipartFile pdfFile =
|
||||
new MockMultipartFile(
|
||||
"file",
|
||||
"document.pdf",
|
||||
MediaType.APPLICATION_PDF_VALUE,
|
||||
"Fake PDF content".getBytes());
|
||||
|
||||
mockedStaticProcessExecutor
|
||||
.when(() -> ProcessExecutor.getInstance(ProcessExecutor.Processes.LIBRE_OFFICE))
|
||||
.thenReturn(mockProcessExecutor);
|
||||
|
||||
when(mockProcessExecutor.runCommandWithOutputHandling(
|
||||
argThat(args -> args != null && args.contains("/custom/unoconvert"))))
|
||||
.thenAnswer(
|
||||
invocation -> {
|
||||
List<String> args = invocation.getArgument(0);
|
||||
String outputPath = args.get(args.size() - 1);
|
||||
Files.write(Path.of(outputPath), "Fake DOCX content".getBytes());
|
||||
return mockExecutorResult;
|
||||
});
|
||||
|
||||
ResponseEntity<byte[]> response =
|
||||
pdfToFileWithUno.processPdfToOfficeFormat(pdfFile, "docx", "writer_pdf_import");
|
||||
|
||||
assertEquals(HttpStatus.OK, response.getStatusCode());
|
||||
assertNotNull(response.getBody());
|
||||
assertTrue(response.getBody().length > 0);
|
||||
assertTrue(
|
||||
response.getHeaders()
|
||||
.getContentDisposition()
|
||||
.toString()
|
||||
.contains("document.docx"));
|
||||
}
|
||||
}
|
||||
|
||||
@Test
|
||||
void testProcessPdfToOfficeFormat_FallsBackWhenUnoconvertFails()
|
||||
throws IOException, InterruptedException {
|
||||
when(mockRuntimePathConfig.getUnoConvertPath()).thenReturn("/custom/unoconvert");
|
||||
PDFToFile pdfToFileWithUno = new PDFToFile(mockTempFileManager, mockRuntimePathConfig);
|
||||
|
||||
try (MockedStatic<ProcessExecutor> mockedStaticProcessExecutor =
|
||||
mockStatic(ProcessExecutor.class)) {
|
||||
MultipartFile pdfFile =
|
||||
new MockMultipartFile(
|
||||
"file",
|
||||
"document.pdf",
|
||||
MediaType.APPLICATION_PDF_VALUE,
|
||||
"Fake PDF content".getBytes());
|
||||
|
||||
mockedStaticProcessExecutor
|
||||
.when(() -> ProcessExecutor.getInstance(ProcessExecutor.Processes.LIBRE_OFFICE))
|
||||
.thenReturn(mockProcessExecutor);
|
||||
|
||||
when(mockProcessExecutor.runCommandWithOutputHandling(
|
||||
argThat(args -> args != null && args.contains("/custom/unoconvert"))))
|
||||
.thenThrow(new IOException("Conversion failed"));
|
||||
|
||||
when(mockProcessExecutor.runCommandWithOutputHandling(
|
||||
argThat(
|
||||
args ->
|
||||
args != null
|
||||
&& args.stream()
|
||||
.anyMatch(
|
||||
arg ->
|
||||
arg.contains(
|
||||
"soffice")))))
|
||||
.thenAnswer(
|
||||
invocation -> {
|
||||
List<String> args = invocation.getArgument(0);
|
||||
String outDir = null;
|
||||
for (int i = 0; i < args.size(); i++) {
|
||||
if ("--outdir".equals(args.get(i)) && i + 1 < args.size()) {
|
||||
outDir = args.get(i + 1);
|
||||
break;
|
||||
}
|
||||
}
|
||||
assertNotNull(outDir);
|
||||
Files.write(
|
||||
Path.of(outDir, "document.docx"),
|
||||
"Fallback DOCX content".getBytes());
|
||||
return mockExecutorResult;
|
||||
});
|
||||
|
||||
ResponseEntity<byte[]> response =
|
||||
pdfToFileWithUno.processPdfToOfficeFormat(pdfFile, "docx", "writer_pdf_import");
|
||||
|
||||
assertEquals(HttpStatus.OK, response.getStatusCode());
|
||||
assertNotNull(response.getBody());
|
||||
assertTrue(response.getBody().length > 0);
|
||||
assertTrue(
|
||||
response.getHeaders()
|
||||
.getContentDisposition()
|
||||
.toString()
|
||||
.contains("document.docx"));
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@ -41,6 +41,8 @@ public class ExternalAppDepConfig {
|
||||
private final String weasyprintPath;
|
||||
private final String unoconvPath;
|
||||
private final String calibrePath;
|
||||
private final String ocrMyPdfPath;
|
||||
private final String sOfficePath;
|
||||
|
||||
/**
|
||||
* Map of command(binary) -> affected groups (e.g. "gs" -> ["Ghostscript"]). Immutable to avoid
|
||||
@ -58,11 +60,13 @@ public class ExternalAppDepConfig {
|
||||
this.weasyprintPath = runtimePathConfig.getWeasyPrintPath();
|
||||
this.unoconvPath = runtimePathConfig.getUnoConvertPath();
|
||||
this.calibrePath = runtimePathConfig.getCalibrePath();
|
||||
this.ocrMyPdfPath = runtimePathConfig.getOcrMyPdfPath();
|
||||
this.sOfficePath = runtimePathConfig.getSOfficePath();
|
||||
|
||||
Map<String, List<String>> tmp = new HashMap<>();
|
||||
tmp.put("gs", List.of("Ghostscript"));
|
||||
tmp.put("ocrmypdf", List.of("OCRmyPDF"));
|
||||
tmp.put("soffice", List.of("LibreOffice"));
|
||||
tmp.put(ocrMyPdfPath, List.of("OCRmyPDF"));
|
||||
tmp.put(sOfficePath, List.of("LibreOffice"));
|
||||
tmp.put(weasyprintPath, List.of("Weasyprint"));
|
||||
tmp.put("pdftohtml", List.of("Pdftohtml"));
|
||||
tmp.put(unoconvPath, List.of("Unoconvert"));
|
||||
|
||||
@ -93,6 +93,7 @@ public class ConvertOfficeController {
|
||||
Files.copy(inputFile.getInputStream(), inputPath, StandardCopyOption.REPLACE_EXISTING);
|
||||
}
|
||||
|
||||
Path libreOfficeProfile = null;
|
||||
try {
|
||||
ProcessExecutorResult result;
|
||||
// Run Unoconvert command
|
||||
@ -112,8 +113,10 @@ public class ConvertOfficeController {
|
||||
.runCommandWithOutputHandling(command);
|
||||
} // Run soffice command
|
||||
else {
|
||||
libreOfficeProfile = Files.createTempDirectory("libreoffice_profile_");
|
||||
List<String> command = new ArrayList<>();
|
||||
command.add("soffice");
|
||||
command.add(runtimePathConfig.getSOfficePath());
|
||||
command.add("-env:UserInstallation=" + libreOfficeProfile.toUri().toString());
|
||||
command.add("--headless");
|
||||
command.add("--nologo");
|
||||
command.add("--convert-to");
|
||||
@ -169,6 +172,9 @@ public class ConvertOfficeController {
|
||||
} catch (IOException e) {
|
||||
log.warn("Failed to delete temp input file: {}", inputPath, e);
|
||||
}
|
||||
if (libreOfficeProfile != null) {
|
||||
FileUtils.deleteQuietly(libreOfficeProfile.toFile());
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@ -13,6 +13,7 @@ import io.swagger.v3.oas.annotations.tags.Tag;
|
||||
|
||||
import lombok.RequiredArgsConstructor;
|
||||
|
||||
import stirling.software.common.configuration.RuntimePathConfig;
|
||||
import stirling.software.common.model.api.PDFFile;
|
||||
import stirling.software.common.util.PDFToFile;
|
||||
import stirling.software.common.util.TempFileManager;
|
||||
@ -24,6 +25,7 @@ import stirling.software.common.util.TempFileManager;
|
||||
public class ConvertPDFToHtml {
|
||||
|
||||
private final TempFileManager tempFileManager;
|
||||
private final RuntimePathConfig runtimePathConfig;
|
||||
|
||||
@PostMapping(consumes = MediaType.MULTIPART_FORM_DATA_VALUE, value = "/pdf/html")
|
||||
@Operation(
|
||||
@ -32,7 +34,7 @@ public class ConvertPDFToHtml {
|
||||
"This endpoint converts a PDF file to HTML format. Input:PDF Output:HTML Type:SISO")
|
||||
public ResponseEntity<byte[]> processPdfToHTML(@ModelAttribute PDFFile file) throws Exception {
|
||||
MultipartFile inputFile = file.getFileInput();
|
||||
PDFToFile pdfToFile = new PDFToFile(tempFileManager);
|
||||
PDFToFile pdfToFile = new PDFToFile(tempFileManager, runtimePathConfig);
|
||||
return pdfToFile.processPdfToHtml(inputFile);
|
||||
}
|
||||
}
|
||||
|
||||
@ -20,6 +20,7 @@ import lombok.RequiredArgsConstructor;
|
||||
import stirling.software.SPDF.model.api.converters.PdfToPresentationRequest;
|
||||
import stirling.software.SPDF.model.api.converters.PdfToTextOrRTFRequest;
|
||||
import stirling.software.SPDF.model.api.converters.PdfToWordRequest;
|
||||
import stirling.software.common.configuration.RuntimePathConfig;
|
||||
import stirling.software.common.model.api.PDFFile;
|
||||
import stirling.software.common.service.CustomPDFDocumentFactory;
|
||||
import stirling.software.common.util.GeneralUtils;
|
||||
@ -35,6 +36,7 @@ public class ConvertPDFToOffice {
|
||||
|
||||
private final CustomPDFDocumentFactory pdfDocumentFactory;
|
||||
private final TempFileManager tempFileManager;
|
||||
private final RuntimePathConfig runtimePathConfig;
|
||||
|
||||
@PostMapping(consumes = MediaType.MULTIPART_FORM_DATA_VALUE, value = "/pdf/presentation")
|
||||
@Operation(
|
||||
@ -47,7 +49,7 @@ public class ConvertPDFToOffice {
|
||||
throws IOException, InterruptedException {
|
||||
MultipartFile inputFile = request.getFileInput();
|
||||
String outputFormat = request.getOutputFormat();
|
||||
PDFToFile pdfToFile = new PDFToFile(tempFileManager);
|
||||
PDFToFile pdfToFile = new PDFToFile(tempFileManager, runtimePathConfig);
|
||||
return pdfToFile.processPdfToOfficeFormat(inputFile, outputFormat, "impress_pdf_import");
|
||||
}
|
||||
|
||||
@ -72,7 +74,7 @@ public class ConvertPDFToOffice {
|
||||
MediaType.TEXT_PLAIN);
|
||||
}
|
||||
} else {
|
||||
PDFToFile pdfToFile = new PDFToFile(tempFileManager);
|
||||
PDFToFile pdfToFile = new PDFToFile(tempFileManager, runtimePathConfig);
|
||||
return pdfToFile.processPdfToOfficeFormat(inputFile, outputFormat, "writer_pdf_import");
|
||||
}
|
||||
}
|
||||
@ -87,7 +89,7 @@ public class ConvertPDFToOffice {
|
||||
throws IOException, InterruptedException {
|
||||
MultipartFile inputFile = request.getFileInput();
|
||||
String outputFormat = request.getOutputFormat();
|
||||
PDFToFile pdfToFile = new PDFToFile(tempFileManager);
|
||||
PDFToFile pdfToFile = new PDFToFile(tempFileManager, runtimePathConfig);
|
||||
return pdfToFile.processPdfToOfficeFormat(inputFile, outputFormat, "writer_pdf_import");
|
||||
}
|
||||
|
||||
@ -100,7 +102,7 @@ public class ConvertPDFToOffice {
|
||||
public ResponseEntity<byte[]> processPdfToXML(@ModelAttribute PDFFile file) throws Exception {
|
||||
MultipartFile inputFile = file.getFileInput();
|
||||
|
||||
PDFToFile pdfToFile = new PDFToFile(tempFileManager);
|
||||
PDFToFile pdfToFile = new PDFToFile(tempFileManager, runtimePathConfig);
|
||||
return pdfToFile.processPdfToOfficeFormat(inputFile, "xml", "writer_pdf_import");
|
||||
}
|
||||
}
|
||||
|
||||
@ -71,9 +71,11 @@ import io.swagger.v3.oas.annotations.Operation;
|
||||
import io.swagger.v3.oas.annotations.tags.Tag;
|
||||
|
||||
import lombok.Getter;
|
||||
import lombok.RequiredArgsConstructor;
|
||||
import lombok.extern.slf4j.Slf4j;
|
||||
|
||||
import stirling.software.SPDF.model.api.converters.PdfToPdfARequest;
|
||||
import stirling.software.common.configuration.RuntimePathConfig;
|
||||
import stirling.software.common.util.ExceptionUtils;
|
||||
import stirling.software.common.util.ProcessExecutor;
|
||||
import stirling.software.common.util.ProcessExecutor.ProcessExecutorResult;
|
||||
@ -83,8 +85,11 @@ import stirling.software.common.util.WebResponseUtils;
|
||||
@RequestMapping("/api/v1/convert")
|
||||
@Slf4j
|
||||
@Tag(name = "Convert", description = "Convert APIs")
|
||||
@RequiredArgsConstructor
|
||||
public class ConvertPDFToPDFA {
|
||||
|
||||
private final RuntimePathConfig runtimePathConfig;
|
||||
|
||||
private static final String ICC_RESOURCE_PATH = "/icc/sRGB2014.icc";
|
||||
private static final int PDFA_COMPATIBILITY_POLICY = 1;
|
||||
|
||||
@ -1043,26 +1048,33 @@ public class ConvertPDFToPDFA {
|
||||
? "pdf:writer_pdf_Export:{\"SelectPdfVersion\":{\"type\":\"long\",\"value\":\"2\"}}"
|
||||
: "pdf:writer_pdf_Export:{\"SelectPdfVersion\":{\"type\":\"long\",\"value\":\"1\"}}";
|
||||
|
||||
// Prepare LibreOffice command
|
||||
List<String> command =
|
||||
new ArrayList<>(
|
||||
Arrays.asList(
|
||||
"soffice",
|
||||
"--headless",
|
||||
"--nologo",
|
||||
"--convert-to",
|
||||
pdfFilter,
|
||||
"--outdir",
|
||||
tempOutputDir.toString(),
|
||||
tempInputFile.toString()));
|
||||
Path libreOfficeProfile = Files.createTempDirectory("libreoffice_profile_");
|
||||
try {
|
||||
// Prepare LibreOffice command
|
||||
List<String> command =
|
||||
new ArrayList<>(
|
||||
Arrays.asList(
|
||||
runtimePathConfig.getSOfficePath(),
|
||||
"-env:UserInstallation="
|
||||
+ libreOfficeProfile.toUri().toString(),
|
||||
"--headless",
|
||||
"--nologo",
|
||||
"--convert-to",
|
||||
pdfFilter,
|
||||
"--outdir",
|
||||
tempOutputDir.toString(),
|
||||
tempInputFile.toString()));
|
||||
|
||||
ProcessExecutorResult returnCode =
|
||||
ProcessExecutor.getInstance(ProcessExecutor.Processes.LIBRE_OFFICE)
|
||||
.runCommandWithOutputHandling(command);
|
||||
ProcessExecutorResult returnCode =
|
||||
ProcessExecutor.getInstance(ProcessExecutor.Processes.LIBRE_OFFICE)
|
||||
.runCommandWithOutputHandling(command);
|
||||
|
||||
if (returnCode.getRc() != 0) {
|
||||
log.error("PDF/A conversion failed with return code: {}", returnCode.getRc());
|
||||
throw ExceptionUtils.createPdfaConversionFailedException();
|
||||
if (returnCode.getRc() != 0) {
|
||||
log.error("PDF/A conversion failed with return code: {}", returnCode.getRc());
|
||||
throw ExceptionUtils.createPdfaConversionFailedException();
|
||||
}
|
||||
} finally {
|
||||
FileUtils.deleteQuietly(libreOfficeProfile.toFile());
|
||||
}
|
||||
|
||||
// Get the output file
|
||||
|
||||
@ -37,10 +37,17 @@ import lombok.extern.slf4j.Slf4j;
|
||||
|
||||
import stirling.software.SPDF.config.EndpointConfiguration;
|
||||
import stirling.software.SPDF.model.api.misc.ProcessPdfWithOcrRequest;
|
||||
import stirling.software.common.configuration.RuntimePathConfig;
|
||||
import stirling.software.common.model.ApplicationProperties;
|
||||
import stirling.software.common.service.CustomPDFDocumentFactory;
|
||||
import stirling.software.common.util.*;
|
||||
import stirling.software.common.util.ExceptionUtils;
|
||||
import stirling.software.common.util.GeneralUtils;
|
||||
import stirling.software.common.util.ProcessExecutor;
|
||||
import stirling.software.common.util.ProcessExecutor.ProcessExecutorResult;
|
||||
import stirling.software.common.util.TempDirectory;
|
||||
import stirling.software.common.util.TempFile;
|
||||
import stirling.software.common.util.TempFileManager;
|
||||
import stirling.software.common.util.WebResponseUtils;
|
||||
|
||||
@RestController
|
||||
@RequestMapping("/api/v1/misc")
|
||||
@ -53,6 +60,7 @@ public class OCRController {
|
||||
private final CustomPDFDocumentFactory pdfDocumentFactory;
|
||||
private final TempFileManager tempFileManager;
|
||||
private final EndpointConfiguration endpointConfiguration;
|
||||
private final RuntimePathConfig runtimePathConfig;
|
||||
|
||||
private boolean isOcrMyPdfEnabled() {
|
||||
return endpointConfiguration.isGroupEnabled("OCRmyPDF");
|
||||
@ -64,7 +72,7 @@ public class OCRController {
|
||||
|
||||
/** Gets the list of available Tesseract languages from the tessdata directory */
|
||||
public List<String> getAvailableTesseractLanguages() {
|
||||
String tessdataDir = applicationProperties.getSystem().getTessdataDir();
|
||||
String tessdataDir = runtimePathConfig.getTessDataPath();
|
||||
File[] files = new File(tessdataDir).listFiles();
|
||||
if (files == null) {
|
||||
return Collections.emptyList();
|
||||
@ -80,9 +88,10 @@ public class OCRController {
|
||||
@Operation(
|
||||
summary = "Process a PDF file with OCR",
|
||||
description =
|
||||
"This endpoint processes a PDF file using OCR (Optical Character Recognition). "
|
||||
+ "Users can specify languages, sidecar, deskew, clean, cleanFinal, ocrType, ocrRenderType, and removeImagesAfter options. "
|
||||
+ "Uses OCRmyPDF if available, falls back to Tesseract. Input:PDF Output:PDF Type:SI-Conditional")
|
||||
"This endpoint processes a PDF file using OCR (Optical Character Recognition). Users can"
|
||||
+ " specify languages, sidecar, deskew, clean, cleanFinal, ocrType, ocrRenderType,"
|
||||
+ " and removeImagesAfter options. Uses OCRmyPDF if available, falls back to"
|
||||
+ " Tesseract. Input:PDF Output:PDF Type:SI-Conditional")
|
||||
public ResponseEntity<byte[]> processPdfWithOCR(
|
||||
@ModelAttribute ProcessPdfWithOcrRequest request)
|
||||
throws IOException, InterruptedException {
|
||||
@ -217,7 +226,7 @@ public class OCRController {
|
||||
List<String> command =
|
||||
new ArrayList<>(
|
||||
Arrays.asList(
|
||||
"ocrmypdf",
|
||||
runtimePathConfig.getOcrMyPdfPath(),
|
||||
"--verbose",
|
||||
"2",
|
||||
"--output-type",
|
||||
|
||||
@ -14,16 +14,20 @@ import io.swagger.v3.oas.annotations.Hidden;
|
||||
import io.swagger.v3.oas.annotations.tags.Tag;
|
||||
|
||||
import lombok.RequiredArgsConstructor;
|
||||
import lombok.extern.slf4j.Slf4j;
|
||||
|
||||
import stirling.software.common.configuration.RuntimePathConfig;
|
||||
import stirling.software.common.model.ApplicationProperties;
|
||||
import stirling.software.common.util.CheckProgramInstall;
|
||||
|
||||
@Controller
|
||||
@Tag(name = "Misc", description = "Miscellaneous APIs")
|
||||
@RequiredArgsConstructor
|
||||
@Slf4j
|
||||
public class OtherWebController {
|
||||
|
||||
private final ApplicationProperties applicationProperties;
|
||||
private final RuntimePathConfig runtimePathConfig;
|
||||
|
||||
@GetMapping("/compress-pdf")
|
||||
@Hidden
|
||||
@ -120,7 +124,7 @@ public class OtherWebController {
|
||||
}
|
||||
|
||||
public List<String> getAvailableTesseractLanguages() {
|
||||
String tessdataDir = applicationProperties.getSystem().getTessdataDir();
|
||||
String tessdataDir = runtimePathConfig.getTessDataPath();
|
||||
File[] files = new File(tessdataDir).listFiles();
|
||||
if (files == null) {
|
||||
return Collections.emptyList();
|
||||
|
||||
@ -115,7 +115,7 @@ system:
|
||||
showUpdate: false # see when a new update is available
|
||||
showUpdateOnlyAdmin: false # only admins can see when a new update is available, depending on showUpdate it must be set to 'true'
|
||||
customHTMLFiles: false # enable to have files placed in /customFiles/templates override the existing template HTML files
|
||||
tessdataDir: /usr/share/tessdata # path to the directory containing the Tessdata files. This setting is relevant for Windows systems. For Windows users, this path should be adjusted to point to the appropriate directory where the Tessdata files are stored.
|
||||
tessdataDir: "" # path to the directory containing the Tessdata files. This setting is relevant for Windows systems. For Windows users, this path should be adjusted to point to the appropriate directory where the Tessdata files are stored.
|
||||
enableAnalytics: null # Master toggle for analytics: set to 'true' to enable all analytics, 'false' to disable all analytics, or leave as 'null' to prompt admin on first launch
|
||||
enablePosthog: null # Enable PostHog analytics (open-source product analytics): set to 'true' to enable, 'false' to disable, or 'null' to enable by default when analytics is enabled
|
||||
enableScarf: null # Enable Scarf pixel: set to 'true' to enable, 'false' to disable, or 'null' to enable by default when analytics is enabled
|
||||
@ -150,6 +150,8 @@ system:
|
||||
weasyprint: '' # Defaults to /opt/venv/bin/weasyprint
|
||||
unoconvert: '' # Defaults to /opt/venv/bin/unoconvert
|
||||
calibre: '' # Defaults to /usr/bin/ebook-convert
|
||||
ocrmypdf: '' # Defaults to /usr/bin/ocrmypdf
|
||||
soffice: '' # Defaults to /usr/bin/soffice
|
||||
fileUploadLimit: '' # Defaults to "". No limit when string is empty. Set a number, between 0 and 999, followed by one of the following strings to set a limit. "KB", "MB", "GB".
|
||||
tempFileManagement:
|
||||
baseTmpDir: '' # Defaults to java.io.tmpdir/stirling-pdf
|
||||
|
||||
@ -32,6 +32,8 @@ class ExternalAppDepConfigTest {
|
||||
void setUp() {
|
||||
when(runtimePathConfig.getWeasyPrintPath()).thenReturn("/custom/weasyprint");
|
||||
when(runtimePathConfig.getUnoConvertPath()).thenReturn("/custom/unoconvert");
|
||||
when(runtimePathConfig.getCalibrePath()).thenReturn("/custom/calibre");
|
||||
when(runtimePathConfig.getOcrMyPdfPath()).thenReturn("/custom/ocrmypdf");
|
||||
lenient()
|
||||
.when(endpointConfiguration.getEndpointsForGroup(anyString()))
|
||||
.thenReturn(Set.of());
|
||||
@ -45,6 +47,8 @@ class ExternalAppDepConfigTest {
|
||||
|
||||
assertEquals(List.of("Weasyprint"), mapping.get("/custom/weasyprint"));
|
||||
assertEquals(List.of("Unoconvert"), mapping.get("/custom/unoconvert"));
|
||||
assertEquals(List.of("Calibre"), mapping.get("/custom/calibre"));
|
||||
assertEquals(List.of("OCRmyPDF"), mapping.get("/custom/ocrmypdf"));
|
||||
assertEquals(List.of("Ghostscript"), mapping.get("gs"));
|
||||
}
|
||||
|
||||
|
||||
@ -1,8 +1,8 @@
|
||||
|
||||
services:
|
||||
stirling-pdf:
|
||||
container_name: Stirling-PDF-Fat-Disable-Endpoints
|
||||
image: docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest-fat
|
||||
# image: docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest-fat
|
||||
image: ghcr.io/stirling-tools/stirling-pdf-test:fat
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
|
||||
@ -1,7 +1,8 @@
|
||||
services:
|
||||
stirling-pdf:
|
||||
container_name: Stirling-PDF-Security-Fat
|
||||
image: docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest-fat
|
||||
# image: docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest-fat
|
||||
image: ghcr.io/stirling-tools/stirling-pdf-test:fat
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
|
||||
@ -1,7 +1,8 @@
|
||||
services:
|
||||
stirling-pdf:
|
||||
container_name: Stirling-PDF-Security
|
||||
image: docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest
|
||||
# image: docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest
|
||||
image: ghcr.io/stirling-tools/stirling-pdf-test:latest
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
@ -22,9 +23,9 @@ services:
|
||||
SECURITY_ENABLELOGIN: "true"
|
||||
SECURITY_OAUTH2_ENABLED: "true"
|
||||
SECURITY_OAUTH2_AUTOCREATEUSER: "true" # This is set to true to allow auto-creation of non-existing users in Stirling-PDF
|
||||
SECURITY_OAUTH2_ISSUER: "https://accounts.google.com" # Change with any other provider that supports OpenID Connect Discovery (/.well-known/openid-configuration) end-point
|
||||
SECURITY_OAUTH2_ISSUER: "https://accounts.google.com" # Change with any other provider that supports OpenID Connect Discovery (/.well-known/openid-configuration) end-point
|
||||
SECURITY_OAUTH2_CLIENTID: "<YOUR CLIENT ID>.apps.googleusercontent.com" # Client ID from your provider
|
||||
SECURITY_OAUTH2_CLIENTSECRET: "<YOUR CLIENT SECRET>" # Client Secret from your provider
|
||||
SECURITY_OAUTH2_CLIENTSECRET: "<YOUR CLIENT SECRET>" # Client Secret from your provider
|
||||
SECURITY_OAUTH2_SCOPES: "openid,profile,email" # Expected OAuth2 Scope
|
||||
SECURITY_OAUTH2_USEASUSERNAME: "email" # Default is 'email'; custom fields can be used as the username
|
||||
SECURITY_OAUTH2_PROVIDER: "google" # Set this to your OAuth provider's name, e.g., 'google' or 'keycloak'
|
||||
|
||||
@ -1,7 +1,8 @@
|
||||
services:
|
||||
stirling-pdf:
|
||||
container_name: Stirling-PDF-Security
|
||||
image: docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest
|
||||
# image: docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest
|
||||
image: ghcr.io/stirling-tools/stirling-pdf-test:latest
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
|
||||
@ -1,7 +1,8 @@
|
||||
services:
|
||||
stirling-pdf:
|
||||
container_name: Stirling-PDF-Ultra-Lite-Security
|
||||
image: docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest-ultra-lite
|
||||
# image: docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest-ultra-lite
|
||||
image: ghcr.io/stirling-tools/stirling-pdf-test:ultra-lite
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
|
||||
@ -1,7 +1,8 @@
|
||||
services:
|
||||
stirling-pdf:
|
||||
container_name: Stirling-PDF-Ultra-Lite
|
||||
image: docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest-ultra-lite
|
||||
# image: docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest-ultra-lite
|
||||
image: ghcr.io/stirling-tools/stirling-pdf-test:ultra-lite
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
|
||||
@ -1,7 +1,8 @@
|
||||
services:
|
||||
stirling-pdf:
|
||||
container_name: Stirling-PDF
|
||||
image: docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest
|
||||
# image: docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest
|
||||
image: ghcr.io/stirling-tools/stirling-pdf-test:latest
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
|
||||
@ -1,7 +1,8 @@
|
||||
services:
|
||||
stirling-pdf:
|
||||
container_name: Stirling-PDF-Security-Fat-with-login
|
||||
image: docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest-fat
|
||||
# image: docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest-fat
|
||||
image: ghcr.io/stirling-tools/stirling-pdf-test:fat
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
|
||||
@ -1,42 +1,188 @@
|
||||
#!/bin/bash
|
||||
# This script initializes Stirling PDF without OCR features.
|
||||
set -euo pipefail
|
||||
|
||||
export JAVA_TOOL_OPTIONS="${JAVA_BASE_OPTS} ${JAVA_CUSTOM_OPTS}"
|
||||
echo "running with JAVA_TOOL_OPTIONS ${JAVA_BASE_OPTS} ${JAVA_CUSTOM_OPTS}"
|
||||
log() { printf '%s\n' "$*" >&2; }
|
||||
command_exists() { command -v "$1" >/dev/null 2>&1; }
|
||||
|
||||
# Update the user and group IDs as per environment variables
|
||||
if [ ! -z "$PUID" ] && [ "$PUID" != "$(id -u stirlingpdfuser)" ]; then
|
||||
usermod -o -u "$PUID" stirlingpdfuser || true
|
||||
SU_EXEC_BIN=""
|
||||
if command_exists su-exec; then
|
||||
SU_EXEC_BIN="su-exec"
|
||||
elif command_exists gosu; then
|
||||
SU_EXEC_BIN="gosu"
|
||||
fi
|
||||
|
||||
CURRENT_USER="$(id -un)"
|
||||
CURRENT_UID="$(id -u)"
|
||||
SWITCH_USER_WARNING_EMITTED=false
|
||||
|
||||
if [ ! -z "$PGID" ] && [ "$PGID" != "$(getent group stirlingpdfgroup | cut -d: -f3)" ]; then
|
||||
groupmod -o -g "$PGID" stirlingpdfgroup || true
|
||||
fi
|
||||
umask "$UMASK" || true
|
||||
warn_switch_user_once() {
|
||||
if [ "$SWITCH_USER_WARNING_EMITTED" = false ]; then
|
||||
log "WARNING: Unable to switch to user ${RUNTIME_USER:-stirlingpdfuser}; running command as ${CURRENT_USER}."
|
||||
SWITCH_USER_WARNING_EMITTED=true
|
||||
fi
|
||||
}
|
||||
|
||||
if [[ "$INSTALL_BOOK_AND_ADVANCED_HTML_OPS" == "true" && "$FAT_DOCKER" != "true" ]]; then
|
||||
echo "issue with calibre in current version, feature currently disabled on Stirling-PDF"
|
||||
#apk add --no-cache calibre@testing
|
||||
run_as_runtime_user() {
|
||||
if [ "$CURRENT_USER" = "$RUNTIME_USER" ]; then
|
||||
"$@"
|
||||
elif [ "$CURRENT_UID" -eq 0 ] && [ -n "$SU_EXEC_BIN" ]; then
|
||||
"$SU_EXEC_BIN" "$RUNTIME_USER" "$@"
|
||||
else
|
||||
warn_switch_user_once
|
||||
"$@"
|
||||
fi
|
||||
}
|
||||
|
||||
# ---------- VERSION_TAG ----------
|
||||
# Load VERSION_TAG from file if not provided via environment.
|
||||
if [ -z "${VERSION_TAG:-}" ] && [ -f /etc/stirling_version ]; then
|
||||
VERSION_TAG="$(tr -d '\r\n' < /etc/stirling_version)"
|
||||
export VERSION_TAG
|
||||
fi
|
||||
|
||||
if [[ "$FAT_DOCKER" != "true" ]]; then
|
||||
/scripts/download-security-jar.sh
|
||||
fi
|
||||
# ---------- JAVA_OPTS ----------
|
||||
# Configure Java runtime options.
|
||||
export JAVA_TOOL_OPTIONS="${JAVA_BASE_OPTS:-} ${JAVA_CUSTOM_OPTS:-}"
|
||||
export JAVA_TOOL_OPTIONS="-Djava.awt.headless=true ${JAVA_TOOL_OPTIONS}"
|
||||
log "running with JAVA_TOOL_OPTIONS=${JAVA_TOOL_OPTIONS}"
|
||||
log "Running Stirling PDF with DISABLE_ADDITIONAL_FEATURES=${DISABLE_ADDITIONAL_FEATURES:-} and VERSION_TAG=${VERSION_TAG:-<unset>}"
|
||||
|
||||
if [[ -n "$LANGS" ]]; then
|
||||
/scripts/installFonts.sh $LANGS
|
||||
fi
|
||||
# ---------- UMASK ----------
|
||||
# Set default permissions mask.
|
||||
UMASK_VAL="${UMASK:-022}"
|
||||
umask "$UMASK_VAL" 2>/dev/null || umask 022
|
||||
|
||||
echo "Setting permissions and ownership for necessary directories..."
|
||||
# Ensure temp directory exists and has correct permissions
|
||||
mkdir -p /tmp/stirling-pdf || true
|
||||
# Attempt to change ownership of directories and files
|
||||
if chown -R stirlingpdfuser:stirlingpdfgroup $HOME /logs /scripts /usr/share/fonts/opentype/noto /configs /customFiles /pipeline /tmp/stirling-pdf /app.jar; then
|
||||
chmod -R 755 /logs /scripts /usr/share/fonts/opentype/noto /configs /customFiles /pipeline /tmp/stirling-pdf /app.jar || true
|
||||
# If chown succeeds, execute the command as stirlingpdfuser
|
||||
exec su-exec stirlingpdfuser "$@"
|
||||
# ---------- XDG_RUNTIME_DIR ----------
|
||||
# Create the runtime directory, respecting UID/GID settings.
|
||||
RUNTIME_USER="stirlingpdfuser"
|
||||
if id -u "$RUNTIME_USER" >/dev/null 2>&1; then
|
||||
RUID="$(id -u "$RUNTIME_USER")"
|
||||
RGRP="$(id -gn "$RUNTIME_USER")"
|
||||
else
|
||||
# If chown fails, execute the command without changing the user context
|
||||
echo "[WARN] Chown failed, running as host user"
|
||||
exec "$@"
|
||||
RUID="$(id -u)"
|
||||
RGRP="$(id -gn)"
|
||||
RUNTIME_USER="$(id -un)"
|
||||
fi
|
||||
CURRENT_USER="$(id -un)"
|
||||
CURRENT_UID="$(id -u)"
|
||||
|
||||
export XDG_RUNTIME_DIR="/tmp/xdg-${RUID}"
|
||||
mkdir -p "${XDG_RUNTIME_DIR}" || true
|
||||
if [ "$(id -u)" -eq 0 ]; then
|
||||
chown "${RUNTIME_USER}:${RGRP}" "${XDG_RUNTIME_DIR}" 2>/dev/null || true
|
||||
fi
|
||||
chmod 700 "${XDG_RUNTIME_DIR}" 2>/dev/null || true
|
||||
log "XDG_RUNTIME_DIR=${XDG_RUNTIME_DIR}"
|
||||
|
||||
# ---------- Optional ----------
|
||||
# Disable advanced HTML operations if required.
|
||||
if [[ "${INSTALL_BOOK_AND_ADVANCED_HTML_OPS:-false}" == "true" && "${FAT_DOCKER:-true}" != "true" ]]; then
|
||||
log "issue with calibre in current version, feature currently disabled on Stirling-PDF"
|
||||
fi
|
||||
|
||||
# Download security JAR in non-fat builds.
|
||||
if [[ "${FAT_DOCKER:-true}" != "true" && -x /scripts/download-security-jar.sh ]]; then
|
||||
/scripts/download-security-jar.sh || true
|
||||
fi
|
||||
|
||||
# ---------- UID/GID remap ----------
|
||||
# Remap user/group IDs to match container runtime settings.
|
||||
if [ "$(id -u)" -eq 0 ]; then
|
||||
if id -u stirlingpdfuser >/dev/null 2>&1; then
|
||||
if [ -n "${PUID:-}" ] && [ "$PUID" != "$(id -u stirlingpdfuser)" ]; then
|
||||
usermod -o -u "$PUID" stirlingpdfuser || true
|
||||
chown stirlingpdfuser:stirlingpdfgroup "${XDG_RUNTIME_DIR}" 2>/dev/null || true
|
||||
fi
|
||||
fi
|
||||
if getent group stirlingpdfgroup >/dev/null 2>&1; then
|
||||
if [ -n "${PGID:-}" ] && [ "$PGID" != "$(getent group stirlingpdfgroup | cut -d: -f3)" ]; then
|
||||
groupmod -o -g "$PGID" stirlingpdfgroup || true
|
||||
fi
|
||||
fi
|
||||
fi
|
||||
|
||||
# ---------- Permissions ----------
|
||||
# Ensure required directories exist and set correct permissions.
|
||||
log "Setting permissions..."
|
||||
mkdir -p /tmp/stirling-pdf /logs /configs /customFiles /pipeline || true
|
||||
CHOWN_PATHS=("$HOME" "/logs" "/scripts" "/configs" "/customFiles" "/pipeline" "/tmp/stirling-pdf" "/app.jar")
|
||||
[ -d /usr/share/fonts/truetype ] && CHOWN_PATHS+=("/usr/share/fonts/truetype")
|
||||
CHOWN_OK=true
|
||||
for p in "${CHOWN_PATHS[@]}"; do
|
||||
if [ -e "$p" ]; then
|
||||
chown -R "stirlingpdfuser:stirlingpdfgroup" "$p" 2>/dev/null || CHOWN_OK=false
|
||||
chmod -R 755 "$p" 2>/dev/null || true
|
||||
fi
|
||||
done
|
||||
|
||||
# ---------- Xvfb ----------
|
||||
# Start a virtual framebuffer for GUI-based LibreOffice interactions.
|
||||
if command_exists Xvfb; then
|
||||
log "Starting Xvfb on :99"
|
||||
Xvfb :99 -screen 0 1024x768x24 -ac +extension GLX +render -noreset > /dev/null 2>&1 &
|
||||
export DISPLAY=:99
|
||||
sleep 1
|
||||
else
|
||||
log "Xvfb not installed; skipping virtual display setup"
|
||||
fi
|
||||
|
||||
# ---------- unoserver ----------
|
||||
# Start LibreOffice UNO server for document conversions.
|
||||
UNOSERVER_BIN="$(command -v unoserver || true)"
|
||||
UNOCONVERT_BIN="$(command -v unoconvert || true)"
|
||||
UNOSERVER_PID=""
|
||||
|
||||
if [ -n "$UNOSERVER_BIN" ] && [ -n "$UNOCONVERT_BIN" ]; then
|
||||
LIBREOFFICE_PROFILE="${HOME:-/home/${RUNTIME_USER}}/.libreoffice_uno_${RUID}"
|
||||
run_as_runtime_user mkdir -p "$LIBREOFFICE_PROFILE"
|
||||
|
||||
log "Starting unoserver on 127.0.0.1:2003"
|
||||
run_as_runtime_user "$UNOSERVER_BIN" \
|
||||
--interface 127.0.0.1 \
|
||||
--port 2003 \
|
||||
--uno-port 2004 \
|
||||
&
|
||||
UNOSERVER_PID=$!
|
||||
log "unoserver PID: $UNOSERVER_PID (Profile: $LIBREOFFICE_PROFILE)"
|
||||
|
||||
# Wait until UNO server is ready.
|
||||
log "Waiting for unoserver..."
|
||||
for _ in {1..20}; do
|
||||
if run_as_runtime_user "$UNOCONVERT_BIN" --version >/dev/null 2>&1; then
|
||||
log "unoserver is ready!"
|
||||
break
|
||||
fi
|
||||
sleep 1
|
||||
done
|
||||
|
||||
if ! run_as_runtime_user "$UNOCONVERT_BIN" --version >/dev/null 2>&1; then
|
||||
log "ERROR: unoserver failed!"
|
||||
if [ -n "$UNOSERVER_PID" ]; then
|
||||
kill "$UNOSERVER_PID" 2>/dev/null || true
|
||||
wait "$UNOSERVER_PID" 2>/dev/null || true
|
||||
fi
|
||||
exit 1
|
||||
fi
|
||||
else
|
||||
log "unoserver/unoconvert not installed; skipping UNO setup"
|
||||
fi
|
||||
|
||||
# ---------- Java ----------
|
||||
# Start Stirling PDF Java application.
|
||||
log "Starting Stirling PDF"
|
||||
JAVA_CMD=(
|
||||
java
|
||||
-Dfile.encoding=UTF-8
|
||||
-Djava.io.tmpdir=/tmp/stirling-pdf
|
||||
-jar /app.jar
|
||||
)
|
||||
|
||||
if [ "$CURRENT_USER" = "$RUNTIME_USER" ]; then
|
||||
exec "${JAVA_CMD[@]}"
|
||||
elif [ "$CURRENT_UID" -eq 0 ] && [ -n "$SU_EXEC_BIN" ]; then
|
||||
exec "$SU_EXEC_BIN" "$RUNTIME_USER" "${JAVA_CMD[@]}"
|
||||
else
|
||||
warn_switch_user_once
|
||||
exec "${JAVA_CMD[@]}"
|
||||
fi
|
||||
|
||||
120
scripts/init.sh
120
scripts/init.sh
@ -1,36 +1,110 @@
|
||||
#!/bin/bash
|
||||
# This script initializes environment variables and paths,
|
||||
# prepares Tesseract data directories, and then runs the main init script.
|
||||
|
||||
# Copy the original tesseract-ocr files to the volume directory without overwriting existing files
|
||||
echo "Copying original files without overwriting existing files"
|
||||
mkdir -p /usr/share/tessdata
|
||||
cp -rn /usr/share/tessdata-original/* /usr/share/tessdata
|
||||
set -euo pipefail
|
||||
|
||||
if [ -d /usr/share/tesseract-ocr/4.00/tessdata ]; then
|
||||
cp -r /usr/share/tesseract-ocr/4.00/tessdata/* /usr/share/tessdata || true;
|
||||
append_env_path() {
|
||||
local target="$1" current="$2" separator=":"
|
||||
if [ -d "$target" ] && [[ ":${current}:" != *":${target}:"* ]]; then
|
||||
if [ -n "$current" ]; then
|
||||
printf '%s' "${target}${separator}${current}"
|
||||
else
|
||||
printf '%s' "${target}"
|
||||
fi
|
||||
else
|
||||
printf '%s' "$current"
|
||||
fi
|
||||
}
|
||||
|
||||
python_site_dir() {
|
||||
local venv_dir="$1"
|
||||
local python_bin="$venv_dir/bin/python"
|
||||
if [ -x "$python_bin" ]; then
|
||||
local py_tag
|
||||
if py_tag="$("$python_bin" -c 'import sys; print(f"python{sys.version_info.major}.{sys.version_info.minor}")' 2>/dev/null)" \
|
||||
&& [ -n "$py_tag" ] \
|
||||
&& [ -d "$venv_dir/lib/$py_tag/site-packages" ]; then
|
||||
printf '%s' "$venv_dir/lib/$py_tag/site-packages"
|
||||
fi
|
||||
fi
|
||||
}
|
||||
|
||||
# === LD_LIBRARY_PATH ===
|
||||
# Adjust the library path depending on CPU architecture.
|
||||
ARCH=$(uname -m)
|
||||
case "$ARCH" in
|
||||
x86_64)
|
||||
[ -d /usr/lib/x86_64-linux-gnu ] && export LD_LIBRARY_PATH="/usr/lib/x86_64-linux-gnu${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"
|
||||
;;
|
||||
aarch64)
|
||||
[ -d /usr/lib/aarch64-linux-gnu ] && export LD_LIBRARY_PATH="/usr/lib/aarch64-linux-gnu${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"
|
||||
;;
|
||||
esac
|
||||
|
||||
# Add LibreOffice program directory to library path if available.
|
||||
if [ -d /usr/lib/libreoffice/program ]; then
|
||||
export LD_LIBRARY_PATH="/usr/lib/libreoffice/program${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"
|
||||
fi
|
||||
|
||||
# === Python PATH ===
|
||||
# Add virtual environments to PATH and PYTHONPATH.
|
||||
for dir in /opt/venv/bin /opt/unoserver-venv/bin; do
|
||||
PATH="$(append_env_path "$dir" "$PATH")"
|
||||
done
|
||||
export PATH
|
||||
|
||||
PYTHON_PATH_ENTRIES=()
|
||||
for venv in /opt/venv /opt/unoserver-venv; do
|
||||
if [ -d "$venv" ]; then
|
||||
site_dir="$(python_site_dir "$venv")"
|
||||
[ -n "${site_dir:-}" ] && PYTHON_PATH_ENTRIES+=("$site_dir")
|
||||
fi
|
||||
done
|
||||
if [ ${#PYTHON_PATH_ENTRIES[@]} -gt 0 ]; then
|
||||
PYTHONPATH="$(IFS=:; printf '%s' "${PYTHON_PATH_ENTRIES[*]}")${PYTHONPATH:+:$PYTHONPATH}"
|
||||
export PYTHONPATH
|
||||
fi
|
||||
|
||||
# # === tessdata ===
|
||||
# # Prepare Tesseract OCR data directory.
|
||||
REAL_TESSDATA="/usr/share/tesseract-ocr/5/tessdata"
|
||||
SEC_TESSDATA="/usr/share/tessdata"
|
||||
|
||||
log_warn() {
|
||||
echo "[init][warn] $*" >&2
|
||||
}
|
||||
|
||||
if [ -d "$REAL_TESSDATA" ] && [ -w "$REAL_TESSDATA" ]; then
|
||||
log_warn "Skipping tessdata adjustments; directory writable: $REAL_TESSDATA"
|
||||
else
|
||||
log_warn "Skipping tessdata adjustments; directory missing or not writable: $REAL_TESSDATA"
|
||||
fi
|
||||
|
||||
if [ -d /usr/share/tesseract-ocr/5/tessdata ]; then
|
||||
cp -r /usr/share/tesseract-ocr/5/tessdata/* /usr/share/tessdata || true;
|
||||
REAL_TESSDATA="/usr/share/tesseract-ocr/5/tessdata"
|
||||
log_warn "Using /usr/share/tesseract-ocr/5/tessdata as TESSDATA_PREFIX"
|
||||
elif [ -d /usr/share/tessdata ]; then
|
||||
REAL_TESSDATA="/usr/share/tessdata"
|
||||
log_warn "Using /usr/share/tessdata as TESSDATA_PREFIX"
|
||||
elif [ -d /tessdata ]; then
|
||||
REAL_TESSDATA="/tessdata"
|
||||
log_warn "Using /tessdata as TESSDATA_PREFIX"
|
||||
else
|
||||
REAL_TESSDATA=""
|
||||
log_warn "No tessdata directory found"
|
||||
fi
|
||||
|
||||
# Check if TESSERACT_LANGS environment variable is set and is not empty
|
||||
if [[ -n "$TESSERACT_LANGS" ]]; then
|
||||
# Convert comma-separated values to a space-separated list
|
||||
SPACE_SEPARATED_LANGS=$(echo $TESSERACT_LANGS | tr ',' ' ')
|
||||
pattern='^[a-zA-Z]{2,4}(_[a-zA-Z]{2,4})?$'
|
||||
# Install each language pack
|
||||
for LANG in $SPACE_SEPARATED_LANGS; do
|
||||
if [[ $LANG =~ $pattern ]]; then
|
||||
apk add --no-cache "tesseract-ocr-data-$LANG"
|
||||
else
|
||||
echo "Skipping invalid language code"
|
||||
fi
|
||||
done
|
||||
if [ -n "$REAL_TESSDATA" ]; then
|
||||
export TESSDATA_PREFIX="$REAL_TESSDATA"
|
||||
fi
|
||||
|
||||
# Ensure temp directory exists with correct permissions before running main init
|
||||
mkdir -p /tmp/stirling-pdf || true
|
||||
# === Temp dir ===
|
||||
# Ensure the temporary directory exists and has proper permissions.
|
||||
mkdir -p /tmp/stirling-pdf
|
||||
chown -R stirlingpdfuser:stirlingpdfgroup /tmp/stirling-pdf || true
|
||||
chmod -R 755 /tmp/stirling-pdf || true
|
||||
|
||||
/scripts/init-without-ocr.sh "$@"
|
||||
# === Start application ===
|
||||
# Run the main init script that handles the full startup logic.
|
||||
exec /scripts/init-without-ocr.sh
|
||||
|
||||
@ -140,6 +140,9 @@ system:
|
||||
operations:
|
||||
weasyprint: '' # Defaults to /opt/venv/bin/weasyprint
|
||||
unoconvert: '' # Defaults to /opt/venv/bin/unoconvert
|
||||
calibre: '' # Defaults to /usr/bin/ebook-convert
|
||||
ocrmypdf: '' # Defaults to /usr/bin/ocrmypdf
|
||||
soffice: '' # Defaults to /usr/bin/soffice
|
||||
fileUploadLimit: '' # Defaults to "". No limit when string is empty. Set a number, between 0 and 999, followed by one of the following strings to set a limit. "KB", "MB", "GB".
|
||||
tempFileManagement:
|
||||
baseTmpDir: '' # Defaults to java.io.tmpdir/stirling-pdf
|
||||
|
||||
195
testing/test.sh
195
testing/test.sh
@ -16,27 +16,47 @@ find_root() {
|
||||
|
||||
PROJECT_ROOT=$(find_root)
|
||||
|
||||
# Function to check the health of the service with a timeout of 80 seconds
|
||||
# Function to check application readiness via HTTP instead of Docker's health status
|
||||
check_health() {
|
||||
local service_name=$1
|
||||
local container_name=$1 # real container name
|
||||
local compose_file=$2
|
||||
local end=$((SECONDS+60))
|
||||
local timeout=80 # total timeout in seconds
|
||||
local interval=3 # poll interval in seconds
|
||||
local end=$((SECONDS + timeout))
|
||||
local last_code="000"
|
||||
|
||||
echo -n "Waiting for $service_name to become healthy..."
|
||||
until [ "$(docker inspect --format='{{if .State.Health}}{{.State.Health.Status}}{{else}}healthy{{end}}' "$service_name")" == "healthy" ] || [ $SECONDS -ge $end ]; do
|
||||
sleep 3
|
||||
echo -n "."
|
||||
if [ $SECONDS -ge $end ]; then
|
||||
echo -e "\n$service_name health check timed out after 80 seconds."
|
||||
echo "Printing logs for $service_name:"
|
||||
docker logs "$service_name"
|
||||
return 1
|
||||
echo "Waiting for $container_name to become reachable on http://localhost:8080/ (timeout ${timeout}s)..."
|
||||
while [ $SECONDS -lt $end ]; do
|
||||
# Optional: check if container is running at all (nice for debugging)
|
||||
if ! docker ps --format '{{.Names}}' | grep -Fxq "$container_name"; then
|
||||
echo " Container $container_name not running yet (still waiting)..."
|
||||
fi
|
||||
|
||||
# Try simple HTTP GET on the root page
|
||||
last_code=$(curl -s -o /dev/null -w '%{http_code}' "http://localhost:8080/") || last_code="000"
|
||||
|
||||
# Treat any 2xx or 3xx as "ready"
|
||||
if [ "$last_code" -ge 200 ] && [ "$last_code" -lt 400 ]; then
|
||||
echo "$container_name is reachable over HTTP (status $last_code)."
|
||||
echo "Printing logs for $container_name:"
|
||||
docker logs "$container_name" || true
|
||||
return 0
|
||||
fi
|
||||
|
||||
echo " Still waiting for HTTP readiness, current status: $last_code"
|
||||
sleep "$interval"
|
||||
done
|
||||
echo -e "\n$service_name is healthy!"
|
||||
echo "Printing logs for $service_name:"
|
||||
docker logs "$service_name"
|
||||
return 0
|
||||
|
||||
echo "$container_name did not become HTTP-ready within ${timeout}s (last HTTP status: $last_code)."
|
||||
|
||||
# For extra debugging: show Docker health status, but DO NOT depend on it
|
||||
local docker_health
|
||||
docker_health=$(docker inspect --format='{{if .State.Health}}{{.State.Health.Status}}{{else}}(no healthcheck){{end}}' "$container_name" 2>/dev/null || echo "inspect failed")
|
||||
echo "Docker-reported health status for $container_name: $docker_health"
|
||||
|
||||
echo "Printing logs for $container_name:"
|
||||
docker logs "$container_name" || true
|
||||
return 1
|
||||
}
|
||||
|
||||
# Function to capture file list from a Docker container
|
||||
@ -48,7 +68,7 @@ capture_file_list() {
|
||||
# Get all files in one command, output directly from Docker to avoid path issues
|
||||
# Skip proc, sys, dev, and the specified LibreOffice config directory
|
||||
# Also skip PDFBox and LibreOffice temporary files
|
||||
docker exec $container_name sh -c "find / -type f \
|
||||
docker exec "$container_name" sh -c "find / -type f \
|
||||
-not -path '*/proc/*' \
|
||||
-not -path '*/sys/*' \
|
||||
-not -path '*/dev/*' \
|
||||
@ -69,7 +89,7 @@ capture_file_list() {
|
||||
echo "Trying alternative approach..."
|
||||
|
||||
# Alternative simpler approach - just get paths as a fallback
|
||||
docker exec $container_name sh -c "find / -type f \
|
||||
docker exec "$container_name" sh -c "find / -type f \
|
||||
-not -path '*/proc/*' \
|
||||
-not -path '*/sys/*' \
|
||||
-not -path '*/dev/*' \
|
||||
@ -106,14 +126,8 @@ compare_file_lists() {
|
||||
# Check if files exist and have content
|
||||
if [ ! -s "$before_file" ] || [ ! -s "$after_file" ]; then
|
||||
echo "WARNING: One or both file lists are empty."
|
||||
|
||||
if [ ! -s "$before_file" ]; then
|
||||
echo "Before file is empty: $before_file"
|
||||
fi
|
||||
|
||||
if [ ! -s "$after_file" ]; then
|
||||
echo "After file is empty: $after_file"
|
||||
fi
|
||||
if [ ! -s "$before_file" ]; then echo "Before file is empty: $before_file"; fi
|
||||
if [ ! -s "$after_file" ]; then echo "After file is empty: $after_file"; fi
|
||||
|
||||
# Create empty diff file
|
||||
> "$diff_file"
|
||||
@ -132,7 +146,6 @@ compare_file_lists() {
|
||||
echo "No temporary files found in the after snapshot."
|
||||
fi
|
||||
fi
|
||||
|
||||
return 0
|
||||
fi
|
||||
|
||||
@ -169,7 +182,6 @@ compare_file_lists() {
|
||||
else
|
||||
echo "No file changes detected during test."
|
||||
fi
|
||||
|
||||
return 0
|
||||
}
|
||||
|
||||
@ -220,19 +232,33 @@ verify_app_version() {
|
||||
# Function to test a Docker Compose configuration
|
||||
test_compose() {
|
||||
local compose_file=$1
|
||||
local service_name=$2
|
||||
local test_name=$2
|
||||
local status=0
|
||||
|
||||
echo "Testing $compose_file configuration..."
|
||||
echo "Testing ${compose_file} configuration..."
|
||||
|
||||
# Start up the Docker Compose service
|
||||
docker-compose -f "$compose_file" up -d
|
||||
|
||||
# Wait for the service to become healthy
|
||||
if check_health "$service_name" "$compose_file"; then
|
||||
echo "$service_name test passed."
|
||||
# Wait a moment for containers to appear
|
||||
sleep 3
|
||||
|
||||
local container_name
|
||||
container_name=$(docker-compose -f "$compose_file" ps --format '{{.Names}}' --filter "status=running" | head -n1)
|
||||
|
||||
if [[ -z "$container_name" ]]; then
|
||||
echo "ERROR: No running container found for ${compose_file}"
|
||||
docker-compose -f "$compose_file" ps
|
||||
return 1
|
||||
fi
|
||||
|
||||
echo "Started container: $container_name"
|
||||
|
||||
# Wait for the service to become healthy (HTTP-based)
|
||||
if check_health "$container_name" "$compose_file"; then
|
||||
echo "${test_name} test passed."
|
||||
else
|
||||
echo "$service_name test failed."
|
||||
echo "${test_name} test failed."
|
||||
status=1
|
||||
fi
|
||||
|
||||
@ -246,7 +272,6 @@ declare -a failed_tests
|
||||
run_tests() {
|
||||
local test_name=$1
|
||||
local compose_file=$2
|
||||
|
||||
if test_compose "$compose_file" "$test_name"; then
|
||||
passed_tests+=("$test_name")
|
||||
else
|
||||
@ -254,18 +279,18 @@ run_tests() {
|
||||
fi
|
||||
}
|
||||
|
||||
|
||||
# Main testing routine
|
||||
main() {
|
||||
SECONDS=0
|
||||
|
||||
cd "$PROJECT_ROOT"
|
||||
|
||||
export DOCKER_CLI_EXPERIMENTAL=enabled
|
||||
export COMPOSE_DOCKER_CLI_BUILD=0
|
||||
export DISABLE_ADDITIONAL_FEATURES=true
|
||||
|
||||
# Run the gradlew build command and check if it fails
|
||||
# ==================================================================
|
||||
# 1. Ultra-Lite (no additional features)
|
||||
# ==================================================================
|
||||
export DISABLE_ADDITIONAL_FEATURES=true
|
||||
if ! ./gradlew clean build; then
|
||||
echo "Gradle build failed with security disabled, exiting script."
|
||||
exit 1
|
||||
@ -276,11 +301,12 @@ main() {
|
||||
EXPECTED_VERSION=$(get_expected_version)
|
||||
echo "Expected version: $EXPECTED_VERSION"
|
||||
|
||||
# Building Docker images
|
||||
# docker build --no-cache --pull --build-arg VERSION_TAG=alpha -t stirlingtools/stirling-pdf:latest -f ./Dockerfile .
|
||||
docker build --build-arg VERSION_TAG=alpha -t docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest-ultra-lite -f ./Dockerfile.ultra-lite .
|
||||
# Build Ultra-Lite image (GHCR tag, matching docker-compose-latest-ultra-lite.yml)
|
||||
docker build --build-arg VERSION_TAG=alpha \
|
||||
-t docker.stirlingpdf.com/stirlingtools/stirling-pdf:ultra-lite \
|
||||
-f ./Dockerfile.ultra-lite .
|
||||
|
||||
# Test each configuration
|
||||
# Test Ultra-Lite configuration
|
||||
run_tests "Stirling-PDF-Ultra-Lite" "./exampleYmlFiles/docker-compose-latest-ultra-lite.yml"
|
||||
|
||||
echo "Testing webpage accessibility..."
|
||||
@ -302,36 +328,27 @@ main() {
|
||||
echo "Version verification failed for Stirling-PDF-Ultra-Lite"
|
||||
fi
|
||||
|
||||
docker-compose -f "./exampleYmlFiles/docker-compose-latest-ultra-lite.yml" down
|
||||
|
||||
# run_tests "Stirling-PDF" "./exampleYmlFiles/docker-compose-latest.yml"
|
||||
# docker-compose -f "./exampleYmlFiles/docker-compose-latest.yml" down
|
||||
docker-compose -f "./exampleYmlFiles/docker-compose-latest-ultra-lite.yml" down -v
|
||||
|
||||
# ==================================================================
|
||||
# 2. Full Fat + Security
|
||||
# ==================================================================
|
||||
export DISABLE_ADDITIONAL_FEATURES=false
|
||||
# Run the gradlew build command and check if it fails
|
||||
if ! ./gradlew clean build; then
|
||||
echo "Gradle build failed with security enabled, exiting script."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Get expected version after the security-enabled build
|
||||
echo "Getting expected version from Gradle (security enabled)..."
|
||||
EXPECTED_VERSION=$(get_expected_version)
|
||||
echo "Expected version with security enabled: $EXPECTED_VERSION"
|
||||
|
||||
# Building Docker images with security enabled
|
||||
# docker build --no-cache --pull --build-arg VERSION_TAG=alpha -t stirlingtools/stirling-pdf:latest -f ./Dockerfile .
|
||||
# docker build --no-cache --pull --build-arg VERSION_TAG=alpha -t stirlingtools/stirling-pdf:latest-ultra-lite -f ./Dockerfile.ultra-lite .
|
||||
docker build --no-cache --pull --build-arg VERSION_TAG=alpha -t docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest-fat -f ./Dockerfile.fat .
|
||||
|
||||
|
||||
# Test each configuration with security
|
||||
# run_tests "Stirling-PDF-Ultra-Lite-Security" "./exampleYmlFiles/docker-compose-latest-ultra-lite-security.yml"
|
||||
# docker-compose -f "./exampleYmlFiles/docker-compose-latest-ultra-lite-security.yml" down
|
||||
# run_tests "Stirling-PDF-Security" "./exampleYmlFiles/docker-compose-latest-security.yml"
|
||||
# docker-compose -f "./exampleYmlFiles/docker-compose-latest-security.yml" down
|
||||
|
||||
# Build Fat (Security) image for GHCR tag used in all 'fat' compose files
|
||||
docker build --no-cache --pull --build-arg VERSION_TAG=alpha \
|
||||
-t docker.stirlingpdf.com/stirlingtools/stirling-pdf:fat \
|
||||
-f ./Dockerfile.fat .
|
||||
|
||||
# Test fat + security compose
|
||||
run_tests "Stirling-PDF-Security-Fat" "./exampleYmlFiles/docker-compose-latest-fat-security.yml"
|
||||
|
||||
echo "Testing webpage accessibility..."
|
||||
@ -353,54 +370,50 @@ main() {
|
||||
echo "Version verification failed for Stirling-PDF-Security-Fat"
|
||||
fi
|
||||
|
||||
docker-compose -f "./exampleYmlFiles/docker-compose-latest-fat-security.yml" down
|
||||
docker-compose -f "./exampleYmlFiles/docker-compose-latest-fat-security.yml" down -v
|
||||
|
||||
# ==================================================================
|
||||
# 3. Regression test with login (test_cicd.yml)
|
||||
# ==================================================================
|
||||
run_tests "Stirling-PDF-Security-Fat-with-login" "./exampleYmlFiles/test_cicd.yml"
|
||||
|
||||
if [ $? -eq 0 ]; then
|
||||
# Create directory for file snapshots if it doesn't exist
|
||||
# Only run behave tests if the container started successfully
|
||||
if [[ " ${passed_tests[*]} " =~ "Stirling-PDF-Security-Fat-with-login" ]]; then
|
||||
|
||||
CONTAINER_NAME=$(docker-compose -f "./exampleYmlFiles/test_cicd.yml" ps --format '{{.Names}}' --filter "status=running" | head -n1)
|
||||
|
||||
SNAPSHOT_DIR="$PROJECT_ROOT/testing/file_snapshots"
|
||||
mkdir -p "$SNAPSHOT_DIR"
|
||||
|
||||
# Capture file list before running behave tests
|
||||
BEFORE_FILE="$SNAPSHOT_DIR/files_before_behave.txt"
|
||||
AFTER_FILE="$SNAPSHOT_DIR/files_after_behave.txt"
|
||||
DIFF_FILE="$SNAPSHOT_DIR/files_diff.txt"
|
||||
|
||||
# Define container name variable for consistency
|
||||
CONTAINER_NAME="Stirling-PDF-Security-Fat-with-login"
|
||||
|
||||
capture_file_list "$CONTAINER_NAME" "$BEFORE_FILE"
|
||||
|
||||
cd "testing/cucumber"
|
||||
if python -m behave; then
|
||||
# Wait 10 seconds before capturing the file list after tests
|
||||
echo "Waiting 5 seconds for any file operations to complete..."
|
||||
sleep 5
|
||||
|
||||
# Capture file list after running behave tests
|
||||
cd "$PROJECT_ROOT"
|
||||
capture_file_list "$CONTAINER_NAME" "$AFTER_FILE"
|
||||
|
||||
# Compare file lists
|
||||
if compare_file_lists "$BEFORE_FILE" "$AFTER_FILE" "$DIFF_FILE" "$CONTAINER_NAME"; then
|
||||
echo "No unexpected temporary files found."
|
||||
passed_tests+=("Stirling-PDF-Regression")
|
||||
passed_tests+=("Stirling-PDF-Regression $CONTAINER_NAME")
|
||||
else
|
||||
echo "WARNING: Unexpected temporary files detected after behave tests!"
|
||||
failed_tests+=("Stirling-PDF-Regression-Temp-Files")
|
||||
fi
|
||||
|
||||
passed_tests+=("Stirling-PDF-Regression")
|
||||
passed_tests+=("Stirling-PDF-Regression $CONTAINER_NAME")
|
||||
else
|
||||
failed_tests+=("Stirling-PDF-Regression")
|
||||
failed_tests+=("Stirling-PDF-Regression $CONTAINER_NAME")
|
||||
echo "Printing docker logs of failed regression"
|
||||
docker logs "$CONTAINER_NAME"
|
||||
echo "Printed docker logs of failed regression"
|
||||
|
||||
# Still capture file list after failure for analysis
|
||||
# Wait 10 seconds before capturing the file list
|
||||
echo "Waiting 5 seconds before capturing file list..."
|
||||
echo "Waiting 10 seconds before capturing file list..."
|
||||
sleep 10
|
||||
|
||||
cd "$PROJECT_ROOT"
|
||||
@ -408,9 +421,11 @@ main() {
|
||||
compare_file_lists "$BEFORE_FILE" "$AFTER_FILE" "$DIFF_FILE" "$CONTAINER_NAME"
|
||||
fi
|
||||
fi
|
||||
docker-compose -f "./exampleYmlFiles/test_cicd.yml" down -v
|
||||
|
||||
docker-compose -f "./exampleYmlFiles/test_cicd.yml" down
|
||||
|
||||
# ==================================================================
|
||||
# 4. Disabled Endpoints Test
|
||||
# ==================================================================
|
||||
run_tests "Stirling-PDF-Fat-Disable-Endpoints" "./exampleYmlFiles/docker-compose-latest-fat-endpoints-disabled.yml"
|
||||
|
||||
echo "Testing disabled endpoints..."
|
||||
@ -430,27 +445,27 @@ main() {
|
||||
echo "Version verification failed for Stirling-PDF-Fat-Disable-Endpoints"
|
||||
fi
|
||||
|
||||
docker-compose -f "./exampleYmlFiles/docker-compose-latest-fat-endpoints-disabled.yml" down
|
||||
docker-compose -f "./exampleYmlFiles/docker-compose-latest-fat-endpoints-disabled.yml" down -v
|
||||
|
||||
# Report results
|
||||
# ==================================================================
|
||||
# Final Report
|
||||
# ==================================================================
|
||||
echo "All tests completed in $SECONDS seconds."
|
||||
|
||||
|
||||
if [ ${#passed_tests[@]} -ne 0 ]; then
|
||||
echo "Passed tests:"
|
||||
for test in "${passed_tests[@]}"; do
|
||||
echo -e "\e[32m$test\e[0m"
|
||||
done
|
||||
fi
|
||||
for test in "${passed_tests[@]}"; do
|
||||
echo -e "\e[32m$test\e[0m" # Green color for passed tests
|
||||
done
|
||||
|
||||
if [ ${#failed_tests[@]} -ne 0 ]; then
|
||||
echo "Failed tests:"
|
||||
for test in "${failed_tests[@]}"; do
|
||||
echo -e "\e[31m$test\e[0m"
|
||||
done
|
||||
fi
|
||||
for test in "${failed_tests[@]}"; do
|
||||
echo -e "\e[31m$test\e[0m" # Red color for failed tests
|
||||
done
|
||||
|
||||
# Check if there are any failed tests and exit with an error code if so
|
||||
if [ ${#failed_tests[@]} -ne 0 ]; then
|
||||
echo "Some tests failed."
|
||||
exit 1
|
||||
|
||||
Loading…
Reference in New Issue
Block a user