From 886f9b379e8b7abb795d3dce2cea4f055da75fda Mon Sep 17 00:00:00 2001 From: Ludy Date: Tue, 25 Nov 2025 00:07:54 +0100 Subject: [PATCH] feat(docker-runtime): unified Debian-based images, dynamic path resolution & enhanced UNO/LibreOffice handling (#4880) # Description of Changes ### What was changed This PR introduces a major refinement to the Docker runtime, system path resolution, conversion tooling, and integration logic across the codebase. Key improvements include: - Migration of **Dockerfile**, **Dockerfile.fat** to a unified Debian-based environment. - Introduction of **RuntimePathConfig** enhancements to dynamically resolve: - `weasyprint`, `unoconvert`, `calibre`, `ocrmypdf`, `soffice` - Tesseract `tessdata` paths with Docker-aware defaults. - Support for **UNO server (unoserver/unoconvert)** as primary document converter with automatic fallback to `soffice`. - Isolation of Python environments for WeasyPrint and UNO tooling. - Updated controllers and services to correctly inject `RuntimePathConfig`. - Improved process execution logic in converters and OCR handling. - Major updates to `init.sh` and `init-without-ocr.sh`: - Unified environment initialization - Proper UID/GID remapping - Safer permissions handling - Automatic Tesseract path detection - Reliable startup of headless LibreOffice + Xvfb + UNO server - Full test suite updates: - Adaptation to new conversion paths - Mocking of UNO and LibreOffice commands - More robust Docker test logic - Updated example docker-compose files referencing GHCR test images. - Expanded configuration schema for new operations paths. ### Why the change was made These changes address long-standing issues around: - Inconsistent or missing binary paths between image variants. - Reduced reliability of document conversions (UNO vs. soffice). - Lack of uniform runtime initialization across Docker images. - Repetitive environment setup logic split across multiple scripts. - Fragile test scenarios tied to Alpine-based images. Switching to a unified Debian-based runtime significantly improves: - Compatibility with LibreOffice, Calibre, WebEngine and graphics stack. - UNO stability for document conversions. - Tesseract deterministic behavior. - Debuggability and reliability of CI/CD Docker-based tests. The improvements to `RuntimePathConfig` ensure all system binaries are fully configurable and correctly detected at runtime. --- ## Checklist ### General - [x] I have read the [Contribution Guidelines](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/CONTRIBUTING.md) - [x] I have read the [Stirling-PDF Developer Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md) (if applicable) - [ ] I have read the [How to add new languages to Stirling-PDF](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md) (if applicable) - [x] I have performed a self-review of my own code - [x] My changes generate no new warnings ### Documentation - [ ] I have updated relevant docs on [Stirling-PDF's doc repo](https://github.com/Stirling-Tools/Stirling-Tools.github.io/blob/main/docs/) (if functionality has heavily changed) - [ ] I have read the section [Add New Translation Tags](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md#add-new-translation-tags) (for new translation tags only) ### Translations (if applicable) - [ ] I ran [`scripts/counter_translation.py`](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/docs/counter_translation.md) ### UI Changes (if applicable) - [ ] Screenshots or videos demonstrating the UI changes are attached (e.g., as comments or direct attachments in the PR) ### Testing (if applicable) - [x] I have tested my changes locally. Refer to the [Testing Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md#6-testing) for more details. --- .github/config/.files.yaml | 29 +- .github/workflows/build.yml | 267 ++++++++++++++--- Dockerfile | 212 +++++++++----- Dockerfile.fat | 271 ++++++++++++------ Dockerfile.ultra-lite | 2 +- .../configuration/RuntimePathConfig.java | 39 ++- .../common/model/ApplicationProperties.java | 10 +- .../common/service/PostHogService.java | 9 +- .../software/common/util/PDFToFile.java | 105 ++++++- .../software/common/util/PDFToFileTest.java | 124 +++++++- .../SPDF/config/ExternalAppDepConfig.java | 8 +- .../converters/ConvertOfficeController.java | 8 +- .../api/converters/ConvertPDFToHtml.java | 4 +- .../api/converters/ConvertPDFToOffice.java | 10 +- .../api/converters/ConvertPDFToPDFA.java | 48 ++-- .../controller/api/misc/OCRController.java | 21 +- .../controller/web/OtherWebController.java | 6 +- .../src/main/resources/settings.yml.template | 4 +- .../SPDF/config/ExternalAppDepConfigTest.java | 4 + ...-compose-latest-fat-endpoints-disabled.yml | 4 +- .../docker-compose-latest-fat-security.yml | 3 +- ...ocker-compose-latest-security-with-sso.yml | 7 +- .../docker-compose-latest-security.yml | 3 +- ...ker-compose-latest-ultra-lite-security.yml | 3 +- .../docker-compose-latest-ultra-lite.yml | 3 +- exampleYmlFiles/docker-compose-latest.yml | 3 +- exampleYmlFiles/test_cicd.yml | 3 +- scripts/init-without-ocr.sh | 204 +++++++++++-- scripts/init.sh | 120 ++++++-- testing/allEndpointsRemovedSettings.yml | 3 + testing/test.sh | 195 +++++++------ 31 files changed, 1292 insertions(+), 440 deletions(-) diff --git a/.github/config/.files.yaml b/.github/config/.files.yaml index d120123d7..f866770b5 100644 --- a/.github/config/.files.yaml +++ b/.github/config/.files.yaml @@ -6,22 +6,27 @@ app: &app - app/(common|core|proprietary)/src/main/java/** openapi: &openapi - - build.gradle - - app/(common|core|proprietary)/build.gradle - - app/(common|core|proprietary)/src/main/java/** + - *build + - *app -project: &project - - app/(common|core|proprietary)/src/(main|test)/java/** - - app/(common|core|proprietary)/build.gradle - - 'app/(common|core|proprietary)/src/(main|test)/resources/**/!(messages_*.properties|*.md)*' - - exampleYmlFiles/** - - gradle/** - - libs/** - - 'testing/**/!(requirements*.txt|requirements*.in)*' - - build.gradle +docker: &docker - Dockerfile - Dockerfile.fat - Dockerfile.ultra-lite + - ".github/workflows/build.yml" + - scripts/init.sh + - scripts/init-without-ocr.sh + - exampleYmlFiles/** + +project: &project + - app/(common|core|proprietary)/src/(main|test)/java/** + - *build + - "app/(common|core|proprietary)/src/(main|test)/resources/**/!(messages_*.properties|*.md)*" + - exampleYmlFiles/** + - gradle/** + - libs/** + - "testing/**/!(requirements*.txt|requirements*.in)*" + - *docker - gradle.properties - gradlew - gradlew.bat diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml index da5d91131..7c524f1bd 100644 --- a/.github/workflows/build.yml +++ b/.github/workflows/build.yml @@ -33,6 +33,7 @@ jobs: app: ${{ steps.changes.outputs.app }} project: ${{ steps.changes.outputs.project }} openapi: ${{ steps.changes.outputs.openapi }} + docker: ${{ steps.changes.outputs.docker }} steps: - uses: actions/checkout@93cb6efe18208431cddfb8368fd83d5badbf9bfd # v5.0.1 @@ -68,14 +69,10 @@ jobs: with: java-version: ${{ matrix.jdk-version }} distribution: "temurin" - - - name: Setup Gradle - uses: gradle/actions/setup-gradle@4d9f0ba0025fe599b4ebab900eb7f3a1d93ef4c2 # v5.0.0 - with: - gradle-version: 8.14 + cache: gradle - name: Build with Gradle and spring security ${{ matrix.spring-security }} - run: ./gradlew clean build + run: ./gradlew clean build -x spotlessApply -x spotlessCheck -x sonarqube env: DISABLE_ADDITIONAL_FEATURES: ${{ matrix.spring-security }} @@ -100,12 +97,14 @@ jobs: if [ ${#missing_reports[@]} -gt 0 ]; then echo "ERROR: The following required test report directories are missing:" printf '%s\n' "${missing_reports[@]}" - exit 1 + echo "reports-present=false" >> "$GITHUB_OUTPUT" + else + echo "All required test report directories are present" + echo "reports-present=true" >> "$GITHUB_OUTPUT" fi - echo "All required test report directories are present" - name: Upload Test Reports - if: always() + if: always() && steps.check-reports.outputs.reports-present == 'true' uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0 with: name: test-reports-jdk-${{ matrix.jdk-version }}-spring-security-${{ matrix.spring-security }} @@ -127,6 +126,7 @@ jobs: if-no-files-found: warn - name: Add coverage to PR with spring security ${{ matrix.spring-security }} and JDK ${{ matrix.jdk-version }} + if: steps.check-reports.outputs.reports-present == 'true' id: jacoco uses: madrapps/jacoco-report@50d3aff4548aa991e6753342d9ba291084e63848 # v1.7.2 with: @@ -155,15 +155,13 @@ jobs: with: java-version: "17" distribution: "temurin" - - - name: Setup Gradle - uses: gradle/actions/setup-gradle@4d9f0ba0025fe599b4ebab900eb7f3a1d93ef4c2 # v5.0.0 + cache: gradle - name: Generate OpenAPI documentation run: ./gradlew :stirling-pdf:generateOpenApiDocs env: DISABLE_ADDITIONAL_FEATURES: true - + - name: Upload OpenAPI Documentation uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0 with: @@ -188,6 +186,7 @@ jobs: with: java-version: "17" distribution: "temurin" + cache: gradle - name: Check licenses for compatibility run: ./gradlew clean checkLicense @@ -205,8 +204,14 @@ jobs: retention-days: 3 docker-compose-tests: - if: needs.files-changed.outputs.project == 'true' - needs: files-changed + if: | + needs.files-changed.outputs.project == 'true' && + ( + needs.files-changed.outputs.docker != 'true' || + needs.test-build-docker-images.result == 'success' || + needs.test-build-docker-images.result == 'skipped' + ) + needs: [files-changed, test-build-docker-images] # if: github.event_name == 'push' && github.ref == 'refs/heads/main' || # (github.event_name == 'pull_request' && # contains(github.event.pull_request.labels.*.name, 'licenses') == false && @@ -237,20 +242,21 @@ jobs: with: java-version: "17" distribution: "temurin" + cache: gradle - name: Set up Docker Buildx uses: docker/setup-buildx-action@e468171a9de216ec08956ac3ada2f0791b6bd435 # v3.11.1 - name: Install Docker Compose run: | - sudo curl -SL "https://github.com/docker/compose/releases/download/v2.37.2/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose + sudo curl -SL "https://github.com/docker/compose/releases/download/v2.40.3/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose sudo chmod +x /usr/local/bin/docker-compose - name: Set up Python uses: actions/setup-python@e797f83bcb11b83ae66e0230d6156d7c80228e7c # v6.0.0 with: python-version: "3.12" - cache: 'pip' # caching pip dependencies + cache: "pip" # caching pip dependencies cache-dependency-path: ./testing/cucumber/requirements.txt - name: Pip requirements @@ -265,13 +271,22 @@ jobs: ./testing/test.sh test-build-docker-images: - if: github.event_name == 'pull_request' && needs.files-changed.outputs.project == 'true' + if: github.event_name == 'pull_request' && needs.files-changed.outputs.docker == 'true' needs: [files-changed, build] runs-on: ubuntu-latest + permissions: + contents: read + packages: write strategy: fail-fast: false matrix: - docker-rev: ["Dockerfile", "Dockerfile.ultra-lite", "Dockerfile.fat"] + docker: + - name: "Dockerfile.ultra-lite" + tag: "ultra-lite" + - name: "Dockerfile.fat" + tag: "fat" + - name: "Dockerfile" + tag: "latest" steps: - name: Harden Runner uses: step-security/harden-runner@95d9a5deda9de15063e7595e9719c11c38c90ae2 # v2.13.2 @@ -286,46 +301,220 @@ jobs: with: java-version: "17" distribution: "temurin" - - - name: Set up Gradle - uses: gradle/actions/setup-gradle@4d9f0ba0025fe599b4ebab900eb7f3a1d93ef4c2 # v5.0.0 - with: - gradle-version: 8.14 + cache: gradle - name: Build application - run: ./gradlew clean build + run: ./gradlew clean build -x spotlessApply -x spotlessCheck -x test -x sonarqube env: DISABLE_ADDITIONAL_FEATURES: true STIRLING_PDF_DESKTOP_UI: false + # - name: Free disk space on runner + # run: | + # echo "Disk space before cleanup:" && df -h + # sudo rm -rf /usr/share/dotnet /opt/ghc /usr/local/lib/android /usr/local/share/boost + # docker system prune -af || true + # echo "Disk space after cleanup:" && df -h + - name: Set up QEMU uses: docker/setup-qemu-action@c7c53464625b32c7a7e944ae62b3e17d2b600130 # v3.7.0 + with: + platforms: linux/amd64,linux/arm64/v8 - name: Set up Docker Buildx id: buildx uses: docker/setup-buildx-action@e468171a9de216ec08956ac3ada2f0791b6bd435 # v3.11.1 + with: + platforms: linux/amd64,linux/arm64/v8 - - name: Build ${{ matrix.docker-rev }} + - name: Prepare branch tag + id: branch_tag + shell: bash + run: | + BRANCH_SOURCE="${GITHUB_HEAD_REF:-${GITHUB_REF_NAME}}" + BRANCH_LOWER=$(echo "$BRANCH_SOURCE" | tr '[:upper:]' '[:lower:]') + SAFE_BRANCH=$(echo "$BRANCH_LOWER" | sed 's/[^a-z0-9_.-]/-/g' | sed 's/^-\+//' | sed 's/-\+$//' | sed 's/--\+/-/g') + if [ -z "$SAFE_BRANCH" ]; then + SAFE_BRANCH="branch" + fi + SHORT_SHA=$(echo "${GITHUB_SHA:-${{ github.sha }}}" | cut -c1-8) + echo "safe_branch=$SAFE_BRANCH" >> "$GITHUB_OUTPUT" + echo "short_sha=$SHORT_SHA" >> "$GITHUB_OUTPUT" + + - name: Convert repository owner to lowercase + id: repoowner + run: echo "lowercase=$(echo ${{ github.repository_owner }} | tr '[:upper:]' '[:lower:]')" >> $GITHUB_OUTPUT + + - name: Docker meta + id: meta + uses: docker/metadata-action@c1e51972afc2121e065aed6d45c65596fe445f3f # v5.8.0 + with: + images: | + # ${{ secrets.DOCKER_HUB_USERNAME }}/stirling-pdf-test + ghcr.io/${{ steps.repoowner.outputs.lowercase }}/stirling-pdf-test + flavor: | + latest=false + tags: | + type=raw,value=${{ matrix.docker.tag }},enable=true + # type=raw,value=${{ matrix.docker.tag }}-${{ steps.branch_tag.outputs.safe_branch }},enable=true + # type=raw,value=${{ matrix.docker.tag }}-${{ steps.branch_tag.outputs.safe_branch }}-${{ steps.branch_tag.outputs.short_sha }},enable=true + labels: | + org.opencontainers.image.title=Stirling-PDF Test + org.opencontainers.image.description=CI test image for Stirling-PDF + org.opencontainers.image.url=https://www.stirlingpdf.com + org.opencontainers.image.documentation=https://docs.stirlingpdf.com + org.opencontainers.image.authors=Stirling-Tools + org.opencontainers.image.licenses=MIT + org.opencontainers.image.version=${{ matrix.docker.tag }} + org.opencontainers.image.revision=${{ github.sha }} + org.opencontainers.image.source=${{ github.repository }} + maintainer=Stirling-Tools + + - name: Choose primary tag for tests + id: testtag + shell: bash + run: | + IMAGE="ghcr.io/${{ steps.repoowner.outputs.lowercase }}/stirling-pdf-test" + VARIANT="${{ matrix.docker.tag }}" + BRANCH="${{ steps.branch_tag.outputs.safe_branch }}" + SHA_SHORT="${{ steps.branch_tag.outputs.short_sha }}" + CANDIDATE="$IMAGE:$VARIANT-$BRANCH-$SHA_SHORT" + SECONDARY="$IMAGE:$VARIANT-$BRANCH" + ALL_TAGS="$(echo '${{ steps.meta.outputs.tags }}' | tr ' ' '\n')" + if echo "$ALL_TAGS" | grep -qx "$CANDIDATE"; then + SELECTED="$CANDIDATE" + elif echo "$ALL_TAGS" | grep -qx "$SECONDARY"; then + SELECTED="$SECONDARY" + else + SELECTED="$(echo "$ALL_TAGS" | head -n1)" + fi + echo "tag=$SELECTED" >> $GITHUB_OUTPUT + echo "Using test tag: $SELECTED" + + # - name: Log in to Docker Hub + # uses: docker/login-action@184bdaa0721073962dff0199f1fb9940f07167d1 # v3.5.0 + # with: + # username: ${{ secrets.DOCKER_HUB_USERNAME }} + # password: ${{ secrets.DOCKER_HUB_API }} + + # - name: Log in to GitHub Container Registry + # uses: docker/login-action@184bdaa0721073962dff0199f1fb9940f07167d1 # v3.5.0 + # with: + # registry: ghcr.io + # username: ${{ github.actor }} + # password: ${{ github.token }} + + - name: Build and push amd64 image uses: docker/build-push-action@263435318d21b8e681c14492fe198d362a7d2c83 # v6.18.0 with: builder: ${{ steps.buildx.outputs.name }} context: . - file: ./${{ matrix.docker-rev }} + file: ./${{ matrix.docker.name }} push: false + load: true cache-from: type=gha cache-to: type=gha,mode=max - platforms: linux/amd64,linux/arm64/v8 - provenance: true - sbom: true + tags: ${{ steps.meta.outputs.tags }} # ALLE Tags publishen + labels: ${{ steps.meta.outputs.labels }} + platforms: linux/amd64 + provenance: false + sbom: false - - name: Upload Reports + - name: Show amd64 image size + run: | + IMAGE_TAG="${{ steps.testtag.outputs.tag }}" + echo "Inspecting image: ${IMAGE_TAG}" + SIZE=$(docker image inspect "${IMAGE_TAG}" --format='{{.Size}}') + FORMATTED=$(numfmt --to=iec --suffix=B "${SIZE}") + echo "Image size (amd64): ${FORMATTED}" + + - name: Start amd64 image for 2 minutes + run: | + IMAGE_TAG="${{ steps.testtag.outputs.tag }}" + CONTAINER_NAME="stirling-pdf-test-${{ matrix.docker.tag }}-amd64" + echo "Starting container ${CONTAINER_NAME} from ${IMAGE_TAG}" + docker run -d --name "${CONTAINER_NAME}" "${IMAGE_TAG}" + echo "Waiting up to 2 minutes..." + sleep 120 || true + echo "===== Logs for ${CONTAINER_NAME} =====" + docker logs "${CONTAINER_NAME}" || true + echo "Stopping container ${CONTAINER_NAME} after 2 minutes" + docker stop "${CONTAINER_NAME}" || true + docker rm "${CONTAINER_NAME}" || true + + - name: Prune amd64 image and cache if: always() - uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0 + run: | + docker image rm -f ${{ steps.testtag.outputs.tag }} || true + docker builder prune --force || true + + - name: Build and push arm64 image + uses: docker/build-push-action@263435318d21b8e681c14492fe198d362a7d2c83 # v6.18.0 with: - name: reports-docker-${{ matrix.docker-rev }} - path: | - build/reports/tests/ - build/test-results/ - build/reports/problems/ - retention-days: 3 - if-no-files-found: warn + builder: ${{ steps.buildx.outputs.name }} + context: . + file: ./${{ matrix.docker.name }} + push: false + load: true + cache-from: type=gha + cache-to: type=gha,mode=max + tags: ${{ steps.meta.outputs.tags }} # ALLE Tags publishen + labels: ${{ steps.meta.outputs.labels }} + platforms: linux/arm64/v8 + provenance: false + sbom: false + + - name: Show arm64 image size + run: | + IMAGE_TAG="${{ steps.testtag.outputs.tag }}" + echo "Inspecting image: ${IMAGE_TAG}" + SIZE=$(docker image inspect "${IMAGE_TAG}" --format='{{.Size}}') + FORMATTED=$(numfmt --to=iec --suffix=B "${SIZE}") + echo "Image size (arm64): ${FORMATTED}" + + - name: Start arm64 image for 2 minutes + run: | + IMAGE_TAG="${{ steps.testtag.outputs.tag }}" + CONTAINER_NAME="stirling-pdf-test-${{ matrix.docker.tag }}-arm64" + echo "Starting container ${CONTAINER_NAME} from ${IMAGE_TAG}" + docker run -d --name "${CONTAINER_NAME}" "${IMAGE_TAG}" + echo "Waiting up to 2 minutes..." + sleep 120 || true + echo "===== Logs for ${CONTAINER_NAME} =====" + docker logs "${CONTAINER_NAME}" || true + echo "Stopping container ${CONTAINER_NAME} after 2 minutes" + docker stop "${CONTAINER_NAME}" || true + docker rm "${CONTAINER_NAME}" || true + + - name: Cleanup arm64 image and cache + if: always() + run: | + docker image rm -f ${{ steps.testtag.outputs.tag }} || true + docker builder prune --force || true + + # - name: Build and push multi-arch image + # uses: docker/build-push-action@263435318d21b8e681c14492fe198d362a7d2c83 # v6.18.0 + # with: + # builder: ${{ steps.buildx.outputs.name }} + # context: . + # file: ./${{ matrix.docker.name }} + # push: true + # cache-from: type=gha + # cache-to: type=gha,mode=max + # tags: ${{ steps.meta.outputs.tags }} + # labels: ${{ steps.meta.outputs.labels }} + # platforms: linux/amd64,linux/arm64/v8 + # provenance: false + # sbom: false + + # - name: Upload Docker build reports + # if: always() + # uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0 + # with: + # name: reports-docker-${{ matrix.docker.name }} + # path: | + # build/reports/ + # build/test-results/ + # build/reports/problems/ + # retention-days: 3 + # if-no-files-found: warn diff --git a/Dockerfile b/Dockerfile index bcb62ed58..14ac55099 100644 --- a/Dockerfile +++ b/Dockerfile @@ -1,11 +1,88 @@ -# Main stage -FROM alpine:3.22.2@sha256:4b7ce07002c69e8f3d704a9c5d6fd3053be500b7f1c69fc0d80990c2ad8dd412 +# ============================================================================== +# Multi-stage Dockerfile for Stirling-PDF – image with everything included +# Includes: LibreOffice, Calibre, Tesseract, OCRmyPDF, unoserver, WeasyPrint, etc. +# ============================================================================== -# Copy necessary files -COPY scripts /scripts -COPY app/core/src/main/resources/static/fonts/*.ttf /usr/share/fonts/opentype/noto/ +# ======================================== +# STAGE 1: Runtime image based on Debian stable-slim +# Contains Java runtime + LibreOffice + Calibre + all PDF tools +# ======================================== +FROM debian:stable-slim@sha256:7cb087f19bcc175b96fbe4c2aef42ed00733a659581a80f6ebccfd8fe3185a3d + +SHELL ["/bin/bash", "-o", "pipefail", "-c"] +ENV DEBIAN_FRONTEND=noninteractive + +ENV TESS_BASE_PATH=/usr/share/tesseract-ocr/5/tessdata + +# Install core runtime dependencies + tools required by Stirling-PDF features +RUN apt-get update && apt-get install -y --no-install-recommends \ + ca-certificates tzdata tini bash fontconfig \ + openjdk-21-jre-headless \ + ffmpeg poppler-utils ocrmypdf \ + libreoffice-nogui libreoffice-java-common \ + python3 python3-venv python3-uno \ + tesseract-ocr tesseract-ocr-eng tesseract-ocr-deu tesseract-ocr-fra \ + tesseract-ocr-por tesseract-ocr-chi-sim \ + libcairo2 libpango-1.0-0 libpangoft2-1.0-0 libgdk-pixbuf-2.0-0 \ + gosu unpaper \ + # AWT headless support (required for some Java graphics operations) + libfreetype6 libfontconfig1 libx11-6 libxt6 libxext6 libxrender1 libxtst6 libxi6 \ + libxinerama1 libxkbcommon0 libxkbfile1 libsm6 libice6 \ + # Qt WebEngine dependencies for Calibre + libegl1 libopengl0 libgl1 libxdamage1 libxfixes3 libxshmfence1 libdrm2 libgbm1 \ + libxkbcommon-x11-0 libxrandr2 libxcomposite1 libnss3 libx11-xcb1 \ + libxcb-cursor0 libdbus-1-3 libglib2.0-0 \ + # Virtual framebuffer (required for headless LibreOffice) + xvfb x11-utils coreutils \ + # Temporary packages only needed for Calibre installer + xz-utils gpgv curl xdg-utils \ + \ + # Install Calibre from official installer script + && curl -fsSL https://download.calibre-ebook.com/linux-installer.sh | sh /dev/stdin \ + \ + # Clean up installer-only packages + && apt-get purge -y xz-utils gpgv xdg-utils \ + && apt-get autoremove -y \ + && rm -rf /var/lib/apt/lists/* + +# Make ebook-convert available in PATH +RUN ln -sf /opt/calibre/ebook-convert /usr/bin/ebook-convert \ + && /opt/calibre/ebook-convert --version + +# ============================================================================== +# Create non-root user (stirlingpdfuser) with configurable UID/GID +# ============================================================================== +ARG PUID=1000 +ARG PGID=1000 + +RUN set -eux; \ + # Create group if it doesn't exist + if ! getent group stirlingpdfgroup >/dev/null 2>&1; then \ + if getent group "${PGID}" >/dev/null 2>&1; then \ + groupadd -o -g "${PGID}" stirlingpdfgroup; \ + else \ + groupadd -g "${PGID}" stirlingpdfgroup; \ + fi; \ + fi; \ + # Create user if it doesn't exist, avoid UID conflicts + if ! id -u stirlingpdfuser >/dev/null 2>&1; then \ + if getent passwd | awk -F: -v id="${PUID}" '$3==id{found=1} END{exit !found}'; then \ + echo "UID ${PUID} already in use – creating stirlingpdfuser with automatic UID"; \ + useradd -m -g stirlingpdfgroup -d /home/stirlingpdfuser -s /bin/bash stirlingpdfuser; \ + else \ + useradd -m -u "${PUID}" -g stirlingpdfgroup -d /home/stirlingpdfuser -s /bin/bash stirlingpdfuser; \ + fi; \ + fi + +# Compatibility alias for older entrypoint scripts expecting su-exec +RUN ln -sf /usr/sbin/gosu /usr/local/bin/su-exec + +# Copy application files from build stage +COPY scripts/ /scripts/ +COPY app/core/src/main/resources/static/fonts/*.ttf /usr/share/fonts/truetype/ COPY app/core/build/libs/*.jar app.jar +# Optional version tag (can be passed at build time) ARG VERSION_TAG LABEL org.opencontainers.image.title="Stirling-PDF" @@ -20,91 +97,68 @@ LABEL org.opencontainers.image.authors="Stirling-Tools" LABEL org.opencontainers.image.version="${VERSION_TAG}" LABEL org.opencontainers.image.keywords="PDF, manipulation, merge, split, convert, OCR, watermark" -# Set Environment Variables +# ============================================================================== +# Runtime environment variables +# ============================================================================== ENV DISABLE_ADDITIONAL_FEATURES=true \ - VERSION_TAG=$VERSION_TAG \ - JAVA_BASE_OPTS="-XX:+UnlockExperimentalVMOptions -XX:MaxRAMPercentage=75 -XX:InitiatingHeapOccupancyPercent=20 -XX:+G1PeriodicGCInvokesConcurrent -XX:G1PeriodicGCInterval=10000 -XX:+UseStringDeduplication -XX:G1PeriodicGCSystemLoadThreshold=70" \ + JAVA_BASE_OPTS="-XX:+UnlockExperimentalVMOptions -XX:MaxRAMPercentage=75 -XX:InitiatingHeapOccupancyPercent=20 \ + -XX:+G1PeriodicGCInvokesConcurrent -XX:G1PeriodicGCInterval=10000 \ + -XX:+UseStringDeduplication -XX:G1PeriodicGCSystemLoadThreshold=70 \ + -Djava.awt.headless=true" \ JAVA_CUSTOM_OPTS="" \ HOME=/home/stirlingpdfuser \ - PUID=1000 \ - PGID=1000 \ + PUID=${PUID} \ + PGID=${PGID} \ UMASK=022 \ - PYTHONPATH=/usr/lib/libreoffice/program:/opt/venv/lib/python3.12/site-packages \ UNO_PATH=/usr/lib/libreoffice/program \ - URE_BOOTSTRAP=file:///usr/lib/libreoffice/program/fundamentalrc \ - PATH=$PATH:/opt/venv/bin \ STIRLING_TEMPFILES_DIRECTORY=/tmp/stirling-pdf \ TMPDIR=/tmp/stirling-pdf \ TEMP=/tmp/stirling-pdf \ TMP=/tmp/stirling-pdf -# JDK for app -RUN apk add --no-cache bash \ - && ln -sf /bin/bash /bin/sh \ - && printf '%s\n' \ - 'https://dl-cdn.alpinelinux.org/alpine/edge/main' \ - 'https://dl-cdn.alpinelinux.org/alpine/edge/community' \ - 'https://dl-cdn.alpinelinux.org/alpine/edge/testing' \ - > /etc/apk/repositories && \ - apk upgrade --no-cache -a && \ - apk add --no-cache \ - ca-certificates \ - tzdata \ - tini \ - bash \ - curl \ - shadow \ - su-exec \ - openssl \ - openssl-dev \ - openjdk21-jre \ - ffmpeg \ - # Doc conversion - gcompat \ - libc6-compat \ - libreoffice \ - # pdftohtml - poppler-utils \ - # OCR MY PDF (unpaper for descew and other advanced features) - tesseract-ocr-data-eng \ - tesseract-ocr-data-chi_sim \ - tesseract-ocr-data-deu \ - tesseract-ocr-data-fra \ - tesseract-ocr-data-por \ - unpaper \ - # CV / Python - py3-opencv \ - python3 \ - ocrmypdf \ - py3-pip \ - py3-pillow \ - py3-pdf2image \ - # Calibre - calibre \ - # URW Base 35 fonts for better PDF rendering - font-urw-base35 && \ - # Calibre fixes - apk fix --no-cache calibre && \ - python3 -m venv /opt/venv && \ - /opt/venv/bin/pip install --no-cache-dir --upgrade pip setuptools && \ - /opt/venv/bin/pip install --no-cache-dir --upgrade unoserver weasyprint && \ - ln -s /usr/lib/libreoffice/program/uno.py /opt/venv/lib/python3.12/site-packages/ && \ - ln -s /usr/lib/libreoffice/program/unohelper.py /opt/venv/lib/python3.12/site-packages/ && \ - ln -s /usr/lib/libreoffice/program /opt/venv/lib/python3.12/site-packages/LibreOffice && \ - mv /usr/share/tessdata /usr/share/tessdata-original && \ - mkdir -p $HOME /configs /logs /customFiles /pipeline/watchedFolders /pipeline/finishedFolders /tmp/stirling-pdf && \ - # Configure URW Base 35 fonts - ln -s /usr/share/fontconfig/conf.avail/69-urw-*.conf /etc/fonts/conf.d/ && \ - fc-cache -f -v && \ - chmod +x /scripts/* && \ - # User permissions - addgroup -S stirlingpdfgroup && adduser -S stirlingpdfuser -G stirlingpdfgroup && \ - chown -R stirlingpdfuser:stirlingpdfgroup $HOME /scripts /usr/share/fonts/opentype/noto /configs /customFiles /pipeline /tmp/stirling-pdf && \ - chown stirlingpdfuser:stirlingpdfgroup /app.jar && \ - ln -sf /bin/busybox /bin/sh +# ============================================================================== +# Python virtual environment for additional Python tools (WeasyPrint, OpenCV, etc.) +# ============================================================================== +RUN python3 -m venv /opt/venv --system-site-packages \ + && /opt/venv/bin/pip install --no-cache-dir weasyprint pdf2image opencv-python-headless \ + && /opt/venv/bin/python -c "import cv2; print('OpenCV version:', cv2.__version__)" +# Separate venv for unoserver (keeps it isolated) +RUN python3 -m venv /opt/unoserver-venv --system-site-packages \ + && /opt/unoserver-venv/bin/pip install --no-cache-dir unoserver + +# Make unoserver tools available in main venv PATH +RUN ln -sf /opt/unoserver-venv/bin/unoconvert /opt/venv/bin/unoconvert \ + && ln -sf /opt/unoserver-venv/bin/unoserver /opt/venv/bin/unoserver + +# Extend PATH to include both virtual environments +ENV PATH="/opt/venv/bin:/opt/unoserver-venv/bin:${PATH}" + +# ============================================================================== +# Final permissions, directories and font cache +# ============================================================================== +RUN set -eux; \ + chmod +x /scripts/*; \ + mkdir -p /configs /logs /customFiles /pipeline/watchedFolders /pipeline/finishedFolders /tmp/stirling-pdf; \ + chown -R stirlingpdfuser:stirlingpdfgroup \ + /home/stirlingpdfuser /configs /logs /customFiles /pipeline /tmp/stirling-pdf \ + /app.jar /usr/share/fonts/truetype /scripts; \ + chmod -R 755 /tmp/stirling-pdf + +# Rebuild font cache +RUN fc-cache -f -v + +# Force Qt/WebEngine to run headlessly (required for Calibre in Docker) +ENV QT_QPA_PLATFORM=offscreen \ + QTWEBENGINE_CHROMIUM_FLAGS="--disable-gpu --disable-dev-shm-usage" + +# Expose web UI port EXPOSE 8080/tcp -# Set user and run command +STOPSIGNAL SIGTERM + +# Use tini as init (handles signals and zombies correctly) ENTRYPOINT ["tini", "--", "/scripts/init.sh"] -CMD ["sh", "-c", "java -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/tmp/stirling-pdf -jar /app.jar & /opt/venv/bin/unoserver --port 2003 --interface 127.0.0.1"] + +# CMD is empty – actual start command is defined in init.sh +CMD [] diff --git a/Dockerfile.fat b/Dockerfile.fat index 5609ffd20..b260e9421 100644 --- a/Dockerfile.fat +++ b/Dockerfile.fat @@ -1,122 +1,209 @@ -# Build the application -FROM gradle:8.14-jdk21 AS build +# ============================================================================== +# Multi-stage Dockerfile for Stirling-PDF – "fat" image with everything included +# Includes: LibreOffice, Calibre, Tesseract, OCRmyPDF, unoserver, WeasyPrint, etc. +# ============================================================================== -COPY build.gradle . -COPY settings.gradle . -COPY gradlew . -COPY gradle gradle/ +# ======================================== +# STAGE 1: Build Stirling-PDF with Gradle (Alpine) +# ======================================== +FROM eclipse-temurin:21-jdk-alpine@sha256:c4799f335a65b1ecca8a31239b05522f2b0a184d6818f6349e83484ee6956198 AS build + +# Install build tools +RUN apk add --no-cache bash unzip curl git + +WORKDIR /workspace + +# Copy Gradle wrapper and configuration files +COPY build.gradle settings.gradle gradlew ./ +COPY gradle ./gradle/ + +# Make gradlew executable +RUN chmod +x gradlew + +# Create module directories and copy module build files (for Gradle layer caching) +RUN mkdir -p core common proprietary COPY app/core/build.gradle core/. COPY app/common/build.gradle common/. COPY app/proprietary/build.gradle proprietary/. -RUN ./gradlew build -x spotlessApply -x spotlessCheck -x test -x sonarqube || return 0 -# Set the working directory +# Warm-up Gradle dependency cache (optional but improves subsequent builds) +RUN ./gradlew --no-daemon printVersion --quiet | tail -1 > /tmp/version_tag || true +RUN ./gradlew --no-daemon build -x spotlessApply -x spotlessCheck -x test -x sonarqube || true + +# Switch to final source directory and copy full source code WORKDIR /app - -# Copy the entire project to the working directory COPY . . -# Build the application with DISABLE_ADDITIONAL_FEATURES=false +# Environment variables (can be overridden at build time) ENV DISABLE_ADDITIONAL_FEATURES=false \ STIRLING_PDF_DESKTOP_UI=false -RUN ./gradlew clean build -x spotlessApply -x spotlessCheck -x test -x sonarqube -# Main stage -FROM alpine:3.22.2@sha256:4b7ce07002c69e8f3d704a9c5d6fd3053be500b7f1c69fc0d80990c2ad8dd412 +# Final build – produce the fat JAR +RUN ./gradlew --no-daemon clean build \ + -x spotlessApply -x spotlessCheck -x test -x sonarqube \ + && apk del bash unzip curl git -# Copy necessary files -COPY scripts /scripts -COPY app/core/src/main/resources/static/fonts/*.ttf /usr/share/fonts/opentype/noto/ -# first /app directory is for the build stage, second is for the final image -COPY --from=build /app/app/core/build/libs/*.jar app.jar +# ======================================== +# STAGE 2: Runtime image based on Debian stable-slim +# Contains Java runtime + LibreOffice + Calibre + all PDF tools +# ======================================== +FROM debian:stable-slim@sha256:7cb087f19bcc175b96fbe4c2aef42ed00733a659581a80f6ebccfd8fe3185a3d + +SHELL ["/bin/bash", "-o", "pipefail", "-c"] +ENV DEBIAN_FRONTEND=noninteractive + +# Install core runtime dependencies + tools required by Stirling-PDF features +RUN apt-get update && apt-get install -y --no-install-recommends \ + ca-certificates tzdata tini bash fontconfig \ + openjdk-21-jre-headless \ + ffmpeg poppler-utils qpdf ghostscript ocrmypdf \ + libreoffice-nogui libreoffice-java-common \ + python3 python3-venv python3-uno \ + tesseract-ocr tesseract-ocr-eng tesseract-ocr-deu tesseract-ocr-fra \ + tesseract-ocr-por tesseract-ocr-chi-sim \ + libcairo2 libpango-1.0-0 libpangoft2-1.0-0 libgdk-pixbuf-2.0-0 \ + gosu unpaper \ + # AWT headless support (required for some Java graphics operations) + libfreetype6 libfontconfig1 libx11-6 libxt6 libxext6 libxrender1 libxtst6 libxi6 \ + libxinerama1 libxkbcommon0 libxkbfile1 libsm6 libice6 \ + # Qt WebEngine dependencies for Calibre + libegl1 libopengl0 libgl1 libxdamage1 libxfixes3 libxshmfence1 libdrm2 libgbm1 \ + libxkbcommon-x11-0 libxrandr2 libxcomposite1 libnss3 libx11-xcb1 \ + libxcb-cursor0 libdbus-1-3 libglib2.0-0 \ + # Virtual framebuffer (required for headless LibreOffice) + xvfb x11-utils coreutils \ + # Temporary packages only needed for Calibre installer + xz-utils gpgv curl xdg-utils \ + \ + # Install Calibre from official installer script + && curl -fsSL https://download.calibre-ebook.com/linux-installer.sh | sh /dev/stdin \ + \ + # Clean up installer-only packages + && apt-get purge -y xz-utils gpgv xdg-utils \ + && apt-get autoremove -y \ + && rm -rf /var/lib/apt/lists/* + +# Make ebook-convert available in PATH +RUN ln -sf /opt/calibre/ebook-convert /usr/bin/ebook-convert \ + && /opt/calibre/ebook-convert --version + +# ============================================================================== +# Create non-root user (stirlingpdfuser) with configurable UID/GID +# ============================================================================== +ARG PUID=1000 +ARG PGID=1000 + +RUN set -eux; \ + # Create group if it doesn't exist + if ! getent group stirlingpdfgroup >/dev/null 2>&1; then \ + if getent group "${PGID}" >/dev/null 2>&1; then \ + groupadd -o -g "${PGID}" stirlingpdfgroup; \ + else \ + groupadd -g "${PGID}" stirlingpdfgroup; \ + fi; \ + fi; \ + # Create user if it doesn't exist, avoid UID conflicts + if ! id -u stirlingpdfuser >/dev/null 2>&1; then \ + if getent passwd | awk -F: -v id="${PUID}" '$3==id{found=1} END{exit !found}'; then \ + echo "UID ${PUID} already in use – creating stirlingpdfuser with automatic UID"; \ + useradd -m -g stirlingpdfgroup -d /home/stirlingpdfuser -s /bin/bash stirlingpdfuser; \ + else \ + useradd -m -u "${PUID}" -g stirlingpdfgroup -d /home/stirlingpdfuser -s /bin/bash stirlingpdfuser; \ + fi; \ + fi + +# Compatibility alias for older entrypoint scripts expecting su-exec +RUN ln -sf /usr/sbin/gosu /usr/local/bin/su-exec + +# Copy application files from build stage +COPY scripts/ /scripts/ +COPY app/core/src/main/resources/static/fonts/*.ttf /usr/share/fonts/truetype/ +COPY --from=build /app/app/core/build/libs/*.jar /app.jar + +# Copy version tag generated during build +COPY --from=build /tmp/version_tag /etc/stirling_version + +# Optional version tag (can be passed at build time) ARG VERSION_TAG -# Set Environment Variables +# Metadata labels +LABEL org.opencontainers.image.title="Stirling-PDF" +LABEL org.opencontainers.image.description="A powerful locally hosted web-based PDF manipulation tool supporting 50+ operations including merging, splitting, conversion, OCR, watermarking, and more." +LABEL org.opencontainers.image.source="https://github.com/Stirling-Tools/Stirling-PDF" +LABEL org.opencontainers.image.licenses="MIT" +LABEL org.opencontainers.image.vendor="Stirling-Tools" +LABEL org.opencontainers.image.url="https://www.stirlingpdf.com" +LABEL org.opencontainers.image.documentation="https://docs.stirlingpdf.com" +LABEL maintainer="Stirling-Tools" +LABEL org.opencontainers.image.authors="Stirling-Tools" +LABEL org.opencontainers.image.version="${VERSION_TAG}" +LABEL org.opencontainers.image.keywords="PDF, manipulation, merge, split, convert, OCR, watermark" + +# ============================================================================== +# Runtime environment variables +# ============================================================================== ENV DISABLE_ADDITIONAL_FEATURES=true \ - VERSION_TAG=$VERSION_TAG \ - JAVA_BASE_OPTS="-XX:+UnlockExperimentalVMOptions -XX:MaxRAMPercentage=75 -XX:InitiatingHeapOccupancyPercent=20 -XX:+G1PeriodicGCInvokesConcurrent -XX:G1PeriodicGCInterval=10000 -XX:+UseStringDeduplication -XX:G1PeriodicGCSystemLoadThreshold=70" \ + JAVA_BASE_OPTS="-XX:+UnlockExperimentalVMOptions -XX:MaxRAMPercentage=75 -XX:InitiatingHeapOccupancyPercent=20 \ + -XX:+G1PeriodicGCInvokesConcurrent -XX:G1PeriodicGCInterval=10000 \ + -XX:+UseStringDeduplication -XX:G1PeriodicGCSystemLoadThreshold=70 \ + -Djava.awt.headless=true" \ JAVA_CUSTOM_OPTS="" \ HOME=/home/stirlingpdfuser \ - PUID=1000 \ - PGID=1000 \ + PUID=${PUID} \ + PGID=${PGID} \ UMASK=022 \ FAT_DOCKER=true \ INSTALL_BOOK_AND_ADVANCED_HTML_OPS=false \ - PYTHONPATH=/usr/lib/libreoffice/program:/opt/venv/lib/python3.12/site-packages \ UNO_PATH=/usr/lib/libreoffice/program \ - URE_BOOTSTRAP=file:///usr/lib/libreoffice/program/fundamentalrc \ - PATH=$PATH:/opt/venv/bin \ STIRLING_TEMPFILES_DIRECTORY=/tmp/stirling-pdf \ TMPDIR=/tmp/stirling-pdf \ TEMP=/tmp/stirling-pdf \ TMP=/tmp/stirling-pdf -# JDK for app -RUN apk add --no-cache bash \ - && ln -sf /bin/bash /bin/sh \ - && printf '%s\n' \ - 'https://dl-cdn.alpinelinux.org/alpine/edge/main' \ - 'https://dl-cdn.alpinelinux.org/alpine/edge/community' \ - 'https://dl-cdn.alpinelinux.org/alpine/edge/testing' \ - > /etc/apk/repositories && \ - apk upgrade --no-cache -a && \ - apk add --no-cache \ - ca-certificates \ - tzdata \ - tini \ - bash \ - curl \ - shadow \ - su-exec \ - openssl \ - openssl-dev \ - openjdk21-jre \ - ffmpeg \ - # Doc conversion - gcompat \ - libc6-compat \ - libreoffice \ - # pdftohtml - poppler-utils \ - # OCR MY PDF (unpaper for descew and other advanced featues) - tesseract-ocr-data-eng \ - tesseract-ocr-data-chi_sim \ - tesseract-ocr-data-deu \ - tesseract-ocr-data-fra \ - tesseract-ocr-data-por \ - unpaper \ - font-terminus font-dejavu font-noto font-noto-cjk font-awesome font-noto-extra font-liberation font-linux-libertine font-urw-base35 \ - # CV / Python - py3-opencv \ - python3 \ - ocrmypdf \ - py3-pip \ - py3-pillow \ - py3-pdf2image \ - # Calibre (musl-native) + QtWebEngine Runtime - calibre && \ - # Calibre fixes - apk fix --no-cache calibre && \ - python3 -m venv /opt/venv && \ - /opt/venv/bin/pip install --no-cache-dir --upgrade pip setuptools && \ - /opt/venv/bin/pip install --no-cache-dir --upgrade unoserver weasyprint && \ - ln -s /usr/lib/libreoffice/program/uno.py /opt/venv/lib/python3.12/site-packages/ && \ - ln -s /usr/lib/libreoffice/program/unohelper.py /opt/venv/lib/python3.12/site-packages/ && \ - ln -s /usr/lib/libreoffice/program /opt/venv/lib/python3.12/site-packages/LibreOffice && \ - mv /usr/share/tessdata /usr/share/tessdata-original && \ - mkdir -p $HOME /configs /logs /customFiles /pipeline/watchedFolders /pipeline/finishedFolders /tmp/stirling-pdf && \ - # Configure URW Base 35 fonts - ln -s /usr/share/fontconfig/conf.avail/69-urw-*.conf /etc/fonts/conf.d/ && \ - fc-cache -f -v && \ - chmod +x /scripts/* && \ - # User permissions - addgroup -S stirlingpdfgroup && adduser -S stirlingpdfuser -G stirlingpdfgroup && \ - chown -R stirlingpdfuser:stirlingpdfgroup $HOME /scripts /usr/share/fonts/opentype/noto /configs /customFiles /pipeline /tmp/stirling-pdf && \ - chown stirlingpdfuser:stirlingpdfgroup /app.jar && \ - ln -sf /bin/busybox /bin/sh +# ============================================================================== +# Python virtual environment for additional Python tools (WeasyPrint, OpenCV, etc.) +# ============================================================================== +RUN python3 -m venv /opt/venv --system-site-packages \ + && /opt/venv/bin/pip install --no-cache-dir weasyprint pdf2image opencv-python-headless \ + && /opt/venv/bin/python -c "import cv2; print('OpenCV version:', cv2.__version__)" +# Separate venv for unoserver (keeps it isolated) +RUN python3 -m venv /opt/unoserver-venv --system-site-packages \ + && /opt/unoserver-venv/bin/pip install --no-cache-dir unoserver + +# Make unoserver tools available in main venv PATH +RUN ln -sf /opt/unoserver-venv/bin/unoconvert /opt/venv/bin/unoconvert \ + && ln -sf /opt/unoserver-venv/bin/unoserver /opt/venv/bin/unoserver + +# Extend PATH to include both virtual environments +ENV PATH="/opt/venv/bin:/opt/unoserver-venv/bin:${PATH}" + +# ============================================================================== +# Final permissions, directories and font cache +# ============================================================================== +RUN set -eux; \ + chmod +x /scripts/*; \ + mkdir -p /configs /logs /customFiles /pipeline/watchedFolders /pipeline/finishedFolders /tmp/stirling-pdf; \ + chown -R stirlingpdfuser:stirlingpdfgroup \ + /home/stirlingpdfuser /configs /logs /customFiles /pipeline /tmp/stirling-pdf \ + /app.jar /usr/share/fonts/truetype /scripts; \ + chmod -R 755 /tmp/stirling-pdf + +# Rebuild font cache +RUN fc-cache -f -v + +# Force Qt/WebEngine to run headlessly (required for Calibre in Docker) +ENV QT_QPA_PLATFORM=offscreen \ + QTWEBENGINE_CHROMIUM_FLAGS="--disable-gpu --disable-dev-shm-usage" + +# Expose web UI port EXPOSE 8080/tcp -# Set user and run command + +STOPSIGNAL SIGTERM + +# Use tini as init (handles signals and zombies correctly) ENTRYPOINT ["tini", "--", "/scripts/init.sh"] -CMD ["sh", "-c", "java -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/tmp/stirling-pdf -jar /app.jar & /opt/venv/bin/unoserver --port 2003 --interface 127.0.0.1"] + +# CMD is empty – actual start command is defined in init.sh +CMD [] diff --git a/Dockerfile.ultra-lite b/Dockerfile.ultra-lite index a49362d60..e364ba0a7 100644 --- a/Dockerfile.ultra-lite +++ b/Dockerfile.ultra-lite @@ -56,4 +56,4 @@ EXPOSE 8080/tcp # Run the application ENTRYPOINT ["tini", "--", "/scripts/init-without-ocr.sh"] -CMD ["java", "-Dfile.encoding=UTF-8", "-Djava.io.tmpdir=/tmp/stirling-pdf", "-jar", "/app.jar"] +CMD [] diff --git a/app/common/src/main/java/stirling/software/common/configuration/RuntimePathConfig.java b/app/common/src/main/java/stirling/software/common/configuration/RuntimePathConfig.java index f8bc38a6b..7e6da0f0f 100644 --- a/app/common/src/main/java/stirling/software/common/configuration/RuntimePathConfig.java +++ b/app/common/src/main/java/stirling/software/common/configuration/RuntimePathConfig.java @@ -10,8 +10,10 @@ import lombok.Getter; import lombok.extern.slf4j.Slf4j; import stirling.software.common.model.ApplicationProperties; +import stirling.software.common.model.ApplicationProperties.CustomPaths; import stirling.software.common.model.ApplicationProperties.CustomPaths.Operations; import stirling.software.common.model.ApplicationProperties.CustomPaths.Pipeline; +import stirling.software.common.model.ApplicationProperties.System; @Slf4j @Configuration @@ -19,9 +21,16 @@ import stirling.software.common.model.ApplicationProperties.CustomPaths.Pipeline public class RuntimePathConfig { private final ApplicationProperties properties; private final String basePath; + + // Operation paths private final String weasyPrintPath; private final String unoConvertPath; private final String calibrePath; + private final String ocrMyPdfPath; + private final String sOfficePath; + + // Tesseract data path + private final String tessDataPath; // Pipeline paths private final String pipelineWatchedFoldersPath; @@ -38,7 +47,10 @@ public class RuntimePathConfig { String defaultFinishedFolders = Path.of(this.pipelinePath, "finishedFolders").toString(); String defaultWebUIConfigs = Path.of(this.pipelinePath, "defaultWebUIConfigs").toString(); - Pipeline pipeline = properties.getSystem().getCustomPaths().getPipeline(); + System system = properties.getSystem(); + CustomPaths customPaths = system.getCustomPaths(); + + Pipeline pipeline = customPaths.getPipeline(); this.pipelineWatchedFoldersPath = resolvePath( @@ -58,9 +70,11 @@ public class RuntimePathConfig { // Initialize Operation paths String defaultWeasyPrintPath = isDocker ? "/opt/venv/bin/weasyprint" : "weasyprint"; String defaultUnoConvertPath = isDocker ? "/opt/venv/bin/unoconvert" : "unoconvert"; - String defaultCalibrePath = isDocker ? "/usr/bin/ebook-convert" : "ebook-convert"; + String defaultCalibrePath = isDocker ? "/opt/calibre/ebook-convert" : "ebook-convert"; + String defaultOcrMyPdfPath = isDocker ? "/usr/bin/ocrmypdf" : "ocrmypdf"; + String defaultSOfficePath = isDocker ? "/usr/bin/soffice" : "soffice"; - Operations operations = properties.getSystem().getCustomPaths().getOperations(); + Operations operations = customPaths.getOperations(); this.weasyPrintPath = resolvePath( defaultWeasyPrintPath, @@ -72,6 +86,25 @@ public class RuntimePathConfig { this.calibrePath = resolvePath( defaultCalibrePath, operations != null ? operations.getCalibre() : null); + this.ocrMyPdfPath = + resolvePath( + defaultOcrMyPdfPath, operations != null ? operations.getOcrmypdf() : null); + this.sOfficePath = + resolvePath( + defaultSOfficePath, operations != null ? operations.getSoffice() : null); + + // Initialize Tesseract data path + String defaultTessDataPath = + isDocker ? "/usr/share/tesseract-ocr/5/tessdata" : "/usr/share/tessdata"; + + String tessPath = system.getTessdataDir(); + String tessdataDir = java.lang.System.getenv("TESSDATA_PREFIX"); + + this.tessDataPath = + resolvePath( + defaultTessDataPath, + (tessPath != null && !tessPath.isEmpty()) ? tessPath : tessdataDir); + log.info("Using Tesseract data path: {}", this.tessDataPath); } private String resolvePath(String defaultPath, String customPath) { diff --git a/app/common/src/main/java/stirling/software/common/model/ApplicationProperties.java b/app/common/src/main/java/stirling/software/common/model/ApplicationProperties.java index e8606b1f9..fbd36c6d4 100644 --- a/app/common/src/main/java/stirling/software/common/model/ApplicationProperties.java +++ b/app/common/src/main/java/stirling/software/common/model/ApplicationProperties.java @@ -372,6 +372,8 @@ public class ApplicationProperties { private String weasyprint; private String unoconvert; private String calibre; + private String ocrmypdf; + private String soffice; } } @@ -454,10 +456,10 @@ public class ApplicationProperties { @Override public String toString() { return """ - Driver { - driverName='%s' - } - """ + Driver { + driverName='%s' + } + """ .formatted(driverName); } } diff --git a/app/common/src/main/java/stirling/software/common/service/PostHogService.java b/app/common/src/main/java/stirling/software/common/service/PostHogService.java index 6c42e093f..199890f18 100644 --- a/app/common/src/main/java/stirling/software/common/service/PostHogService.java +++ b/app/common/src/main/java/stirling/software/common/service/PostHogService.java @@ -25,6 +25,7 @@ import org.springframework.stereotype.Service; import com.posthog.java.PostHog; +import stirling.software.common.configuration.RuntimePathConfig; import stirling.software.common.model.ApplicationProperties; @Service @@ -33,6 +34,7 @@ public class PostHogService { private final String uniqueId; private final String appVersion; private final ApplicationProperties applicationProperties; + private final RuntimePathConfig runtimePathConfig; private final UserServiceInterface userService; private final Environment env; private boolean configDirMounted; @@ -43,12 +45,14 @@ public class PostHogService { @Qualifier("configDirMounted") boolean configDirMounted, @Qualifier("appVersion") String appVersion, ApplicationProperties applicationProperties, + RuntimePathConfig runtimePathConfig, @Autowired(required = false) UserServiceInterface userService, Environment env) { this.postHog = postHog; this.uniqueId = uuid; this.appVersion = appVersion; this.applicationProperties = applicationProperties; + this.runtimePathConfig = runtimePathConfig; this.userService = userService; this.env = env; this.configDirMounted = configDirMounted; @@ -313,10 +317,7 @@ public class PostHogService { properties, "system_customHTMLFiles", applicationProperties.getSystem().isCustomHTMLFiles()); - addIfNotEmpty( - properties, - "system_tessdataDir", - applicationProperties.getSystem().getTessdataDir()); + addIfNotEmpty(properties, "system_tessdataDir", runtimePathConfig.getTessDataPath()); addIfNotEmpty( properties, "system_enableAlphaFunctionality", diff --git a/app/common/src/main/java/stirling/software/common/util/PDFToFile.java b/app/common/src/main/java/stirling/software/common/util/PDFToFile.java index 32f2cc874..b00cdae86 100644 --- a/app/common/src/main/java/stirling/software/common/util/PDFToFile.java +++ b/app/common/src/main/java/stirling/software/common/util/PDFToFile.java @@ -27,15 +27,22 @@ import io.github.pixee.security.Filenames; import lombok.extern.slf4j.Slf4j; +import stirling.software.common.configuration.RuntimePathConfig; import stirling.software.common.util.ProcessExecutor.ProcessExecutorResult; @Slf4j public class PDFToFile { private final TempFileManager tempFileManager; + private final RuntimePathConfig runtimePathConfig; public PDFToFile(TempFileManager tempFileManager) { + this(tempFileManager, null); + } + + public PDFToFile(TempFileManager tempFileManager, RuntimePathConfig runtimePathConfig) { this.tempFileManager = tempFileManager; + this.runtimePathConfig = runtimePathConfig; } public ResponseEntity processPdfToMarkdown(MultipartFile inputFile) @@ -241,31 +248,65 @@ public class PDFToFile { byte[] fileBytes; String fileName; + Path libreOfficeProfile = null; try (TempFile inputFileTemp = new TempFile(tempFileManager, ".pdf"); TempDirectory outputDirTemp = new TempDirectory(tempFileManager)) { Path tempInputFile = inputFileTemp.getPath(); Path tempOutputDir = outputDirTemp.getPath(); + Path unoOutputFile = + tempOutputDir.resolve( + pdfBaseName + "." + resolvePrimaryExtension(outputFormat)); // Save the uploaded file to a temporary location inputFile.transferTo(tempInputFile); // Run the LibreOffice command - List command = - new ArrayList<>( - Arrays.asList( - "soffice", - "--headless", - "--nologo", - "--infilter=" + libreOfficeFilter, - "--convert-to", - outputFormat, - "--outdir", - tempOutputDir.toString(), - tempInputFile.toString())); - ProcessExecutorResult returnCode = - ProcessExecutor.getInstance(ProcessExecutor.Processes.LIBRE_OFFICE) - .runCommandWithOutputHandling(command); + ProcessExecutorResult returnCode = null; + IOException unoconvertException = null; + + if (isUnoConvertEnabled()) { + try { + List unoCommand = + buildUnoConvertCommand( + tempInputFile, unoOutputFile, outputFormat, libreOfficeFilter); + returnCode = + ProcessExecutor.getInstance(ProcessExecutor.Processes.LIBRE_OFFICE) + .runCommandWithOutputHandling(unoCommand); + } catch (IOException e) { + unoconvertException = e; + log.warn( + "Unoconvert command failed ({}). Falling back to soffice command.", + e.getMessage()); + } + } + + if (returnCode == null) { + // Run the LibreOffice command as a fallback + libreOfficeProfile = Files.createTempDirectory("libreoffice_profile_"); + List command = new ArrayList<>(); + command.add(runtimePathConfig.getSOfficePath()); + command.add("-env:UserInstallation=" + libreOfficeProfile.toUri().toString()); + command.add("--headless"); + command.add("--nologo"); + command.add("--infilter=" + libreOfficeFilter); + command.add("--convert-to"); + command.add(outputFormat); + command.add("--outdir"); + command.add(tempOutputDir.toString()); + command.add(tempInputFile.toString()); + + try { + returnCode = + ProcessExecutor.getInstance(ProcessExecutor.Processes.LIBRE_OFFICE) + .runCommandWithOutputHandling(command); + } catch (IOException e) { + if (unoconvertException != null) { + e.addSuppressed(unoconvertException); + } + throw e; + } + } // Get output files List outputFiles = Arrays.asList(tempOutputDir.toFile().listFiles()); @@ -300,8 +341,42 @@ public class PDFToFile { fileBytes = byteArrayOutputStream.toByteArray(); } + } finally { + if (libreOfficeProfile != null) { + FileUtils.deleteQuietly(libreOfficeProfile.toFile()); + } } return WebResponseUtils.bytesToWebResponse( fileBytes, fileName, MediaType.APPLICATION_OCTET_STREAM); } + + private boolean isUnoConvertEnabled() { + return runtimePathConfig != null + && runtimePathConfig.getUnoConvertPath() != null + && !runtimePathConfig.getUnoConvertPath().isBlank(); + } + + private List buildUnoConvertCommand( + Path inputFile, Path outputFile, String outputFormat, String libreOfficeFilter) { + List command = new ArrayList<>(); + command.add(runtimePathConfig.getUnoConvertPath()); + command.add("--port"); + command.add("2003"); + command.add("--convert-to"); + command.add(outputFormat); + if (libreOfficeFilter != null && !libreOfficeFilter.isBlank()) { + command.add("--input-filter=" + libreOfficeFilter); + } + command.add(inputFile.toString()); + command.add(outputFile.toString()); + return command; + } + + private String resolvePrimaryExtension(String outputFormat) { + if (outputFormat == null) { + return ""; + } + int colonIndex = outputFormat.indexOf(':'); + return colonIndex > 0 ? outputFormat.substring(0, colonIndex) : outputFormat; + } } diff --git a/app/common/src/test/java/stirling/software/common/util/PDFToFileTest.java b/app/common/src/test/java/stirling/software/common/util/PDFToFileTest.java index 19c3d4322..528125eac 100644 --- a/app/common/src/test/java/stirling/software/common/util/PDFToFileTest.java +++ b/app/common/src/test/java/stirling/software/common/util/PDFToFileTest.java @@ -32,6 +32,7 @@ import org.springframework.web.multipart.MultipartFile; import io.github.pixee.security.ZipSecurity; +import stirling.software.common.configuration.RuntimePathConfig; import stirling.software.common.util.ProcessExecutor.ProcessExecutorResult; /** @@ -48,6 +49,7 @@ class PDFToFileTest { @Mock private ProcessExecutor mockProcessExecutor; @Mock private ProcessExecutorResult mockExecutorResult; @Mock private TempFileManager mockTempFileManager; + @Mock private RuntimePathConfig mockRuntimePathConfig; @BeforeEach void setUp() throws IOException { @@ -61,7 +63,9 @@ class PDFToFileTest { .when(mockTempFileManager.createTempDirectory()) .thenAnswer(invocation -> Files.createTempDirectory("test")); - pdfToFile = new PDFToFile(mockTempFileManager); + lenient().when(mockRuntimePathConfig.getSOfficePath()).thenReturn("/usr/bin/soffice"); + + pdfToFile = new PDFToFile(mockTempFileManager, mockRuntimePathConfig); } @Test @@ -363,7 +367,8 @@ class PDFToFileTest { when(mockProcessExecutor.runCommandWithOutputHandling( argThat( args -> - args.contains("--convert-to") + args != null + && args.contains("--convert-to") && args.contains("docx")))) .thenAnswer( invocation -> { @@ -424,7 +429,11 @@ class PDFToFileTest { .thenReturn(mockProcessExecutor); when(mockProcessExecutor.runCommandWithOutputHandling( - argThat(args -> args.contains("--convert-to") && args.contains("odp")))) + argThat( + args -> + args != null + && args.contains("--convert-to") + && args.contains("odp")))) .thenAnswer( invocation -> { // When command is executed, find the output directory argument @@ -513,7 +522,8 @@ class PDFToFileTest { when(mockProcessExecutor.runCommandWithOutputHandling( argThat( args -> - args.contains("--convert-to") + args != null + && args.contains("--convert-to") && args.contains("txt:Text")))) .thenAnswer( invocation -> { @@ -611,4 +621,110 @@ class PDFToFileTest { .contains("output.docx")); } } + + @Test + void testProcessPdfToOfficeFormat_UsesUnoconvertWhenConfigured() + throws IOException, InterruptedException { + when(mockRuntimePathConfig.getUnoConvertPath()).thenReturn("/custom/unoconvert"); + PDFToFile pdfToFileWithUno = new PDFToFile(mockTempFileManager, mockRuntimePathConfig); + + try (MockedStatic mockedStaticProcessExecutor = + mockStatic(ProcessExecutor.class)) { + MultipartFile pdfFile = + new MockMultipartFile( + "file", + "document.pdf", + MediaType.APPLICATION_PDF_VALUE, + "Fake PDF content".getBytes()); + + mockedStaticProcessExecutor + .when(() -> ProcessExecutor.getInstance(ProcessExecutor.Processes.LIBRE_OFFICE)) + .thenReturn(mockProcessExecutor); + + when(mockProcessExecutor.runCommandWithOutputHandling( + argThat(args -> args != null && args.contains("/custom/unoconvert")))) + .thenAnswer( + invocation -> { + List args = invocation.getArgument(0); + String outputPath = args.get(args.size() - 1); + Files.write(Path.of(outputPath), "Fake DOCX content".getBytes()); + return mockExecutorResult; + }); + + ResponseEntity response = + pdfToFileWithUno.processPdfToOfficeFormat(pdfFile, "docx", "writer_pdf_import"); + + assertEquals(HttpStatus.OK, response.getStatusCode()); + assertNotNull(response.getBody()); + assertTrue(response.getBody().length > 0); + assertTrue( + response.getHeaders() + .getContentDisposition() + .toString() + .contains("document.docx")); + } + } + + @Test + void testProcessPdfToOfficeFormat_FallsBackWhenUnoconvertFails() + throws IOException, InterruptedException { + when(mockRuntimePathConfig.getUnoConvertPath()).thenReturn("/custom/unoconvert"); + PDFToFile pdfToFileWithUno = new PDFToFile(mockTempFileManager, mockRuntimePathConfig); + + try (MockedStatic mockedStaticProcessExecutor = + mockStatic(ProcessExecutor.class)) { + MultipartFile pdfFile = + new MockMultipartFile( + "file", + "document.pdf", + MediaType.APPLICATION_PDF_VALUE, + "Fake PDF content".getBytes()); + + mockedStaticProcessExecutor + .when(() -> ProcessExecutor.getInstance(ProcessExecutor.Processes.LIBRE_OFFICE)) + .thenReturn(mockProcessExecutor); + + when(mockProcessExecutor.runCommandWithOutputHandling( + argThat(args -> args != null && args.contains("/custom/unoconvert")))) + .thenThrow(new IOException("Conversion failed")); + + when(mockProcessExecutor.runCommandWithOutputHandling( + argThat( + args -> + args != null + && args.stream() + .anyMatch( + arg -> + arg.contains( + "soffice"))))) + .thenAnswer( + invocation -> { + List args = invocation.getArgument(0); + String outDir = null; + for (int i = 0; i < args.size(); i++) { + if ("--outdir".equals(args.get(i)) && i + 1 < args.size()) { + outDir = args.get(i + 1); + break; + } + } + assertNotNull(outDir); + Files.write( + Path.of(outDir, "document.docx"), + "Fallback DOCX content".getBytes()); + return mockExecutorResult; + }); + + ResponseEntity response = + pdfToFileWithUno.processPdfToOfficeFormat(pdfFile, "docx", "writer_pdf_import"); + + assertEquals(HttpStatus.OK, response.getStatusCode()); + assertNotNull(response.getBody()); + assertTrue(response.getBody().length > 0); + assertTrue( + response.getHeaders() + .getContentDisposition() + .toString() + .contains("document.docx")); + } + } } diff --git a/app/core/src/main/java/stirling/software/SPDF/config/ExternalAppDepConfig.java b/app/core/src/main/java/stirling/software/SPDF/config/ExternalAppDepConfig.java index 22ad2adf4..7dd806aa7 100644 --- a/app/core/src/main/java/stirling/software/SPDF/config/ExternalAppDepConfig.java +++ b/app/core/src/main/java/stirling/software/SPDF/config/ExternalAppDepConfig.java @@ -41,6 +41,8 @@ public class ExternalAppDepConfig { private final String weasyprintPath; private final String unoconvPath; private final String calibrePath; + private final String ocrMyPdfPath; + private final String sOfficePath; /** * Map of command(binary) -> affected groups (e.g. "gs" -> ["Ghostscript"]). Immutable to avoid @@ -58,11 +60,13 @@ public class ExternalAppDepConfig { this.weasyprintPath = runtimePathConfig.getWeasyPrintPath(); this.unoconvPath = runtimePathConfig.getUnoConvertPath(); this.calibrePath = runtimePathConfig.getCalibrePath(); + this.ocrMyPdfPath = runtimePathConfig.getOcrMyPdfPath(); + this.sOfficePath = runtimePathConfig.getSOfficePath(); Map> tmp = new HashMap<>(); tmp.put("gs", List.of("Ghostscript")); - tmp.put("ocrmypdf", List.of("OCRmyPDF")); - tmp.put("soffice", List.of("LibreOffice")); + tmp.put(ocrMyPdfPath, List.of("OCRmyPDF")); + tmp.put(sOfficePath, List.of("LibreOffice")); tmp.put(weasyprintPath, List.of("Weasyprint")); tmp.put("pdftohtml", List.of("Pdftohtml")); tmp.put(unoconvPath, List.of("Unoconvert")); diff --git a/app/core/src/main/java/stirling/software/SPDF/controller/api/converters/ConvertOfficeController.java b/app/core/src/main/java/stirling/software/SPDF/controller/api/converters/ConvertOfficeController.java index a4ce59380..167293ef1 100644 --- a/app/core/src/main/java/stirling/software/SPDF/controller/api/converters/ConvertOfficeController.java +++ b/app/core/src/main/java/stirling/software/SPDF/controller/api/converters/ConvertOfficeController.java @@ -93,6 +93,7 @@ public class ConvertOfficeController { Files.copy(inputFile.getInputStream(), inputPath, StandardCopyOption.REPLACE_EXISTING); } + Path libreOfficeProfile = null; try { ProcessExecutorResult result; // Run Unoconvert command @@ -112,8 +113,10 @@ public class ConvertOfficeController { .runCommandWithOutputHandling(command); } // Run soffice command else { + libreOfficeProfile = Files.createTempDirectory("libreoffice_profile_"); List command = new ArrayList<>(); - command.add("soffice"); + command.add(runtimePathConfig.getSOfficePath()); + command.add("-env:UserInstallation=" + libreOfficeProfile.toUri().toString()); command.add("--headless"); command.add("--nologo"); command.add("--convert-to"); @@ -169,6 +172,9 @@ public class ConvertOfficeController { } catch (IOException e) { log.warn("Failed to delete temp input file: {}", inputPath, e); } + if (libreOfficeProfile != null) { + FileUtils.deleteQuietly(libreOfficeProfile.toFile()); + } } } diff --git a/app/core/src/main/java/stirling/software/SPDF/controller/api/converters/ConvertPDFToHtml.java b/app/core/src/main/java/stirling/software/SPDF/controller/api/converters/ConvertPDFToHtml.java index 76414ca57..e64746872 100644 --- a/app/core/src/main/java/stirling/software/SPDF/controller/api/converters/ConvertPDFToHtml.java +++ b/app/core/src/main/java/stirling/software/SPDF/controller/api/converters/ConvertPDFToHtml.java @@ -13,6 +13,7 @@ import io.swagger.v3.oas.annotations.tags.Tag; import lombok.RequiredArgsConstructor; +import stirling.software.common.configuration.RuntimePathConfig; import stirling.software.common.model.api.PDFFile; import stirling.software.common.util.PDFToFile; import stirling.software.common.util.TempFileManager; @@ -24,6 +25,7 @@ import stirling.software.common.util.TempFileManager; public class ConvertPDFToHtml { private final TempFileManager tempFileManager; + private final RuntimePathConfig runtimePathConfig; @PostMapping(consumes = MediaType.MULTIPART_FORM_DATA_VALUE, value = "/pdf/html") @Operation( @@ -32,7 +34,7 @@ public class ConvertPDFToHtml { "This endpoint converts a PDF file to HTML format. Input:PDF Output:HTML Type:SISO") public ResponseEntity processPdfToHTML(@ModelAttribute PDFFile file) throws Exception { MultipartFile inputFile = file.getFileInput(); - PDFToFile pdfToFile = new PDFToFile(tempFileManager); + PDFToFile pdfToFile = new PDFToFile(tempFileManager, runtimePathConfig); return pdfToFile.processPdfToHtml(inputFile); } } diff --git a/app/core/src/main/java/stirling/software/SPDF/controller/api/converters/ConvertPDFToOffice.java b/app/core/src/main/java/stirling/software/SPDF/controller/api/converters/ConvertPDFToOffice.java index d9538de58..753b8a075 100644 --- a/app/core/src/main/java/stirling/software/SPDF/controller/api/converters/ConvertPDFToOffice.java +++ b/app/core/src/main/java/stirling/software/SPDF/controller/api/converters/ConvertPDFToOffice.java @@ -20,6 +20,7 @@ import lombok.RequiredArgsConstructor; import stirling.software.SPDF.model.api.converters.PdfToPresentationRequest; import stirling.software.SPDF.model.api.converters.PdfToTextOrRTFRequest; import stirling.software.SPDF.model.api.converters.PdfToWordRequest; +import stirling.software.common.configuration.RuntimePathConfig; import stirling.software.common.model.api.PDFFile; import stirling.software.common.service.CustomPDFDocumentFactory; import stirling.software.common.util.GeneralUtils; @@ -35,6 +36,7 @@ public class ConvertPDFToOffice { private final CustomPDFDocumentFactory pdfDocumentFactory; private final TempFileManager tempFileManager; + private final RuntimePathConfig runtimePathConfig; @PostMapping(consumes = MediaType.MULTIPART_FORM_DATA_VALUE, value = "/pdf/presentation") @Operation( @@ -47,7 +49,7 @@ public class ConvertPDFToOffice { throws IOException, InterruptedException { MultipartFile inputFile = request.getFileInput(); String outputFormat = request.getOutputFormat(); - PDFToFile pdfToFile = new PDFToFile(tempFileManager); + PDFToFile pdfToFile = new PDFToFile(tempFileManager, runtimePathConfig); return pdfToFile.processPdfToOfficeFormat(inputFile, outputFormat, "impress_pdf_import"); } @@ -72,7 +74,7 @@ public class ConvertPDFToOffice { MediaType.TEXT_PLAIN); } } else { - PDFToFile pdfToFile = new PDFToFile(tempFileManager); + PDFToFile pdfToFile = new PDFToFile(tempFileManager, runtimePathConfig); return pdfToFile.processPdfToOfficeFormat(inputFile, outputFormat, "writer_pdf_import"); } } @@ -87,7 +89,7 @@ public class ConvertPDFToOffice { throws IOException, InterruptedException { MultipartFile inputFile = request.getFileInput(); String outputFormat = request.getOutputFormat(); - PDFToFile pdfToFile = new PDFToFile(tempFileManager); + PDFToFile pdfToFile = new PDFToFile(tempFileManager, runtimePathConfig); return pdfToFile.processPdfToOfficeFormat(inputFile, outputFormat, "writer_pdf_import"); } @@ -100,7 +102,7 @@ public class ConvertPDFToOffice { public ResponseEntity processPdfToXML(@ModelAttribute PDFFile file) throws Exception { MultipartFile inputFile = file.getFileInput(); - PDFToFile pdfToFile = new PDFToFile(tempFileManager); + PDFToFile pdfToFile = new PDFToFile(tempFileManager, runtimePathConfig); return pdfToFile.processPdfToOfficeFormat(inputFile, "xml", "writer_pdf_import"); } } diff --git a/app/core/src/main/java/stirling/software/SPDF/controller/api/converters/ConvertPDFToPDFA.java b/app/core/src/main/java/stirling/software/SPDF/controller/api/converters/ConvertPDFToPDFA.java index bdc57a984..5c388b504 100644 --- a/app/core/src/main/java/stirling/software/SPDF/controller/api/converters/ConvertPDFToPDFA.java +++ b/app/core/src/main/java/stirling/software/SPDF/controller/api/converters/ConvertPDFToPDFA.java @@ -71,9 +71,11 @@ import io.swagger.v3.oas.annotations.Operation; import io.swagger.v3.oas.annotations.tags.Tag; import lombok.Getter; +import lombok.RequiredArgsConstructor; import lombok.extern.slf4j.Slf4j; import stirling.software.SPDF.model.api.converters.PdfToPdfARequest; +import stirling.software.common.configuration.RuntimePathConfig; import stirling.software.common.util.ExceptionUtils; import stirling.software.common.util.ProcessExecutor; import stirling.software.common.util.ProcessExecutor.ProcessExecutorResult; @@ -83,8 +85,11 @@ import stirling.software.common.util.WebResponseUtils; @RequestMapping("/api/v1/convert") @Slf4j @Tag(name = "Convert", description = "Convert APIs") +@RequiredArgsConstructor public class ConvertPDFToPDFA { + private final RuntimePathConfig runtimePathConfig; + private static final String ICC_RESOURCE_PATH = "/icc/sRGB2014.icc"; private static final int PDFA_COMPATIBILITY_POLICY = 1; @@ -1043,26 +1048,33 @@ public class ConvertPDFToPDFA { ? "pdf:writer_pdf_Export:{\"SelectPdfVersion\":{\"type\":\"long\",\"value\":\"2\"}}" : "pdf:writer_pdf_Export:{\"SelectPdfVersion\":{\"type\":\"long\",\"value\":\"1\"}}"; - // Prepare LibreOffice command - List command = - new ArrayList<>( - Arrays.asList( - "soffice", - "--headless", - "--nologo", - "--convert-to", - pdfFilter, - "--outdir", - tempOutputDir.toString(), - tempInputFile.toString())); + Path libreOfficeProfile = Files.createTempDirectory("libreoffice_profile_"); + try { + // Prepare LibreOffice command + List command = + new ArrayList<>( + Arrays.asList( + runtimePathConfig.getSOfficePath(), + "-env:UserInstallation=" + + libreOfficeProfile.toUri().toString(), + "--headless", + "--nologo", + "--convert-to", + pdfFilter, + "--outdir", + tempOutputDir.toString(), + tempInputFile.toString())); - ProcessExecutorResult returnCode = - ProcessExecutor.getInstance(ProcessExecutor.Processes.LIBRE_OFFICE) - .runCommandWithOutputHandling(command); + ProcessExecutorResult returnCode = + ProcessExecutor.getInstance(ProcessExecutor.Processes.LIBRE_OFFICE) + .runCommandWithOutputHandling(command); - if (returnCode.getRc() != 0) { - log.error("PDF/A conversion failed with return code: {}", returnCode.getRc()); - throw ExceptionUtils.createPdfaConversionFailedException(); + if (returnCode.getRc() != 0) { + log.error("PDF/A conversion failed with return code: {}", returnCode.getRc()); + throw ExceptionUtils.createPdfaConversionFailedException(); + } + } finally { + FileUtils.deleteQuietly(libreOfficeProfile.toFile()); } // Get the output file diff --git a/app/core/src/main/java/stirling/software/SPDF/controller/api/misc/OCRController.java b/app/core/src/main/java/stirling/software/SPDF/controller/api/misc/OCRController.java index 291624629..e9f558e7d 100644 --- a/app/core/src/main/java/stirling/software/SPDF/controller/api/misc/OCRController.java +++ b/app/core/src/main/java/stirling/software/SPDF/controller/api/misc/OCRController.java @@ -37,10 +37,17 @@ import lombok.extern.slf4j.Slf4j; import stirling.software.SPDF.config.EndpointConfiguration; import stirling.software.SPDF.model.api.misc.ProcessPdfWithOcrRequest; +import stirling.software.common.configuration.RuntimePathConfig; import stirling.software.common.model.ApplicationProperties; import stirling.software.common.service.CustomPDFDocumentFactory; -import stirling.software.common.util.*; +import stirling.software.common.util.ExceptionUtils; +import stirling.software.common.util.GeneralUtils; +import stirling.software.common.util.ProcessExecutor; import stirling.software.common.util.ProcessExecutor.ProcessExecutorResult; +import stirling.software.common.util.TempDirectory; +import stirling.software.common.util.TempFile; +import stirling.software.common.util.TempFileManager; +import stirling.software.common.util.WebResponseUtils; @RestController @RequestMapping("/api/v1/misc") @@ -53,6 +60,7 @@ public class OCRController { private final CustomPDFDocumentFactory pdfDocumentFactory; private final TempFileManager tempFileManager; private final EndpointConfiguration endpointConfiguration; + private final RuntimePathConfig runtimePathConfig; private boolean isOcrMyPdfEnabled() { return endpointConfiguration.isGroupEnabled("OCRmyPDF"); @@ -64,7 +72,7 @@ public class OCRController { /** Gets the list of available Tesseract languages from the tessdata directory */ public List getAvailableTesseractLanguages() { - String tessdataDir = applicationProperties.getSystem().getTessdataDir(); + String tessdataDir = runtimePathConfig.getTessDataPath(); File[] files = new File(tessdataDir).listFiles(); if (files == null) { return Collections.emptyList(); @@ -80,9 +88,10 @@ public class OCRController { @Operation( summary = "Process a PDF file with OCR", description = - "This endpoint processes a PDF file using OCR (Optical Character Recognition). " - + "Users can specify languages, sidecar, deskew, clean, cleanFinal, ocrType, ocrRenderType, and removeImagesAfter options. " - + "Uses OCRmyPDF if available, falls back to Tesseract. Input:PDF Output:PDF Type:SI-Conditional") + "This endpoint processes a PDF file using OCR (Optical Character Recognition). Users can" + + " specify languages, sidecar, deskew, clean, cleanFinal, ocrType, ocrRenderType," + + " and removeImagesAfter options. Uses OCRmyPDF if available, falls back to" + + " Tesseract. Input:PDF Output:PDF Type:SI-Conditional") public ResponseEntity processPdfWithOCR( @ModelAttribute ProcessPdfWithOcrRequest request) throws IOException, InterruptedException { @@ -217,7 +226,7 @@ public class OCRController { List command = new ArrayList<>( Arrays.asList( - "ocrmypdf", + runtimePathConfig.getOcrMyPdfPath(), "--verbose", "2", "--output-type", diff --git a/app/core/src/main/java/stirling/software/SPDF/controller/web/OtherWebController.java b/app/core/src/main/java/stirling/software/SPDF/controller/web/OtherWebController.java index 09dd46cec..9549ac4db 100644 --- a/app/core/src/main/java/stirling/software/SPDF/controller/web/OtherWebController.java +++ b/app/core/src/main/java/stirling/software/SPDF/controller/web/OtherWebController.java @@ -14,16 +14,20 @@ import io.swagger.v3.oas.annotations.Hidden; import io.swagger.v3.oas.annotations.tags.Tag; import lombok.RequiredArgsConstructor; +import lombok.extern.slf4j.Slf4j; +import stirling.software.common.configuration.RuntimePathConfig; import stirling.software.common.model.ApplicationProperties; import stirling.software.common.util.CheckProgramInstall; @Controller @Tag(name = "Misc", description = "Miscellaneous APIs") @RequiredArgsConstructor +@Slf4j public class OtherWebController { private final ApplicationProperties applicationProperties; + private final RuntimePathConfig runtimePathConfig; @GetMapping("/compress-pdf") @Hidden @@ -120,7 +124,7 @@ public class OtherWebController { } public List getAvailableTesseractLanguages() { - String tessdataDir = applicationProperties.getSystem().getTessdataDir(); + String tessdataDir = runtimePathConfig.getTessDataPath(); File[] files = new File(tessdataDir).listFiles(); if (files == null) { return Collections.emptyList(); diff --git a/app/core/src/main/resources/settings.yml.template b/app/core/src/main/resources/settings.yml.template index 734f3f793..64ddaa857 100644 --- a/app/core/src/main/resources/settings.yml.template +++ b/app/core/src/main/resources/settings.yml.template @@ -115,7 +115,7 @@ system: showUpdate: false # see when a new update is available showUpdateOnlyAdmin: false # only admins can see when a new update is available, depending on showUpdate it must be set to 'true' customHTMLFiles: false # enable to have files placed in /customFiles/templates override the existing template HTML files - tessdataDir: /usr/share/tessdata # path to the directory containing the Tessdata files. This setting is relevant for Windows systems. For Windows users, this path should be adjusted to point to the appropriate directory where the Tessdata files are stored. + tessdataDir: "" # path to the directory containing the Tessdata files. This setting is relevant for Windows systems. For Windows users, this path should be adjusted to point to the appropriate directory where the Tessdata files are stored. enableAnalytics: null # Master toggle for analytics: set to 'true' to enable all analytics, 'false' to disable all analytics, or leave as 'null' to prompt admin on first launch enablePosthog: null # Enable PostHog analytics (open-source product analytics): set to 'true' to enable, 'false' to disable, or 'null' to enable by default when analytics is enabled enableScarf: null # Enable Scarf pixel: set to 'true' to enable, 'false' to disable, or 'null' to enable by default when analytics is enabled @@ -150,6 +150,8 @@ system: weasyprint: '' # Defaults to /opt/venv/bin/weasyprint unoconvert: '' # Defaults to /opt/venv/bin/unoconvert calibre: '' # Defaults to /usr/bin/ebook-convert + ocrmypdf: '' # Defaults to /usr/bin/ocrmypdf + soffice: '' # Defaults to /usr/bin/soffice fileUploadLimit: '' # Defaults to "". No limit when string is empty. Set a number, between 0 and 999, followed by one of the following strings to set a limit. "KB", "MB", "GB". tempFileManagement: baseTmpDir: '' # Defaults to java.io.tmpdir/stirling-pdf diff --git a/app/core/src/test/java/stirling/software/SPDF/config/ExternalAppDepConfigTest.java b/app/core/src/test/java/stirling/software/SPDF/config/ExternalAppDepConfigTest.java index 7247f5f0e..e2c6a326f 100644 --- a/app/core/src/test/java/stirling/software/SPDF/config/ExternalAppDepConfigTest.java +++ b/app/core/src/test/java/stirling/software/SPDF/config/ExternalAppDepConfigTest.java @@ -32,6 +32,8 @@ class ExternalAppDepConfigTest { void setUp() { when(runtimePathConfig.getWeasyPrintPath()).thenReturn("/custom/weasyprint"); when(runtimePathConfig.getUnoConvertPath()).thenReturn("/custom/unoconvert"); + when(runtimePathConfig.getCalibrePath()).thenReturn("/custom/calibre"); + when(runtimePathConfig.getOcrMyPdfPath()).thenReturn("/custom/ocrmypdf"); lenient() .when(endpointConfiguration.getEndpointsForGroup(anyString())) .thenReturn(Set.of()); @@ -45,6 +47,8 @@ class ExternalAppDepConfigTest { assertEquals(List.of("Weasyprint"), mapping.get("/custom/weasyprint")); assertEquals(List.of("Unoconvert"), mapping.get("/custom/unoconvert")); + assertEquals(List.of("Calibre"), mapping.get("/custom/calibre")); + assertEquals(List.of("OCRmyPDF"), mapping.get("/custom/ocrmypdf")); assertEquals(List.of("Ghostscript"), mapping.get("gs")); } diff --git a/exampleYmlFiles/docker-compose-latest-fat-endpoints-disabled.yml b/exampleYmlFiles/docker-compose-latest-fat-endpoints-disabled.yml index 827de1e19..96d366c99 100644 --- a/exampleYmlFiles/docker-compose-latest-fat-endpoints-disabled.yml +++ b/exampleYmlFiles/docker-compose-latest-fat-endpoints-disabled.yml @@ -1,8 +1,8 @@ - services: stirling-pdf: container_name: Stirling-PDF-Fat-Disable-Endpoints - image: docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest-fat + # image: docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest-fat + image: ghcr.io/stirling-tools/stirling-pdf-test:fat deploy: resources: limits: diff --git a/exampleYmlFiles/docker-compose-latest-fat-security.yml b/exampleYmlFiles/docker-compose-latest-fat-security.yml index 5b07420ff..57169923d 100644 --- a/exampleYmlFiles/docker-compose-latest-fat-security.yml +++ b/exampleYmlFiles/docker-compose-latest-fat-security.yml @@ -1,7 +1,8 @@ services: stirling-pdf: container_name: Stirling-PDF-Security-Fat - image: docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest-fat + # image: docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest-fat + image: ghcr.io/stirling-tools/stirling-pdf-test:fat deploy: resources: limits: diff --git a/exampleYmlFiles/docker-compose-latest-security-with-sso.yml b/exampleYmlFiles/docker-compose-latest-security-with-sso.yml index 55ea0893d..e1716d817 100644 --- a/exampleYmlFiles/docker-compose-latest-security-with-sso.yml +++ b/exampleYmlFiles/docker-compose-latest-security-with-sso.yml @@ -1,7 +1,8 @@ services: stirling-pdf: container_name: Stirling-PDF-Security - image: docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest + # image: docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest + image: ghcr.io/stirling-tools/stirling-pdf-test:latest deploy: resources: limits: @@ -22,9 +23,9 @@ services: SECURITY_ENABLELOGIN: "true" SECURITY_OAUTH2_ENABLED: "true" SECURITY_OAUTH2_AUTOCREATEUSER: "true" # This is set to true to allow auto-creation of non-existing users in Stirling-PDF - SECURITY_OAUTH2_ISSUER: "https://accounts.google.com" # Change with any other provider that supports OpenID Connect Discovery (/.well-known/openid-configuration) end-point + SECURITY_OAUTH2_ISSUER: "https://accounts.google.com" # Change with any other provider that supports OpenID Connect Discovery (/.well-known/openid-configuration) end-point SECURITY_OAUTH2_CLIENTID: ".apps.googleusercontent.com" # Client ID from your provider - SECURITY_OAUTH2_CLIENTSECRET: "" # Client Secret from your provider + SECURITY_OAUTH2_CLIENTSECRET: "" # Client Secret from your provider SECURITY_OAUTH2_SCOPES: "openid,profile,email" # Expected OAuth2 Scope SECURITY_OAUTH2_USEASUSERNAME: "email" # Default is 'email'; custom fields can be used as the username SECURITY_OAUTH2_PROVIDER: "google" # Set this to your OAuth provider's name, e.g., 'google' or 'keycloak' diff --git a/exampleYmlFiles/docker-compose-latest-security.yml b/exampleYmlFiles/docker-compose-latest-security.yml index c6589ab9c..61315b885 100644 --- a/exampleYmlFiles/docker-compose-latest-security.yml +++ b/exampleYmlFiles/docker-compose-latest-security.yml @@ -1,7 +1,8 @@ services: stirling-pdf: container_name: Stirling-PDF-Security - image: docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest + # image: docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest + image: ghcr.io/stirling-tools/stirling-pdf-test:latest deploy: resources: limits: diff --git a/exampleYmlFiles/docker-compose-latest-ultra-lite-security.yml b/exampleYmlFiles/docker-compose-latest-ultra-lite-security.yml index fe839d941..237ac0047 100644 --- a/exampleYmlFiles/docker-compose-latest-ultra-lite-security.yml +++ b/exampleYmlFiles/docker-compose-latest-ultra-lite-security.yml @@ -1,7 +1,8 @@ services: stirling-pdf: container_name: Stirling-PDF-Ultra-Lite-Security - image: docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest-ultra-lite + # image: docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest-ultra-lite + image: ghcr.io/stirling-tools/stirling-pdf-test:ultra-lite deploy: resources: limits: diff --git a/exampleYmlFiles/docker-compose-latest-ultra-lite.yml b/exampleYmlFiles/docker-compose-latest-ultra-lite.yml index a3710ad82..d6a187307 100644 --- a/exampleYmlFiles/docker-compose-latest-ultra-lite.yml +++ b/exampleYmlFiles/docker-compose-latest-ultra-lite.yml @@ -1,7 +1,8 @@ services: stirling-pdf: container_name: Stirling-PDF-Ultra-Lite - image: docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest-ultra-lite + # image: docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest-ultra-lite + image: ghcr.io/stirling-tools/stirling-pdf-test:ultra-lite deploy: resources: limits: diff --git a/exampleYmlFiles/docker-compose-latest.yml b/exampleYmlFiles/docker-compose-latest.yml index a68da538a..a3985ff10 100644 --- a/exampleYmlFiles/docker-compose-latest.yml +++ b/exampleYmlFiles/docker-compose-latest.yml @@ -1,7 +1,8 @@ services: stirling-pdf: container_name: Stirling-PDF - image: docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest + # image: docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest + image: ghcr.io/stirling-tools/stirling-pdf-test:latest deploy: resources: limits: diff --git a/exampleYmlFiles/test_cicd.yml b/exampleYmlFiles/test_cicd.yml index 086f862d5..7f58fc1a2 100644 --- a/exampleYmlFiles/test_cicd.yml +++ b/exampleYmlFiles/test_cicd.yml @@ -1,7 +1,8 @@ services: stirling-pdf: container_name: Stirling-PDF-Security-Fat-with-login - image: docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest-fat + # image: docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest-fat + image: ghcr.io/stirling-tools/stirling-pdf-test:fat deploy: resources: limits: diff --git a/scripts/init-without-ocr.sh b/scripts/init-without-ocr.sh index 73d9feb4a..d34010363 100644 --- a/scripts/init-without-ocr.sh +++ b/scripts/init-without-ocr.sh @@ -1,42 +1,188 @@ #!/bin/bash +# This script initializes Stirling PDF without OCR features. +set -euo pipefail -export JAVA_TOOL_OPTIONS="${JAVA_BASE_OPTS} ${JAVA_CUSTOM_OPTS}" -echo "running with JAVA_TOOL_OPTIONS ${JAVA_BASE_OPTS} ${JAVA_CUSTOM_OPTS}" +log() { printf '%s\n' "$*" >&2; } +command_exists() { command -v "$1" >/dev/null 2>&1; } -# Update the user and group IDs as per environment variables -if [ ! -z "$PUID" ] && [ "$PUID" != "$(id -u stirlingpdfuser)" ]; then - usermod -o -u "$PUID" stirlingpdfuser || true +SU_EXEC_BIN="" +if command_exists su-exec; then + SU_EXEC_BIN="su-exec" +elif command_exists gosu; then + SU_EXEC_BIN="gosu" fi +CURRENT_USER="$(id -un)" +CURRENT_UID="$(id -u)" +SWITCH_USER_WARNING_EMITTED=false -if [ ! -z "$PGID" ] && [ "$PGID" != "$(getent group stirlingpdfgroup | cut -d: -f3)" ]; then - groupmod -o -g "$PGID" stirlingpdfgroup || true -fi -umask "$UMASK" || true +warn_switch_user_once() { + if [ "$SWITCH_USER_WARNING_EMITTED" = false ]; then + log "WARNING: Unable to switch to user ${RUNTIME_USER:-stirlingpdfuser}; running command as ${CURRENT_USER}." + SWITCH_USER_WARNING_EMITTED=true + fi +} -if [[ "$INSTALL_BOOK_AND_ADVANCED_HTML_OPS" == "true" && "$FAT_DOCKER" != "true" ]]; then - echo "issue with calibre in current version, feature currently disabled on Stirling-PDF" - #apk add --no-cache calibre@testing +run_as_runtime_user() { + if [ "$CURRENT_USER" = "$RUNTIME_USER" ]; then + "$@" + elif [ "$CURRENT_UID" -eq 0 ] && [ -n "$SU_EXEC_BIN" ]; then + "$SU_EXEC_BIN" "$RUNTIME_USER" "$@" + else + warn_switch_user_once + "$@" + fi +} + +# ---------- VERSION_TAG ---------- +# Load VERSION_TAG from file if not provided via environment. +if [ -z "${VERSION_TAG:-}" ] && [ -f /etc/stirling_version ]; then + VERSION_TAG="$(tr -d '\r\n' < /etc/stirling_version)" + export VERSION_TAG fi -if [[ "$FAT_DOCKER" != "true" ]]; then - /scripts/download-security-jar.sh -fi +# ---------- JAVA_OPTS ---------- +# Configure Java runtime options. +export JAVA_TOOL_OPTIONS="${JAVA_BASE_OPTS:-} ${JAVA_CUSTOM_OPTS:-}" +export JAVA_TOOL_OPTIONS="-Djava.awt.headless=true ${JAVA_TOOL_OPTIONS}" +log "running with JAVA_TOOL_OPTIONS=${JAVA_TOOL_OPTIONS}" +log "Running Stirling PDF with DISABLE_ADDITIONAL_FEATURES=${DISABLE_ADDITIONAL_FEATURES:-} and VERSION_TAG=${VERSION_TAG:-}" -if [[ -n "$LANGS" ]]; then - /scripts/installFonts.sh $LANGS -fi +# ---------- UMASK ---------- +# Set default permissions mask. +UMASK_VAL="${UMASK:-022}" +umask "$UMASK_VAL" 2>/dev/null || umask 022 -echo "Setting permissions and ownership for necessary directories..." -# Ensure temp directory exists and has correct permissions -mkdir -p /tmp/stirling-pdf || true -# Attempt to change ownership of directories and files -if chown -R stirlingpdfuser:stirlingpdfgroup $HOME /logs /scripts /usr/share/fonts/opentype/noto /configs /customFiles /pipeline /tmp/stirling-pdf /app.jar; then - chmod -R 755 /logs /scripts /usr/share/fonts/opentype/noto /configs /customFiles /pipeline /tmp/stirling-pdf /app.jar || true - # If chown succeeds, execute the command as stirlingpdfuser - exec su-exec stirlingpdfuser "$@" +# ---------- XDG_RUNTIME_DIR ---------- +# Create the runtime directory, respecting UID/GID settings. +RUNTIME_USER="stirlingpdfuser" +if id -u "$RUNTIME_USER" >/dev/null 2>&1; then + RUID="$(id -u "$RUNTIME_USER")" + RGRP="$(id -gn "$RUNTIME_USER")" else - # If chown fails, execute the command without changing the user context - echo "[WARN] Chown failed, running as host user" - exec "$@" + RUID="$(id -u)" + RGRP="$(id -gn)" + RUNTIME_USER="$(id -un)" +fi +CURRENT_USER="$(id -un)" +CURRENT_UID="$(id -u)" + +export XDG_RUNTIME_DIR="/tmp/xdg-${RUID}" +mkdir -p "${XDG_RUNTIME_DIR}" || true +if [ "$(id -u)" -eq 0 ]; then + chown "${RUNTIME_USER}:${RGRP}" "${XDG_RUNTIME_DIR}" 2>/dev/null || true +fi +chmod 700 "${XDG_RUNTIME_DIR}" 2>/dev/null || true +log "XDG_RUNTIME_DIR=${XDG_RUNTIME_DIR}" + +# ---------- Optional ---------- +# Disable advanced HTML operations if required. +if [[ "${INSTALL_BOOK_AND_ADVANCED_HTML_OPS:-false}" == "true" && "${FAT_DOCKER:-true}" != "true" ]]; then + log "issue with calibre in current version, feature currently disabled on Stirling-PDF" +fi + +# Download security JAR in non-fat builds. +if [[ "${FAT_DOCKER:-true}" != "true" && -x /scripts/download-security-jar.sh ]]; then + /scripts/download-security-jar.sh || true +fi + +# ---------- UID/GID remap ---------- +# Remap user/group IDs to match container runtime settings. +if [ "$(id -u)" -eq 0 ]; then + if id -u stirlingpdfuser >/dev/null 2>&1; then + if [ -n "${PUID:-}" ] && [ "$PUID" != "$(id -u stirlingpdfuser)" ]; then + usermod -o -u "$PUID" stirlingpdfuser || true + chown stirlingpdfuser:stirlingpdfgroup "${XDG_RUNTIME_DIR}" 2>/dev/null || true + fi + fi + if getent group stirlingpdfgroup >/dev/null 2>&1; then + if [ -n "${PGID:-}" ] && [ "$PGID" != "$(getent group stirlingpdfgroup | cut -d: -f3)" ]; then + groupmod -o -g "$PGID" stirlingpdfgroup || true + fi + fi +fi + +# ---------- Permissions ---------- +# Ensure required directories exist and set correct permissions. +log "Setting permissions..." +mkdir -p /tmp/stirling-pdf /logs /configs /customFiles /pipeline || true +CHOWN_PATHS=("$HOME" "/logs" "/scripts" "/configs" "/customFiles" "/pipeline" "/tmp/stirling-pdf" "/app.jar") +[ -d /usr/share/fonts/truetype ] && CHOWN_PATHS+=("/usr/share/fonts/truetype") +CHOWN_OK=true +for p in "${CHOWN_PATHS[@]}"; do + if [ -e "$p" ]; then + chown -R "stirlingpdfuser:stirlingpdfgroup" "$p" 2>/dev/null || CHOWN_OK=false + chmod -R 755 "$p" 2>/dev/null || true + fi +done + +# ---------- Xvfb ---------- +# Start a virtual framebuffer for GUI-based LibreOffice interactions. +if command_exists Xvfb; then + log "Starting Xvfb on :99" + Xvfb :99 -screen 0 1024x768x24 -ac +extension GLX +render -noreset > /dev/null 2>&1 & + export DISPLAY=:99 + sleep 1 +else + log "Xvfb not installed; skipping virtual display setup" +fi + +# ---------- unoserver ---------- +# Start LibreOffice UNO server for document conversions. +UNOSERVER_BIN="$(command -v unoserver || true)" +UNOCONVERT_BIN="$(command -v unoconvert || true)" +UNOSERVER_PID="" + +if [ -n "$UNOSERVER_BIN" ] && [ -n "$UNOCONVERT_BIN" ]; then + LIBREOFFICE_PROFILE="${HOME:-/home/${RUNTIME_USER}}/.libreoffice_uno_${RUID}" + run_as_runtime_user mkdir -p "$LIBREOFFICE_PROFILE" + + log "Starting unoserver on 127.0.0.1:2003" + run_as_runtime_user "$UNOSERVER_BIN" \ + --interface 127.0.0.1 \ + --port 2003 \ + --uno-port 2004 \ + & + UNOSERVER_PID=$! + log "unoserver PID: $UNOSERVER_PID (Profile: $LIBREOFFICE_PROFILE)" + + # Wait until UNO server is ready. + log "Waiting for unoserver..." + for _ in {1..20}; do + if run_as_runtime_user "$UNOCONVERT_BIN" --version >/dev/null 2>&1; then + log "unoserver is ready!" + break + fi + sleep 1 + done + + if ! run_as_runtime_user "$UNOCONVERT_BIN" --version >/dev/null 2>&1; then + log "ERROR: unoserver failed!" + if [ -n "$UNOSERVER_PID" ]; then + kill "$UNOSERVER_PID" 2>/dev/null || true + wait "$UNOSERVER_PID" 2>/dev/null || true + fi + exit 1 + fi +else + log "unoserver/unoconvert not installed; skipping UNO setup" +fi + +# ---------- Java ---------- +# Start Stirling PDF Java application. +log "Starting Stirling PDF" +JAVA_CMD=( + java + -Dfile.encoding=UTF-8 + -Djava.io.tmpdir=/tmp/stirling-pdf + -jar /app.jar +) + +if [ "$CURRENT_USER" = "$RUNTIME_USER" ]; then + exec "${JAVA_CMD[@]}" +elif [ "$CURRENT_UID" -eq 0 ] && [ -n "$SU_EXEC_BIN" ]; then + exec "$SU_EXEC_BIN" "$RUNTIME_USER" "${JAVA_CMD[@]}" +else + warn_switch_user_once + exec "${JAVA_CMD[@]}" fi diff --git a/scripts/init.sh b/scripts/init.sh index 24ca66cbe..80ed42015 100644 --- a/scripts/init.sh +++ b/scripts/init.sh @@ -1,36 +1,110 @@ #!/bin/bash +# This script initializes environment variables and paths, +# prepares Tesseract data directories, and then runs the main init script. -# Copy the original tesseract-ocr files to the volume directory without overwriting existing files -echo "Copying original files without overwriting existing files" -mkdir -p /usr/share/tessdata -cp -rn /usr/share/tessdata-original/* /usr/share/tessdata +set -euo pipefail -if [ -d /usr/share/tesseract-ocr/4.00/tessdata ]; then - cp -r /usr/share/tesseract-ocr/4.00/tessdata/* /usr/share/tessdata || true; +append_env_path() { + local target="$1" current="$2" separator=":" + if [ -d "$target" ] && [[ ":${current}:" != *":${target}:"* ]]; then + if [ -n "$current" ]; then + printf '%s' "${target}${separator}${current}" + else + printf '%s' "${target}" + fi + else + printf '%s' "$current" + fi +} + +python_site_dir() { + local venv_dir="$1" + local python_bin="$venv_dir/bin/python" + if [ -x "$python_bin" ]; then + local py_tag + if py_tag="$("$python_bin" -c 'import sys; print(f"python{sys.version_info.major}.{sys.version_info.minor}")' 2>/dev/null)" \ + && [ -n "$py_tag" ] \ + && [ -d "$venv_dir/lib/$py_tag/site-packages" ]; then + printf '%s' "$venv_dir/lib/$py_tag/site-packages" + fi + fi +} + +# === LD_LIBRARY_PATH === +# Adjust the library path depending on CPU architecture. +ARCH=$(uname -m) +case "$ARCH" in + x86_64) + [ -d /usr/lib/x86_64-linux-gnu ] && export LD_LIBRARY_PATH="/usr/lib/x86_64-linux-gnu${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}" + ;; + aarch64) + [ -d /usr/lib/aarch64-linux-gnu ] && export LD_LIBRARY_PATH="/usr/lib/aarch64-linux-gnu${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}" + ;; +esac + +# Add LibreOffice program directory to library path if available. +if [ -d /usr/lib/libreoffice/program ]; then + export LD_LIBRARY_PATH="/usr/lib/libreoffice/program${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}" +fi + +# === Python PATH === +# Add virtual environments to PATH and PYTHONPATH. +for dir in /opt/venv/bin /opt/unoserver-venv/bin; do + PATH="$(append_env_path "$dir" "$PATH")" +done +export PATH + +PYTHON_PATH_ENTRIES=() +for venv in /opt/venv /opt/unoserver-venv; do + if [ -d "$venv" ]; then + site_dir="$(python_site_dir "$venv")" + [ -n "${site_dir:-}" ] && PYTHON_PATH_ENTRIES+=("$site_dir") + fi +done +if [ ${#PYTHON_PATH_ENTRIES[@]} -gt 0 ]; then + PYTHONPATH="$(IFS=:; printf '%s' "${PYTHON_PATH_ENTRIES[*]}")${PYTHONPATH:+:$PYTHONPATH}" + export PYTHONPATH +fi + +# # === tessdata === +# # Prepare Tesseract OCR data directory. +REAL_TESSDATA="/usr/share/tesseract-ocr/5/tessdata" +SEC_TESSDATA="/usr/share/tessdata" + +log_warn() { + echo "[init][warn] $*" >&2 +} + +if [ -d "$REAL_TESSDATA" ] && [ -w "$REAL_TESSDATA" ]; then + log_warn "Skipping tessdata adjustments; directory writable: $REAL_TESSDATA" +else + log_warn "Skipping tessdata adjustments; directory missing or not writable: $REAL_TESSDATA" fi if [ -d /usr/share/tesseract-ocr/5/tessdata ]; then - cp -r /usr/share/tesseract-ocr/5/tessdata/* /usr/share/tessdata || true; + REAL_TESSDATA="/usr/share/tesseract-ocr/5/tessdata" + log_warn "Using /usr/share/tesseract-ocr/5/tessdata as TESSDATA_PREFIX" +elif [ -d /usr/share/tessdata ]; then + REAL_TESSDATA="/usr/share/tessdata" + log_warn "Using /usr/share/tessdata as TESSDATA_PREFIX" +elif [ -d /tessdata ]; then + REAL_TESSDATA="/tessdata" + log_warn "Using /tessdata as TESSDATA_PREFIX" +else + REAL_TESSDATA="" + log_warn "No tessdata directory found" fi -# Check if TESSERACT_LANGS environment variable is set and is not empty -if [[ -n "$TESSERACT_LANGS" ]]; then - # Convert comma-separated values to a space-separated list - SPACE_SEPARATED_LANGS=$(echo $TESSERACT_LANGS | tr ',' ' ') - pattern='^[a-zA-Z]{2,4}(_[a-zA-Z]{2,4})?$' - # Install each language pack - for LANG in $SPACE_SEPARATED_LANGS; do - if [[ $LANG =~ $pattern ]]; then - apk add --no-cache "tesseract-ocr-data-$LANG" - else - echo "Skipping invalid language code" - fi - done +if [ -n "$REAL_TESSDATA" ]; then + export TESSDATA_PREFIX="$REAL_TESSDATA" fi -# Ensure temp directory exists with correct permissions before running main init -mkdir -p /tmp/stirling-pdf || true +# === Temp dir === +# Ensure the temporary directory exists and has proper permissions. +mkdir -p /tmp/stirling-pdf chown -R stirlingpdfuser:stirlingpdfgroup /tmp/stirling-pdf || true chmod -R 755 /tmp/stirling-pdf || true -/scripts/init-without-ocr.sh "$@" +# === Start application === +# Run the main init script that handles the full startup logic. +exec /scripts/init-without-ocr.sh diff --git a/testing/allEndpointsRemovedSettings.yml b/testing/allEndpointsRemovedSettings.yml index 58e7fd9f9..bcc1e6e37 100644 --- a/testing/allEndpointsRemovedSettings.yml +++ b/testing/allEndpointsRemovedSettings.yml @@ -140,6 +140,9 @@ system: operations: weasyprint: '' # Defaults to /opt/venv/bin/weasyprint unoconvert: '' # Defaults to /opt/venv/bin/unoconvert + calibre: '' # Defaults to /usr/bin/ebook-convert + ocrmypdf: '' # Defaults to /usr/bin/ocrmypdf + soffice: '' # Defaults to /usr/bin/soffice fileUploadLimit: '' # Defaults to "". No limit when string is empty. Set a number, between 0 and 999, followed by one of the following strings to set a limit. "KB", "MB", "GB". tempFileManagement: baseTmpDir: '' # Defaults to java.io.tmpdir/stirling-pdf diff --git a/testing/test.sh b/testing/test.sh index d4adce375..cb24db902 100644 --- a/testing/test.sh +++ b/testing/test.sh @@ -16,27 +16,47 @@ find_root() { PROJECT_ROOT=$(find_root) -# Function to check the health of the service with a timeout of 80 seconds +# Function to check application readiness via HTTP instead of Docker's health status check_health() { - local service_name=$1 + local container_name=$1 # real container name local compose_file=$2 - local end=$((SECONDS+60)) + local timeout=80 # total timeout in seconds + local interval=3 # poll interval in seconds + local end=$((SECONDS + timeout)) + local last_code="000" - echo -n "Waiting for $service_name to become healthy..." - until [ "$(docker inspect --format='{{if .State.Health}}{{.State.Health.Status}}{{else}}healthy{{end}}' "$service_name")" == "healthy" ] || [ $SECONDS -ge $end ]; do - sleep 3 - echo -n "." - if [ $SECONDS -ge $end ]; then - echo -e "\n$service_name health check timed out after 80 seconds." - echo "Printing logs for $service_name:" - docker logs "$service_name" - return 1 + echo "Waiting for $container_name to become reachable on http://localhost:8080/ (timeout ${timeout}s)..." + while [ $SECONDS -lt $end ]; do + # Optional: check if container is running at all (nice for debugging) + if ! docker ps --format '{{.Names}}' | grep -Fxq "$container_name"; then + echo " Container $container_name not running yet (still waiting)..." fi + + # Try simple HTTP GET on the root page + last_code=$(curl -s -o /dev/null -w '%{http_code}' "http://localhost:8080/") || last_code="000" + + # Treat any 2xx or 3xx as "ready" + if [ "$last_code" -ge 200 ] && [ "$last_code" -lt 400 ]; then + echo "$container_name is reachable over HTTP (status $last_code)." + echo "Printing logs for $container_name:" + docker logs "$container_name" || true + return 0 + fi + + echo " Still waiting for HTTP readiness, current status: $last_code" + sleep "$interval" done - echo -e "\n$service_name is healthy!" - echo "Printing logs for $service_name:" - docker logs "$service_name" - return 0 + + echo "$container_name did not become HTTP-ready within ${timeout}s (last HTTP status: $last_code)." + + # For extra debugging: show Docker health status, but DO NOT depend on it + local docker_health + docker_health=$(docker inspect --format='{{if .State.Health}}{{.State.Health.Status}}{{else}}(no healthcheck){{end}}' "$container_name" 2>/dev/null || echo "inspect failed") + echo "Docker-reported health status for $container_name: $docker_health" + + echo "Printing logs for $container_name:" + docker logs "$container_name" || true + return 1 } # Function to capture file list from a Docker container @@ -48,7 +68,7 @@ capture_file_list() { # Get all files in one command, output directly from Docker to avoid path issues # Skip proc, sys, dev, and the specified LibreOffice config directory # Also skip PDFBox and LibreOffice temporary files - docker exec $container_name sh -c "find / -type f \ + docker exec "$container_name" sh -c "find / -type f \ -not -path '*/proc/*' \ -not -path '*/sys/*' \ -not -path '*/dev/*' \ @@ -69,7 +89,7 @@ capture_file_list() { echo "Trying alternative approach..." # Alternative simpler approach - just get paths as a fallback - docker exec $container_name sh -c "find / -type f \ + docker exec "$container_name" sh -c "find / -type f \ -not -path '*/proc/*' \ -not -path '*/sys/*' \ -not -path '*/dev/*' \ @@ -106,14 +126,8 @@ compare_file_lists() { # Check if files exist and have content if [ ! -s "$before_file" ] || [ ! -s "$after_file" ]; then echo "WARNING: One or both file lists are empty." - - if [ ! -s "$before_file" ]; then - echo "Before file is empty: $before_file" - fi - - if [ ! -s "$after_file" ]; then - echo "After file is empty: $after_file" - fi + if [ ! -s "$before_file" ]; then echo "Before file is empty: $before_file"; fi + if [ ! -s "$after_file" ]; then echo "After file is empty: $after_file"; fi # Create empty diff file > "$diff_file" @@ -132,7 +146,6 @@ compare_file_lists() { echo "No temporary files found in the after snapshot." fi fi - return 0 fi @@ -169,7 +182,6 @@ compare_file_lists() { else echo "No file changes detected during test." fi - return 0 } @@ -220,19 +232,33 @@ verify_app_version() { # Function to test a Docker Compose configuration test_compose() { local compose_file=$1 - local service_name=$2 + local test_name=$2 local status=0 - echo "Testing $compose_file configuration..." + echo "Testing ${compose_file} configuration..." # Start up the Docker Compose service docker-compose -f "$compose_file" up -d - # Wait for the service to become healthy - if check_health "$service_name" "$compose_file"; then - echo "$service_name test passed." + # Wait a moment for containers to appear + sleep 3 + + local container_name + container_name=$(docker-compose -f "$compose_file" ps --format '{{.Names}}' --filter "status=running" | head -n1) + + if [[ -z "$container_name" ]]; then + echo "ERROR: No running container found for ${compose_file}" + docker-compose -f "$compose_file" ps + return 1 + fi + + echo "Started container: $container_name" + + # Wait for the service to become healthy (HTTP-based) + if check_health "$container_name" "$compose_file"; then + echo "${test_name} test passed." else - echo "$service_name test failed." + echo "${test_name} test failed." status=1 fi @@ -246,7 +272,6 @@ declare -a failed_tests run_tests() { local test_name=$1 local compose_file=$2 - if test_compose "$compose_file" "$test_name"; then passed_tests+=("$test_name") else @@ -254,18 +279,18 @@ run_tests() { fi } - # Main testing routine main() { SECONDS=0 - cd "$PROJECT_ROOT" export DOCKER_CLI_EXPERIMENTAL=enabled export COMPOSE_DOCKER_CLI_BUILD=0 - export DISABLE_ADDITIONAL_FEATURES=true - # Run the gradlew build command and check if it fails + # ================================================================== + # 1. Ultra-Lite (no additional features) + # ================================================================== + export DISABLE_ADDITIONAL_FEATURES=true if ! ./gradlew clean build; then echo "Gradle build failed with security disabled, exiting script." exit 1 @@ -276,11 +301,12 @@ main() { EXPECTED_VERSION=$(get_expected_version) echo "Expected version: $EXPECTED_VERSION" - # Building Docker images - # docker build --no-cache --pull --build-arg VERSION_TAG=alpha -t stirlingtools/stirling-pdf:latest -f ./Dockerfile . - docker build --build-arg VERSION_TAG=alpha -t docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest-ultra-lite -f ./Dockerfile.ultra-lite . + # Build Ultra-Lite image (GHCR tag, matching docker-compose-latest-ultra-lite.yml) + docker build --build-arg VERSION_TAG=alpha \ + -t docker.stirlingpdf.com/stirlingtools/stirling-pdf:ultra-lite \ + -f ./Dockerfile.ultra-lite . - # Test each configuration + # Test Ultra-Lite configuration run_tests "Stirling-PDF-Ultra-Lite" "./exampleYmlFiles/docker-compose-latest-ultra-lite.yml" echo "Testing webpage accessibility..." @@ -302,36 +328,27 @@ main() { echo "Version verification failed for Stirling-PDF-Ultra-Lite" fi - docker-compose -f "./exampleYmlFiles/docker-compose-latest-ultra-lite.yml" down - - # run_tests "Stirling-PDF" "./exampleYmlFiles/docker-compose-latest.yml" - # docker-compose -f "./exampleYmlFiles/docker-compose-latest.yml" down + docker-compose -f "./exampleYmlFiles/docker-compose-latest-ultra-lite.yml" down -v + # ================================================================== + # 2. Full Fat + Security + # ================================================================== export DISABLE_ADDITIONAL_FEATURES=false - # Run the gradlew build command and check if it fails if ! ./gradlew clean build; then echo "Gradle build failed with security enabled, exiting script." exit 1 fi - # Get expected version after the security-enabled build echo "Getting expected version from Gradle (security enabled)..." EXPECTED_VERSION=$(get_expected_version) echo "Expected version with security enabled: $EXPECTED_VERSION" - # Building Docker images with security enabled - # docker build --no-cache --pull --build-arg VERSION_TAG=alpha -t stirlingtools/stirling-pdf:latest -f ./Dockerfile . - # docker build --no-cache --pull --build-arg VERSION_TAG=alpha -t stirlingtools/stirling-pdf:latest-ultra-lite -f ./Dockerfile.ultra-lite . - docker build --no-cache --pull --build-arg VERSION_TAG=alpha -t docker.stirlingpdf.com/stirlingtools/stirling-pdf:latest-fat -f ./Dockerfile.fat . - - - # Test each configuration with security - # run_tests "Stirling-PDF-Ultra-Lite-Security" "./exampleYmlFiles/docker-compose-latest-ultra-lite-security.yml" - # docker-compose -f "./exampleYmlFiles/docker-compose-latest-ultra-lite-security.yml" down - # run_tests "Stirling-PDF-Security" "./exampleYmlFiles/docker-compose-latest-security.yml" - # docker-compose -f "./exampleYmlFiles/docker-compose-latest-security.yml" down - + # Build Fat (Security) image for GHCR tag used in all 'fat' compose files + docker build --no-cache --pull --build-arg VERSION_TAG=alpha \ + -t docker.stirlingpdf.com/stirlingtools/stirling-pdf:fat \ + -f ./Dockerfile.fat . + # Test fat + security compose run_tests "Stirling-PDF-Security-Fat" "./exampleYmlFiles/docker-compose-latest-fat-security.yml" echo "Testing webpage accessibility..." @@ -353,54 +370,50 @@ main() { echo "Version verification failed for Stirling-PDF-Security-Fat" fi - docker-compose -f "./exampleYmlFiles/docker-compose-latest-fat-security.yml" down + docker-compose -f "./exampleYmlFiles/docker-compose-latest-fat-security.yml" down -v + # ================================================================== + # 3. Regression test with login (test_cicd.yml) + # ================================================================== run_tests "Stirling-PDF-Security-Fat-with-login" "./exampleYmlFiles/test_cicd.yml" - if [ $? -eq 0 ]; then - # Create directory for file snapshots if it doesn't exist + # Only run behave tests if the container started successfully + if [[ " ${passed_tests[*]} " =~ "Stirling-PDF-Security-Fat-with-login" ]]; then + + CONTAINER_NAME=$(docker-compose -f "./exampleYmlFiles/test_cicd.yml" ps --format '{{.Names}}' --filter "status=running" | head -n1) + SNAPSHOT_DIR="$PROJECT_ROOT/testing/file_snapshots" mkdir -p "$SNAPSHOT_DIR" - # Capture file list before running behave tests BEFORE_FILE="$SNAPSHOT_DIR/files_before_behave.txt" AFTER_FILE="$SNAPSHOT_DIR/files_after_behave.txt" DIFF_FILE="$SNAPSHOT_DIR/files_diff.txt" - # Define container name variable for consistency - CONTAINER_NAME="Stirling-PDF-Security-Fat-with-login" - capture_file_list "$CONTAINER_NAME" "$BEFORE_FILE" cd "testing/cucumber" if python -m behave; then - # Wait 10 seconds before capturing the file list after tests echo "Waiting 5 seconds for any file operations to complete..." sleep 5 - # Capture file list after running behave tests cd "$PROJECT_ROOT" capture_file_list "$CONTAINER_NAME" "$AFTER_FILE" - # Compare file lists if compare_file_lists "$BEFORE_FILE" "$AFTER_FILE" "$DIFF_FILE" "$CONTAINER_NAME"; then echo "No unexpected temporary files found." - passed_tests+=("Stirling-PDF-Regression") + passed_tests+=("Stirling-PDF-Regression $CONTAINER_NAME") else echo "WARNING: Unexpected temporary files detected after behave tests!" failed_tests+=("Stirling-PDF-Regression-Temp-Files") fi - - passed_tests+=("Stirling-PDF-Regression") + passed_tests+=("Stirling-PDF-Regression $CONTAINER_NAME") else - failed_tests+=("Stirling-PDF-Regression") + failed_tests+=("Stirling-PDF-Regression $CONTAINER_NAME") echo "Printing docker logs of failed regression" docker logs "$CONTAINER_NAME" echo "Printed docker logs of failed regression" - # Still capture file list after failure for analysis - # Wait 10 seconds before capturing the file list - echo "Waiting 5 seconds before capturing file list..." + echo "Waiting 10 seconds before capturing file list..." sleep 10 cd "$PROJECT_ROOT" @@ -408,9 +421,11 @@ main() { compare_file_lists "$BEFORE_FILE" "$AFTER_FILE" "$DIFF_FILE" "$CONTAINER_NAME" fi fi + docker-compose -f "./exampleYmlFiles/test_cicd.yml" down -v - docker-compose -f "./exampleYmlFiles/test_cicd.yml" down - + # ================================================================== + # 4. Disabled Endpoints Test + # ================================================================== run_tests "Stirling-PDF-Fat-Disable-Endpoints" "./exampleYmlFiles/docker-compose-latest-fat-endpoints-disabled.yml" echo "Testing disabled endpoints..." @@ -430,27 +445,27 @@ main() { echo "Version verification failed for Stirling-PDF-Fat-Disable-Endpoints" fi - docker-compose -f "./exampleYmlFiles/docker-compose-latest-fat-endpoints-disabled.yml" down + docker-compose -f "./exampleYmlFiles/docker-compose-latest-fat-endpoints-disabled.yml" down -v - # Report results + # ================================================================== + # Final Report + # ================================================================== echo "All tests completed in $SECONDS seconds." - if [ ${#passed_tests[@]} -ne 0 ]; then echo "Passed tests:" + for test in "${passed_tests[@]}"; do + echo -e "\e[32m$test\e[0m" + done fi - for test in "${passed_tests[@]}"; do - echo -e "\e[32m$test\e[0m" # Green color for passed tests - done if [ ${#failed_tests[@]} -ne 0 ]; then echo "Failed tests:" + for test in "${failed_tests[@]}"; do + echo -e "\e[31m$test\e[0m" + done fi - for test in "${failed_tests[@]}"; do - echo -e "\e[31m$test\e[0m" # Red color for failed tests - done - # Check if there are any failed tests and exit with an error code if so if [ ${#failed_tests[@]} -ne 0 ]; then echo "Some tests failed." exit 1