feat(aot): add aot-diagnostics.sh for AOT cache diagnostics and validation (#5848)

# Description of Changes



This pull request makes significant improvements to the Docker build
process for the embedded Stirling-PDF image, focusing on build
efficiency, runtime optimization, and maintainability. Key changes
include upgrading major tool versions, introducing optional stripping of
Calibre's WebEngine to reduce image size, consolidating ImageMagick
layers, and refining the Python environment build process. The runtime
image is now leaner, with clearer separation between build and runtime
dependencies, and improved caching for faster builds and pulls.

**Build and Dependency Management Improvements**
* Upgraded Calibre to version `9.4.0` and added support for the
`TARGETPLATFORM` build argument for multi-platform builds.
* Added an optional `CALIBRE_STRIP_WEBENGINE` build argument to strip
Chromium/WebEngine from Calibre, saving ~80 MB when PDF output via
Calibre is not needed.
* Consolidated ImageMagick outputs into a single staging directory
(`/magick-export`) to reduce Docker layers and improve caching
efficiency.
* Refactored Python virtual environment build: now built in a dedicated
stage with pre-built wheels and copied into the runtime image,
eliminating the need for build tools and pip installs at runtime.

**Runtime Image Optimization**
* Reduced installed system packages to only what is needed at runtime;
Python build tools and dev packages are no longer included.
* Cleaned up unnecessary runtime files, including removal of build-only
Python artifacts and system headers, for a smaller and more secure
image.

**Layer and Copy Optimization**
* Switched to `COPY --link` for all major external tool layers and
application files, enabling independent layer caching and parallel pulls
for faster builds.

**Runtime Configuration and Health**
* Improved runtime directory structure and permissions, added persistent
cache directories for Project Leyden AOT, and wrote the version tag to
`/etc/stirling_version` for easier script access.
* Updated the healthcheck to wait longer for startup and increased
timeout/retries for more robust readiness detection.

<!--
Please provide a summary of the changes, including:

- What was changed
- Why the change was made
- Any challenges encountered

Closes #(issue_number)
-->

---

## Checklist

### General

- [ ] I have read the [Contribution
Guidelines](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/CONTRIBUTING.md)
- [ ] I have read the [Stirling-PDF Developer
Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md)
(if applicable)
- [ ] I have read the [How to add new languages to
Stirling-PDF](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md)
(if applicable)
- [ ] I have performed a self-review of my own code
- [ ] My changes generate no new warnings

### Documentation

- [ ] I have updated relevant docs on [Stirling-PDF's doc
repo](https://github.com/Stirling-Tools/Stirling-Tools.github.io/blob/main/docs/)
(if functionality has heavily changed)
- [ ] I have read the section [Add New Translation
Tags](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md#add-new-translation-tags)
(for new translation tags only)

### Translations (if applicable)

- [ ] I ran
[`scripts/counter_translation.py`](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/docs/counter_translation.md)

### UI Changes (if applicable)

- [ ] Screenshots or videos demonstrating the UI changes are attached
(e.g., as comments or direct attachments in the PR)

### Testing (if applicable)

- [ ] I have tested my changes locally. Refer to the [Testing
Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md#6-testing)
for more details.

---------

Signed-off-by: Balázs Szücs <bszucs1209@gmail.com>
This commit is contained in:
Balázs Szücs
2026-03-03 20:06:46 +01:00
committed by GitHub
parent 1b68a513a9
commit 9ac260ee92
5 changed files with 856 additions and 225 deletions

View File

@@ -1,8 +1,9 @@
# Stirling-PDF - Full version (embedded frontend)
FROM ubuntu:noble AS calibre-build
ARG CALIBRE_VERSION=9.3.1
ARG TARGETPLATFORM
ARG CALIBRE_VERSION=9.4.0
ARG CALIBRE_STRIP_WEBENGINE=false
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
set -eux; \
@@ -27,7 +28,7 @@ RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
tar xJf /tmp/calibre.txz -C /opt/calibre; \
rm /tmp/calibre.txz; \
\
# We only need Qt6 WebEngine (Chromium) for ebookPDF output.
# We only need Qt6 WebEngine (Chromium) for ebook->PDF output.
# PDF INPUT now uses the pdftohtml engine (poppler), not Qt.
rm -f /opt/calibre/lib/libQt6Designer* \
/opt/calibre/lib/libQt6Multimedia* \
@@ -229,7 +230,7 @@ RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
find /opt/calibre -name '*.pyc' -delete 2>/dev/null || true; \
\
# ── Verify conversion still works ──
# NOTE: txtepub used intentionally NOT txtpdf.
# NOTE: txt->epub used intentionally NOT txt->pdf.
# Calibre 7+ uses WebEngine (Chromium) for PDF output, which requires kernel
# capabilities unavailable in Docker RUN steps and segfaults under QEMU.
# epub output exercises the same Python/plugin stack without touching WebEngine.
@@ -242,6 +243,21 @@ RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
pdftohtml -v >/dev/null 2>&1 && echo "pdftohtml OK" || { echo "ERROR: pdftohtml not found"; exit 1; }; \
echo "=== Calibre stripped successfully ==="
# Optional: strip Chromium/WebEngine (~80 MB savings) when PDF output via Calibre is not needed.
# Build with --build-arg CALIBRE_STRIP_WEBENGINE=true to enable.
RUN if [ "${CALIBRE_STRIP_WEBENGINE}" = "true" ]; then \
echo "Stripping Calibre WebEngine (Chromium), PDF output via Calibre will be disabled"; \
rm -rf /opt/calibre/lib/qt6/libexec/QtWebEngineProcess \
/opt/calibre/lib/qt6/resources \
/opt/calibre/lib/libQt6WebEngine*.so.* \
/opt/calibre/lib/libQt6Quick*.so.* \
/opt/calibre/lib/libQt6Qml*.so.* \
/opt/calibre/translations/qtwebengine_locales 2>/dev/null || true; \
echo "WebEngine stripped, Calibre PDF output disabled"; \
else \
echo "CALIBRE_STRIP_WEBENGINE=false, keeping WebEngine for PDF output"; \
fi
# Build the Java application and frontend.
FROM gradle:9.3.1-jdk25 AS app-build
@@ -294,6 +310,7 @@ RUN java -Djarmode=tools -jar app.jar extract --layers --destination /layers
# Build Ghostscript 10.06.0 from source in an isolated stage (avoids library conflicts).
FROM ubuntu:noble AS gs-build
ARG TARGETPLATFORM
ARG GS_VERSION=10.06.0
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
--mount=type=cache,target=/tmp/gs-build,id=gs-build-${TARGETPLATFORM:-local} \
@@ -316,6 +333,7 @@ RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
# Build PDF Tools (QPDF and ImageMagick 7).
FROM ubuntu:noble AS pdf-tools-build
ARG TARGETPLATFORM
ARG QPDF_VERSION=12.3.2
ARG IM_VERSION=7.1.2-13
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
@@ -346,6 +364,44 @@ RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
cd .. && \
ldconfig /usr/local/lib
# Stage ImageMagick outputs into a single directory so runtime can import them with one COPY
# (reduces 4 separate COPY layers to 1 independent --link layer).
RUN mkdir -p /magick-export/usr/bin \
/magick-export/usr/local/lib \
/magick-export/usr/local/etc && \
cp /usr/local/bin/magick /magick-export/usr/bin/ && \
cp -a /usr/local/lib/libMagick*.so* /magick-export/usr/local/lib/ && \
cp -a /usr/local/lib/ImageMagick-7* /magick-export/usr/local/lib/ && \
cp -a /usr/local/etc/ImageMagick-7 /magick-export/usr/local/etc/
# Build Python venv in an isolated stage so runtime image never needs build tools.
# Packages with native extensions (opencv, cryptography) use pre-built wheels (--prefer-binary).
# python3-uno is intentionally NOT installed here, it is a system package in the runtime stage
# and accessed via --system-site-packages at runtime.
FROM ubuntu:noble AS python-venv-build
ARG TARGETPLATFORM
ARG UNOSERVER_VERSION=3.6
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
apt-get update && apt-get install -y --no-install-recommends \
python3 python3-venv ca-certificates binutils && \
rm -rf /var/lib/apt/lists/*
RUN --mount=type=cache,target=/root/.cache/pip,sharing=locked \
python3 -m venv /opt/venv --system-site-packages && \
/opt/venv/bin/pip install --no-cache-dir --prefer-binary \
weasyprint pdf2image opencv-python-headless ocrmypdf \
cryptography \
"unoserver==${UNOSERVER_VERSION}" && \
find /opt/venv -type d -name __pycache__ -exec rm -rf {} + 2>/dev/null || true && \
find /opt/venv \( -name '*.pyc' -o -name '*.pyi' \) -delete 2>/dev/null || true && \
rm -rf /opt/venv/lib/python*/site-packages/pip \
/opt/venv/lib/python*/site-packages/pip-*.dist-info \
/opt/venv/lib/python*/site-packages/setuptools \
/opt/venv/lib/python*/site-packages/setuptools-*.dist-info && \
find /opt/venv -name '*.so' -exec strip --strip-unneeded {} + 2>/dev/null || true
# Final runtime image.
FROM eclipse-temurin:25-jre-noble AS runtime
@@ -377,10 +433,11 @@ RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
fonts-crosextra-caladea fonts-crosextra-carlito \
fonts-noto-core fonts-noto-mono fonts-noto-extra \
fonts-noto-cjk poppler-data \
# We install these via apt to avoid downloading "fat wheels" from pip
# python3-full replaced with minimal set
python3 python3-dev python3-venv python3-uno \
# Python dependencies via pip to avoid conflicts, so we don't install them here
# python3-uno required for UNO bridge (accessed by venv via --system-site-packages)
# python3-venv is NOT needed: the copied /opt/venv works without it at runtime
# python3-dev is NOT needed, venv is pre-built in python-venv-build stage
python3 python3-uno \
# Python packages are in /opt/venv (copied from python-venv-build stage below)
# OCR
tesseract-ocr tesseract-ocr-eng tesseract-ocr-deu tesseract-ocr-fra \
tesseract-ocr-por tesseract-ocr-chi-sim \
@@ -401,36 +458,21 @@ RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
; \
\
\
# Note: We do NOT install numpy/pillow/cv2 here; it uses the system versions
python3 -m venv /opt/venv --system-site-packages; \
/opt/venv/bin/pip install --no-cache-dir \
weasyprint pdf2image opencv-python-headless ocrmypdf \
cryptography \
"unoserver==${UNOSERVER_VERSION}"; \
\
ln -sf /opt/venv/bin/unoconvert /usr/local/bin/unoconvert; \
ln -sf /opt/venv/bin/unoserver /usr/local/bin/unoserver; \
\
# Verify and fix LibreOffice
libreoffice --version; \
soffice --version 2>/dev/null || true; \
# Rebuild UNO bridge type database
/usr/lib/libreoffice/program/soffice.bin --headless --convert-to pdf /dev/null 2>/dev/null || true; \
# Force font cache rebuild and verify filters are available
# Force font cache rebuild
fc-cache -f -v 2>&1 | awk 'NR <= 20'; \
/opt/venv/bin/python -c "import cv2; print('OpenCV', cv2.__version__)"; \
/opt/venv/bin/python -c "import ocrmypdf; print('ocrmypdf OK')"; \
\
# Cleanup stage.
\
# Remove build-only packages no longer needed at runtime.
apt-get remove --purge -y software-properties-common python3-dev || true; \
# Remove PPA helper, no longer needed after apt-get update
apt-get remove --purge -y software-properties-common || true; \
apt-get autoremove --purge -y || true; \
rm -rf /var/lib/apt/lists/*; \
\
# Remove C/C++ headers (no longer needed after pip install)
rm -rf /usr/include/*; \
\
# Docs / man / info / icons / themes / GUI assets (headless server)
rm -rf /usr/share/doc/* /usr/share/man/* /usr/share/info/* \
/usr/share/lintian/* /usr/share/linda/* \
@@ -499,15 +541,6 @@ RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
/usr/lib/libreoffice/program/libdbu* \
/usr/lib/libreoffice/program/libreport* 2>/dev/null || true; \
\
find /opt/venv -type d -name __pycache__ \
-exec rm -rf {} + 2>/dev/null || true; \
find /opt/venv \
\( -name '*.pyc' -o -name '*.pyi' \) -delete 2>/dev/null || true; \
rm -rf /opt/venv/lib/python*/site-packages/pip \
/opt/venv/lib/python*/site-packages/pip-*.dist-info \
/opt/venv/lib/python*/site-packages/setuptools \
/opt/venv/lib/python*/site-packages/setuptools-*.dist-info; \
\
rm -rf /usr/lib/python3.12/test \
/usr/lib/python3.12/idlelib \
/usr/lib/python3.12/tkinter \
@@ -524,8 +557,6 @@ RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
/usr/lib/python3/dist-packages/_cffi_backend*.so \
/usr/lib/python3/dist-packages/_cffi_backend*.cpython*.so \
2>/dev/null || true; \
/opt/venv/bin/python -c "import cffi; print('cffi OK:', cffi.__version__)" \
|| { echo 'ERROR: cffi broken after system package cleanup'; exit 1; }; \
\
# Strip debug symbols from ALL shared libraries
find /usr/lib -name '*.so*' -type f \
@@ -597,7 +628,7 @@ RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
# to be rebuilt without --enable-libflite (not worth the complexity).
\
# ── dpkg metadata cleanup (~14MB) ──
# Not needed at runtime container won't run apt-get.
# Not needed at runtime, container won't run apt-get.
rm -rf /var/lib/dpkg/info/*.list \
/var/lib/dpkg/info/*.md5sums \
/var/lib/dpkg/info/*.conffiles \
@@ -613,17 +644,23 @@ RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
# Misc caches
rm -rf /var/cache/fontconfig/* /tmp/*
# Calibre and QPDF tools.
COPY --from=calibre-build /opt/calibre /opt/calibre
COPY --from=pdf-tools-build /usr/local/bin/qpdf /usr/bin/qpdf
COPY --from=pdf-tools-build /usr/local/bin/magick /usr/bin/magick
COPY --from=pdf-tools-build /usr/local/lib/libMagick*.so* /usr/local/lib/
# Copy loadable coder/filter modules (required when built with --with-modules)
COPY --from=pdf-tools-build /usr/local/lib/ImageMagick-7* /usr/local/lib/
COPY --from=pdf-tools-build /usr/local/etc/ImageMagick-7 /usr/local/etc/ImageMagick-7
COPY --from=gs-build /usr/local/bin/gs /usr/local/bin/gs
COPY --from=gs-build /usr/local/share/ghostscript /usr/local/share/ghostscript
RUN ldconfig /usr/local/lib
# External tool layers, all use --link for independent layer caching and parallel pulls.
COPY --link --from=calibre-build /opt/calibre /opt/calibre
COPY --link --from=pdf-tools-build /usr/local/bin/qpdf /usr/bin/qpdf
# ImageMagick: 4 layers collapsed to 1 via the magick-export staging dir in pdf-tools-build
COPY --link --from=pdf-tools-build /magick-export/ /
COPY --link --from=gs-build /usr/local/bin/gs /usr/local/bin/gs
COPY --link --from=gs-build /usr/local/share/ghostscript /usr/local/share/ghostscript
# Python venv pre-built in python-venv-build (no pip install at runtime, no build tools needed)
COPY --link --from=python-venv-build /opt/venv /opt/venv
RUN ldconfig /usr/local/lib && \
PYTHONDONTWRITEBYTECODE=1 \
/opt/venv/bin/python -c "import cffi; print('cffi OK:', cffi.__version__)" && \
PYTHONDONTWRITEBYTECODE=1 \
/opt/venv/bin/python -c "import cv2; print('OpenCV', cv2.__version__)" && \
PYTHONDONTWRITEBYTECODE=1 \
/opt/venv/bin/python -c "import ocrmypdf; print('ocrmypdf OK')" && \
find /opt/venv -type d -name __pycache__ -exec rm -rf {} + 2>/dev/null || true
# ---
# Non-root user
@@ -646,16 +683,16 @@ RUN set -eux; \
# Application files.
WORKDIR /app
COPY --from=jar-extract --chown=1000:1000 /layers/dependencies/ /app/
COPY --from=jar-extract --chown=1000:1000 /layers/spring-boot-loader/ /app/
COPY --from=jar-extract --chown=1000:1000 /layers/snapshot-dependencies/ /app/
COPY --from=jar-extract --chown=1000:1000 /layers/application/ /app/
COPY --link --from=jar-extract --chown=1000:1000 /layers/dependencies/ /app/
COPY --link --from=jar-extract --chown=1000:1000 /layers/spring-boot-loader/ /app/
COPY --link --from=jar-extract --chown=1000:1000 /layers/snapshot-dependencies/ /app/
COPY --link --from=jar-extract --chown=1000:1000 /layers/application/ /app/
COPY --from=app-build --chown=1000:1000 \
COPY --link --from=app-build --chown=1000:1000 \
/app/build/libs/restart-helper.jar /restart-helper.jar
COPY --chown=1000:1000 scripts/ /scripts/
COPY --link --chown=1000:1000 scripts/ /scripts/
# Fonts go to system dir root ownership is correct (world-readable)
# Fonts go to system dir, root ownership is correct (world-readable)
COPY app/core/src/main/resources/static/fonts/*.ttf /usr/share/fonts/truetype/
# Permissions and configuration.
@@ -667,7 +704,7 @@ RUN set -eux; \
ln -sf /opt/venv/bin/weasyprint /usr/local/bin/weasyprint; \
ln -sf /opt/venv/bin/unoping /usr/local/bin/unoping; \
chmod +x /scripts/*; \
mkdir -p /configs /logs /customFiles \
mkdir -p /configs /configs/cache /configs/heap_dumps /logs /customFiles \
/pipeline/watchedFolders /pipeline/finishedFolders \
/tmp/stirling-pdf/heap_dumps; \
# Create symlinks to allow app to find these in /app/
@@ -684,15 +721,21 @@ RUN set -eux; \
chmod 750 /tmp/stirling-pdf/heap_dumps; \
fc-cache -f
# NOTE: Project Leyden AOT cache is generated in the background on first boot
# by init-without-ocr.sh. The cache is picked up on subsequent boots for
# 15-25% faster startup. See: JEP 483 + 514 + 515 (JDK 25).
# by init-without-ocr.sh and stored in /configs/cache/stirling.aot (persistent volume).
# The cache is picked up on subsequent boots for 15-25% faster startup.
# See: JEP 483 + 514 + 515 (JDK 25).
# Environment variables.
ARG VERSION_TAG
# Write version to a file so it is readable by scripts without env-var inheritance.
# init-without-ocr.sh reads /etc/stirling_version for the AOT cache fingerprint.
RUN echo "${VERSION_TAG:-dev}" > /etc/stirling_version
ENV VERSION_TAG=$VERSION_TAG \
STIRLING_AOT_ENABLE="false" \
STIRLING_JVM_PROFILE="balanced" \
_JVM_OPTS_BALANCED="-XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/stirling-pdf/heap_dumps -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:G1HeapRegionSize=4m -XX:G1PeriodicGCInterval=60000 -XX:+UseStringDeduplication -XX:+UseCompactObjectHeaders -XX:+ExplicitGCInvokesConcurrent -Dspring.threads.virtual.enabled=true -Djava.awt.headless=true" \
_JVM_OPTS_PERFORMANCE="-XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/stirling-pdf/heap_dumps -XX:+UseShenandoahGC -XX:ShenandoahGCMode=generational -XX:+UseCompactObjectHeaders -XX:+UseStringDeduplication -XX:+AlwaysPreTouch -XX:+ExplicitGCInvokesConcurrent -Dspring.threads.virtual.enabled=true -Djava.awt.headless=true" \
_JVM_OPTS_BALANCED="-XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/configs/heap_dumps -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:G1HeapRegionSize=4m -XX:G1PeriodicGCInterval=60000 -XX:+UseStringDeduplication -XX:+UseCompactObjectHeaders -XX:+ExplicitGCInvokesConcurrent -Dspring.threads.virtual.enabled=true -Djava.awt.headless=true" \
_JVM_OPTS_PERFORMANCE="-XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/configs/heap_dumps -XX:+UseShenandoahGC -XX:ShenandoahGCMode=generational -XX:+UseCompactObjectHeaders -XX:+UseStringDeduplication -XX:+AlwaysPreTouch -XX:+ExplicitGCInvokesConcurrent -Dspring.threads.virtual.enabled=true -Djava.awt.headless=true" \
JAVA_CUSTOM_OPTS="" \
HOME=/home/stirlingpdfuser \
PUID=${PUID} \
@@ -724,8 +767,8 @@ LABEL org.opencontainers.image.title="Stirling-PDF" \
EXPOSE 8080/tcp
STOPSIGNAL SIGTERM
HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
CMD curl -fs --show-error http://localhost:8080/api/v1/info/status || exit 1
HEALTHCHECK --interval=30s --timeout=15s --start-period=120s --retries=5 \
CMD curl -fs --max-time 10 http://localhost:8080/api/v1/info/status || exit 1
ENTRYPOINT ["tini", "--", "/scripts/init.sh"]
CMD []

View File

@@ -3,7 +3,7 @@
FROM ubuntu:noble AS calibre-build
ARG CALIBRE_VERSION=9.3.1
ARG CALIBRE_VERSION=9.4.0
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
set -eux; \
@@ -562,9 +562,10 @@ RUN set -eux; \
# Environment variables.
ARG VERSION_TAG
ENV VERSION_TAG=$VERSION_TAG \
STIRLING_AOT_ENABLE="false" \
STIRLING_JVM_PROFILE="balanced" \
_JVM_OPTS_BALANCED="-XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/stirling-pdf/heap_dumps -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:G1HeapRegionSize=4m -XX:G1PeriodicGCInterval=60000 -XX:+UseStringDeduplication -XX:+UseCompactObjectHeaders -XX:+ExplicitGCInvokesConcurrent -Dspring.threads.virtual.enabled=true -Djava.awt.headless=true" \
_JVM_OPTS_PERFORMANCE="-XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/stirling-pdf/heap_dumps -XX:+UseShenandoahGC -XX:ShenandoahGCMode=generational -XX:+UseCompactObjectHeaders -XX:+UseStringDeduplication -XX:+AlwaysPreTouch -XX:+ExplicitGCInvokesConcurrent -Dspring.threads.virtual.enabled=true -Djava.awt.headless=true" \
_JVM_OPTS_BALANCED="-XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/configs/heap_dumps -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:G1HeapRegionSize=4m -XX:G1PeriodicGCInterval=60000 -XX:+UseStringDeduplication -XX:+UseCompactObjectHeaders -XX:+ExplicitGCInvokesConcurrent -Dspring.threads.virtual.enabled=true -Djava.awt.headless=true" \
_JVM_OPTS_PERFORMANCE="-XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/configs/heap_dumps -XX:+UseShenandoahGC -XX:ShenandoahGCMode=generational -XX:+UseCompactObjectHeaders -XX:+UseStringDeduplication -XX:+AlwaysPreTouch -XX:+ExplicitGCInvokesConcurrent -Dspring.threads.virtual.enabled=true -Djava.awt.headless=true" \
JAVA_CUSTOM_OPTS="" \
HOME=/home/stirlingpdfuser \
PUID=${PUID} \

View File

@@ -69,9 +69,10 @@ LABEL org.opencontainers.image.title="Stirling-PDF Ultra-Lite" \
# NOTE: Memory flags (InitialRAMPercentage, MaxRAMPercentage, MaxMetaspaceSize)
# are computed dynamically by init-without-ocr.sh based on container memory limits.
ENV VERSION_TAG=$VERSION_TAG \
STIRLING_AOT_ENABLE="false" \
STIRLING_JVM_PROFILE="balanced" \
_JVM_OPTS_BALANCED="-XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/stirling-pdf/heap_dumps -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:G1HeapRegionSize=4m -XX:G1PeriodicGCInterval=60000 -XX:+UseStringDeduplication -XX:+UseCompactObjectHeaders -XX:+ExplicitGCInvokesConcurrent -Dspring.threads.virtual.enabled=true -Djava.awt.headless=true" \
_JVM_OPTS_PERFORMANCE="-XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/stirling-pdf/heap_dumps -XX:+UseShenandoahGC -XX:ShenandoahGCMode=generational -XX:+UseCompactObjectHeaders -XX:+UseStringDeduplication -XX:+AlwaysPreTouch -XX:+ExplicitGCInvokesConcurrent -Dspring.threads.virtual.enabled=true -Djava.awt.headless=true" \
_JVM_OPTS_BALANCED="-XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/configs/heap_dumps -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:G1HeapRegionSize=4m -XX:G1PeriodicGCInterval=60000 -XX:+UseStringDeduplication -XX:+UseCompactObjectHeaders -XX:+ExplicitGCInvokesConcurrent -Dspring.threads.virtual.enabled=true -Djava.awt.headless=true" \
_JVM_OPTS_PERFORMANCE="-XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/configs/heap_dumps -XX:+UseShenandoahGC -XX:ShenandoahGCMode=generational -XX:+UseCompactObjectHeaders -XX:+UseStringDeduplication -XX:+AlwaysPreTouch -XX:+ExplicitGCInvokesConcurrent -Dspring.threads.virtual.enabled=true -Djava.awt.headless=true" \
JAVA_CUSTOM_OPTS="" \
HOME=/home/stirlingpdfuser \
PUID=1000 \

380
scripts/aot-diagnostics.sh Executable file
View File

@@ -0,0 +1,380 @@
#!/bin/bash
# aot-diagnostics.sh - Project Leyden AOT cache diagnostic tool for Stirling-PDF
#
# Diagnoses AOT cache generation failures, especially on ARM64 (aarch64).
# Reports JVM feature support, memory limits, cache state, and fingerprint validity.
#
# Usage:
# aot-diagnostics.sh [--test] [--cache PATH]
#
# --test Run a quick AOT RECORD smoke test (~10-30s). Shows exactly
# what error the JVM produces, useful for ARM debugging.
# --cache PATH Override the AOT cache path (default: /configs/cache/stirling.aot)
#
# Symlink aliases set up by init-without-ocr.sh: aot-diag, aot-diagnostics
set -euo pipefail
AOT_CACHE_DEFAULT="/configs/cache/stirling.aot"
RUN_SMOKE_TEST=false
AOT_CACHE_PATH=""
for arg in "$@"; do
case "$arg" in
--test) RUN_SMOKE_TEST=true ;;
--cache=*) AOT_CACHE_PATH="${arg#--cache=}" ;;
--cache) shift; AOT_CACHE_PATH="${1:-}" ;;
-h|--help)
sed -n '/^#/,/^[^#]/{ /^#/{ s/^# \{0,1\}//; p } }' "$0" | head -20
exit 0
;;
esac
done
AOT_CACHE="${AOT_CACHE_PATH:-$AOT_CACHE_DEFAULT}"
AOT_FP="${AOT_CACHE}.fingerprint"
# ── Terminal colours ──────────────────────────────────────────────────────────
if [ -t 1 ]; then
C_RED='\033[0;31m' C_GRN='\033[0;32m' C_YLW='\033[0;33m'
C_CYN='\033[0;36m' C_BLD='\033[1m' C_RST='\033[0m'
else
C_RED='' C_GRN='' C_YLW='' C_CYN='' C_BLD='' C_RST=''
fi
PASS=0; WARN=0; FAIL=0
pass() { printf "${C_GRN}[PASS]${C_RST} %s\n" "$*"; PASS=$((PASS+1)); }
warn() { printf "${C_YLW}[WARN]${C_RST} %s\n" "$*"; WARN=$((WARN+1)); }
fail() { printf "${C_RED}[FAIL]${C_RST} %s\n" "$*"; FAIL=$((FAIL+1)); }
info() { printf "${C_CYN}[INFO]${C_RST} %s\n" "$*"; }
hdr() { printf "\n${C_BLD}=== %s ===${C_RST}\n" "$*"; }
command_exists() { command -v "$1" >/dev/null 2>&1; }
# ── Section 1: Environment ────────────────────────────────────────────────────
hdr "Environment"
info "Date: $(date -u +"%Y-%m-%dT%H:%M:%SZ" 2>/dev/null || date)"
info "Hostname: $(hostname 2>/dev/null || echo unknown)"
info "Architecture: $(uname -m)"
info "Kernel: $(uname -r)"
if [ -f /etc/stirling_version ]; then
info "Version: $(tr -d '\r\n' < /etc/stirling_version)"
elif [ -n "${VERSION_TAG:-}" ]; then
info "Version: ${VERSION_TAG}"
else
warn "VERSION_TAG not set and /etc/stirling_version not found"
fi
if [ -f /etc/os-release ]; then
info "OS: $(. /etc/os-release; echo "${PRETTY_NAME:-${NAME:-unknown}}")"
fi
# Warn about external JVM option vars — these break AOT training if set
for _jvm_var in JAVA_TOOL_OPTIONS JDK_JAVA_OPTIONS _JAVA_OPTIONS; do
_jvm_val="$(eval echo "\${${_jvm_var}:-}")"
if [ -n "$_jvm_val" ]; then
warn "${_jvm_var}='${_jvm_val}'"
warn " External JVM options are cleared during AOT training (fixed), but may"
warn " affect the running app. Ensure they are compatible with -Xmx limits."
fi
done
unset _jvm_var _jvm_val
# ── Section 2: JVM Detection ──────────────────────────────────────────────────
hdr "JVM Detection"
if ! command_exists java; then
fail "java not found in PATH. PATH=${PATH}"
exit 1
fi
JDK_VER="$(JAVA_TOOL_OPTIONS= JDK_JAVA_OPTIONS= _JAVA_OPTIONS= java -version 2>&1 | head -1)"
info "JDK: ${JDK_VER}"
info "java binary: $(command -v java)"
ARCH="$(uname -m)"
# --- AOTMode support (Project Leyden) ---
AOT_SUPPORTED=false
if java -XX:AOTMode=off -version >/dev/null 2>&1; then
pass "AOTMode supported (-XX:AOTMode=off accepted)"
AOT_SUPPORTED=true
else
fail "AOTMode NOT supported on this JVM build ($(uname -m))"
fail " This JDK does not support Project Leyden (JEP 483/514/515)."
fail " AOT cache generation will be skipped."
if [[ "$ARCH" == "aarch64" ]]; then
warn " ARM64: some vendor JDK 25 builds omit Leyden. Try eclipse-temurin:25-jre."
fi
fi
# --- CompactObjectHeaders support (Project Lilliput) ---
COMPACT_HEADERS_FLAG=""
if java -XX:+UseCompactObjectHeaders -version >/dev/null 2>&1; then
pass "UseCompactObjectHeaders supported (Project Lilliput active)"
COMPACT_HEADERS_FLAG="-XX:+UseCompactObjectHeaders"
else
warn "UseCompactObjectHeaders NOT supported on $(uname -m)"
warn " AOT training will run without this flag. Runtime must also omit it."
if [[ "$ARCH" == "aarch64" ]]; then
warn " This is the most common cause of ARM AOT failures: the flag was"
warn " hardcoded in training but unsupported at runtime (or vice-versa)."
fi
fi
# --- CompressedOops ---
COMPRESSED_OOPS_FLAG="-XX:+UseCompressedOops"
if java -XX:+UseCompressedOops -version >/dev/null 2>&1; then
pass "UseCompressedOops accepted by JVM"
else
warn "UseCompressedOops flag not accepted — will use -XX:-UseCompressedOops"
COMPRESSED_OOPS_FLAG="-XX:-UseCompressedOops"
fi
# ── Section 3: Memory Limits ──────────────────────────────────────────────────
hdr "Memory Limits"
MEM_MB=0
if [ -f /sys/fs/cgroup/memory.max ]; then
RAW="$(cat /sys/fs/cgroup/memory.max 2>/dev/null || echo '')"
if [ "$RAW" = "max" ]; then
info "cgroup v2 memory.max: unlimited"
elif [ -n "$RAW" ] && [ "$RAW" -gt 0 ] 2>/dev/null; then
MEM_MB=$(( RAW / 1048576 ))
info "cgroup v2 memory.max: ${MEM_MB}MB"
fi
elif [ -f /sys/fs/cgroup/memory/memory.limit_in_bytes ]; then
RAW="$(cat /sys/fs/cgroup/memory/memory.limit_in_bytes 2>/dev/null || echo '')"
if [ "${#RAW}" -ge 19 ]; then
info "cgroup v1 limit: unlimited (max uint64)"
elif [ -n "$RAW" ] && [ "$RAW" -gt 0 ] 2>/dev/null; then
MEM_MB=$(( RAW / 1048576 ))
info "cgroup v1 limit: ${MEM_MB}MB"
fi
else
info "No cgroup memory limit detected"
fi
if [ "$MEM_MB" -eq 0 ] && [ -f /proc/meminfo ]; then
MEM_MB=$(awk '/MemTotal/ {print int($2/1024)}' /proc/meminfo 2>/dev/null || echo 0)
info "System MemTotal: ${MEM_MB}MB"
fi
MIN_MEM=768
if [ "$ARCH" = "aarch64" ]; then
MIN_MEM=1024
fi
if [ "$MEM_MB" -eq 0 ]; then
warn "Could not determine container memory. AOT generation may be skipped."
elif [ "$MEM_MB" -le "$MIN_MEM" ]; then
warn "Available memory (${MEM_MB}MB) is at or below AOT generation minimum (${MIN_MEM}MB on ${ARCH})."
warn " AOT background generation will be skipped for this architecture."
warn " Increase container memory above ${MIN_MEM}MB to enable AOT cache generation."
else
pass "Memory OK: ${MEM_MB}MB available, minimum ${MIN_MEM}MB for ${ARCH}"
fi
if command_exists free; then
FREE_MB="$(free -m 2>/dev/null | awk '/^Mem:/ {print $7}')"
info "Available (free+cache): ${FREE_MB:-?}MB"
fi
# ── Section 4: AOT Cache State ────────────────────────────────────────────────
hdr "AOT Cache State"
info "Cache path: ${AOT_CACHE}"
info "Fingerprint path: ${AOT_FP}"
if [ -f "${AOT_CACHE}" ]; then
CACHE_SIZE="$(du -h "${AOT_CACHE}" 2>/dev/null | cut -f1 || echo '?')"
CACHE_MTIME="$(stat -c '%y' "${AOT_CACHE}" 2>/dev/null | cut -d. -f1 || echo '?')"
info "Cache exists: ${CACHE_SIZE} (modified ${CACHE_MTIME})"
if [ -s "${AOT_CACHE}" ]; then
pass "Cache file is non-empty"
else
fail "Cache file is empty — will be regenerated on next boot"
rm -f "${AOT_CACHE}" "${AOT_FP}" 2>/dev/null || true
fi
else
warn "No cache file at ${AOT_CACHE}"
info " Cache will be generated in background on next boot."
if [ ! -d "$(dirname "${AOT_CACHE}")" ]; then
warn " Parent directory $(dirname "${AOT_CACHE}") does not exist."
warn " Ensure /configs is volume-mounted and writable."
fi
fi
# --- Fingerprint validation ---
if [ -f "${AOT_FP}" ]; then
STORED_FP="$(tr -d '\r\n' < "${AOT_FP}" 2>/dev/null || echo '')"
info "Stored fingerprint: ${STORED_FP}"
# Recompute fingerprint using the same logic as init-without-ocr.sh
FP=""
FP+="jdk:$(JAVA_TOOL_OPTIONS= JDK_JAVA_OPTIONS= _JAVA_OPTIONS= java -version 2>&1 | head -1);"
FP+="arch:${ARCH};"
FP+="compact:${COMPACT_HEADERS_FLAG:-none};"
FP+="oops:${COMPRESSED_OOPS_FLAG:-none};"
if [ -f /app/app.jar ]; then
FP+="app:$(stat -c '%s-%Y' /app/app.jar 2>/dev/null || echo unknown);"
elif [ -f /app.jar ]; then
FP+="app:$(stat -c '%s-%Y' /app.jar 2>/dev/null || echo unknown);"
elif [ -d /app/lib ]; then
FP+="app:$(ls -la /app/lib/ 2>/dev/null | md5sum 2>/dev/null | cut -c1-16 || echo unknown);"
fi
FP+="ver:${VERSION_TAG:-unknown};"
if command_exists md5sum; then
EXPECTED_FP="$(printf '%s' "$FP" | md5sum | cut -c1-16)"
elif command_exists sha256sum; then
EXPECTED_FP="$(printf '%s' "$FP" | sha256sum | cut -c1-16)"
else
EXPECTED_FP="$(printf '%s' "$FP" | cksum | cut -d' ' -f1)"
fi
info "Expected fingerprint: ${EXPECTED_FP}"
if [ "$STORED_FP" = "$EXPECTED_FP" ]; then
pass "Fingerprint valid — cache matches current JDK/arch/app"
else
fail "Fingerprint mismatch — cache is stale"
info " The cache was built with a different JDK, arch, flags, or app version."
info " It will be automatically removed and regenerated on next boot."
# Print a diff of fingerprint components for easier debugging
printf " Stored FP string: (run with --test to regenerate)\n"
printf " Expected FP string: %s\n" "$FP"
fi
else
if [ -f "${AOT_CACHE}" ]; then
warn "Cache exists but no fingerprint file found"
warn " Cache will be treated as stale and regenerated on next boot."
else
info "No fingerprint file (expected — cache not yet generated)"
fi
fi
# ── Section 5: JAR Layout Detection ──────────────────────────────────────────
hdr "JAR Layout"
if [ -f /app/app.jar ] && [ -d /app/lib ]; then
pass "Spring Boot 4 layered layout: /app/app.jar + /app/lib/"
info " Classpath: -cp /app/app.jar:/app/lib/* stirling.software.SPDF.SPDFApplication"
JAR_LAYOUT="layered"
elif [ -f /app.jar ]; then
pass "Single JAR layout: /app.jar"
info " Invocation: -jar /app.jar"
JAR_LAYOUT="single"
elif [ -d /app/BOOT-INF ]; then
pass "Exploded Spring Boot 3 layout: /app/BOOT-INF"
info " Classpath: -cp /app org.springframework.boot.loader.launch.JarLauncher"
JAR_LAYOUT="exploded"
else
fail "No recognisable JAR layout found. Looked for:"
fail " /app/app.jar + /app/lib/ (Spring Boot 4 layered)"
fail " /app.jar (single fat JAR)"
fail " /app/BOOT-INF/ (Spring Boot 3 exploded)"
JAR_LAYOUT="unknown"
fi
# ── Section 6: Disk Space ─────────────────────────────────────────────────────
hdr "Disk Space"
CACHE_DIR="$(dirname "${AOT_CACHE}")"
if [ -d "$CACHE_DIR" ]; then
DF="$(df -h "$CACHE_DIR" 2>/dev/null | tail -1 || echo '')"
info "Volume ($CACHE_DIR): $DF"
AVAIL_PCT="$(df "$CACHE_DIR" 2>/dev/null | awk 'NR==2{print $5}' | tr -d '%')"
if [ -n "$AVAIL_PCT" ] && [ "$AVAIL_PCT" -ge 95 ]; then
fail "Disk almost full (${AVAIL_PCT}% used). AOT cache creation will fail."
elif [ -n "$AVAIL_PCT" ] && [ "$AVAIL_PCT" -ge 85 ]; then
warn "Disk usage high (${AVAIL_PCT}% used). AOT cache is typically 50-150MB."
else
pass "Sufficient disk space available"
fi
else
warn "Cache directory ${CACHE_DIR} does not exist."
warn " /configs must be volume-mounted. AOT cache will not persist across restarts."
fi
# ── Section 7: Optional Smoke Test ───────────────────────────────────────────
if [ "$RUN_SMOKE_TEST" = true ]; then
hdr "AOT RECORD Smoke Test"
if [ "$AOT_SUPPORTED" = false ]; then
warn "Skipping smoke test — AOTMode not supported on this JVM"
elif [ "$JAR_LAYOUT" = "unknown" ]; then
warn "Skipping smoke test — could not determine JAR layout"
else
info "Running minimal AOT RECORD phase (this may take 10-30s on ARM)..."
SMOKE_CONF="/tmp/aot-diag-smoke.aotconf"
SMOKE_LOG="/tmp/aot-diag-smoke.log"
rm -f "$SMOKE_CONF" "$SMOKE_LOG"
SMOKE_CMD=(java -Xmx256m ${COMPACT_HEADERS_FLAG:-} ${COMPRESSED_OOPS_FLAG}
-Xlog:aot=info
-XX:AOTMode=record
-XX:AOTConfiguration="$SMOKE_CONF"
-Dspring.main.banner-mode=off
-Dspring.context.exit=onRefresh
-Dstirling.datasource.url="jdbc:h2:mem:aotsmoke;DB_CLOSE_DELAY=-1;MODE=PostgreSQL")
case "$JAR_LAYOUT" in
layered) SMOKE_CMD+=(-cp "/app/app.jar:/app/lib/*" stirling.software.SPDF.SPDFApplication) ;;
single) SMOKE_CMD+=(-jar /app.jar) ;;
exploded) SMOKE_CMD+=(-cp /app org.springframework.boot.loader.launch.JarLauncher) ;;
esac
info "Command: ${SMOKE_CMD[*]}"
SMOKE_EXIT=0
if command_exists timeout; then
JAVA_TOOL_OPTIONS= JDK_JAVA_OPTIONS= _JAVA_OPTIONS= \
timeout 120s "${SMOKE_CMD[@]}" >"$SMOKE_LOG" 2>&1 || SMOKE_EXIT=$?
else
JAVA_TOOL_OPTIONS= JDK_JAVA_OPTIONS= _JAVA_OPTIONS= \
"${SMOKE_CMD[@]}" >"$SMOKE_LOG" 2>&1 || SMOKE_EXIT=$?
fi
case "$SMOKE_EXIT" in
0|1)
if [ -f "$SMOKE_CONF" ] && [ -s "$SMOKE_CONF" ]; then
CONF_SIZE="$(du -h "$SMOKE_CONF" | cut -f1)"
pass "RECORD phase succeeded (exit=${SMOKE_EXIT}, conf=${CONF_SIZE})"
info " AOT cache generation should work on this system."
else
fail "RECORD phase exit=${SMOKE_EXIT} but no .aotconf produced"
info " Last 30 lines of AOT output:"
tail -30 "$SMOKE_LOG" 2>/dev/null | while IFS= read -r line; do
printf " %s\n" "$line"
done
fi
;;
124)
fail "RECORD phase timed out after 120s"
warn " On ARM under QEMU or slow storage this can happen."
warn " Try running with more memory or on native ARM hardware."
;;
137)
fail "RECORD phase OOM-killed (exit 137)"
warn " Increase container memory. Minimum for ARM AOT training: 1GB."
;;
*)
fail "RECORD phase failed (exit=${SMOKE_EXIT})"
info " Last 30 lines of AOT output:"
tail -30 "$SMOKE_LOG" 2>/dev/null | while IFS= read -r line; do
printf " %s\n" "$line"
done
;;
esac
rm -f "$SMOKE_CONF" "$SMOKE_LOG"
fi
fi
# ── Summary ───────────────────────────────────────────────────────────────────
printf "\n${C_BLD}=== Summary: PASS=%d WARN=%d FAIL=%d ===${C_RST}\n" \
"$PASS" "$WARN" "$FAIL"
if [ "$FAIL" -gt 0 ]; then
printf "${C_RED}AOT cache has issues. See FAIL items above.${C_RST}\n"
printf "To disable AOT: omit STIRLING_AOT_ENABLE (default is off) or set STIRLING_AOT_ENABLE=false\n"
exit 1
elif [ "$WARN" -gt 0 ]; then
printf "${C_YLW}AOT cache may not function optimally. See WARN items above.${C_RST}\n"
exit 0
else
printf "${C_GRN}All AOT checks passed.${C_RST}\n"
exit 0
fi

View File

@@ -23,6 +23,11 @@ if [ -x /scripts/stirling-diagnostics.sh ]; then
ln -sf /scripts/stirling-diagnostics.sh /usr/local/bin/debug
ln -sf /scripts/stirling-diagnostics.sh /usr/local/bin/diagnostic
fi
if [ -x /scripts/aot-diagnostics.sh ] && [ "${STIRLING_AOT_ENABLE:-false}" = "true" ]; then
mkdir -p /usr/local/bin
ln -sf /scripts/aot-diagnostics.sh /usr/local/bin/aot-diag
ln -sf /scripts/aot-diagnostics.sh /usr/local/bin/aot-diagnostics
fi
print_versions() {
set +o pipefail
@@ -46,17 +51,45 @@ print_versions() {
}
cleanup() {
# Prevent re-entrance from double signals
trap '' SIGTERM EXIT
log "Shutdown signal received. Cleaning up..."
# Kill background AOT generation if still running
[ -n "${AOT_GEN_PID:-}" ] && kill -TERM "$AOT_GEN_PID" 2>/dev/null || true
# Kill background processes (unoservers, watchdog, Xvfb)
pkill -P $$ || true
# Kill Java if it was backgrounded (though it handles its own shutdown)
[ -n "${JAVA_PID:-}" ] && kill -TERM "$JAVA_PID" 2>/dev/null || true
# Kill background AOT generation first (least important, clean up tmp files)
if [ -n "${AOT_GEN_PID:-}" ] && kill -0 "$AOT_GEN_PID" 2>/dev/null; then
kill -TERM "$AOT_GEN_PID" 2>/dev/null || true
wait "$AOT_GEN_PID" 2>/dev/null || true
fi
# Signal unoserver instances to shut down
for pid in "${UNOSERVER_PIDS[@]:-}"; do
[ -n "$pid" ] && kill -TERM "$pid" 2>/dev/null || true
done
# Signal Java to shut down gracefully, Spring Boot handles SIGTERM cleanly
if [ -n "${JAVA_PID:-}" ] && kill -0 "$JAVA_PID" 2>/dev/null; then
kill -TERM "$JAVA_PID" 2>/dev/null || true
# Wait up to 30s for graceful shutdown before forcing
local _i=0
while [ "$_i" -lt 30 ] && kill -0 "$JAVA_PID" 2>/dev/null; do
sleep 1
_i=$((_i + 1))
done
if kill -0 "$JAVA_PID" 2>/dev/null; then
log "Java did not exit within 30s, sending SIGKILL"
kill -KILL "$JAVA_PID" 2>/dev/null || true
fi
fi
# Kill any remaining children (watchdog, Xvfb, etc.)
pkill -P $$ 2>/dev/null || true
log "Cleanup complete."
}
trap cleanup SIGTERM EXIT
trap cleanup SIGTERM
trap cleanup EXIT
print_versions
@@ -321,6 +354,10 @@ if [ -z "${VERSION_TAG:-}" ] && [ -f /etc/stirling_version ]; then
export VERSION_TAG
fi
# ---------- AOT ----------
# OFF by default. Set STIRLING_AOT_ENABLE=true to opt in.
AOT_ENABLED="${STIRLING_AOT_ENABLE:-false}"
# ---------- Dynamic Memory Detection ----------
# Detects the container memory limit (in MB) from cgroups v2/v1 or /proc/meminfo.
detect_container_memory_mb() {
@@ -408,9 +445,9 @@ compute_dynamic_memory() {
# ---------- Project Leyden AOT Cache (JEP 483 + 514 + 515) ----------
# Replaces legacy AppCDS with JDK 25's AOT cache. Uses the three-step workflow:
# 1. RECORD runs Spring context init, captures class loading + method profiles
# 2. CREATE builds the AOT cache file (does NOT start the app)
# 3. RUNTIME java -XX:AOTCache=... starts with pre-linked classes + compiled methods
# 1. RECORD , runs Spring context init, captures class loading + method profiles
# 2. CREATE , builds the AOT cache file (does NOT start the app)
# 3. RUNTIME, java -XX:AOTCache=... starts with pre-linked classes + compiled methods
# Constraints:
# - Cache must be generated on the same JDK build + OS + arch as production (satisfied
# because we generate inside the same container image at runtime)
@@ -426,64 +463,193 @@ generate_aot_cache() {
mkdir -p "$aot_dir" 2>/dev/null || true
local aot_conf="/tmp/stirling.aotconf"
local arch
arch=$(uname -m)
log "AOT: Phase 1/2 — Recording class loading + method profiles..."
# ── ARM-aware heap sizing ──
# ARM devices (Raspberry Pi, Ampere) often have tighter memory.
# Scale training heap down to avoid OOM-killing the background generation.
local record_xmx="512m"
local create_xmx="256m"
if [ "${CONTAINER_MEM_MB:-0}" -gt 0 ] && [ "${CONTAINER_MEM_MB}" -le 1024 ]; then
record_xmx="256m"
create_xmx="128m"
fi
# RECORD — starts Spring context, observes class loading + collects method profiles (JEP 515).
# -Dspring.context.exit=onRefresh stops after Spring context loads (good training coverage).
# Uses -Xmx512m: enough for Spring context init without starving the running application.
# -Xlog:aot=error suppresses harmless "Skipping"/"Preload Warning" messages for proxies,
# signed JARs (BouncyCastle), JFR events, CGLIB classes, etc. The JVM handles all of
# these internally they are informational, not errors.
# Non-zero exit is expected — onRefresh triggers controlled shutdown.
# Uses in-memory H2 database to avoid file-lock conflicts with the running application.
# Note: DatabaseConfig reads System.getProperty("stirling.datasource.url") to override
# the default file-based H2 URL. We use MODE=PostgreSQL to match the production config.
# Redirect both stdout and stderr to suppress duplicate startup logs (banner + Spring init).
# IMPORTANT: COMPRESSED_OOPS_FLAG must match the runtime setting to avoid AOT cache
# invalidation on restart ("saved state of UseCompressedOops ... is different" error).
java -Xmx512m -XX:+UseCompactObjectHeaders ${COMPRESSED_OOPS_FLAG} \
-Xlog:aot=error \
-XX:AOTMode=record \
-XX:AOTConfiguration="$aot_conf" \
-Dspring.main.banner-mode=off \
-Dspring.context.exit=onRefresh \
-Dstirling.datasource.url="jdbc:h2:mem:aottraining;DB_CLOSE_DELAY=-1;MODE=PostgreSQL" \
"$@" >/tmp/aot-record.log 2>&1 || true
# ── ARM-aware timeouts ──
# ARM under QEMU or on slow SD/eMMC can take much longer than x86_64.
local record_timeout=300
local create_timeout=180
if [ "$arch" = "aarch64" ]; then
record_timeout=600
create_timeout=300
fi
log "AOT: arch=${arch} mem=${CONTAINER_MEM_MB:-?}MB heap=${record_xmx} timeouts=${record_timeout}s/${create_timeout}s"
log "AOT: COMPACT_HEADERS='${COMPACT_HEADERS_FLAG:-<none>}' COMPRESSED_OOPS='${COMPRESSED_OOPS_FLAG}'"
log "AOT: Phase 1/2, Recording class loading + method profiles..."
# RECORD, starts Spring context, observes class loading + collects method profiles (JEP 515).
# Non-zero exit is expected: -Dspring.context.exit=onRefresh triggers controlled shutdown.
# Uses in-memory H2 to avoid file-lock conflicts with the running app.
# COMPACT_HEADERS_FLAG/COMPRESSED_OOPS_FLAG must exactly match the runtime invocation.
# Clear all JVM option env vars so external settings (e.g. _JAVA_OPTIONS=-Xms14G) cannot
# conflict with the explicit -Xmx we pass here. Training uses its own minimal flag set.
local record_exit=0
if command_exists timeout; then
JAVA_TOOL_OPTIONS= JDK_JAVA_OPTIONS= _JAVA_OPTIONS= \
timeout "${record_timeout}s" \
java "-Xmx${record_xmx}" ${COMPACT_HEADERS_FLAG:-} ${COMPRESSED_OOPS_FLAG} \
-Xlog:aot=error \
-XX:AOTMode=record \
-XX:AOTConfiguration="$aot_conf" \
-Dspring.main.banner-mode=off \
-Dspring.context.exit=onRefresh \
-Dstirling.datasource.url="jdbc:h2:mem:aottraining;DB_CLOSE_DELAY=-1;MODE=PostgreSQL" \
"$@" >/tmp/aot-record.log 2>&1 || record_exit=$?
else
JAVA_TOOL_OPTIONS= JDK_JAVA_OPTIONS= _JAVA_OPTIONS= \
java "-Xmx${record_xmx}" ${COMPACT_HEADERS_FLAG:-} ${COMPRESSED_OOPS_FLAG} \
-Xlog:aot=error \
-XX:AOTMode=record \
-XX:AOTConfiguration="$aot_conf" \
-Dspring.main.banner-mode=off \
-Dspring.context.exit=onRefresh \
-Dstirling.datasource.url="jdbc:h2:mem:aottraining;DB_CLOSE_DELAY=-1;MODE=PostgreSQL" \
"$@" >/tmp/aot-record.log 2>&1 || record_exit=$?
fi
if [ "$record_exit" -eq 124 ]; then
log "AOT: RECORD phase timed out after ${record_timeout}s, skipping"
rm -f "$aot_conf" /tmp/aot-record.log
return 1
fi
if [ "$record_exit" -eq 137 ]; then
log "AOT: RECORD phase OOM-killed (exit 137), container memory too low for training"
log "AOT: Set STIRLING_AOT_ENABLE=false or increase container memory above 1GB"
rm -f "$aot_conf" /tmp/aot-record.log
return 1
fi
if [ ! -f "$aot_conf" ]; then
log "AOT: Training produced no configuration file."
tail -5 /tmp/aot-record.log 2>/dev/null | while IFS= read -r line; do log " $line"; done
log "AOT: Training produced no configuration file (exit=${record_exit}), last 30 lines:"
tail -30 /tmp/aot-record.log 2>/dev/null | while IFS= read -r line; do log " $line"; done
rm -f /tmp/aot-record.log
return 1
fi
log "AOT: Phase 1 complete, conf $(du -h "$aot_conf" 2>/dev/null | cut -f1)"
log "AOT: Phase 2/2, Creating AOT cache from recorded profile..."
log "AOT: Phase 2/2 — Creating AOT cache from recorded profile..."
# CREATE — does NOT start the application. Processes the recorded configuration
# to build the AOT cache with pre-linked classes and optimized native code.
# Uses less memory than the training run.
# -Xlog:aot=error: same as record phase — suppress harmless skip/preload warnings.
# Redirect both stdout and stderr to avoid polluting container logs.
# IMPORTANT: COMPRESSED_OOPS_FLAG must match both RECORD and RUNTIME.
if java -Xmx256m -XX:+UseCompactObjectHeaders ${COMPRESSED_OOPS_FLAG} \
-Xlog:aot=error \
-XX:AOTMode=create \
-XX:AOTConfiguration="$aot_conf" \
-XX:AOTCache="$aot_path" \
"$@" >/tmp/aot-create.log 2>&1; then
local cache_size
cache_size=$(du -h "$aot_path" 2>/dev/null | cut -f1)
log "AOT: Cache created successfully: $aot_path ($cache_size)"
rm -f "$aot_conf" /tmp/aot-record.log /tmp/aot-create.log
return 0
# CREATE, does NOT start the application; builds pre-linked class + method data.
local create_exit=0
if command_exists timeout; then
JAVA_TOOL_OPTIONS= JDK_JAVA_OPTIONS= _JAVA_OPTIONS= \
timeout "${create_timeout}s" \
java "-Xmx${create_xmx}" ${COMPACT_HEADERS_FLAG:-} ${COMPRESSED_OOPS_FLAG} \
-Xlog:aot=error \
-XX:AOTMode=create \
-XX:AOTConfiguration="$aot_conf" \
-XX:AOTCache="$aot_path" \
"$@" >/tmp/aot-create.log 2>&1 || create_exit=$?
else
log "AOT: Cache creation failed."
tail -5 /tmp/aot-create.log 2>/dev/null | while IFS= read -r line; do log " $line"; done
JAVA_TOOL_OPTIONS= JDK_JAVA_OPTIONS= _JAVA_OPTIONS= \
java "-Xmx${create_xmx}" ${COMPACT_HEADERS_FLAG:-} ${COMPRESSED_OOPS_FLAG} \
-Xlog:aot=error \
-XX:AOTMode=create \
-XX:AOTConfiguration="$aot_conf" \
-XX:AOTCache="$aot_path" \
"$@" >/tmp/aot-create.log 2>&1 || create_exit=$?
fi
if [ "$create_exit" -eq 124 ]; then
log "AOT: CREATE phase timed out after ${create_timeout}s"
rm -f "$aot_conf" "$aot_path" /tmp/aot-record.log /tmp/aot-create.log
return 1
fi
if [ "$create_exit" -eq 137 ]; then
log "AOT: CREATE phase OOM-killed (exit 137)"
rm -f "$aot_conf" "$aot_path" /tmp/aot-record.log /tmp/aot-create.log
return 1
fi
if [ "$create_exit" -eq 0 ] && [ -f "$aot_path" ] && [ -s "$aot_path" ]; then
local cache_size
cache_size=$(du -h "$aot_path" 2>/dev/null | cut -f1)
log "AOT: Cache created successfully: $aot_path ($cache_size)"
chmod 644 "$aot_path" 2>/dev/null || true
save_aot_fingerprint "$aot_path"
rm -f "$aot_conf" /tmp/aot-record.log /tmp/aot-create.log
return 0
else
log "AOT: Cache creation failed (exit=${create_exit}), last 30 lines:"
tail -30 /tmp/aot-create.log 2>/dev/null | while IFS= read -r line; do log " $line"; done
rm -f "$aot_conf" "$aot_path" /tmp/aot-record.log /tmp/aot-create.log
return 1
fi
}
# ---------- AOT Cache Fingerprinting ----------
# Detects stale caches automatically when the app JAR, JDK version, arch, or JVM flags change.
# Stores a short hash alongside the cache file; mismatch → cache is deleted and regenerated.
compute_aot_fingerprint() {
local fp=""
# Clear JAVA_TOOL_OPTIONS / JDK_JAVA_OPTIONS so the JVM does not prepend
# "Picked up JAVA_TOOL_OPTIONS: ..." to stderr before the version line.
# Those vars are exported by the time the background subshell runs
# save_aot_fingerprint, but are NOT yet set when validate_aot_cache runs on
# the next boot -- causing head -1 to return different strings each time.
fp+="jdk:$(JAVA_TOOL_OPTIONS= JDK_JAVA_OPTIONS= _JAVA_OPTIONS= java -version 2>&1 | head -1);"
fp+="arch:$(uname -m);"
fp+="compact:${COMPACT_HEADERS_FLAG:-none};"
fp+="oops:${COMPRESSED_OOPS_FLAG:-none};"
# App identity: size+mtime is fast (avoids hashing 200MB JARs)
if [ -f /app/app.jar ]; then
fp+="app:$(stat -c '%s-%Y' /app/app.jar 2>/dev/null || echo unknown);"
elif [ -f /app.jar ]; then
fp+="app:$(stat -c '%s-%Y' /app.jar 2>/dev/null || echo unknown);"
elif [ -d /app/lib ]; then
fp+="app:$(ls -la /app/lib/ 2>/dev/null | md5sum 2>/dev/null | cut -c1-16 || echo unknown);"
fi
fp+="ver:${VERSION_TAG:-unknown};"
if command_exists md5sum; then
printf '%s' "$fp" | md5sum | cut -c1-16
elif command_exists sha256sum; then
printf '%s' "$fp" | sha256sum | cut -c1-16
else
printf '%s' "$fp" | cksum | cut -d' ' -f1
fi
}
validate_aot_cache() {
local cache_path="$1"
local fp_file="${cache_path}.fingerprint"
[ -f "$cache_path" ] || return 1
if [ ! -s "$cache_path" ]; then
log "AOT: Cache file is empty, removing."
rm -f "$cache_path" "$fp_file"
return 1
fi
local expected_fp stored_fp=""
expected_fp=$(compute_aot_fingerprint)
[ -f "$fp_file" ] && stored_fp=$(cat "$fp_file" 2>/dev/null || true)
if [ "$stored_fp" != "$expected_fp" ]; then
log "AOT: Fingerprint mismatch (stored=${stored_fp:-<none>} expected=${expected_fp})."
log "AOT: JAR, JDK, arch, or flags changed, removing stale cache."
rm -f "$cache_path" "$fp_file"
return 1
fi
log "AOT: Cache fingerprint valid (${expected_fp})"
return 0
}
save_aot_fingerprint() {
local cache_path="$1"
local fp_file="${cache_path}.fingerprint"
compute_aot_fingerprint > "$fp_file" 2>/dev/null || true
chmod 644 "$fp_file" 2>/dev/null || true
}
# ---------- Memory Detection ----------
@@ -493,24 +659,18 @@ compute_dynamic_memory "$CONTAINER_MEM_MB" "$JVM_PROFILE"
MEMORY_FLAGS="-XX:InitialRAMPercentage=${DYNAMIC_INITIAL_RAM_PCT} -XX:MaxRAMPercentage=${DYNAMIC_MAX_RAM_PCT} -XX:MaxMetaspaceSize=${DYNAMIC_MAX_METASPACE}m"
# ---------- Compressed Oops Detection ----------
# AOT/CDS cache is sensitive to UseCompressedOops. The setting must be identical
# between the training run (generate_aot_cache) and all subsequent runtime boots.
# With small -Xmx during training the JVM defaults to +UseCompressedOops, but at
# runtime a large MaxRAMPercentage (e.g. 50% of 64GB ≈ 32GB) may cause the JVM to
# disable it, invalidating the cache. We compute the expected max heap and lock the
# flag so every invocation agrees.
if [ "$CONTAINER_MEM_MB" -gt 0 ] 2>/dev/null; then
MAX_HEAP_MB=$((CONTAINER_MEM_MB * DYNAMIC_MAX_RAM_PCT / 100))
# JVM disables compressed oops when max heap >= ~32 GB (exact threshold varies
# by alignment / JVM build). Use a conservative 31744 MB (~31 GB) cutoff.
if [ "$MAX_HEAP_MB" -ge 31744 ]; then
COMPRESSED_OOPS_FLAG="-XX:-UseCompressedOops"
# Only needed for AOT cache consistency (training and runtime must agree on this flag).
if [ "$AOT_ENABLED" = "true" ]; then
if [ "$CONTAINER_MEM_MB" -gt 0 ] 2>/dev/null; then
MAX_HEAP_MB=$((CONTAINER_MEM_MB * DYNAMIC_MAX_RAM_PCT / 100))
if [ "$MAX_HEAP_MB" -ge 31744 ]; then
COMPRESSED_OOPS_FLAG="-XX:-UseCompressedOops"
else
COMPRESSED_OOPS_FLAG="-XX:+UseCompressedOops"
fi
else
COMPRESSED_OOPS_FLAG="-XX:+UseCompressedOops"
fi
else
# Cannot detect memory — default matches small-heap behaviour
COMPRESSED_OOPS_FLAG="-XX:+UseCompressedOops"
fi
# ---------- JVM Profile Selection ----------
@@ -563,58 +723,63 @@ else
fi
fi
# Check if Project Lilliput is supported (standard in Java 25+)
# Check if Project Lilliput is supported (standard in Java 25+, but experimental on some ARM builds)
# COMPACT_HEADERS_FLAG is used by generate_aot_cache() to ensure training/runtime consistency.
if java -XX:+UseCompactObjectHeaders -version >/dev/null 2>&1; then
COMPACT_HEADERS_FLAG="-XX:+UseCompactObjectHeaders"
# Only append if not already present in JAVA_BASE_OPTS
case "${JAVA_BASE_OPTS}" in
*UseCompactObjectHeaders*) ;;
*)
log "JVM supports Compact Object Headers. Enabling Project Lilliput..."
log "JVM supports Compact Object Headers ($(uname -m)). Enabling Project Lilliput..."
JAVA_BASE_OPTS="${JAVA_BASE_OPTS} -XX:+UseCompactObjectHeaders"
;;
esac
else
log "JVM does not support Compact Object Headers. Skipping Project Lilliput flags."
COMPACT_HEADERS_FLAG=""
log "JVM does not support Compact Object Headers on $(uname -m). Skipping Project Lilliput flags."
fi
# ---------- AOT Support Check ----------
AOT_SUPPORTED=false
if [ "$AOT_ENABLED" = "true" ]; then
AOT_SUPPORTED=true
if ! java -XX:AOTMode=off -version >/dev/null 2>&1; then
log "AOT: JVM on $(uname -m) does not support -XX:AOTMode, AOT cache disabled"
AOT_SUPPORTED=false
fi
fi
# ---------- Clean deprecated/invalid JVM flags ----------
# Remove UseCompressedClassPointers (deprecated in Java 25+ with Lilliput)
JAVA_BASE_OPTS=$(echo "$JAVA_BASE_OPTS" | sed -E 's/-XX:[+-]UseCompressedClassPointers//g')
# Remove any existing UseCompressedOops (we manage it explicitly for AOT consistency)
JAVA_BASE_OPTS=$(echo "$JAVA_BASE_OPTS" | sed -E 's/-XX:[+-]UseCompressedOops//g')
# Append the computed compressed oops flag (must match AOT training)
JAVA_BASE_OPTS="${JAVA_BASE_OPTS} ${COMPRESSED_OOPS_FLAG}"
# Manage UseCompressedOops explicitly only when AOT is enabled (training/runtime must agree)
if [ "$AOT_ENABLED" = "true" ]; then
JAVA_BASE_OPTS=$(echo "$JAVA_BASE_OPTS" | sed -E 's/-XX:[+-]UseCompressedOops//g')
JAVA_BASE_OPTS="${JAVA_BASE_OPTS} ${COMPRESSED_OOPS_FLAG}"
fi
# ---------- AOT Cache Management (Project Leyden) ----------
# Strip any legacy CDS/AOT references from base opts (we manage AOT dynamically below)
JAVA_BASE_OPTS=$(echo "$JAVA_BASE_OPTS" | sed -E \
's/-XX:SharedArchiveFile=[^ ]*//g;
s/-Xshare:(auto|on|off)//g;
s/-XX:AOTCache=[^ ]*//g')
AOT_CACHE="/app/stirling.aot"
AOT_CACHE="/configs/cache/stirling.aot"
AOT_GENERATE_BACKGROUND=false
# Support both new (STIRLING_AOT_DISABLE) and legacy (STIRLING_CDS_DISABLE) env vars
AOT_DISABLED="${STIRLING_AOT_DISABLE:-${STIRLING_CDS_DISABLE:-false}}"
if [ "$AOT_ENABLED" = "true" ]; then
# Strip any legacy CDS/AOT references from base opts (managed dynamically here)
JAVA_BASE_OPTS=$(echo "$JAVA_BASE_OPTS" | sed -E \
's/-XX:SharedArchiveFile=[^ ]*//g;
s/-Xshare:(auto|on|off)//g;
s/-XX:AOTCache=[^ ]*//g')
if [ -f "$AOT_CACHE" ]; then
# Cache exists from a previous boot — use it.
# If the file is corrupt or from a different JDK build, the JVM issues a warning
# and continues without the cache (graceful degradation, no crash).
log "AOT cache found: $AOT_CACHE"
JAVA_BASE_OPTS="${JAVA_BASE_OPTS} -XX:AOTCache=${AOT_CACHE}"
# Clean up legacy .jsa if still present
rm -f /app/stirling.jsa 2>/dev/null || true
elif [ "$AOT_DISABLED" = "true" ]; then
log "AOT cache disabled via STIRLING_AOT_DISABLE=true"
else
# No cache exists — schedule background generation after app starts.
# The app starts immediately (no training delay). The AOT cache will be
# ready for the NEXT boot, giving 15-25% faster startup from then on.
log "No AOT cache found. Will generate in background after app starts."
AOT_GENERATE_BACKGROUND=true
if [ "$AOT_SUPPORTED" = false ]; then
log "AOT: Not supported on this JVM/platform, skipping"
elif validate_aot_cache "$AOT_CACHE"; then
log "AOT cache valid: $AOT_CACHE"
JAVA_BASE_OPTS="${JAVA_BASE_OPTS} -XX:AOTCache=${AOT_CACHE}"
rm -f /app/stirling.jsa /app/stirling.aot /app/stirling.aot.fingerprint 2>/dev/null || true
else
log "No valid AOT cache found. Will generate in background after app starts."
AOT_GENERATE_BACKGROUND=true
fi
fi
# Collapse duplicate whitespace
@@ -688,7 +853,7 @@ fi
# ---------- Permissions ----------
# Ensure required directories exist and set correct permissions.
log "Setting permissions..."
mkdir -p /tmp/stirling-pdf /tmp/stirling-pdf/heap_dumps /logs /configs /configs/heap_dumps /customFiles /pipeline || true
mkdir -p /tmp/stirling-pdf /tmp/stirling-pdf/heap_dumps /logs /configs /configs/heap_dumps /configs/cache /customFiles /pipeline || true
CHOWN_PATHS=("$HOME" "/logs" "/scripts" "/configs" "/customFiles" "/pipeline" "/tmp/stirling-pdf" "/app.jar")
[ -d /usr/share/fonts/truetype ] && CHOWN_PATHS+=("/usr/share/fonts/truetype")
CHOWN_OK=true
@@ -705,6 +870,7 @@ if command_exists Xvfb; then
log "Starting Xvfb on :99"
Xvfb :99 -screen 0 1024x768x24 -ac +extension GLX +render -noreset > /dev/null 2>&1 &
export DISPLAY=:99
# Brief pause so Xvfb accepts connections before unoserver tries to attach
sleep 1
else
log "Xvfb not installed; skipping virtual display setup"
@@ -712,44 +878,22 @@ fi
# ---------- unoserver ----------
# Start LibreOffice UNO server for document conversions.
# Java and unoserver start in parallel, do NOT block here waiting for readiness.
# Readiness is verified after Java is launched; the watchdog handles any restarts.
UNOSERVER_BIN="$(command -v unoserver || true)"
UNOCONVERT_BIN="$(command -v unoconvert || true)"
UNOPING_BIN="$(command -v unoping || true)"
if [ -n "$UNOSERVER_BIN" ] && [ -n "$UNOCONVERT_BIN" ]; then
LIBREOFFICE_PROFILE="${HOME:-/home/${RUNTIME_USER}}/.libreoffice_uno_${RUID}"
run_as_runtime_user mkdir -p "$LIBREOFFICE_PROFILE"
start_unoserver_pool
log "unoserver pool started (Profile: $LIBREOFFICE_PROFILE)"
# Wait until UNO server is ready.
log "Waiting for unoserver..."
for _ in {1..20}; do
# Pass 'silent' to check_unoserver_ready to suppress unoping failure logs during wait
if check_unoserver_ready "silent"; then
log "unoserver is ready!"
break
fi
sleep 1
done
start_unoserver_watchdog
if ! check_unoserver_ready; then
log "ERROR: unoserver failed!"
for pid in "${UNOSERVER_PIDS[@]}"; do
kill "$pid" 2>/dev/null || true
wait "$pid" 2>/dev/null || true
done
exit 1
fi
log "unoserver pool started (Profile: $LIBREOFFICE_PROFILE), Java starting in parallel"
else
log "unoserver/unoconvert not installed; skipping UNO setup"
fi
# ---------- Java ----------
# Start Stirling PDF Java application.
# Start Stirling PDF Java application immediately (parallel with unoserver startup).
log "Starting Stirling PDF"
JAVA_CMD=(
java
@@ -780,46 +924,108 @@ fi
JAVA_PID=$!
# ---------- Unoserver Readiness + Watchdog ----------
# Now that Java is running, check unoserver readiness and start the watchdog.
# Runs in the main shell (not a subshell) so UNOSERVER_PIDS/PORTS arrays are accessible.
# Java handles unoserver being temporarily unavailable, no fatal exit on timeout.
if [ "${#UNOSERVER_PORTS[@]}" -gt 0 ]; then
log "Waiting for unoserver (Java already starting in parallel)..."
UNOSERVER_READY=false
for _ in {1..30}; do
if check_unoserver_ready "silent"; then
log "unoserver is ready!"
UNOSERVER_READY=true
break
fi
sleep 1
done
start_unoserver_watchdog
if [ "$UNOSERVER_READY" = false ] && ! check_unoserver_ready; then
log "WARNING: unoserver not ready after 30s. Watchdog will manage restarts. Document conversion may be temporarily unavailable."
fi
fi
# ---------- Background AOT Cache Generation ----------
# On first boot (no existing cache), generate the AOT cache in the background
# so the app starts immediately. The cache is picked up on the next boot.
# Only runs on containers with >768MB memory to avoid starving the main process.
# On first boot (no valid cache), generate the AOT cache in the background so the app
# starts immediately. The cache is ready for the NEXT boot (15-25% faster startup).
AOT_GEN_PID=""
if [ "$AOT_GENERATE_BACKGROUND" = true ]; then
if [ "$CONTAINER_MEM_MB" -gt 768 ] || [ "$CONTAINER_MEM_MB" -eq 0 ]; then
(
# Wait for the app to finish starting before competing for resources.
# This avoids CPU/memory contention during Spring Boot initialization.
sleep 45
# ARM devices need more memory for training due to JIT differences
_aot_min_mem=768
if [ "$(uname -m)" = "aarch64" ]; then
_aot_min_mem=1024
fi
if [ "$CONTAINER_MEM_MB" -gt "$_aot_min_mem" ] || [ "$CONTAINER_MEM_MB" -eq 0 ]; then
(
# Wait for Spring Boot to finish initializing before competing for CPU/memory.
# ARM devices (Raspberry Pi 4, Ampere) need extra time, 90s vs 45s on x86_64.
_startup_wait=45
if [ "$(uname -m)" = "aarch64" ]; then
_startup_wait=90
log "AOT: ARM, waiting ${_startup_wait}s for app stabilization before training"
fi
sleep "$_startup_wait"
# Verify the main app is still running before investing in cache generation
if ! kill -0 "$JAVA_PID" 2>/dev/null; then
log "AOT: Main process exited; skipping cache generation."
exit 0
fi
log "AOT: Starting background cache generation for next boot..."
if [ -f /app/app.jar ] && [ -d /app/lib ]; then
generate_aot_cache "$AOT_CACHE" -cp "/app/app.jar:/app/lib/*" stirling.software.SPDF.SPDFApplication
elif [ -f /app.jar ]; then
generate_aot_cache "$AOT_CACHE" -jar /app.jar
elif [ -d /app/BOOT-INF ]; then
# Spring Boot exploded layer layout (produced by 'java -Djarmode=tools extract --layers').
# The actual JAVA_CMD uses JarLauncher with default classpath = CWD (/app).
# Mirror that exactly: -cp /app resolves the same classes.
generate_aot_cache "$AOT_CACHE" -cp /app org.springframework.boot.loader.launch.JarLauncher
else
log "AOT: Cannot determine JAR layout; skipping cache generation."
fi
_attempt=1
_max_attempts=2
while [ "$_attempt" -le "$_max_attempts" ]; do
log "AOT: Background cache generation attempt ${_attempt}/${_max_attempts}..."
_gen_rc=0
if [ -f /app/app.jar ] && [ -d /app/lib ]; then
generate_aot_cache "$AOT_CACHE" \
-cp "/app/app.jar:/app/lib/*" stirling.software.SPDF.SPDFApplication || _gen_rc=$?
elif [ -f /app.jar ]; then
generate_aot_cache "$AOT_CACHE" -jar /app.jar || _gen_rc=$?
elif [ -d /app/BOOT-INF ]; then
# Spring Boot exploded layer layout, mirror the exact JAVA_CMD classpath
generate_aot_cache "$AOT_CACHE" \
-cp /app org.springframework.boot.loader.launch.JarLauncher || _gen_rc=$?
else
log "AOT: Cannot determine JAR layout; skipping cache generation."
exit 0
fi
if [ "$_gen_rc" -eq 0 ] && [ -f "$AOT_CACHE" ]; then
log "AOT: Cache ready for next boot!"
exit 0
fi
log "AOT: Attempt ${_attempt} failed (rc=${_gen_rc})"
_attempt=$((_attempt + 1))
if [ "$_attempt" -le "$_max_attempts" ]; then
if ! kill -0 "$JAVA_PID" 2>/dev/null; then
log "AOT: Main process exited during retry; aborting."
exit 0
fi
log "AOT: Retrying in 30s..."
sleep 30
fi
done
log "AOT: All attempts failed. App runs normally without cache."
log "AOT: To disable, set STIRLING_AOT_ENABLE=false (or omit it, default is off)"
) &
AOT_GEN_PID=$!
log "AOT: Background cache generation scheduled (PID $AOT_GEN_PID)"
log "AOT: Background generation scheduled (PID $AOT_GEN_PID, arch=$(uname -m))"
else
log "AOT: Container memory (${CONTAINER_MEM_MB}MB) too low for background generation (need >768MB). Cache will not be created."
log "AOT: Container memory (${CONTAINER_MEM_MB}MB) below minimum (${_aot_min_mem}MB on $(uname -m)), skipping cache generation"
fi
fi
wait "$JAVA_PID"
wait "$JAVA_PID" || true
exit_code=$?
# Propagate Java's actual exit code so container orchestrators can detect crashes
case "$exit_code" in
0) log "Stirling PDF exited normally." ;;
137) log "Stirling PDF was OOM-killed (exit 137). Check container memory limits." ;;
143) log "Stirling PDF terminated by SIGTERM (normal orchestrator shutdown)." ;;
*) log "Stirling PDF exited with code ${exit_code}." ;;
esac
# Propagate exit code so orchestrators can detect crashes vs clean shutdowns
exit "${exit_code}"