Add default languages to OCR, fix compression for QPDF and embedded images (#3202)

# Description of Changes

This pull request includes several changes to the codebase, focusing on
enhancing OCR support, improving endpoint management, and adding new
functionality for PDF compression. The most important changes are
detailed below.

### Enhancements to OCR support:

* `Dockerfile` and `Dockerfile.fat`: Added support for multiple new OCR
languages including Chinese (Simplified), German, French, and
Portuguese. (Our top 5 languages including English)
[[1]](diffhunk://#diff-dd2c0eb6ea5cfc6c4bd4eac30934e2d5746747af48fef6da689e85b752f39557R69-R72)
[[2]](diffhunk://#diff-571631582b988e88c52c86960cc083b0b8fa63cf88f056f26e9e684195221c27L78-R81)

### Improvements to endpoint management:

*
[`src/main/java/stirling/software/SPDF/config/EndpointConfiguration.java`](diffhunk://#diff-750f31f6ecbd64b025567108a33775cad339e835a04360affff82a09410b697dR51-R66):
Added a new method `isGroupEnabled` to check if a group of endpoints is
enabled.
*
[`src/main/java/stirling/software/SPDF/config/EndpointConfiguration.java`](diffhunk://#diff-750f31f6ecbd64b025567108a33775cad339e835a04360affff82a09410b697dL179-L193):
Updated endpoint groups and removed redundant qpdf endpoints.
[[1]](diffhunk://#diff-750f31f6ecbd64b025567108a33775cad339e835a04360affff82a09410b697dL179-L193)
[[2]](diffhunk://#diff-750f31f6ecbd64b025567108a33775cad339e835a04360affff82a09410b697dL243-L244)
*
[`src/main/java/stirling/software/SPDF/config/EndpointInspector.java`](diffhunk://#diff-845de13e140bb1264014539714860f044405274ad2a9481f38befdd1c1333818R1-R291):
Introduced a new `EndpointInspector` class to discover and validate GET
endpoints dynamically.

### New functionality for PDF compression:

*
[`src/main/java/stirling/software/SPDF/controller/api/misc/CompressController.java`](diffhunk://#diff-c307589e9f958f2593c9567c5ad9d63cd03788aa4803b3017b1c13b0d0485805R10):
Enhanced the `CompressController` to handle nested images within form
XObjects, improving the accuracy of image compression in PDFs.
Remove Compresses Dependency on QPDF
[[1]](diffhunk://#diff-c307589e9f958f2593c9567c5ad9d63cd03788aa4803b3017b1c13b0d0485805R10)
[[2]](diffhunk://#diff-c307589e9f958f2593c9567c5ad9d63cd03788aa4803b3017b1c13b0d0485805R28-R44)
[[3]](diffhunk://#diff-c307589e9f958f2593c9567c5ad9d63cd03788aa4803b3017b1c13b0d0485805L49-R61)
[[4]](diffhunk://#diff-c307589e9f958f2593c9567c5ad9d63cd03788aa4803b3017b1c13b0d0485805R77-R99)
[[5]](diff
hunk://#diff-c307589e9f958f2593c9567c5ad9d63cd03788aa4803b3017b1c13b0d0485805L92-R191)
Closes #(issue_number)

---

## Checklist

### General

- [ ] I have read the [Contribution
Guidelines](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/CONTRIBUTING.md)
- [ ] I have read the [Stirling-PDF Developer
Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/DeveloperGuide.md)
(if applicable)
- [ ] I have read the [How to add new languages to
Stirling-PDF](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/HowToAddNewLanguage.md)
(if applicable)
- [ ] I have performed a self-review of my own code
- [ ] My changes generate no new warnings

### Documentation

- [ ] I have updated relevant docs on [Stirling-PDF's doc
repo](https://github.com/Stirling-Tools/Stirling-Tools.github.io/blob/main/docs/)
(if functionality has heavily changed)
- [ ] I have read the section [Add New Translation
Tags](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/HowToAddNewLanguage.md#add-new-translation-tags)
(for new translation tags only)

### UI Changes (if applicable)

- [ ] Screenshots or videos demonstrating the UI changes are attached
(e.g., as comments or direct attachments in the PR)

### Testing (if applicable)

- [ ] I have tested my changes locally. Refer to the [Testing
Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/DeveloperGuide.md#6-testing)
for more details.

---------

Co-authored-by: a <a>
This commit is contained in:
Anthony Stirling
2025-03-20 09:39:57 +00:00
committed by GitHub
parent 748ac494e6
commit d8cca66560
15 changed files with 840 additions and 321 deletions

View File

@@ -7,6 +7,7 @@ import java.io.IOException;
import java.io.InputStream;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.StandardCopyOption;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.ArrayList;
@@ -29,8 +30,8 @@ import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDResources;
import org.apache.pdfbox.pdmodel.graphics.PDXObject;
import org.apache.pdfbox.pdmodel.graphics.form.PDFormXObject;
import org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.ModelAttribute;
import org.springframework.web.bind.annotation.PostMapping;
@@ -44,13 +45,14 @@ import io.swagger.v3.oas.annotations.tags.Tag;
import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.EqualsAndHashCode;
import lombok.NoArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import stirling.software.SPDF.config.EndpointConfiguration;
import stirling.software.SPDF.model.api.misc.OptimizePdfRequest;
import stirling.software.SPDF.service.CustomPDFDocumentFactory;
import stirling.software.SPDF.utils.GeneralUtils;
import stirling.software.SPDF.utils.ImageProcessingUtils;
import stirling.software.SPDF.utils.ProcessExecutor;
import stirling.software.SPDF.utils.ProcessExecutor.ProcessExecutorResult;
import stirling.software.SPDF.utils.WebResponseUtils;
@@ -62,10 +64,13 @@ import stirling.software.SPDF.utils.WebResponseUtils;
public class CompressController {
private final CustomPDFDocumentFactory pdfDocumentFactory;
private final boolean qpdfEnabled;
@Autowired
public CompressController(CustomPDFDocumentFactory pdfDocumentFactory) {
public CompressController(
CustomPDFDocumentFactory pdfDocumentFactory,
EndpointConfiguration endpointConfiguration) {
this.pdfDocumentFactory = pdfDocumentFactory;
this.qpdfEnabled = endpointConfiguration.isGroupEnabled("qpdf");
}
@Data
@@ -76,10 +81,30 @@ public class CompressController {
COSName name; // The name used to reference this image
}
@Data
@EqualsAndHashCode(callSuper = true)
@AllArgsConstructor
@NoArgsConstructor
private static class NestedImageReference extends ImageReference {
COSName formName; // Name of the form XObject containing the image
COSName imageName; // Name of the image within the form
}
// Tracks compression stats for reporting
private static class CompressionStats {
int totalImages = 0;
int nestedImages = 0;
int uniqueImagesCount = 0;
int compressedImages = 0;
int skippedImages = 0;
long totalOriginalBytes = 0;
long totalCompressedBytes = 0;
}
public Path compressImagesInPDF(
Path pdfFile, double scaleFactor, float jpegQuality, boolean convertToGrayscale)
throws Exception {
Path newCompressedPDF = Files.createTempFile("compressedPDF", ".pdf");
Path newCompressedPDF = Files.createTempFile("compressedPDF", ".pdf");
long originalFileSize = Files.size(pdfFile);
log.info(
"Starting image compression with scale factor: {}, JPEG quality: {}, grayscale: {} on file size: {}",
@@ -89,146 +114,29 @@ public class CompressController {
GeneralUtils.formatBytes(originalFileSize));
try (PDDocument doc = pdfDocumentFactory.load(pdfFile)) {
// Find all unique images in the document
Map<String, List<ImageReference>> uniqueImages = findImages(doc);
// Collect all unique images by content hash
Map<String, List<ImageReference>> uniqueImages = new HashMap<>();
Map<String, PDImageXObject> compressedVersions = new HashMap<>();
// Get statistics
CompressionStats stats = new CompressionStats();
stats.uniqueImagesCount = uniqueImages.size();
calculateImageStats(uniqueImages, stats);
int totalImages = 0;
// Create compressed versions of unique images
Map<String, PDImageXObject> compressedVersions =
createCompressedImages(
doc, uniqueImages, scaleFactor, jpegQuality, convertToGrayscale, stats);
for (int pageNum = 0; pageNum < doc.getNumberOfPages(); pageNum++) {
PDPage page = doc.getPage(pageNum);
PDResources res = page.getResources();
if (res == null || res.getXObjectNames() == null) continue;
for (COSName name : res.getXObjectNames()) {
PDXObject xobj = res.getXObject(name);
if (!(xobj instanceof PDImageXObject)) continue;
totalImages++;
PDImageXObject image = (PDImageXObject) xobj;
String imageHash = generateImageHash(image);
// Store only page number and name reference
ImageReference ref = new ImageReference();
ref.pageNum = pageNum;
ref.name = name;
uniqueImages.computeIfAbsent(imageHash, k -> new ArrayList<>()).add(ref);
}
}
int uniqueImagesCount = uniqueImages.size();
int duplicatedImages = totalImages - uniqueImagesCount;
log.info(
"Found {} unique images and {} duplicated instances across {} pages",
uniqueImagesCount,
duplicatedImages,
doc.getNumberOfPages());
// SECOND PASS: Process each unique image exactly once
int compressedImages = 0;
int skippedImages = 0;
long totalOriginalBytes = 0;
long totalCompressedBytes = 0;
for (Entry<String, List<ImageReference>> entry : uniqueImages.entrySet()) {
String imageHash = entry.getKey();
List<ImageReference> references = entry.getValue();
if (references.isEmpty()) continue;
// Get the first instance of this image
ImageReference firstRef = references.get(0);
PDPage firstPage = doc.getPage(firstRef.pageNum);
PDResources firstPageResources = firstPage.getResources();
PDImageXObject originalImage =
(PDImageXObject) firstPageResources.getXObject(firstRef.name);
// Track original size
int originalSize = (int) originalImage.getCOSObject().getLength();
totalOriginalBytes += originalSize;
// Process this unique image once
BufferedImage processedImage =
processAndCompressImage(
originalImage, scaleFactor, jpegQuality, convertToGrayscale);
if (processedImage != null) {
// Convert to bytes for storage
byte[] compressedData = convertToBytes(processedImage, jpegQuality);
// Check if compression is beneficial
if (compressedData.length < originalSize || convertToGrayscale) {
// Create a single compressed version
PDImageXObject compressedImage =
PDImageXObject.createFromByteArray(
doc,
compressedData,
originalImage.getCOSObject().toString());
// Store the compressed version only once in our map
compressedVersions.put(imageHash, compressedImage);
// Report compression stats
double reductionPercentage =
100.0 - ((compressedData.length * 100.0) / originalSize);
log.info(
"Image hash {}: Compressed from {} to {} (reduced by {}%)",
imageHash,
GeneralUtils.formatBytes(originalSize),
GeneralUtils.formatBytes(compressedData.length),
String.format("%.1f", reductionPercentage));
// Replace ALL instances with the compressed version
for (ImageReference ref : references) {
// Get the page and resources when needed
PDPage page = doc.getPage(ref.pageNum);
PDResources resources = page.getResources();
resources.put(ref.name, compressedImage);
log.info(
"Replaced image on page {} with compressed version",
ref.pageNum + 1);
}
totalCompressedBytes += compressedData.length * references.size();
compressedImages++;
} else {
log.info("Image hash {}: Compression not beneficial, skipping", imageHash);
totalCompressedBytes += originalSize * references.size();
skippedImages++;
}
} else {
log.info("Image hash {}: Not suitable for compression, skipping", imageHash);
totalCompressedBytes += originalSize * references.size();
skippedImages++;
}
}
// Replace all instances with compressed versions
replaceImages(doc, uniqueImages, compressedVersions, stats);
// Log compression statistics
double overallImageReduction =
totalOriginalBytes > 0
? 100.0 - ((totalCompressedBytes * 100.0) / totalOriginalBytes)
: 0;
log.info(
"Image compression summary - Total unique: {}, Compressed: {}, Skipped: {}, Duplicates: {}",
uniqueImagesCount,
compressedImages,
skippedImages,
duplicatedImages);
log.info(
"Total original image size: {}, compressed: {} (reduced by {}%)",
GeneralUtils.formatBytes(totalOriginalBytes),
GeneralUtils.formatBytes(totalCompressedBytes),
String.format("%.1f", overallImageReduction));
logCompressionStats(stats, originalFileSize);
// Free memory before saving
compressedVersions.clear();
uniqueImages.clear();
// Save the document
log.info("Saving compressed PDF to {}", newCompressedPDF.toString());
doc.save(newCompressedPDF.toString());
@@ -242,7 +150,315 @@ public class CompressController {
String.format("%.1f", overallReduction));
return newCompressedPDF;
}
}
// Find all images in the document, both direct and nested within forms
private Map<String, List<ImageReference>> findImages(PDDocument doc) throws IOException {
Map<String, List<ImageReference>> uniqueImages = new HashMap<>();
// Scan through all pages in the document
for (int pageNum = 0; pageNum < doc.getNumberOfPages(); pageNum++) {
PDPage page = doc.getPage(pageNum);
PDResources res = page.getResources();
if (res == null || res.getXObjectNames() == null) continue;
// Process all XObjects on the page
for (COSName name : res.getXObjectNames()) {
PDXObject xobj = res.getXObject(name);
// Direct image
if (isImage(xobj)) {
addDirectImage(pageNum, name, (PDImageXObject) xobj, uniqueImages);
log.info(
"Found direct image '{}' on page {} - {}x{}",
name.getName(),
pageNum + 1,
((PDImageXObject) xobj).getWidth(),
((PDImageXObject) xobj).getHeight());
}
// Form XObject that may contain nested images
else if (isForm(xobj)) {
checkFormForImages(pageNum, name, (PDFormXObject) xobj, uniqueImages);
}
}
}
return uniqueImages;
}
private boolean isImage(PDXObject xobj) {
return xobj instanceof PDImageXObject;
}
private boolean isForm(PDXObject xobj) {
return xobj instanceof PDFormXObject;
}
private ImageReference addDirectImage(
int pageNum,
COSName name,
PDImageXObject image,
Map<String, List<ImageReference>> uniqueImages)
throws IOException {
ImageReference ref = new ImageReference();
ref.pageNum = pageNum;
ref.name = name;
String imageHash = generateImageHash(image);
uniqueImages.computeIfAbsent(imageHash, k -> new ArrayList<>()).add(ref);
return ref;
}
// Look for images inside form XObjects
private void checkFormForImages(
int pageNum,
COSName formName,
PDFormXObject formXObj,
Map<String, List<ImageReference>> uniqueImages)
throws IOException {
PDResources formResources = formXObj.getResources();
if (formResources == null || formResources.getXObjectNames() == null) {
return;
}
log.info(
"Checking form XObject '{}' on page {} for nested images",
formName.getName(),
pageNum + 1);
// Process all XObjects within the form
for (COSName nestedName : formResources.getXObjectNames()) {
PDXObject nestedXobj = formResources.getXObject(nestedName);
if (isImage(nestedXobj)) {
PDImageXObject nestedImage = (PDImageXObject) nestedXobj;
log.info(
"Found nested image '{}' in form '{}' on page {} - {}x{}",
nestedName.getName(),
formName.getName(),
pageNum + 1,
nestedImage.getWidth(),
nestedImage.getHeight());
// Create specialized reference for the nested image
NestedImageReference nestedRef = new NestedImageReference();
nestedRef.pageNum = pageNum;
nestedRef.formName = formName;
nestedRef.imageName = nestedName;
String imageHash = generateImageHash(nestedImage);
uniqueImages.computeIfAbsent(imageHash, k -> new ArrayList<>()).add(nestedRef);
}
}
}
// Count total images and nested images
private void calculateImageStats(
Map<String, List<ImageReference>> uniqueImages, CompressionStats stats) {
for (List<ImageReference> references : uniqueImages.values()) {
for (ImageReference ref : references) {
stats.totalImages++;
if (ref instanceof NestedImageReference) {
stats.nestedImages++;
}
}
}
}
// Create compressed versions of all unique images
private Map<String, PDImageXObject> createCompressedImages(
PDDocument doc,
Map<String, List<ImageReference>> uniqueImages,
double scaleFactor,
float jpegQuality,
boolean convertToGrayscale,
CompressionStats stats)
throws IOException {
Map<String, PDImageXObject> compressedVersions = new HashMap<>();
// Process each unique image exactly once
for (Entry<String, List<ImageReference>> entry : uniqueImages.entrySet()) {
String imageHash = entry.getKey();
List<ImageReference> references = entry.getValue();
if (references.isEmpty()) continue;
// Get the first instance of this image
PDImageXObject originalImage = getOriginalImage(doc, references.get(0));
// Track original size
int originalSize = (int) originalImage.getCOSObject().getLength();
stats.totalOriginalBytes += originalSize;
// Process this unique image
PDImageXObject compressedImage =
compressImage(
doc,
originalImage,
originalSize,
scaleFactor,
jpegQuality,
convertToGrayscale);
if (compressedImage != null) {
// Store the compressed version in our map
compressedVersions.put(imageHash, compressedImage);
stats.compressedImages++;
// Update compression stats
int compressedSize = (int) compressedImage.getCOSObject().getLength();
stats.totalCompressedBytes += compressedSize * references.size();
double reductionPercentage = 100.0 - ((compressedSize * 100.0) / originalSize);
log.info(
"Image hash {}: Compressed from {} to {} (reduced by {}%)",
imageHash,
GeneralUtils.formatBytes(originalSize),
GeneralUtils.formatBytes(compressedSize),
String.format("%.1f", reductionPercentage));
} else {
log.info("Image hash {}: Not suitable for compression, skipping", imageHash);
stats.totalCompressedBytes += originalSize * references.size();
stats.skippedImages++;
}
}
return compressedVersions;
}
// Get original image from a reference
private PDImageXObject getOriginalImage(PDDocument doc, ImageReference ref) throws IOException {
if (ref instanceof NestedImageReference) {
// Get the nested image from within a form XObject
NestedImageReference nestedRef = (NestedImageReference) ref;
PDPage page = doc.getPage(nestedRef.pageNum);
PDResources pageResources = page.getResources();
// Get the form XObject
PDFormXObject formXObj = (PDFormXObject) pageResources.getXObject(nestedRef.formName);
// Get the nested image from the form's resources
PDResources formResources = formXObj.getResources();
return (PDImageXObject) formResources.getXObject(nestedRef.imageName);
} else {
// Get direct image from page resources
PDPage page = doc.getPage(ref.pageNum);
PDResources resources = page.getResources();
return (PDImageXObject) resources.getXObject(ref.name);
}
}
// Try to compress an image if it makes sense
private PDImageXObject compressImage(
PDDocument doc,
PDImageXObject originalImage,
int originalSize,
double scaleFactor,
float jpegQuality,
boolean convertToGrayscale)
throws IOException {
// Process and compress the image
BufferedImage processedImage =
processAndCompressImage(
originalImage, scaleFactor, jpegQuality, convertToGrayscale);
if (processedImage == null) {
return null;
}
// Convert to bytes for storage
byte[] compressedData = convertToBytes(processedImage, jpegQuality);
// Check if compression is beneficial
if (compressedData.length < originalSize || convertToGrayscale) {
// Create a compressed version
return PDImageXObject.createFromByteArray(
doc, compressedData, originalImage.getCOSObject().toString());
}
return null;
}
// Replace all instances of original images with their compressed versions
private void replaceImages(
PDDocument doc,
Map<String, List<ImageReference>> uniqueImages,
Map<String, PDImageXObject> compressedVersions,
CompressionStats stats)
throws IOException {
for (Entry<String, List<ImageReference>> entry : uniqueImages.entrySet()) {
String imageHash = entry.getKey();
List<ImageReference> references = entry.getValue();
// Skip if no compressed version exists
PDImageXObject compressedImage = compressedVersions.get(imageHash);
if (compressedImage == null) continue;
// Replace ALL instances with the compressed version
for (ImageReference ref : references) {
replaceImageReference(doc, ref, compressedImage);
}
}
}
// Replace a specific image reference with a compressed version
private void replaceImageReference(
PDDocument doc, ImageReference ref, PDImageXObject compressedImage) throws IOException {
if (ref instanceof NestedImageReference) {
// Replace nested image within form XObject
NestedImageReference nestedRef = (NestedImageReference) ref;
PDPage page = doc.getPage(nestedRef.pageNum);
PDResources pageResources = page.getResources();
// Get the form XObject
PDFormXObject formXObj = (PDFormXObject) pageResources.getXObject(nestedRef.formName);
// Replace the nested image in the form's resources
PDResources formResources = formXObj.getResources();
formResources.put(nestedRef.imageName, compressedImage);
log.info(
"Replaced nested image '{}' in form '{}' on page {} with compressed version",
nestedRef.imageName.getName(),
nestedRef.formName.getName(),
nestedRef.pageNum + 1);
} else {
// Replace direct image in page resources
PDPage page = doc.getPage(ref.pageNum);
PDResources resources = page.getResources();
resources.put(ref.name, compressedImage);
log.info("Replaced direct image on page {} with compressed version", ref.pageNum + 1);
}
}
// Log final stats about the compression
private void logCompressionStats(CompressionStats stats, long originalFileSize) {
// Calculate image reduction percentage
double overallImageReduction =
stats.totalOriginalBytes > 0
? 100.0 - ((stats.totalCompressedBytes * 100.0) / stats.totalOriginalBytes)
: 0;
int duplicatedImages = stats.totalImages - stats.uniqueImagesCount;
log.info(
"Image compression summary - Total unique: {}, Compressed: {}, Skipped: {}, Duplicates: {}, Nested: {}",
stats.uniqueImagesCount,
stats.compressedImages,
stats.skippedImages,
duplicatedImages,
stats.nestedImages);
log.info(
"Total original image size: {}, compressed: {} (reduced by {}%)",
GeneralUtils.formatBytes(stats.totalOriginalBytes),
GeneralUtils.formatBytes(stats.totalCompressedBytes),
String.format("%.1f", overallImageReduction));
}
private BufferedImage convertToGrayscale(BufferedImage image) {
@@ -257,10 +473,7 @@ public class CompressController {
return grayImage;
}
/**
* Processes and compresses an image if beneficial. Returns the processed image if compression
* is worthwhile, null otherwise.
*/
// Resize and optionally convert to grayscale
private BufferedImage processAndCompressImage(
PDImageXObject image, double scaleFactor, float jpegQuality, boolean convertToGrayscale)
throws IOException {
@@ -342,10 +555,7 @@ public class CompressController {
return scaledImage;
}
/**
* Converts a BufferedImage to a byte array with specified JPEG quality. Checks if compression
* is beneficial compared to original.
*/
// Convert image to byte array with quality settings
private byte[] convertToBytes(BufferedImage scaledImage, float jpegQuality) throws IOException {
String format = scaledImage.getColorModel().hasAlpha() ? "png" : "jpeg";
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
@@ -376,7 +586,7 @@ public class CompressController {
return outputStream.toByteArray();
}
/** Modified hash function to consistently identify identical image content */
// Hash function to identify identical images
private String generateImageHash(PDImageXObject image) {
try {
// Create a stream for the raw stream data
@@ -414,43 +624,26 @@ public class CompressController {
}
}
private byte[] generateImageMD5(PDImageXObject image) throws IOException {
return generatMD5(ImageProcessingUtils.getImageData(image.getImage()));
}
/** Generates a hash string from a byte array */
private String generateHashFromBytes(byte[] data) {
try {
// Use the existing method to generate MD5 hash
byte[] hash = generatMD5(data);
return bytesToHexString(hash);
} catch (Exception e) {
log.error("Error generating hash from bytes", e);
// Return a unique string as fallback
return "fallback-" + System.identityHashCode(data);
}
}
// Updated scale factor method for levels 4-9
// Scale factors for different optimization levels
private double getScaleFactorForLevel(int optimizeLevel) {
return switch (optimizeLevel) {
case 4 -> 0.9; // 90% of original size - lite image compression
case 5 -> 0.8; // 80% of original size - lite image compression
case 6 -> 0.7; // 70% of original size - lite image compression
case 7 -> 0.6; // 60% of original size - intense image compression
case 8 -> 0.5; // 50% of original size - intense image compression
case 9, 10 -> 0.4; // 40% of original size - intense image compression
default -> 1.0; // No image scaling for levels 1-3
case 4 -> 0.9; // 90% - lite compression
case 5 -> 0.8; // 80% - lite compression
case 6 -> 0.7; // 70% - lite compression
case 7 -> 0.6; // 60% - intense compression
case 8 -> 0.5; // 50% - intense compression
case 9, 10 -> 0.4; // 40% - intense compression
default -> 1.0; // No scaling for levels 1-3
};
}
// New method for JPEG quality based on optimization level
// JPEG quality for different optimization levels
private float getJpegQualityForLevel(int optimizeLevel) {
return switch (optimizeLevel) {
case 7 -> 0.8f; // 80% quality - intense compression
case 8 -> 0.6f; // 60% quality - more intense compression
case 9, 10 -> 0.4f; // 40% quality - most intense compression
default -> 0.7f; // 70% quality for levels 1-6 (higher quality)
case 7 -> 0.8f; // 80% quality
case 8 -> 0.6f; // 60% quality
case 9, 10 -> 0.4f; // 40% quality
default -> 0.7f; // 70% quality for levels 1-6
};
}
@@ -478,17 +671,17 @@ public class CompressController {
}
// Create initial input file
Path originalFile = Files.createTempFile("input_", ".pdf");
Path originalFile = Files.createTempFile("original_", ".pdf");
inputFile.transferTo(originalFile.toFile());
long inputFileSize = Files.size(originalFile);
// Start with original as current working file
Path currentFile = originalFile;
Path currentFile = Files.createTempFile("working_", ".pdf");
Files.copy(originalFile, currentFile, StandardCopyOption.REPLACE_EXISTING);
// Keep track of all temporary files for cleanup
List<Path> tempFiles = new ArrayList<>();
tempFiles.add(originalFile);
tempFiles.add(currentFile);
try {
if (autoMode) {
double sizeReductionRatio = expectedOutputSize / (double) inputFileSize;
@@ -499,93 +692,56 @@ public class CompressController {
boolean imageCompressionApplied = false;
boolean qpdfCompressionApplied = false;
if (qpdfEnabled && optimizeLevel <= 3) {
optimizeLevel = 4;
}
while (!sizeMet && optimizeLevel <= 9) {
// Apply image compression for levels 4-9
if ((optimizeLevel >= 4 || Boolean.TRUE.equals(convertToGrayscale))
&& !imageCompressionApplied) {
double scaleFactor = getScaleFactorForLevel(optimizeLevel);
float jpegQuality = getJpegQualityForLevel(optimizeLevel);
// Use the returned path from compressImagesInPDF
Path compressedImageFile = compressImagesInPDF(
currentFile,
scaleFactor,
jpegQuality,
Boolean.TRUE.equals(convertToGrayscale));
// Add to temp files list and update current file
// Compress images
Path compressedImageFile =
compressImagesInPDF(
currentFile,
scaleFactor,
jpegQuality,
Boolean.TRUE.equals(convertToGrayscale));
tempFiles.add(compressedImageFile);
currentFile = compressedImageFile;
imageCompressionApplied = true;
}
// Apply QPDF compression for all levels
if (!qpdfCompressionApplied) {
long preQpdfSize = Files.size(currentFile);
log.info("Pre-QPDF file size: {}", GeneralUtils.formatBytes(preQpdfSize));
// Map optimization levels to QPDF compression levels
int qpdfCompressionLevel = optimizeLevel <= 3
? optimizeLevel * 3 // Level 1->3, 2->6, 3->9
: 9; // Max compression for levels 4-9
// Create output file for QPDF
Path qpdfOutputFile = Files.createTempFile("qpdf_output_", ".pdf");
tempFiles.add(qpdfOutputFile);
// Run QPDF optimization
List<String> command = new ArrayList<>();
command.add("qpdf");
if (request.getNormalize()) {
command.add("--normalize-content=y");
}
if (request.getLinearize()) {
command.add("--linearize");
}
command.add("--recompress-flate");
command.add("--compression-level=" + qpdfCompressionLevel);
command.add("--compress-streams=y");
command.add("--object-streams=generate");
command.add(currentFile.toString());
command.add(qpdfOutputFile.toString());
ProcessExecutorResult returnCode = null;
try {
returnCode = ProcessExecutor.getInstance(ProcessExecutor.Processes.QPDF)
.runCommandWithOutputHandling(command);
qpdfCompressionApplied = true;
// Update current file to the QPDF output
currentFile = qpdfOutputFile;
long postQpdfSize = Files.size(currentFile);
double qpdfReduction = 100.0 - ((postQpdfSize * 100.0) / preQpdfSize);
log.info(
"Post-QPDF file size: {} (reduced by {}%)",
GeneralUtils.formatBytes(postQpdfSize),
String.format("%.1f", qpdfReduction));
} catch (Exception e) {
if (returnCode != null && returnCode.getRc() != 3) {
throw e;
}
// If QPDF fails, keep using the current file
log.warn("QPDF compression failed, continuing with current file");
if (!qpdfCompressionApplied && qpdfEnabled) {
applyQpdfCompression(request, optimizeLevel, currentFile, tempFiles);
qpdfCompressionApplied = true;
} else if (!qpdfCompressionApplied) {
// If QPDF is disabled, mark as applied and log
if (!qpdfEnabled) {
log.info("Skipping QPDF compression as QPDF group is disabled");
}
qpdfCompressionApplied = true;
}
// Check if file size is within expected size or not auto mode
// Check if target size reached or not in auto mode
long outputFileSize = Files.size(currentFile);
if (outputFileSize <= expectedOutputSize || !autoMode) {
sizeMet = true;
} else {
int newOptimizeLevel = incrementOptimizeLevel(
optimizeLevel, outputFileSize, expectedOutputSize);
int newOptimizeLevel =
incrementOptimizeLevel(
optimizeLevel, outputFileSize, expectedOutputSize);
// Check if we can't increase the level further
if (newOptimizeLevel == optimizeLevel) {
if (autoMode) {
log.info("Maximum optimization level reached without meeting target size.");
log.info(
"Maximum optimization level reached without meeting target size.");
sizeMet = true;
}
} else {
@@ -597,18 +753,19 @@ public class CompressController {
}
}
// Check if optimized file is larger than the original
// Use original if optimized file is somehow larger
long finalFileSize = Files.size(currentFile);
if (finalFileSize > inputFileSize) {
log.warn("Optimized file is larger than the original. Using the original file instead.");
// Use the stored reference to the original file
if (finalFileSize >= inputFileSize) {
log.warn(
"Optimized file is larger than the original. Using the original file instead.");
currentFile = originalFile;
}
String outputFilename = Filenames.toSimpleFileName(inputFile.getOriginalFilename())
String outputFilename =
Filenames.toSimpleFileName(inputFile.getOriginalFilename())
.replaceFirst("[.][^.]+$", "")
+ "_Optimized.pdf";
return WebResponseUtils.pdfDocToWebResponse(
pdfDocumentFactory.load(currentFile.toFile()), outputFilename);
@@ -624,6 +781,65 @@ public class CompressController {
}
}
// Run QPDF compression
private void applyQpdfCompression(
OptimizePdfRequest request, int optimizeLevel, Path currentFile, List<Path> tempFiles)
throws IOException {
long preQpdfSize = Files.size(currentFile);
log.info("Pre-QPDF file size: {}", GeneralUtils.formatBytes(preQpdfSize));
// Map optimization levels to QPDF compression levels
int qpdfCompressionLevel =
optimizeLevel <= 3
? optimizeLevel * 3 // Level 1->3, 2->6, 3->9
: 9; // Max compression for levels 4-9
// Create output file for QPDF
Path qpdfOutputFile = Files.createTempFile("qpdf_output_", ".pdf");
tempFiles.add(qpdfOutputFile);
// Build QPDF command
List<String> command = new ArrayList<>();
command.add("qpdf");
if (request.getNormalize()) {
command.add("--normalize-content=y");
}
if (request.getLinearize()) {
command.add("--linearize");
}
command.add("--recompress-flate");
command.add("--compression-level=" + qpdfCompressionLevel);
command.add("--compress-streams=y");
command.add("--object-streams=generate");
command.add(currentFile.toString());
command.add(qpdfOutputFile.toString());
ProcessExecutorResult returnCode = null;
try {
returnCode =
ProcessExecutor.getInstance(ProcessExecutor.Processes.QPDF)
.runCommandWithOutputHandling(command);
// Update current file to the QPDF output
Files.copy(qpdfOutputFile, currentFile, StandardCopyOption.REPLACE_EXISTING);
long postQpdfSize = Files.size(currentFile);
double qpdfReduction = 100.0 - ((postQpdfSize * 100.0) / preQpdfSize);
log.info(
"Post-QPDF file size: {} (reduced by {}%)",
GeneralUtils.formatBytes(postQpdfSize), String.format("%.1f", qpdfReduction));
} catch (Exception e) {
if (returnCode != null && returnCode.getRc() != 3) {
throw new IOException("QPDF command failed", e);
}
// If QPDF fails, keep using the current file
log.warn("QPDF compression failed, continuing with current file", e);
}
}
// Pick optimization level based on target size
private int determineOptimizeLevel(double sizeReductionRatio) {
if (sizeReductionRatio > 0.9) return 1;
if (sizeReductionRatio > 0.8) return 2;
@@ -636,6 +852,7 @@ public class CompressController {
return 9;
}
// Increment optimization level if we need more compression
private int incrementOptimizeLevel(int currentLevel, long currentSize, long targetSize) {
double currentRatio = currentSize / (double) targetSize;
log.info("Current compression ratio: {}", String.format("%.2f", currentRatio));
@@ -647,4 +864,4 @@ public class CompressController {
}
return Math.min(9, currentLevel + 1);
}
}
}

View File

@@ -5,6 +5,7 @@ import java.io.InputStream;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;
import java.util.zip.ZipEntry;
import java.util.zip.ZipOutputStream;
@@ -26,8 +27,10 @@ import io.swagger.v3.oas.annotations.tags.Tag;
import lombok.extern.slf4j.Slf4j;
import stirling.software.SPDF.model.PipelineConfig;
import stirling.software.SPDF.model.PipelineOperation;
import stirling.software.SPDF.model.PipelineResult;
import stirling.software.SPDF.model.api.HandleDataRequest;
import stirling.software.SPDF.service.PostHogService;
import stirling.software.SPDF.utils.WebResponseUtils;
@RestController
@@ -40,9 +43,13 @@ public class PipelineController {
private final ObjectMapper objectMapper;
public PipelineController(PipelineProcessor processor, ObjectMapper objectMapper) {
private final PostHogService postHogService;
public PipelineController(
PipelineProcessor processor, ObjectMapper objectMapper, PostHogService postHogService) {
this.processor = processor;
this.objectMapper = objectMapper;
this.postHogService = postHogService;
}
@PostMapping("/handleData")
@@ -55,6 +62,18 @@ public class PipelineController {
}
PipelineConfig config = objectMapper.readValue(jsonString, PipelineConfig.class);
log.info("Received POST request to /handleData with {} files", files.length);
List<String> operationNames =
config.getOperations().stream()
.map(PipelineOperation::getOperation)
.collect(Collectors.toList());
Map<String, Object> properties = new HashMap<>();
properties.put("operations", operationNames);
properties.put("fileCount", files.length);
postHogService.captureEvent("pipeline_api_event", properties);
try {
List<Resource> inputFiles = processor.generateInputFiles(files);
if (inputFiles == null || inputFiles.size() == 0) {

View File

@@ -17,7 +17,9 @@ import java.time.LocalDate;
import java.time.LocalTime;
import java.time.format.DateTimeFormatter;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Optional;
import java.util.stream.Stream;
@@ -34,6 +36,7 @@ import stirling.software.SPDF.config.RuntimePathConfig;
import stirling.software.SPDF.model.PipelineConfig;
import stirling.software.SPDF.model.PipelineOperation;
import stirling.software.SPDF.model.PipelineResult;
import stirling.software.SPDF.service.PostHogService;
import stirling.software.SPDF.utils.FileMonitor;
@Service
@@ -41,15 +44,11 @@ import stirling.software.SPDF.utils.FileMonitor;
public class PipelineDirectoryProcessor {
private final ObjectMapper objectMapper;
private final ApiDocService apiDocService;
private final PipelineProcessor processor;
private final FileMonitor fileMonitor;
private final PostHogService postHogService;
private final String watchedFoldersDir;
private final String finishedFoldersDir;
public PipelineDirectoryProcessor(
@@ -57,13 +56,15 @@ public class PipelineDirectoryProcessor {
ApiDocService apiDocService,
PipelineProcessor processor,
FileMonitor fileMonitor,
PostHogService postHogService,
RuntimePathConfig runtimePathConfig) {
this.objectMapper = objectMapper;
this.apiDocService = apiDocService;
this.watchedFoldersDir = runtimePathConfig.getPipelineWatchedFoldersPath();
this.finishedFoldersDir = runtimePathConfig.getPipelineFinishedFoldersPath();
this.processor = processor;
this.fileMonitor = fileMonitor;
this.postHogService = postHogService;
this.watchedFoldersDir = runtimePathConfig.getPipelineWatchedFoldersPath();
this.finishedFoldersDir = runtimePathConfig.getPipelineFinishedFoldersPath();
}
@Scheduled(fixedRate = 60000)
@@ -152,6 +153,14 @@ public class PipelineDirectoryProcessor {
log.debug("No files detected for {} ", dir);
return;
}
List<String> operationNames =
config.getOperations().stream().map(PipelineOperation::getOperation).toList();
Map<String, Object> properties = new HashMap<>();
properties.put("operations", operationNames);
properties.put("fileCount", files.length);
postHogService.captureEvent("pipeline_directory_event", properties);
List<File> filesToProcess = prepareFilesForProcessing(files, processingDir);
runPipelineAgainstFiles(filesToProcess, config, dir, processingDir);
}
@@ -252,8 +261,7 @@ public class PipelineDirectoryProcessor {
try {
Thread.sleep(retryDelayMs * (int) Math.pow(2, attempt - 1));
} catch (InterruptedException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
log.error("prepareFilesForProcessing failure", e);
}
}
}