mirror of
https://github.com/Frooodle/Stirling-PDF.git
synced 2026-03-13 02:18:16 +01:00
Add default languages to OCR, fix compression for QPDF and embedded images (#3202)
# Description of Changes This pull request includes several changes to the codebase, focusing on enhancing OCR support, improving endpoint management, and adding new functionality for PDF compression. The most important changes are detailed below. ### Enhancements to OCR support: * `Dockerfile` and `Dockerfile.fat`: Added support for multiple new OCR languages including Chinese (Simplified), German, French, and Portuguese. (Our top 5 languages including English) [[1]](diffhunk://#diff-dd2c0eb6ea5cfc6c4bd4eac30934e2d5746747af48fef6da689e85b752f39557R69-R72) [[2]](diffhunk://#diff-571631582b988e88c52c86960cc083b0b8fa63cf88f056f26e9e684195221c27L78-R81) ### Improvements to endpoint management: * [`src/main/java/stirling/software/SPDF/config/EndpointConfiguration.java`](diffhunk://#diff-750f31f6ecbd64b025567108a33775cad339e835a04360affff82a09410b697dR51-R66): Added a new method `isGroupEnabled` to check if a group of endpoints is enabled. * [`src/main/java/stirling/software/SPDF/config/EndpointConfiguration.java`](diffhunk://#diff-750f31f6ecbd64b025567108a33775cad339e835a04360affff82a09410b697dL179-L193): Updated endpoint groups and removed redundant qpdf endpoints. [[1]](diffhunk://#diff-750f31f6ecbd64b025567108a33775cad339e835a04360affff82a09410b697dL179-L193) [[2]](diffhunk://#diff-750f31f6ecbd64b025567108a33775cad339e835a04360affff82a09410b697dL243-L244) * [`src/main/java/stirling/software/SPDF/config/EndpointInspector.java`](diffhunk://#diff-845de13e140bb1264014539714860f044405274ad2a9481f38befdd1c1333818R1-R291): Introduced a new `EndpointInspector` class to discover and validate GET endpoints dynamically. ### New functionality for PDF compression: * [`src/main/java/stirling/software/SPDF/controller/api/misc/CompressController.java`](diffhunk://#diff-c307589e9f958f2593c9567c5ad9d63cd03788aa4803b3017b1c13b0d0485805R10): Enhanced the `CompressController` to handle nested images within form XObjects, improving the accuracy of image compression in PDFs. Remove Compresses Dependency on QPDF [[1]](diffhunk://#diff-c307589e9f958f2593c9567c5ad9d63cd03788aa4803b3017b1c13b0d0485805R10) [[2]](diffhunk://#diff-c307589e9f958f2593c9567c5ad9d63cd03788aa4803b3017b1c13b0d0485805R28-R44) [[3]](diffhunk://#diff-c307589e9f958f2593c9567c5ad9d63cd03788aa4803b3017b1c13b0d0485805L49-R61) [[4]](diffhunk://#diff-c307589e9f958f2593c9567c5ad9d63cd03788aa4803b3017b1c13b0d0485805R77-R99) [[5]](diff hunk://#diff-c307589e9f958f2593c9567c5ad9d63cd03788aa4803b3017b1c13b0d0485805L92-R191) Closes #(issue_number) --- ## Checklist ### General - [ ] I have read the [Contribution Guidelines](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/CONTRIBUTING.md) - [ ] I have read the [Stirling-PDF Developer Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/DeveloperGuide.md) (if applicable) - [ ] I have read the [How to add new languages to Stirling-PDF](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/HowToAddNewLanguage.md) (if applicable) - [ ] I have performed a self-review of my own code - [ ] My changes generate no new warnings ### Documentation - [ ] I have updated relevant docs on [Stirling-PDF's doc repo](https://github.com/Stirling-Tools/Stirling-Tools.github.io/blob/main/docs/) (if functionality has heavily changed) - [ ] I have read the section [Add New Translation Tags](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/HowToAddNewLanguage.md#add-new-translation-tags) (for new translation tags only) ### UI Changes (if applicable) - [ ] Screenshots or videos demonstrating the UI changes are attached (e.g., as comments or direct attachments in the PR) ### Testing (if applicable) - [ ] I have tested my changes locally. Refer to the [Testing Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/DeveloperGuide.md#6-testing) for more details. --------- Co-authored-by: a <a>
This commit is contained in:
@@ -141,7 +141,17 @@ function setupFileInput(chooser) {
|
||||
allFiles = Array.from(isDragAndDrop ? allFiles : [element.files[0]]);
|
||||
}
|
||||
|
||||
const originalText = inputContainer.querySelector('#fileInputText').innerHTML;
|
||||
|
||||
inputContainer.querySelector('#fileInputText').innerHTML = window.fileInput.loading;
|
||||
|
||||
async function checkZipFile() {
|
||||
const hasZipFiles = allFiles.some(file => zipTypes.includes(file.type));
|
||||
|
||||
// Only change to extractPDF message if we actually have zip files
|
||||
if (hasZipFiles) {
|
||||
inputContainer.querySelector('#fileInputText').innerHTML = window.fileInput.extractPDF;
|
||||
}
|
||||
|
||||
const promises = allFiles.map(async (file, index) => {
|
||||
try {
|
||||
@@ -156,13 +166,10 @@ function setupFileInput(chooser) {
|
||||
});
|
||||
|
||||
await Promise.all(promises);
|
||||
|
||||
}
|
||||
const originalText = inputContainer.querySelector('#fileInputText').innerHTML;
|
||||
|
||||
const decryptFile = new DecryptFile();
|
||||
|
||||
inputContainer.querySelector('#fileInputText').innerHTML = window.fileInput.extractPDF;
|
||||
|
||||
await checkZipFile();
|
||||
|
||||
allFiles = await Promise.all(
|
||||
@@ -224,26 +231,26 @@ function setupFileInput(chooser) {
|
||||
.then(function (zip) {
|
||||
var extractionPromises = [];
|
||||
|
||||
zip.forEach(function (relativePath, zipEntry) {
|
||||
|
||||
const promise = zipEntry.async('blob').then(function (content) {
|
||||
// Assuming that folders have size zero
|
||||
if (content.size > 0) {
|
||||
const extension = zipEntry.name.split('.').pop().toLowerCase();
|
||||
const mimeType = mimeTypes[extension];
|
||||
|
||||
// Check for file extension
|
||||
if (mimeType && (mimeType.startsWith(acceptedFileType.split('/')[0]) || acceptedFileType === mimeType)) {
|
||||
|
||||
var file = new File([content], zipEntry.name, { type: mimeType });
|
||||
file.uniqueId = UUID.uuidv4();
|
||||
allFiles.push(file);
|
||||
|
||||
} else {
|
||||
console.log(`File ${zipEntry.name} skipped. MIME type (${mimeType}) does not match accepted type (${acceptedFileType})`);
|
||||
}
|
||||
}
|
||||
});
|
||||
zip.forEach(function (relativePath, zipEntry) {
|
||||
const promise = zipEntry.async('blob').then(function (content) {
|
||||
// Assuming that folders have size zero
|
||||
if (content.size > 0) {
|
||||
const extension = zipEntry.name.split('.').pop().toLowerCase();
|
||||
const mimeType = mimeTypes[extension] || 'application/octet-stream';
|
||||
|
||||
// Check if we're accepting ONLY ZIP files (in which case extract everything)
|
||||
// or if the file type matches the accepted type
|
||||
if (zipTypes.includes(acceptedFileType) ||
|
||||
acceptedFileType === '*/*' ||
|
||||
(mimeType && (mimeType.startsWith(acceptedFileType.split('/')[0]) || acceptedFileType === mimeType))) {
|
||||
var file = new File([content], zipEntry.name, { type: mimeType });
|
||||
file.uniqueId = UUID.uuidv4();
|
||||
allFiles.push(file);
|
||||
} else {
|
||||
console.log(`File ${zipEntry.name} skipped. MIME type (${mimeType}) does not match accepted type (${acceptedFileType})`);
|
||||
}
|
||||
}
|
||||
});
|
||||
|
||||
extractionPromises.push(promise);
|
||||
});
|
||||
|
||||
@@ -224,15 +224,20 @@
|
||||
window.fileInput = {
|
||||
dragAndDropPDF: '[[#{fileChooser.dragAndDropPDF}]]',
|
||||
dragAndDropImage: '[[#{fileChooser.dragAndDropImage}]]',
|
||||
extractPDF: '[[#{fileChooser.extractPDF}]]'
|
||||
extractPDF: '[[#{fileChooser.extractPDF}]]',
|
||||
loading: '[[#{loading}]]'
|
||||
};</script>
|
||||
<div class="custom-file-chooser mb-3"
|
||||
th:attr="data-bs-unique-id=${name}, data-bs-element-id=${name+'-input'}, data-bs-element-container-id=${name+'-input-container'}, data-bs-show-uploads=${showUploads}, data-bs-files-selected=#{filesSelected}, data-bs-pdf-prompt=#{pdfPrompt}">
|
||||
<div class="mb-3 d-flex flex-row justify-content-center align-items-center flex-wrap input-container"
|
||||
th:name="${name}+'-input'" th:id="${name}+'-input-container'" th:data-text="#{fileChooser.hoveredDragAndDrop}">
|
||||
<label class="file-input-btn d-none">
|
||||
<input type="file" class="form-control" th:name="${name}" th:id="${name}+'-input'" th:accept="${accept} + ',.zip'"
|
||||
th:attr="multiple=${!disableMultipleFiles}" th:required="${notRequired} ? null : 'required'">
|
||||
<input type="file" class="form-control"
|
||||
th:name="${name}"
|
||||
th:id="${name}+'-input'"
|
||||
th:accept="${accept == null ? '*/*': ((accept == '*/*') ? accept : (accept + ',.zip'))}"
|
||||
th:attr="multiple=${!disableMultipleFiles}"
|
||||
th:required="${notRequired} ? null : 'required'">
|
||||
Browse
|
||||
</label>
|
||||
<div class="d-flex justify-content-start align-items-center" id="fileInputText">
|
||||
|
||||
@@ -64,7 +64,7 @@
|
||||
</div>
|
||||
<div class="element-margin">
|
||||
<div
|
||||
th:replace="~{fragments/common :: fileSelector(name='fileInput', multipleInputsForSingleRequest=true)}"
|
||||
th:replace="~{fragments/common :: fileSelector(name='fileInput', multipleInputsForSingleRequest=true, accept='*/*')}"
|
||||
></div>
|
||||
</div>
|
||||
<div class="element-margin text-start">
|
||||
@@ -93,7 +93,7 @@
|
||||
|
||||
<!-- The Modal -->
|
||||
<div class="modal" id="pipelineSettingsModal">
|
||||
<div class="modal-dialog modal-lg">
|
||||
<div class="modal-dialog modal-dialog-centered modal-lg">
|
||||
<div class="modal-content dark-card">
|
||||
<!-- Modal Header -->
|
||||
<div class="modal-header">
|
||||
|
||||
Reference in New Issue
Block a user