Commit Graph

3 Commits

Author SHA1 Message Date
Balázs Szücs
69b035c6e1
[V2] feat(security): add PDF standards verification feature using veraPDF (#4874)
# Description of Changes
- Implemented `PDFVerificationRequest` and `PDFVerificationResult`
models for validation requests and responses
- Developed `VeraPDFService` to validate PDFs against specific or
auto-detected standards
- Added `VerifyPDFController` with an endpoint for PDF verification
- Integrated veraPDF dependencies into project build file
- Deprecated unused `/verify-pdf` form in `SecurityWebController`
- Updated `EndpointConfiguration` to include the new `verify-pdf`
endpoint

This PR introduces a PDF standards verification feature to Stirling-PDF,
powered by the industry-standard veraPDF validation library. This
feature enables users to validate PDF files against multiple PDF
standards including PDF/A (archival), PDF/UA (accessibility), and WTPDF
standards.

### 1. PDF Standards Verification Endpoint
- New API Endpoint: `/api/v1/security/verify-pdf`
- Validates PDF files against multiple standards:
  - PDF/A (1b, 2a, 2b, 2u, 3a, 3b, 3u, 4, 4e, 4f) - Archival standards
  - PDF/UA-1 and PDF/UA-2 - Universal Accessibility standards
  - WTPDF - Well-Tagged PDF standards
- Auto-detection: Automatically detects and validates all standards
declared in the PDF's XMP metadata


### 2. Validation Results
The verification returns detailed JSON results including:
- Compliance status: Whether the PDF meets the standard requirements
- Declared vs validated standards: Shows what the PDF claims to be vs
what it actually is
- Categorized issues:
  - Errors: Critical compliance failures that prevent certification
  - Warnings: Non-critical issues and recommendations
- Detailed issue information:
  - Rule IDs from the specification
  - Descriptive error messages
  - Location within the PDF where the issue occurs
  - Specification references (clause numbers, test numbers)

### 3. Detect Issue Classification
Implements intelligent classification of validation issues:
- Errors: Issues that prevent standard compliance (font problems, color
space issues, structural problems)
- Warnings: Recommended but not required elements (metadata
recommendations, optional features)
- Classification based on:
  - Rule ID patterns
  - Clause number prefixes
  - Message content analysis

### New Files Added

#### Controllers
- VerifyPDFController.java: REST API controller handling PDF
verification requests
  - Handles multipart file uploads
  - Supports both single-standard and auto-detection modes
- Comprehensive error handling for encrypted PDFs, parsing errors, and
validation failures

#### Models
- PDFVerificationRequest.java: Request model for verification API
  - Extends standard PDFFile model
  - Optional `standard` parameter for manual standard selection
  
- PDFVerificationResult.java: Response model containing validation
results
  - Includes standard information and validation profile details
  - Separate lists for errors and warnings
  - Nested `ValidationIssue` class for detailed issue reporting

#### Services
- VeraPDFService.java: Core service implementing veraPDF integration
  - Initializes veraPDF Greenfield engine
  - Extracts declared PDF/A standards from XMP metadata
  - Performs validation against specified or detected standards
  - Converts veraPDF results to application-specific format
  - Implements smart issue classification logic


### Endpoint Configuration Updates

#### EndpointConfiguration.java
- Added `verify-pdf` to the Security group
- Added `verify-pdf` to the Java group (no external tools required)
- Created new veraPDF dependency group for endpoint availability
tracking
- Updated `isToolGroup()` method to recognize veraPDF as a tool
dependency


### Supported Standards

#### PDF/A (Archival)
- PDF/A-1 (a, b): ISO 19005-1:2005
- PDF/A-2 (a, b, u): ISO 19005-2:2011
- PDF/A-3 (a, b, u): ISO 19005-3:2012
- PDF/A-4 (standard, e, f): ISO 19005-4:2020

#### PDF/UA (Universal Accessibility)
- PDF/UA-1: ISO 14289-1:2014
- PDF/UA-2: ISO 14289-2 (latest)

#### WTPDF (Well-Tagged PDF)
- WTPDF 1.0: Tagged PDF for accessibility and structure

### Security Considerations

The following test scenarios should be validated:

1. Valid PDF/A documents (should return compliant)
2. Non-compliant PDF/A documents (should return errors)
3. PDFs without PDF/A declaration (should detect and report)
4. PDF/UA documents (should validate accessibility)
5. Encrypted PDFs (should return appropriate error)
6. Mixed standards (PDF/A + PDF/UA) (should validate both)
7. Empty standard parameter (should auto-detect)
8. Invalid standard parameter (should return error)

### API Usage Examples

```bash
curl -X POST http://localhost:8080/api/v1/security/verify-pdf \
  -F "fileInput=@document.pdf"
```


### Example Response
```json
[
  {
    "standard": "3b",
    "standardName": "PDF/A-ISO 19005-3:2012B compliant",
    "validationProfile": "3b",
    "validationProfileName": "PDF/A-ISO 19005-3:2012B",
    "complianceSummary": "PDF/A-ISO 19005-3:2012B compliant",
    "declaredPdfa": true,
    "compliant": true,
    "totalFailures": 0,
    "totalWarnings": 0,
    "failures": [],
    "warnings": []
  }
]
```

```json
[
  {
    "standard": "2b",
    "standardName": "PDF/A-ISO 19005-2:2011B with errors",
    "validationProfile": "2b",
    "validationProfileName": "PDF/A-ISO 19005-2:2011B",
    "complianceSummary": "PDF/A-ISO 19005-2:2011B with errors",
    "declaredPdfa": true,
    "compliant": false,
    "totalFailures": 2,
    "totalWarnings": 0,
    "failures": [
      {
        "ruleId": "RuleId [specification=ISO 19005-2:2011, clause=6.2.11.4.1, testNumber=1]",
        "message": "The font programs for all fonts used for rendering within a conforming file shall be embedded within that file, as defined in ISO 32000-1:2008, 9.9",
        "location": "Location [level=CosDocument, context=root/document[0]/pages[0](3 0 obj PDPage)/contentStream[0](105 0 obj PDContentStream)/operators[60]/font[0](ArialMT)]",
        "specification": "ISO 19005-2:2011",
        "clause": "6.2.11.4.1",
        "testNumber": "1"
      },
      {
        "ruleId": "RuleId [specification=ISO 19005-2:2011, clause=6.3.2, testNumber=1]",
        "message": "Except for annotation dictionaries whose Subtype value is Popup, all annotation dictionaries shall contain the F key",
        "location": "Location [level=CosDocument, context=root/document[0]/pages[0](3 0 obj PDPage)/annots[4](107 0 obj PDLinkAnnot)]",
        "specification": "ISO 19005-2:2011",
        "clause": "6.3.2",
        "testNumber": "1"
      }
    ],
    "warnings": []
  }
]
```

<!--
Please provide a summary of the changes, including:

- What was changed
- Why the change was made
- Any challenges encountered

Closes #(issue_number)
-->

---

## Checklist

### General

- [ ] I have read the [Contribution
Guidelines](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/CONTRIBUTING.md)
- [ ] I have read the [Stirling-PDF Developer
Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md)
(if applicable)
- [ ] I have read the [How to add new languages to
Stirling-PDF](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md)
(if applicable)
- [ ] I have performed a self-review of my own code
- [ ] My changes generate no new warnings

### Documentation

- [ ] I have updated relevant docs on [Stirling-PDF's doc
repo](https://github.com/Stirling-Tools/Stirling-Tools.github.io/blob/main/docs/)
(if functionality has heavily changed)
- [ ] I have read the section [Add New Translation
Tags](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md#add-new-translation-tags)
(for new translation tags only)

### Translations (if applicable)

- [ ] I ran
[`scripts/counter_translation.py`](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/docs/counter_translation.md)

### UI Changes (if applicable)

- [ ] Screenshots or videos demonstrating the UI changes are attached
(e.g., as comments or direct attachments in the PR)

### Testing (if applicable)

- [ ] I have tested my changes locally. Refer to the [Testing
Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md#6-testing)
for more details.

---------

Signed-off-by: Balázs Szücs <bszucs1209@gmail.com>
2025-12-23 22:29:30 +00:00
Balázs Szücs
ec1ac4cb2d
feat(cbr-to-pdf,pdf-to-cbr): add PDF to/from CBR conversion with ebook optimization option (#4581)
# Description of Changes

This pull request adds support for converting CBR (Comic Book RAR) files
to PDF, optimizes CBZ/CBR-to-PDF conversion for e-readers using
Ghostscript, and improves file type detection and image file handling.
It introduces the `CbrUtils` and `PdfToCbrUtils` utility classes,
refactors CBZ conversion logic, and integrates these features into the
API controller. The most important changes are grouped below.

### CBR Support and Conversion:

- Added the `com.github.junrar:junrar` dependency to support RAR/CBR
archive extraction in `build.gradle`. (https://github.com/junrar/junrar
and https://github.com/junrar/junrar?tab=License-1-ov-file#readme for
repo and license)
- Introduced the new utility class `CbrUtils` for converting CBR files
to PDF, including image extraction, sorting, and error handling.
- Added the `PdfToCbrUtils` utility class to convert PDF files into CBR
archives by rendering each page as an image and packaging them.

### CBZ/CBR Conversion Optimization:

- Refactored `CbzUtils.convertCbzToPdf` to support optional Ghostscript
optimization for e-reader compatibility and added a new method for this.
- Added `GeneralUtils.optimizePdfWithGhostscript`, which uses
Ghostscript to optimize PDFs for e-readers, and integrated error
handling.

### API Controller Integration:

- Updated `ConvertImgPDFController` to support CBR conversion, CBZ/CBR
optimization toggling, and Ghostscript availability checks.

### Endpoints
<img width="1298" height="522" alt="image"
src="https://github.com/user-attachments/assets/144d3e03-a637-451a-9c35-f784b2a66dc1"
/>

<img width="1279" height="472" alt="image"
src="https://github.com/user-attachments/assets/879f221d-b775-4224-8edb-a23dbea6a0ca"
/>

### UI

<img width="384" height="105" alt="image"
src="https://github.com/user-attachments/assets/5f861943-0706-4fad-8775-c40a9c1f3170"
/>


### File Type and Image Detection Improvements:

- Improved file extension detection for comic book files and image files
in `CbzUtils` and added a shared regex pattern utility for image files.

### Additional notes:
- Please keep in mind new the dependency, this is not dependency-free
implementation (as opposed to CBZ converter)
- RAR 5 currently not supported. (because JUNRAR does not support it)
- Added the new ebook optimization func to GeneralUtils since we'll soon
(hopefully) at least 3 book/ebook formats (EPUB, CBZ, CBR) all of which
can use it.
- Once again this has been thoroughly tested but can't share actual
"real life" file due to copyright.


Closes: #775
<!--
Please provide a summary of the changes, including:

- What was changed
- Why the change was made
- Any challenges encountered

Closes #(issue_number)
-->

---

## Checklist

### General

- [x] I have read the [Contribution
Guidelines](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/CONTRIBUTING.md)
- [x] I have read the [Stirling-PDF Developer
Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md)
(if applicable)
- [ ] I have read the [How to add new languages to
Stirling-PDF](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md)
(if applicable)
- [x] I have performed a self-review of my own code
- [x] My changes generate no new warnings

### Documentation

- [x] I have updated relevant docs on [Stirling-PDF's doc
repo](https://github.com/Stirling-Tools/Stirling-Tools.github.io/blob/main/docs/)
(if functionality has heavily changed)
- [ ] I have read the section [Add New Translation
Tags](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md#add-new-translation-tags)
(for new translation tags only)

### UI Changes (if applicable)

- [x] Screenshots or videos demonstrating the UI changes are attached
(e.g., as comments or direct attachments in the PR)

### Testing (if applicable)

- [x] I have tested my changes locally. Refer to the [Testing
Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md#6-testing)
for more details.

---------

Signed-off-by: Balázs Szücs <bszucs1209@gmail.com>
2025-10-04 11:15:23 +01:00
Ludy
299d52c517
refactor: move modules under app/ directory and update file paths (#3938)
# Description of Changes

- **What was changed:**  
- Renamed top-level directories: `stirling-pdf` → `app/core`, `common` →
`app/common`, `proprietary` → `app/proprietary`.
- Updated all path references in `.gitattributes`, GitHub workflows
(`.github/workflows/*`), scripts (`.github/scripts/*`), `.gitignore`,
Dockerfiles, license files, and template settings to reflect the new
structure.
- Added a new CI job `check-generateOpenApiDocs` to generate and upload
OpenAPI documentation.
- Removed redundant `@Autowired` annotations from `TempFileShutdownHook`
and `UnlockPDFFormsController`.
- Minor formatting and comment adjustments in YAML templates and
resource files.

- **Why the change was made:**  
- To introduce a clear `app/` directory hierarchy for core, common, and
proprietary modules, improving organization and maintainability.
- To ensure continuous integration and Docker builds continue to work
seamlessly with the reorganized structure.
- To automate OpenAPI documentation generation as part of the CI
pipeline.

---

## Checklist

### General

- [x] I have read the [Contribution
Guidelines](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/CONTRIBUTING.md)
- [x] I have read the [Stirling-PDF Developer
Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md)
(if applicable)
- [ ] I have read the [How to add new languages to
Stirling-PDF](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md)
(if applicable)
- [x] I have performed a self-review of my own code
- [x] My changes generate no new warnings

### Documentation

- [ ] I have updated relevant docs on [Stirling-PDF's doc
repo](https://github.com/Stirling-Tools/Stirling-Tools.github.io/blob/main/docs/)
(if functionality has heavily changed)
- [ ] I have read the section [Add New Translation
Tags](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md#add-new-translation-tags)
(for new translation tags only)

### UI Changes (if applicable)

- [ ] Screenshots or videos demonstrating the UI changes are attached
(e.g., as comments or direct attachments in the PR)

### Testing (if applicable)

- [ ] I have tested my changes locally. Refer to the [Testing
Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md#6-testing)
for more details.
2025-07-14 20:53:11 +01:00