Balázs Szücs 69b035c6e1 [V2] feat(security): add PDF standards verification feature using veraPDF (#4874)
# Description of Changes
- Implemented `PDFVerificationRequest` and `PDFVerificationResult`
models for validation requests and responses
- Developed `VeraPDFService` to validate PDFs against specific or
auto-detected standards
- Added `VerifyPDFController` with an endpoint for PDF verification
- Integrated veraPDF dependencies into project build file
- Deprecated unused `/verify-pdf` form in `SecurityWebController`
- Updated `EndpointConfiguration` to include the new `verify-pdf`
endpoint

This PR introduces a PDF standards verification feature to Stirling-PDF,
powered by the industry-standard veraPDF validation library. This
feature enables users to validate PDF files against multiple PDF
standards including PDF/A (archival), PDF/UA (accessibility), and WTPDF
standards.

### 1. PDF Standards Verification Endpoint
- New API Endpoint: `/api/v1/security/verify-pdf`
- Validates PDF files against multiple standards:
  - PDF/A (1b, 2a, 2b, 2u, 3a, 3b, 3u, 4, 4e, 4f) - Archival standards
  - PDF/UA-1 and PDF/UA-2 - Universal Accessibility standards
  - WTPDF - Well-Tagged PDF standards
- Auto-detection: Automatically detects and validates all standards
declared in the PDF's XMP metadata


### 2. Validation Results
The verification returns detailed JSON results including:
- Compliance status: Whether the PDF meets the standard requirements
- Declared vs validated standards: Shows what the PDF claims to be vs
what it actually is
- Categorized issues:
  - Errors: Critical compliance failures that prevent certification
  - Warnings: Non-critical issues and recommendations
- Detailed issue information:
  - Rule IDs from the specification
  - Descriptive error messages
  - Location within the PDF where the issue occurs
  - Specification references (clause numbers, test numbers)

### 3. Detect Issue Classification
Implements intelligent classification of validation issues:
- Errors: Issues that prevent standard compliance (font problems, color
space issues, structural problems)
- Warnings: Recommended but not required elements (metadata
recommendations, optional features)
- Classification based on:
  - Rule ID patterns
  - Clause number prefixes
  - Message content analysis

### New Files Added

#### Controllers
- VerifyPDFController.java: REST API controller handling PDF
verification requests
  - Handles multipart file uploads
  - Supports both single-standard and auto-detection modes
- Comprehensive error handling for encrypted PDFs, parsing errors, and
validation failures

#### Models
- PDFVerificationRequest.java: Request model for verification API
  - Extends standard PDFFile model
  - Optional `standard` parameter for manual standard selection
  
- PDFVerificationResult.java: Response model containing validation
results
  - Includes standard information and validation profile details
  - Separate lists for errors and warnings
  - Nested `ValidationIssue` class for detailed issue reporting

#### Services
- VeraPDFService.java: Core service implementing veraPDF integration
  - Initializes veraPDF Greenfield engine
  - Extracts declared PDF/A standards from XMP metadata
  - Performs validation against specified or detected standards
  - Converts veraPDF results to application-specific format
  - Implements smart issue classification logic


### Endpoint Configuration Updates

#### EndpointConfiguration.java
- Added `verify-pdf` to the Security group
- Added `verify-pdf` to the Java group (no external tools required)
- Created new veraPDF dependency group for endpoint availability
tracking
- Updated `isToolGroup()` method to recognize veraPDF as a tool
dependency


### Supported Standards

#### PDF/A (Archival)
- PDF/A-1 (a, b): ISO 19005-1:2005
- PDF/A-2 (a, b, u): ISO 19005-2:2011
- PDF/A-3 (a, b, u): ISO 19005-3:2012
- PDF/A-4 (standard, e, f): ISO 19005-4:2020

#### PDF/UA (Universal Accessibility)
- PDF/UA-1: ISO 14289-1:2014
- PDF/UA-2: ISO 14289-2 (latest)

#### WTPDF (Well-Tagged PDF)
- WTPDF 1.0: Tagged PDF for accessibility and structure

### Security Considerations

The following test scenarios should be validated:

1. Valid PDF/A documents (should return compliant)
2. Non-compliant PDF/A documents (should return errors)
3. PDFs without PDF/A declaration (should detect and report)
4. PDF/UA documents (should validate accessibility)
5. Encrypted PDFs (should return appropriate error)
6. Mixed standards (PDF/A + PDF/UA) (should validate both)
7. Empty standard parameter (should auto-detect)
8. Invalid standard parameter (should return error)

### API Usage Examples

```bash
curl -X POST http://localhost:8080/api/v1/security/verify-pdf \
  -F "fileInput=@document.pdf"
```


### Example Response
```json
[
  {
    "standard": "3b",
    "standardName": "PDF/A-ISO 19005-3:2012B compliant",
    "validationProfile": "3b",
    "validationProfileName": "PDF/A-ISO 19005-3:2012B",
    "complianceSummary": "PDF/A-ISO 19005-3:2012B compliant",
    "declaredPdfa": true,
    "compliant": true,
    "totalFailures": 0,
    "totalWarnings": 0,
    "failures": [],
    "warnings": []
  }
]
```

```json
[
  {
    "standard": "2b",
    "standardName": "PDF/A-ISO 19005-2:2011B with errors",
    "validationProfile": "2b",
    "validationProfileName": "PDF/A-ISO 19005-2:2011B",
    "complianceSummary": "PDF/A-ISO 19005-2:2011B with errors",
    "declaredPdfa": true,
    "compliant": false,
    "totalFailures": 2,
    "totalWarnings": 0,
    "failures": [
      {
        "ruleId": "RuleId [specification=ISO 19005-2:2011, clause=6.2.11.4.1, testNumber=1]",
        "message": "The font programs for all fonts used for rendering within a conforming file shall be embedded within that file, as defined in ISO 32000-1:2008, 9.9",
        "location": "Location [level=CosDocument, context=root/document[0]/pages[0](3 0 obj PDPage)/contentStream[0](105 0 obj PDContentStream)/operators[60]/font[0](ArialMT)]",
        "specification": "ISO 19005-2:2011",
        "clause": "6.2.11.4.1",
        "testNumber": "1"
      },
      {
        "ruleId": "RuleId [specification=ISO 19005-2:2011, clause=6.3.2, testNumber=1]",
        "message": "Except for annotation dictionaries whose Subtype value is Popup, all annotation dictionaries shall contain the F key",
        "location": "Location [level=CosDocument, context=root/document[0]/pages[0](3 0 obj PDPage)/annots[4](107 0 obj PDLinkAnnot)]",
        "specification": "ISO 19005-2:2011",
        "clause": "6.3.2",
        "testNumber": "1"
      }
    ],
    "warnings": []
  }
]
```

<!--
Please provide a summary of the changes, including:

- What was changed
- Why the change was made
- Any challenges encountered

Closes #(issue_number)
-->

---

## Checklist

### General

- [ ] I have read the [Contribution
Guidelines](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/CONTRIBUTING.md)
- [ ] I have read the [Stirling-PDF Developer
Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md)
(if applicable)
- [ ] I have read the [How to add new languages to
Stirling-PDF](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md)
(if applicable)
- [ ] I have performed a self-review of my own code
- [ ] My changes generate no new warnings

### Documentation

- [ ] I have updated relevant docs on [Stirling-PDF's doc
repo](https://github.com/Stirling-Tools/Stirling-Tools.github.io/blob/main/docs/)
(if functionality has heavily changed)
- [ ] I have read the section [Add New Translation
Tags](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md#add-new-translation-tags)
(for new translation tags only)

### Translations (if applicable)

- [ ] I ran
[`scripts/counter_translation.py`](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/docs/counter_translation.md)

### UI Changes (if applicable)

- [ ] Screenshots or videos demonstrating the UI changes are attached
(e.g., as comments or direct attachments in the PR)

### Testing (if applicable)

- [ ] I have tested my changes locally. Refer to the [Testing
Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md#6-testing)
for more details.

---------

Signed-off-by: Balázs Szücs <bszucs1209@gmail.com>
2025-12-23 22:29:30 +00:00
2025-12-21 10:40:32 +00:00
2025-12-21 10:40:32 +00:00
2025-12-21 10:40:32 +00:00
2025-12-02 17:15:29 +00:00
2025-11-26 17:21:42 +00:00
2025-12-21 10:40:32 +00:00
2025-12-03 09:57:00 +00:00

Stirling PDF logo

Stirling PDF - The Open-Source PDF Platform

Stirling PDF is a powerful, open-source PDF editing platform. Run it as a personal desktop app, in the browser, or deploy it on your own servers with a private API. Edit, sign, redact, convert, and automate PDFs without sending documents to external services.

Docker Pulls Discord OpenSSF Scorecard GitHub Repo stars

Stirling PDF - Dashboard

Key Capabilities

  • Everywhere you work - Desktop client, browser UI, and self-hosted server with a private API.
  • 50+ PDF tools - Edit, merge, split, sign, redact, convert, OCR, compress, and more.
  • Automation & workflows - No-code pipelines direct in UI with APIs to process millions of PDFs.
  • Enterprisegrade - SSO, auditing, and flexible onprem deployments.
  • Developer platform - REST APIs available for nearly all tools to integrate into your existing systems.
  • Global UI - Interface available in 40+ languages.

For a full feature list, see the docs: https://docs.stirlingpdf.com

Quick Start

docker run -p 8080:8080 docker.stirlingpdf.com/stirlingtools/stirling-pdf

Then open: http://localhost:8080

For full installation options (including desktop and Kubernetes), see our Documentation Guide.

Resources

Support

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

For development setup, see the Developer Guide.

For adding translations, see the Translation Guide.

License

Stirling PDF is open-core. See LICENSE for details.

Description
locally hosted web application that allows you to perform various operations on PDF files
Readme 596 MiB
Languages
TypeScript 45.2%
Java 42.4%
JavaScript 4.7%
CSS 2.8%
Python 2.1%
Other 2.7%