Stirling-PDF/app/core/build.gradle
Balázs Szücs 69b035c6e1
[V2] feat(security): add PDF standards verification feature using veraPDF (#4874)
# Description of Changes
- Implemented `PDFVerificationRequest` and `PDFVerificationResult`
models for validation requests and responses
- Developed `VeraPDFService` to validate PDFs against specific or
auto-detected standards
- Added `VerifyPDFController` with an endpoint for PDF verification
- Integrated veraPDF dependencies into project build file
- Deprecated unused `/verify-pdf` form in `SecurityWebController`
- Updated `EndpointConfiguration` to include the new `verify-pdf`
endpoint

This PR introduces a PDF standards verification feature to Stirling-PDF,
powered by the industry-standard veraPDF validation library. This
feature enables users to validate PDF files against multiple PDF
standards including PDF/A (archival), PDF/UA (accessibility), and WTPDF
standards.

### 1. PDF Standards Verification Endpoint
- New API Endpoint: `/api/v1/security/verify-pdf`
- Validates PDF files against multiple standards:
  - PDF/A (1b, 2a, 2b, 2u, 3a, 3b, 3u, 4, 4e, 4f) - Archival standards
  - PDF/UA-1 and PDF/UA-2 - Universal Accessibility standards
  - WTPDF - Well-Tagged PDF standards
- Auto-detection: Automatically detects and validates all standards
declared in the PDF's XMP metadata


### 2. Validation Results
The verification returns detailed JSON results including:
- Compliance status: Whether the PDF meets the standard requirements
- Declared vs validated standards: Shows what the PDF claims to be vs
what it actually is
- Categorized issues:
  - Errors: Critical compliance failures that prevent certification
  - Warnings: Non-critical issues and recommendations
- Detailed issue information:
  - Rule IDs from the specification
  - Descriptive error messages
  - Location within the PDF where the issue occurs
  - Specification references (clause numbers, test numbers)

### 3. Detect Issue Classification
Implements intelligent classification of validation issues:
- Errors: Issues that prevent standard compliance (font problems, color
space issues, structural problems)
- Warnings: Recommended but not required elements (metadata
recommendations, optional features)
- Classification based on:
  - Rule ID patterns
  - Clause number prefixes
  - Message content analysis

### New Files Added

#### Controllers
- VerifyPDFController.java: REST API controller handling PDF
verification requests
  - Handles multipart file uploads
  - Supports both single-standard and auto-detection modes
- Comprehensive error handling for encrypted PDFs, parsing errors, and
validation failures

#### Models
- PDFVerificationRequest.java: Request model for verification API
  - Extends standard PDFFile model
  - Optional `standard` parameter for manual standard selection
  
- PDFVerificationResult.java: Response model containing validation
results
  - Includes standard information and validation profile details
  - Separate lists for errors and warnings
  - Nested `ValidationIssue` class for detailed issue reporting

#### Services
- VeraPDFService.java: Core service implementing veraPDF integration
  - Initializes veraPDF Greenfield engine
  - Extracts declared PDF/A standards from XMP metadata
  - Performs validation against specified or detected standards
  - Converts veraPDF results to application-specific format
  - Implements smart issue classification logic


### Endpoint Configuration Updates

#### EndpointConfiguration.java
- Added `verify-pdf` to the Security group
- Added `verify-pdf` to the Java group (no external tools required)
- Created new veraPDF dependency group for endpoint availability
tracking
- Updated `isToolGroup()` method to recognize veraPDF as a tool
dependency


### Supported Standards

#### PDF/A (Archival)
- PDF/A-1 (a, b): ISO 19005-1:2005
- PDF/A-2 (a, b, u): ISO 19005-2:2011
- PDF/A-3 (a, b, u): ISO 19005-3:2012
- PDF/A-4 (standard, e, f): ISO 19005-4:2020

#### PDF/UA (Universal Accessibility)
- PDF/UA-1: ISO 14289-1:2014
- PDF/UA-2: ISO 14289-2 (latest)

#### WTPDF (Well-Tagged PDF)
- WTPDF 1.0: Tagged PDF for accessibility and structure

### Security Considerations

The following test scenarios should be validated:

1. Valid PDF/A documents (should return compliant)
2. Non-compliant PDF/A documents (should return errors)
3. PDFs without PDF/A declaration (should detect and report)
4. PDF/UA documents (should validate accessibility)
5. Encrypted PDFs (should return appropriate error)
6. Mixed standards (PDF/A + PDF/UA) (should validate both)
7. Empty standard parameter (should auto-detect)
8. Invalid standard parameter (should return error)

### API Usage Examples

```bash
curl -X POST http://localhost:8080/api/v1/security/verify-pdf \
  -F "fileInput=@document.pdf"
```


### Example Response
```json
[
  {
    "standard": "3b",
    "standardName": "PDF/A-ISO 19005-3:2012B compliant",
    "validationProfile": "3b",
    "validationProfileName": "PDF/A-ISO 19005-3:2012B",
    "complianceSummary": "PDF/A-ISO 19005-3:2012B compliant",
    "declaredPdfa": true,
    "compliant": true,
    "totalFailures": 0,
    "totalWarnings": 0,
    "failures": [],
    "warnings": []
  }
]
```

```json
[
  {
    "standard": "2b",
    "standardName": "PDF/A-ISO 19005-2:2011B with errors",
    "validationProfile": "2b",
    "validationProfileName": "PDF/A-ISO 19005-2:2011B",
    "complianceSummary": "PDF/A-ISO 19005-2:2011B with errors",
    "declaredPdfa": true,
    "compliant": false,
    "totalFailures": 2,
    "totalWarnings": 0,
    "failures": [
      {
        "ruleId": "RuleId [specification=ISO 19005-2:2011, clause=6.2.11.4.1, testNumber=1]",
        "message": "The font programs for all fonts used for rendering within a conforming file shall be embedded within that file, as defined in ISO 32000-1:2008, 9.9",
        "location": "Location [level=CosDocument, context=root/document[0]/pages[0](3 0 obj PDPage)/contentStream[0](105 0 obj PDContentStream)/operators[60]/font[0](ArialMT)]",
        "specification": "ISO 19005-2:2011",
        "clause": "6.2.11.4.1",
        "testNumber": "1"
      },
      {
        "ruleId": "RuleId [specification=ISO 19005-2:2011, clause=6.3.2, testNumber=1]",
        "message": "Except for annotation dictionaries whose Subtype value is Popup, all annotation dictionaries shall contain the F key",
        "location": "Location [level=CosDocument, context=root/document[0]/pages[0](3 0 obj PDPage)/annots[4](107 0 obj PDLinkAnnot)]",
        "specification": "ISO 19005-2:2011",
        "clause": "6.3.2",
        "testNumber": "1"
      }
    ],
    "warnings": []
  }
]
```

<!--
Please provide a summary of the changes, including:

- What was changed
- Why the change was made
- Any challenges encountered

Closes #(issue_number)
-->

---

## Checklist

### General

- [ ] I have read the [Contribution
Guidelines](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/CONTRIBUTING.md)
- [ ] I have read the [Stirling-PDF Developer
Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md)
(if applicable)
- [ ] I have read the [How to add new languages to
Stirling-PDF](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md)
(if applicable)
- [ ] I have performed a self-review of my own code
- [ ] My changes generate no new warnings

### Documentation

- [ ] I have updated relevant docs on [Stirling-PDF's doc
repo](https://github.com/Stirling-Tools/Stirling-Tools.github.io/blob/main/docs/)
(if functionality has heavily changed)
- [ ] I have read the section [Add New Translation
Tags](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md#add-new-translation-tags)
(for new translation tags only)

### Translations (if applicable)

- [ ] I ran
[`scripts/counter_translation.py`](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/docs/counter_translation.md)

### UI Changes (if applicable)

- [ ] Screenshots or videos demonstrating the UI changes are attached
(e.g., as comments or direct attachments in the PR)

### Testing (if applicable)

- [ ] I have tested my changes locally. Refer to the [Testing
Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md#6-testing)
for more details.

---------

Signed-off-by: Balázs Szücs <bszucs1209@gmail.com>
2025-12-23 22:29:30 +00:00

306 lines
11 KiB
Groovy

apply plugin: 'org.springframework.boot'
import org.apache.tools.ant.taskdefs.condition.Os
repositories {
maven { url = 'https://build.shibboleth.net/maven/releases' }
maven { url = 'https://maven.pkg.github.com/jcefmaven/jcefmaven' }
}
configurations {
developmentOnly
runtimeClasspath {
extendsFrom developmentOnly
}
}
spotless {
java {
target 'src/**/java/**/*.java'
targetExclude 'src/main/resources/static/**'
googleJavaFormat(googleJavaFormatVersion).aosp().reorderImports(false)
importOrder("java", "javax", "org", "com", "net", "io", "jakarta", "lombok", "me", "stirling")
toggleOffOn()
trimTrailingWhitespace()
leadingTabsToSpaces()
endWithNewline()
}
yaml {
target '**/*.yml', '**/*.yaml'
targetExclude 'src/main/resources/static/**'
trimTrailingWhitespace()
leadingTabsToSpaces()
endWithNewline()
}
format 'gradle', {
target '**/gradle/*.gradle', '**/*.gradle'
targetExclude 'src/main/resources/static/**'
trimTrailingWhitespace()
leadingTabsToSpaces()
endWithNewline()
}
}
dependencies {
if (System.getenv('STIRLING_PDF_DESKTOP_UI') != 'false'
|| (project.hasProperty('STIRLING_PDF_DESKTOP_UI')
&& project.getProperty('STIRLING_PDF_DESKTOP_UI') != 'false')) {
implementation 'org.openjfx:javafx-controls:21'
implementation 'org.openjfx:javafx-swing:21'
}
if (System.getenv('DISABLE_ADDITIONAL_FEATURES') != 'true'
|| (project.hasProperty('DISABLE_ADDITIONAL_FEATURES')
&& System.getProperty('DISABLE_ADDITIONAL_FEATURES') != 'true')) {
implementation project(':proprietary')
}
implementation project(':common')
implementation 'org.springframework.boot:spring-boot-starter-jetty'
implementation 'com.posthog.java:posthog:1.2.0'
implementation 'commons-io:commons-io:2.21.0'
implementation "org.bouncycastle:bcprov-jdk18on:$bouncycastleVersion"
implementation "org.bouncycastle:bcpkix-jdk18on:$bouncycastleVersion"
implementation 'io.micrometer:micrometer-core:1.16.0'
implementation 'com.google.zxing:core:3.5.4'
implementation "org.commonmark:commonmark:$commonmarkVersion" // https://mvnrepository.com/artifact/org.commonmark/commonmark
implementation "org.commonmark:commonmark-ext-gfm-tables:$commonmarkVersion"
// General PDF dependencies
implementation "org.apache.pdfbox:preflight:$pdfboxVersion"
implementation "org.apache.pdfbox:xmpbox:$pdfboxVersion"
implementation 'org.verapdf:validation-model:1.28.2'
// veraPDF still uses javax.xml.bind, not the new jakarta namespace
implementation 'javax.xml.bind:jaxb-api:2.3.1'
implementation 'com.sun.xml.bind:jaxb-impl:2.3.9'
implementation 'com.sun.xml.bind:jaxb-core:2.3.0.1'
// https://mvnrepository.com/artifact/technology.tabula/tabula
implementation ('technology.tabula:tabula:1.0.5') {
exclude group: 'org.slf4j', module: 'slf4j-simple'
exclude group: 'org.bouncycastle', module: 'bcprov-jdk15on'
exclude group: 'com.google.code.gson', module: 'gson'
}
implementation 'org.apache.pdfbox:jbig2-imageio:3.0.4'
implementation 'com.opencsv:opencsv:5.12.0' // https://mvnrepository.com/artifact/com.opencsv/opencsv
// Batik
implementation 'org.apache.xmlgraphics:batik-all:1.19'
// TwelveMonkeys
runtimeOnly "com.twelvemonkeys.imageio:imageio-batik:$imageioVersion"
runtimeOnly "com.twelvemonkeys.imageio:imageio-bmp:$imageioVersion"
runtimeOnly "com.twelvemonkeys.imageio:imageio-jpeg:$imageioVersion"
runtimeOnly "com.twelvemonkeys.imageio:imageio-tiff:$imageioVersion"
runtimeOnly "com.twelvemonkeys.imageio:imageio-webp:$imageioVersion"
// runtimeOnly "com.twelvemonkeys.imageio:imageio-hdr:$imageioVersion"
// runtimeOnly "com.twelvemonkeys.imageio:imageio-icns:$imageioVersion"
// runtimeOnly "com.twelvemonkeys.imageio:imageio-iff:$imageioVersion"
// runtimeOnly "com.twelvemonkeys.imageio:imageio-pcx:$imageioVersion@
// runtimeOnly "com.twelvemonkeys.imageio:imageio-pict:$imageioVersion"
// runtimeOnly "com.twelvemonkeys.imageio:imageio-pnm:$imageioVersion"
runtimeOnly "com.twelvemonkeys.imageio:imageio-psd:$imageioVersion"
// runtimeOnly "com.twelvemonkeys.imageio:imageio-sgi:$imageioVersion"
// runtimeOnly "com.twelvemonkeys.imageio:imageio-tga:$imageioVersion"
// runtimeOnly "com.twelvemonkeys.imageio:imageio-thumbsdb:$imageioVersion"
// runtimeOnly "com.twelvemonkeys.imageio:imageio-xwd:$imageioVersion"
developmentOnly 'org.springframework.boot:spring-boot-devtools'
}
sourceSets {
main {
resources {
srcDirs += ['../configs']
}
java {
if (System.getenv('STIRLING_PDF_DESKTOP_UI') == 'false') {
exclude 'stirling/software/SPDF/UI/impl/**'
}
}
}
test {
java {
if (System.getenv('STIRLING_PDF_DESKTOP_UI') == 'false') {
exclude 'stirling/software/SPDF/UI/impl/**'
}
}
}
}
// Disable regular jar
jar {
enabled = false
}
// Configure and enable bootJar for this project
bootJar {
enabled = true
duplicatesStrategy = DuplicatesStrategy.EXCLUDE
zip64 = true
// Don't include all dependencies directly like the old jar task did
// from {
// configurations.runtimeClasspath.collect { it.isDirectory() ? it : zipTree(it) }
// }
// Exclude signature files to prevent "Invalid signature file digest" errors
exclude 'META-INF/*.SF'
exclude 'META-INF/*.DSA'
exclude 'META-INF/*.RSA'
exclude 'META-INF/*.EC'
manifest {
attributes(
'Implementation-Title': 'Stirling-PDF',
'Implementation-Version': project.version
)
}
}
// Configure main class for Spring Boot
springBoot {
mainClass = 'stirling.software.SPDF.SPDFApplication'
}
// Frontend build tasks - only enabled with -PbuildWithFrontend=true
def buildWithFrontend = project.hasProperty('buildWithFrontend') && project.property('buildWithFrontend') == 'true'
def frontendDir = file('../../frontend')
def frontendDistDir = file('../../frontend/dist')
def resourcesStaticDir = file('src/main/resources/static')
def generatedFrontendPaths = [
'assets',
'index.html',
'locales',
'Login',
'classic-logo',
'modern-logo',
'og_images',
'samples',
'manifest-classic.json'
]
tasks.register('npmInstall', Exec) {
enabled = buildWithFrontend
group = 'frontend'
description = 'Install frontend dependencies'
workingDir frontendDir
commandLine = Os.isFamily(Os.FAMILY_WINDOWS) ? ['cmd', '/c', 'npm', 'ci', '--prefer-offline'] : ['npm', 'ci', '--prefer-offline']
inputs.file(new File(frontendDir, 'package.json'))
inputs.file(new File(frontendDir, 'package-lock.json'))
outputs.dir(new File(frontendDir, 'node_modules'))
// Show live output
standardOutput = System.out
errorOutput = System.err
// Skip if node_modules exists and is up-to-date
onlyIf {
def nodeModules = new File(frontendDir, 'node_modules')
if (!nodeModules.exists()) {
println "node_modules not found, will install..."
return true
}
def packageJson = new File(frontendDir, 'package.json')
def packageLock = new File(frontendDir, 'package-lock.json')
def isOutdated = nodeModules.lastModified() < packageJson.lastModified() ||
nodeModules.lastModified() < packageLock.lastModified()
if (isOutdated) {
println "package.json or package-lock.json changed, will reinstall..."
} else {
println "node_modules is up-to-date, skipping npm install"
}
return isOutdated
}
doFirst {
println "Installing npm dependencies in ${frontendDir}..."
}
}
tasks.register('npmBuild', Exec) {
enabled = buildWithFrontend
group = 'frontend'
description = 'Build frontend application'
workingDir frontendDir
commandLine = Os.isFamily(Os.FAMILY_WINDOWS) ? ['cmd', '/c', 'npm', 'run', 'build'] : ['npm', 'run', 'build']
dependsOn npmInstall
inputs.dir(new File(frontendDir, 'src'))
inputs.file(new File(frontendDir, 'package.json'))
outputs.dir(frontendDistDir)
// Show live output
standardOutput = System.out
errorOutput = System.err
// Override VITE_API_BASE_URL to use relative paths for production builds
// This ensures JARs work regardless of how they're deployed (direct, proxied, etc.)
environment 'VITE_API_BASE_URL', '/'
doFirst {
println "Building frontend application for production (VITE_API_BASE_URL=/)"
}
}
tasks.register('copyFrontendAssets', Copy) {
enabled = buildWithFrontend
group = 'frontend'
description = 'Copy frontend build to static resources'
dependsOn npmBuild
from(frontendDistDir) {
// Exclude files that conflict with backend static resources
exclude 'robots.txt' // Backend already has this
exclude 'favicon.ico' // Backend already has this
}
into resourcesStaticDir
duplicatesStrategy = DuplicatesStrategy.INCLUDE // Let frontend overwrite when needed
doFirst {
println "Copying frontend build from ${frontendDistDir} to ${resourcesStaticDir}..."
println "Backend static resources will be preserved"
}
doLast {
println "Frontend assets copied successfully!"
}
}
tasks.register('cleanFrontendAssets', Delete) {
group = 'frontend'
description = 'Remove previously generated frontend assets from static resources'
delete generatedFrontendPaths.collect { new File(resourcesStaticDir, it) }
}
tasks.register('copyApiLandingPage', Copy) {
group = 'frontend'
description = 'Copy API landing page to index.html for backend-only mode'
from(new File(resourcesStaticDir, 'api-landing.html'))
into(resourcesStaticDir)
rename('api-landing.html', 'index.html')
dependsOn cleanFrontendAssets
doFirst {
println "Copying API landing page to index.html for backend-only mode..."
}
}
// Ensure copyFrontendAssets runs after spotless tasks
tasks.named('copyFrontendAssets').configure {
mustRunAfter tasks.matching { it.name.startsWith('spotless') }
}
if (buildWithFrontend) {
println "Frontend build enabled - JAR will include React frontend"
processResources.dependsOn copyFrontendAssets
} else {
println "Frontend build disabled - JAR will be backend-only with API landing page"
// When not building the UI, ensure any stale frontend assets are removed and use API landing page
processResources.dependsOn copyApiLandingPage
}
bootJar.dependsOn ':common:jar'
bootJar.dependsOn ':proprietary:jar'