# Description of Changes refactor(eml-to-pdf): Enhance compliance with PDF/ISO standards and MIME specifications This commit refactors the EML-to-PDF conversion utility to improve standards compliance, implementing requirements from multiple RFCs and ISO specifications: ### Standards Compliance Implemented: • **PDF Standards (ISO 32000-1:2008)**: Added PDF version validation in `attachFilesToPdf()` to ensure 1.7+ compatibility for Unicode file embeddings • **MIME Processing (RFC 2045/2046)**: Implemented case-insensitive MIME type handling in `processPartAdvanced()` with `toLowerCase(Locale.ROOT)` normalization • **Content Encoding (RFC 2047)**: Enhanced `safeMimeDecode()` with UTF-8→ISO-8859-1 charset fallback chains for robust header decoding • **Content-ID Processing (RFC 2392)**: Added proper Content-ID stripping with `replaceAll("[<>]", "")` for embedded image references • **Multipart Safety (RFC 2046)** (best practice, not compliance related): Implemented recursion depth limiting (max 10 levels) • **processMultipartAdvanced()**, setCatalogViewerPreferences used to set PageMode.USE_ATTACHMENTS, but PDF spec 12.2 (Viewer Preferences) requires a /ViewerPreferences dictionary for full control (e.g., /DisplayDocTitle). Docs suggested setting additional prefs like /NonFullScreenPageMode to ensure attachments panel opens reliably across viewers • **addAttachmentAnnotationToPage**, annotations are set to /Invisible=true but must remain interactive. PDF spec 12.5.6.15 (File Attachment Annotations) requires /F flags to control print/view (e.g., NoPrint if not printable). ### Technical Improvements: • **Coordinate System Handling**: Added rotation-aware coordinate transformations in PDF annotation placement following ISO 32000-1 Section 8.3 • **Charset Fallbacks**: Implemented progressive charset detection with UTF-8 primary and ISO-8859-1 fallback in MIME decoding • **Error Resilience**: Enhanced exception handling with specific error types and proper resource cleanup using try-with-resources patterns • **HTML5 Compliance**: Updated email HTML generation with proper DOCTYPE and charset declarations for browser compatibility ### Security & Robustness: • **Input Validation**: Added comprehensive null checks and boundary validation throughout attachment and multipart processing • **XSS Prevention**: All user content now processed through `escapeHtml()` or `CustomHtmlSanitizer` before HTML generation ### Code Quality: • **Method Signatures**: Updated `processMultipartAdvanced()` to include depth parameter for recursion tracking • **Switch Expressions**: Modernized switch statements to use Java 17+ arrow syntax where applicable • **Documentation**: Added inline RFC/ISO references for compliance-critical sections All changes maintain backward compatibility while significantly improving standards adherence. Tested with various EML formats. No major change. No change in tests. No change in aesthetic of the resulting PDF. No change change in "user space" (except when user relied on compliance of aforementioned stuff then a major improvement) <!-- Please provide a summary of the changes, including: - What was changed - Why the change was made - Any challenges encountered Closes #(issue_number) --> --- ## Checklist ### General - [x] I have read the [Contribution Guidelines](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/CONTRIBUTING.md) - [x] I have read the [Stirling-PDF Developer Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md) (if applicable) - [ ] I have read the [How to add new languages to Stirling-PDF](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md) (if applicable) - [x] I have performed a self-review of my own code - [ ] My changes generate no new warnings ### Documentation - [ ] I have updated relevant docs on [Stirling-PDF's doc repo](https://github.com/Stirling-Tools/Stirling-Tools.github.io/blob/main/docs/) (if functionality has heavily changed) - [ ] I have read the section [Add New Translation Tags](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md#add-new-translation-tags) (for new translation tags only) ### UI Changes (if applicable) - [ ] Screenshots or videos demonstrating the UI changes are attached (e.g., as comments or direct attachments in the PR) ### Testing (if applicable) - [x] I have tested my changes locally. Refer to the [Testing Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md#6-testing) for more details. |
||
---|---|---|
.claude | ||
.devcontainer | ||
.github | ||
.vscode | ||
app | ||
devGuide | ||
devTools | ||
docs | ||
exampleYmlFiles | ||
gradle/wrapper | ||
images | ||
scripts | ||
testing | ||
.editorconfig | ||
.git-blame-ignore-revs | ||
.gitattributes | ||
.gitignore | ||
.pre-commit-config.yaml | ||
build.gradle | ||
CONTRIBUTING.md | ||
DATABASE.md | ||
Dockerfile | ||
Dockerfile.dev | ||
Dockerfile.fat | ||
Dockerfile.ultra-lite | ||
gradle.properties | ||
gradlew | ||
gradlew.bat | ||
HowToUseOCR.md | ||
launch4jConfig.xml | ||
LICENSE | ||
README.md | ||
SECURITY.md | ||
settings.gradle |
Stirling-PDF
Stirling-PDF is a robust, locally hosted web-based PDF manipulation tool using Docker. It enables you to carry out various operations on PDF files, including splitting, merging, converting, reorganizing, adding images, rotating, compressing, and more. This locally hosted web application has evolved to encompass a comprehensive set of features, addressing all your PDF requirements.
All files and PDFs exist either exclusively on the client side, reside in server memory only during task execution, or temporarily reside in a file solely for the execution of the task. Any file downloaded by the user will have been deleted from the server by that point.
Homepage: https://stirlingpdf.com
All documentation available at https://docs.stirlingpdf.com/
Features
- 50+ PDF Operations
- Parallel file processing and downloads
- Dark mode support
- Custom download options
- Custom 'Pipelines' to run multiple features in a automated queue
- API for integration with external scripts
- Optional Login and Authentication support (see here for documentation)
- Database Backup and Import (see here for documentation)
- Enterprise features like SSO (see here for documentation)
PDF Features
Page Operations
- View and modify PDFs - View multi-page PDFs with custom viewing, sorting, and searching. Plus, on-page edit features like annotating, drawing, and adding text and images. (Using PDF.js with Joxit and Liberation fonts)
- Full interactive GUI for merging/splitting/rotating/moving PDFs and their pages
- Merge multiple PDFs into a single resultant file
- Split PDFs into multiple files at specified page numbers or extract all pages as individual files
- Reorganize PDF pages into different orders
- Rotate PDFs in 90-degree increments
- Remove pages
- Multi-page layout (format PDFs into a multi-paged page)
- Scale page contents size by set percentage
- Adjust contrast
- Crop PDF
- Auto-split PDF (with physically scanned page dividers)
- Extract page(s)
- Convert PDF to a single page
- Overlay PDFs on top of each other
- PDF to a single page
- Split PDF by sections
Conversion Operations
- Convert PDFs to and from images
- Convert any common file to PDF (using LibreOffice)
- Convert PDF to Word/PowerPoint/others (using LibreOffice)
- Convert HTML to PDF
- Convert PDF to XML
- Convert PDF to CSV
- URL to PDF
- Markdown to PDF
Security & Permissions
- Add and remove passwords
- Change/set PDF permissions
- Add watermark(s)
- Certify/sign PDFs
- Sanitize PDFs
- Auto-redact text
Other Operations
- Add/generate/write signatures
- Split by Size or PDF
- Repair PDFs
- Detect and remove blank pages
- Compare two PDFs and show differences in text
- Add images to PDFs
- Compress PDFs to decrease their filesize (using qpdf)
- Extract images from PDF
- Remove images from PDF
- Extract images from scans
- Remove annotations
- Add page numbers
- Auto-rename files by detecting PDF header text
- OCR on PDF (using Tesseract OCR)
- PDF/A conversion (using LibreOffice)
- Edit metadata
- Flatten PDFs
- Get all information on a PDF to view or export as JSON
- Show/detect embedded JavaScript
📖 Get Started
Visit our comprehensive documentation at docs.stirlingpdf.com for:
- Installation guides for all platforms
- Configuration options
- Feature documentation
- API reference
- Security setup
- Enterprise features
Supported Languages
Stirling-PDF currently supports 40 languages!
Stirling PDF Enterprise
Stirling PDF offers an Enterprise edition of its software. This is the same great software but with added features, support and comforts. Check out our Enterprise docs
🤝 Looking to contribute?
Join our community: