Stirling-PDF/HowToUseOCR.md
Tarun Kumar S 2fe39cc85d
Touch up tests via TestDriver (#2243)
* use transform instead of change

* disable workflows

* updates expect commands as well

* try "highlighted"

* update versions

* update workflow files

* Remove pro badge if enabled

* add some wait time

* Fix metricCollection

* Update PostHogService.java

* Update messages_de_DE.properties (#2070)

* Update messages_de_DE.properties

Completed translations for German language.

* Update messages_de_DE.properties

* Update messages_it_IT.properties (#2077)

* extract and apply the image orientation from exif data in imageToPdf (#2073)

* Update 3rd Party Licenses (#2080)

Signed-off-by: GitHub Action <action@github.com>
Co-authored-by: GitHub Action <action@github.com>

* visual certificate signing (#2084)

add visual digital signature

* added some missing translations (#2085)

* Auto detect presence of external dependencies (LibreOffice etc) and disable/enable features dynamically (#2082)

* Create ExternalAppDepConfig.java

* Update EndpointConfiguration.java

* Hardening suggestions for Stirling-PDF / ExternalAppDepConfig (#2083)

Switch order of literals to prevent NullPointerException

Co-authored-by: pixeebot[bot] <104101892+pixeebot[bot]@users.noreply.github.com>

---------

Co-authored-by: pixeebot[bot] <104101892+pixeebot[bot]@users.noreply.github.com>

* 📝 Update README: Translation Progress Table (#2072)

📝 Sync README
> Made via sync_files.yml

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Spanish translate (#2102)

* Spanish translate

* Added blank line

---------

Co-authored-by: Manu <manuel@fusiontelecom.co>

* Optimierung der SAML2-Integration und Verbesserung der Zertifikats- und Fehlerbehandlung (#2105)

* certificate processing

* Hides dialog when provider list is empty

* removed: unused

* Modernize and secure temp file creation (#2106)

Co-authored-by: pixeebot[bot] <104101892+pixeebot[bot]@users.noreply.github.com>

* Bump springBootVersion from 3.3.4 to 3.3.5 (#2117)

Bumps `springBootVersion` from 3.3.4 to 3.3.5.

Updates `org.springframework.boot:spring-boot-starter-web` from 3.3.4 to 3.3.5
- [Release notes](https://github.com/spring-projects/spring-boot/releases)
- [Commits](https://github.com/spring-projects/spring-boot/compare/v3.3.4...v3.3.5)

Updates `org.springframework.boot:spring-boot-starter-jetty` from 3.3.4 to 3.3.5
- [Release notes](https://github.com/spring-projects/spring-boot/releases)
- [Commits](https://github.com/spring-projects/spring-boot/compare/v3.3.4...v3.3.5)

Updates `org.springframework.boot:spring-boot-starter-thymeleaf` from 3.3.4 to 3.3.5
- [Release notes](https://github.com/spring-projects/spring-boot/releases)
- [Commits](https://github.com/spring-projects/spring-boot/compare/v3.3.4...v3.3.5)

Updates `org.springframework.boot:spring-boot-starter-security` from 3.3.4 to 3.3.5
- [Release notes](https://github.com/spring-projects/spring-boot/releases)
- [Commits](https://github.com/spring-projects/spring-boot/compare/v3.3.4...v3.3.5)

Updates `org.springframework.boot:spring-boot-starter-data-jpa` from 3.3.4 to 3.3.5
- [Release notes](https://github.com/spring-projects/spring-boot/releases)
- [Commits](https://github.com/spring-projects/spring-boot/compare/v3.3.4...v3.3.5)

Updates `org.springframework.boot:spring-boot-starter-oauth2-client` from 3.3.4 to 3.3.5
- [Release notes](https://github.com/spring-projects/spring-boot/releases)
- [Commits](https://github.com/spring-projects/spring-boot/compare/v3.3.4...v3.3.5)

Updates `org.springframework.boot:spring-boot-starter-test` from 3.3.4 to 3.3.5
- [Release notes](https://github.com/spring-projects/spring-boot/releases)
- [Commits](https://github.com/spring-projects/spring-boot/compare/v3.3.4...v3.3.5)

Updates `org.springframework.boot:spring-boot-starter-actuator` from 3.3.4 to 3.3.5
- [Release notes](https://github.com/spring-projects/spring-boot/releases)
- [Commits](https://github.com/spring-projects/spring-boot/compare/v3.3.4...v3.3.5)

Updates `org.springframework.boot:spring-boot-devtools` from 3.3.4 to 3.3.5
- [Release notes](https://github.com/spring-projects/spring-boot/releases)
- [Commits](https://github.com/spring-projects/spring-boot/compare/v3.3.4...v3.3.5)

---
updated-dependencies:
- dependency-name: org.springframework.boot:spring-boot-starter-web
  dependency-type: direct:production
  update-type: version-update:semver-patch
- dependency-name: org.springframework.boot:spring-boot-starter-jetty
  dependency-type: direct:production
  update-type: version-update:semver-patch
- dependency-name: org.springframework.boot:spring-boot-starter-thymeleaf
  dependency-type: direct:production
  update-type: version-update:semver-patch
- dependency-name: org.springframework.boot:spring-boot-starter-security
  dependency-type: direct:production
  update-type: version-update:semver-patch
- dependency-name: org.springframework.boot:spring-boot-starter-data-jpa
  dependency-type: direct:production
  update-type: version-update:semver-patch
- dependency-name: org.springframework.boot:spring-boot-starter-oauth2-client
  dependency-type: direct:production
  update-type: version-update:semver-patch
- dependency-name: org.springframework.boot:spring-boot-starter-test
  dependency-type: direct:production
  update-type: version-update:semver-patch
- dependency-name: org.springframework.boot:spring-boot-starter-actuator
  dependency-type: direct:production
  update-type: version-update:semver-patch
- dependency-name: org.springframework.boot:spring-boot-devtools
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump org.springframework.boot from 3.3.4 to 3.3.5 (#2118)

Bumps [org.springframework.boot](https://github.com/spring-projects/spring-boot) from 3.3.4 to 3.3.5.
- [Release notes](https://github.com/spring-projects/spring-boot/releases)
- [Commits](https://github.com/spring-projects/spring-boot/compare/v3.3.4...v3.3.5)

---
updated-dependencies:
- dependency-name: org.springframework.boot
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* 📝 Update README: Translation Progress Table (#2103)

📝 Sync README
> Made via sync_files.yml

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Update 3rd Party Licenses (#2119)

Signed-off-by: GitHub Action <action@github.com>
Co-authored-by: GitHub Action <action@github.com>

* Add new french translations (#2120)

Add new french translations and improve simple quote

* Feature/298 improve compare performance (#2124)

* Implement Diff.js

* Compare feature - add service worker and improve efficiency for large files

* Compare - messages updated to be compatable with language packs

* Compare - Acknowledge Diff.js usage

* Add message warning there is  no text in uploaded pdf to messages file

---------

Co-authored-by: Anthony Stirling <77850077+Frooodle@users.noreply.github.com>

* Update translation files (#2125)

Signed-off-by: GitHub Action <action@github.com>
Co-authored-by: GitHub Action <action@github.com>

* 📝 Update README: Translation Progress Table (#2121)

📝 Sync README
> Made via sync_files.yml

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Fix csrf (#2126)

* apply fix

* Fixes empty th:action

* Update build.gradle

* fix

* formatting

---------

Co-authored-by: Dimitrios Kaitantzidis <james_k23@hotmail.gr>

* Update id_ID Translation and fix some grammars (#2108)

* Update id_ID Translation and fix some grammars

* sync lines to fix build warning

* get back new line at end of file

---------

Co-authored-by: Anthony Stirling <77850077+Frooodle@users.noreply.github.com>

* 📝 Update README: Translation Progress Table (#2129)

📝 Sync README
> Made via sync_files.yml

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Feature/save signs (#2127)

* apply fix

* Fixes empty th:action

* Update build.gradle

* fix

* formatting

* Save signatures

* Fix code scanning alert no. 42: Uncontrolled data used in path expression

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

* fix UserServiceInterface

* Merge branch 'feature/saveSigns' of
git@github.com:Stirling-Tools/Stirling-PDF.git into feature/saveSigns

* 0.31.0 bump and further csrf

* formatting

* preview name

* add

* sign doc

* Update translation files (#2128)

Signed-off-by: GitHub Action <action@github.com>
Co-authored-by: GitHub Action <action@github.com>

---------

Signed-off-by: GitHub Action <action@github.com>
Co-authored-by: Dimitrios Kaitantzidis <james_k23@hotmail.gr>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Co-authored-by: a <a>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: GitHub Action <action@github.com>

* 📝 Update README: Translation Progress Table (#2133)

📝 Sync README
> Made via sync_files.yml

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* 💾 Update Version (#2132)

💾 Sync Versions
> Made via sync_files.yml

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Bump org.springframework.security:spring-security-saml2-service-provider from 6.3.3 to 6.3.4 (#2052)

Bump org.springframework.security:spring-security-saml2-service-provider

Bumps [org.springframework.security:spring-security-saml2-service-provider](https://github.com/spring-projects/spring-security) from 6.3.3 to 6.3.4.
- [Release notes](https://github.com/spring-projects/spring-security/releases)
- [Changelog](https://github.com/spring-projects/spring-security/blob/main/RELEASE.adoc)
- [Commits](https://github.com/spring-projects/spring-security/compare/6.3.3...6.3.4)

---
updated-dependencies:
- dependency-name: org.springframework.security:spring-security-saml2-service-provider
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update 3rd Party Licenses (#2134)

Signed-off-by: GitHub Action <action@github.com>
Co-authored-by: GitHub Action <action@github.com>

* chore(helm): bump chart version according to semver (#2109)

Signed-off-by: Ludovic Ortega <ludovic.ortega@adminafk.fr>

* Update messages_it_IT.properties (#2135)

* [bug fix] Update compress-pdf.html (#2138)

Update compress-pdf.html

* Update build.gradle

* 💾 Update Version (#2139)

💾 Sync Versions
> Made via sync_files.yml

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Update README.md

* navbar.css: prevent overlapping of elements (#2140)

go-pro-link is overlapping the settings button

* Update pull_request_template.md

* fix signature logo not loading and add option to disable it (#2143)

* fix signature logo not loading and add option to disable it

* Hardening suggestions for Stirling-PDF / fix-sig-logo (#2144)

Modernize and secure temp file creation

Co-authored-by: pixeebot[bot] <104101892+pixeebot[bot]@users.noreply.github.com>

---------

Co-authored-by: pixeebot[bot] <104101892+pixeebot[bot]@users.noreply.github.com>

* Update translation files (#2145)

Signed-off-by: GitHub Action <action@github.com>
Co-authored-by: GitHub Action <action@github.com>

* 📝 Update README: Translation Progress Table (#2136)

📝 Sync README
> Made via sync_files.yml

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Update messages_it_IT.properties (#2146)

* fixed minor bugs in Markdown (#2152)

* re-config labeler & add new labels (#2153)

Co-authored-by: Anthony Stirling <77850077+Frooodle@users.noreply.github.com>

* Fix: redeclaration of const and add: tranlation placeholder for Session Expiry Messages (#2158)

Fix: redeclaration of const

* Fix: Path correction to draggable.js #2154 + little makeup (#2159)

* Fix: Add missing .map file for minified files (#2156)

* Update messages_it_IT.properties (#2161)

* Fix: Auto language detection #2122 (#2148)

* Fix: Auto language detection #2122

* add LanguageService and AdditionalLanguageJsController

* hidden swagger

* 📝 Update README: Translation Progress Table (#2160)

📝 Sync README
> Made via sync_files.yml

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Completed translations for 19 languages using AI (#2164)

Created translations for various languages using AI

* Corrects AI generated translation (#2166)

* Fix: Navbar layout overflow (#2162)

Fix: Navbar layout overflow using Bootstrap class .navbar-expand-xl

Co-authored-by: Harshad Marathe <harshad@DESKTOP-1MNKUHA>

* 📝 Update README: Translation Progress Table (#2165)

📝 Sync README
> Made via sync_files.yml

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* feat: add helm chart github action (#2113)

* feat: add helm chart github action

Signed-off-by: Ludovic Ortega <ludovic.ortega@adminafk.fr>

* fix: remove test branch

Signed-off-by: Ludovic Ortega <ludovic.ortega@adminafk.fr>

* fix: run helm-docs-built after syncing version

Signed-off-by: Ludovic Ortega <ludovic.ortega@adminafk.fr>

* fix: helm repo url

---------

Signed-off-by: Ludovic Ortega <ludovic.ortega@adminafk.fr>

* fix remmeber me (#2184)

* fix remmeber me

* remove uselss comment

* Update translation files (#2185)

Signed-off-by: GitHub Action <action@github.com>
Co-authored-by: GitHub Action <action@github.com>

---------

Signed-off-by: GitHub Action <action@github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: GitHub Action <action@github.com>

* Added input sanitization to fix self-xss issue (#2189)

* Update and improve zh_TW Traditional Chinese locale (#2188)

* 📝 Update README: Translation Progress Table (#2190)

📝 Sync README
> Made via sync_files.yml

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* [Feature] Set Executor Instances limits dynamically from properties (#2193)

* Update 'ProcessExecutor.java' to use dynamic process limits from properties

* Move limits location out of 'application.properties'

* Rename 'SemaphoreLimit' to 'SessionLimit' and bundle with 'Timeout...' into one parent class

* Searchbar in nav auto select, and exe nolonger disable CLI (#2197)

* fix remmeber me

* remove uselss comment

* Update translation files (#2185)

Signed-off-by: GitHub Action <action@github.com>
Co-authored-by: GitHub Action <action@github.com>

* exe no longer disable CLI

---------

Signed-off-by: GitHub Action <action@github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: GitHub Action <action@github.com>
Co-authored-by: a <a>

* Add option to insert blank page between pages in Multi-tool (#2194) (#2201)

* Fix: Card has no favorite icon (#2203)

fixes the bug if the card has no favorite icon

* Rename lint-helm-charts.yml to lint-helm-charts.yml-disabled

* Rename release-helm-charts.yml to release-helm-charts.yml-disabled

* Update build.gradle

* 💾 Update Version (#2204)

💾 Sync Versions
> Made via sync_files.yml

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Update build.gradle

* 💾 Update Version (#2205)

💾 Sync Versions
> Made via sync_files.yml

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Fix: missing opener for View PDF #2206 (#2207)

Fix missing opener for View PDF #2206

* feat: move helm chart to https://github.com/Stirling-Tools/Stirling-PDF-chart (#2208)

* feat: remove helm chart

Signed-off-by: Ludovic Ortega <ludovic.ortega@adminafk.fr>

* feat: mention kubernetes in install doc

Signed-off-by: Ludovic Ortega <ludovic.ortega@adminafk.fr>

---------

Signed-off-by: Ludovic Ortega <ludovic.ortega@adminafk.fr>

* Update Version-groups.md

* Fix: Reading the username based on the login method. (#2211)

* Update messages_ca_CA.properties (#2210)

* Update messages_ca_CA.properties

Partial Catalan Translation Contribution for Stirling PDF

Hi,

I’ve completed a partial Catalan translation for Stirling PDF, covering all strings up to the Pipeline section. I focused on maintaining consistency in terminology to ensure a smooth user experience in Catalan.

* Update messages_ca_CA.properties

Update on Catalan Translation Verification – Test 2 Passed

Hi [Developer’s Name],

I’ve now completed the verification for Test 2 and ensured that all keys in messages_en_GB.properties align with those in messages_ca_CA.properties. The files should now be fully synchronized with no missing or extra keys.

I’ll proceed to re-run the tests to confirm everything is in order.

Please feel free to review the updated pull request, and let me know if there’s anything further you’d like me to adjust.

Thank you for your support!

Best regards,

---------

Co-authored-by: Anthony Stirling <77850077+Frooodle@users.noreply.github.com>

* Update get-info-on-pdf.html #2212

* 📝 Update README: Translation Progress Table (#2214)

📝 Sync README
> Made via sync_files.yml

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Update README.md

* Update HowToUseOCR.md

* Restricting file input to .md files for Markdown to PDF conversion (#2219)

Co-authored-by: Harshad Marathe <harshad@DESKTOP-1MNKUHA>

* Removes references to nonexistent endpoint (#2223)

* Catalan Translation - Stirling PDF String Updates (#2222)

* Update messages_ca_CA.properties

Partial Catalan Translation Contribution for Stirling PDF

Hi,

I’ve completed a partial Catalan translation for Stirling PDF, covering all strings up to the Pipeline section. I focused on maintaining consistency in terminology to ensure a smooth user experience in Catalan.

* Update messages_ca_CA.properties

Update on Catalan Translation Verification – Test 2 Passed

Hi [Developer’s Name],

I’ve now completed the verification for Test 2 and ensured that all keys in messages_en_GB.properties align with those in messages_ca_CA.properties. The files should now be fully synchronized with no missing or extra keys.

I’ll proceed to re-run the tests to confirm everything is in order.

Please feel free to review the updated pull request, and let me know if there’s anything further you’d like me to adjust.

Thank you for your support!

Best regards,

* Catalan Translation - Stirling PDF String Updates

Hi,

I have worked on the Catalan translation for some of the text strings in the Stirling PDF project. Attached are my contributions, which include the relevant strings for various parts of the system. I’ve made a few small adjustments to ensure the translation is as accurate and coherent as possible in technical contexts.

Changes made:
	1.	Translation of strings related to PDF manipulation tools (e.g., Add Watermark, Split PDF, etc.).
	2.	Adjustments of terms for better accuracy, such as using “Eliminar” instead of “Treure” or “Esborrar”.
	3.	Review of technical translations to ensure consistency, such as using “Nombre” instead of “Quantitat” for referring to the number of documents or pages.

Attached are the modified strings for your review:
	•	[Attach the modified strings file]

If you have any questions or need further adjustments, I’m happy to help.

Thank you for your attention and for all your work on the project!

Best regards,

* Catalan Translation - Stirling PDF String Updates

Hi,

I have worked on the Catalan translation for some of the text strings in the Stirling PDF project. Attached are my contributions, which include the relevant strings for various parts of the system. I’ve made a few small adjustments to ensure the translation is as accurate and coherent as possible in technical contexts.

Changes made:
	1.	Translation of strings related to PDF manipulation tools (e.g., Add Watermark, Split PDF, etc.).
	2.	Adjustments of terms for better accuracy, such as using “Eliminar” instead of “Treure” or “Esborrar”.
	3.	Review of technical translations to ensure consistency, such as using “Nombre” instead of “Quantitat” for referring to the number of documents or pages.

Attached are the modified strings for your review:
	•	[Attach the modified strings file]

If you have any questions or need further adjustments, I’m happy to help.

Thank you for your attention and for all your work on the project!

Best regards,

* Catalan Translation - Stirling PDF String Updates

Hi,

I have worked on the Catalan translation for some of the text strings in the Stirling PDF project. Attached are my contributions, which include the relevant strings for various parts of the system. I’ve made a few small adjustments to ensure the translation is as accurate and coherent as possible in technical contexts.

Changes made:
	1.	Translation of strings related to PDF manipulation tools (e.g., Add Watermark, Split PDF, etc.).
	2.	Adjustments of terms for better accuracy, such as using “Eliminar” instead of “Treure” or “Esborrar”.
	3.	Review of technical translations to ensure consistency, such as using “Nombre” instead of “Quantitat” for referring to the number of documents or pages.

Attached are the modified strings for your review:
	•	[Attach the modified strings file]

If you have any questions or need further adjustments, I’m happy to help.

Thank you for your attention and for all your work on the project!

Best regards,

* Catalan Translation - Stirling PDF String Updates

* 📝 Sync README
> Made via sync_files.yml

---------

Co-authored-by: Anthony Stirling <77850077+Frooodle@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* adds missing dependencies in the endpoints (#2224)

* Mention HTTP error 413 in FAQ (#2226)

* Fix canvas crop (#2221)

* WIP: fixes canvas and rect to crop - small problem in smaller screens - neew to fix re render page on resize

* Closes #2209

* Increase watermark coverage to fill page (#2049) (#2220)

* Increase watermark coverage to fill page (#2049)

* Increase watermark coverage to fill page with the new calculation (#2049)

* Update pull_request_template.md

* Setup new docker org stirlingtools/stirling-pdf (#2232)

* Update push-docker.yml

* Update push-docker.yml

* Update push-docker.yml

* Feature/1976/multi tool multiple pages (#2200)

* Multitool - Select multiple pages for rotation tool

* Multitool multi select delete feature

* Multitool multi select UI improvements and big fixes

* Multitool multi select select all and UI improvements

* Multi tool multi select, download selected, clean up and bug fixes

* Comments

* Update buttons for page selection

* Update translation files

Signed-off-by: GitHub Action <action@github.com>

* Multitool multiselect split functionality and UI updates

* Download selected button, additional tooltips

* Update translation files

Signed-off-by: GitHub Action <action@github.com>

* revert CertSignController

* remove material icons

* restore to previous certsigncontroller

* Update CertSignController.java

---------

Signed-off-by: GitHub Action <action@github.com>
Co-authored-by: GitHub Action <action@github.com>
Co-authored-by: Anthony Stirling <77850077+Frooodle@users.noreply.github.com>

* 📝 Update README: Translation Progress Table (#2236)

📝 Sync README
> Made via sync_files.yml

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Fix: Ensure backend receives false when checkbox is unchecked in split-pdf-by-chapters feature (#2234)

* Implemented hidden input tags to resolve issue with file input handling

* Cleanup: Remove log statements for production readiness

---------

Co-authored-by: Harshad Marathe <harshad@DESKTOP-1MNKUHA>
Co-authored-by: Anthony Stirling <77850077+Frooodle@users.noreply.github.com>

* Update the Tests (#2066)

* use transform instead of change

* disable workflows

* updates expect commands as well

* try "highlighted"

* update versions

* update workflow files

* Remove pro badge if enabled

* add some wait time

---------

Co-authored-by: Anthony Stirling <77850077+Frooodle@users.noreply.github.com>
Co-authored-by: a <a>

* try this

---------

Signed-off-by: GitHub Action <action@github.com>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Ludovic Ortega <ludovic.ortega@adminafk.fr>
Co-authored-by: Anthony Stirling <77850077+Frooodle@users.noreply.github.com>
Co-authored-by: a <a>
Co-authored-by: Corbinian Grimm <23664150+pixma140@users.noreply.github.com>
Co-authored-by: albanobattistella <34811668+albanobattistella@users.noreply.github.com>
Co-authored-by: Eric <71648843+sbplat@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: GitHub Action <action@github.com>
Co-authored-by: swanemar <107953493+swanemar@users.noreply.github.com>
Co-authored-by: pixeebot[bot] <104101892+pixeebot[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Manuel Mora Gordillo <manuito@gmail.com>
Co-authored-by: Manu <manuel@fusiontelecom.co>
Co-authored-by: Ludy <Ludy87@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Florian Fish <florian@poissonmail.fr>
Co-authored-by: reecebrowne <74901996+reecebrowne@users.noreply.github.com>
Co-authored-by: Dimitrios Kaitantzidis <james_k23@hotmail.gr>
Co-authored-by: Rania Amina <reaamina@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Co-authored-by: Ludovic Ortega <ludovic.ortega@adminafk.fr>
Co-authored-by: Philip H. <47042125+pheiduck@users.noreply.github.com>
Co-authored-by: Saud Fatayerji <Sf298@users.noreply.github.com>
Co-authored-by: MaratheHarshad <97970262+MaratheHarshad@users.noreply.github.com>
Co-authored-by: Harshad Marathe <harshad@DESKTOP-1MNKUHA>
Co-authored-by: ninjat <hotanya.r@gmail.com>
Co-authored-by: Peter Dave Hello <hsu@peterdavehello.org>
Co-authored-by: Rafael Encinas <rafael.encinas@encora.com>
Co-authored-by: Renan <82916964+thisisrenan@users.noreply.github.com>
Co-authored-by: leo-jmateo <128976497+leo-jmateo@users.noreply.github.com>
Co-authored-by: S. Neuhaus <neuhaus@users.noreply.github.com>
Co-authored-by: Dimitris Kaitantzidis <44621809+DimK10@users.noreply.github.com>
2024-11-15 13:06:09 +00:00

3.5 KiB

OCR Language Packs and Setup

This document provides instructions on how to add additional language packs for the OCR tab in Stirling-PDF, both inside and outside of Docker.

My OCR used to work and now doesn't!

The paths have changed for the tessdata locations on new Docker images. Please use /usr/share/tessdata (Others should still work for backward compatibility but might not).

How does the OCR Work

Stirling-PDF uses OCRmyPDF, which in turn uses Tesseract for its text recognition. All credit goes to them for this awesome work!

Language Packs

Tesseract OCR supports a variety of languages. You can find additional language packs in the Tesseract GitHub repositories:

  • tessdata_fast: These language packs are smaller and faster to load but may provide lower recognition accuracy.
  • tessdata: These language packs are larger and provide better recognition accuracy, but may take longer to load.

Depending on your requirements, you can choose the appropriate language pack for your use case. By default, Stirling-PDF uses tessdata_fast for English, but this can be replaced.

Installing Language Packs

  1. Download the desired language pack(s) by selecting the .traineddata file(s) for the language(s) you need.
  2. Place the .traineddata files in the Tesseract tessdata directory: /usr/share/tessdata

DO NOT REMOVE EXISTING eng.traineddata, IT'S REQUIRED.

Docker Setup

If you are using Docker, you need to expose the Tesseract tessdata directory as a volume in order to use the additional language packs.

Docker Compose

Modify your docker-compose.yml file to include the following volume configuration:

services:
  your_service_name:
    image: your_docker_image_name
    volumes:
      - /location/of/trainingData:/usr/share/tessdata

Docker Run

Add the following to your existing Docker run command:

-v /location/of/trainingData:/usr/share/tessdata

Non-Docker Setup

If you are not using Docker, you need to install the OCR components, including the ocrmypdf app. You can see the OCRmyPDF install guide.

For Debian-based systems, install languages with this command:

sudo apt update &&\
# All languages
# sudo apt install -y 'tesseract-ocr-*'

# Find languages:
apt search tesseract-ocr-

# View installed languages:
dpkg-query -W tesseract-ocr- | sed 's/tesseract-ocr-//g'

For Fedora:

# All languages
# sudo dnf install -y tesseract-langpack-*

# Find languages:
dnf search -C tesseract-langpack-

# View installed languages:
rpm -qa | grep tesseract-langpack | sed 's/tesseract-langpack-//g'

For Windows:

Ensure ocrmypdf in installed with pip install ocrmypdf

Additional languages must be downloaded manually: Download desired .traineddata files from tessdata or tessdata_fast Place them in the tessdata folder within your Tesseract installation directory (e.g., C:\Program Files\Tesseract-OCR\tessdata)

Verify installation: tesseract --list-langs

You must then edit your /configs/settings.yml and change the system.tessdataDir to match the directory containing lang files

system:
 tessdataDir: C:/Program Files/Tesseract-OCR/tessdata # path to the directory containing the Tessdata files. This setting is relevant for Windows systems. For Windows users, this path should be adjusted to point to the appropriate directory where the Tessdata files are stored.