From 0dd4456ae80ce1f6a96f3ff10fdaec8d02906792 Mon Sep 17 00:00:00 2001 From: Anthony Stirling <77850077+Frooodle@users.noreply.github.com> Date: Tue, 12 Nov 2024 13:31:34 +0000 Subject: [PATCH] Update HowToUseOCR.md --- HowToUseOCR.md | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/HowToUseOCR.md b/HowToUseOCR.md index ecb187c5..6f168111 100644 --- a/HowToUseOCR.md +++ b/HowToUseOCR.md @@ -80,3 +80,23 @@ dnf search -C tesseract-langpack- # View installed languages: rpm -qa | grep tesseract-langpack | sed 's/tesseract-langpack-//g' ``` + +For Windows: + +Ensure ocrmypdf in installed with +``pip install ocrmypdf`` + +Additional languages must be downloaded manually: +Download desired .traineddata files from tessdata or tessdata_fast +Place them in the tessdata folder within your Tesseract installation directory +(e.g., C:\Program Files\Tesseract-OCR\tessdata) + +Verify installation: +``tesseract --list-langs`` + +You must then edit your ``/configs/settings.yml`` and change the system.tessdataDir to match the directory containing lang files +``` +system: + tessdataDir: C:/Program Files/Tesseract-OCR/tessdata # path to the directory containing the Tessdata files. This setting is relevant for Windows systems. For Windows users, this path should be adjusted to point to the appropriate directory where the Tessdata files are stored. +``` +