The paths have changed for the tessdata locations on new Docker images. Please use `/usr/share/tessdata` (Others should still work for backward compatibility but might not).
Stirling-PDF uses [qpdf](https://github.com/qpdf/qpdf), which in turn uses Tesseract for its text recognition. All credit goes to them for this awesome work!
- [tessdata_fast](https://github.com/tesseract-ocr/tessdata_fast): These language packs are smaller and faster to load but may provide lower recognition accuracy.
- [tessdata](https://github.com/tesseract-ocr/tessdata): These language packs are larger and provide better recognition accuracy, but may take longer to load.
Depending on your requirements, you can choose the appropriate language pack for your use case. By default, Stirling-PDF uses `tessdata_fast` for English, but this can be replaced.
If you are not using Docker, you need to install the OCR components, including the `qpdf` app. You can see the [qpdf install guide](https://qpdf.readthedocs.io/en/latest/installation.html).
Download desired .traineddata files from tessdata or tessdata_fast
Place them in the tessdata folder within your Tesseract installation directory
(e.g., C:\Program Files\Tesseract-OCR\tessdata)
Verify installation:
``tesseract --list-langs``
You must then edit your ``/configs/settings.yml`` and change the system.tessdataDir to match the directory containing lang files
```
system:
tessdataDir: C:/Program Files/Tesseract-OCR/tessdata # path to the directory containing the Tessdata files. This setting is relevant for Windows systems. For Windows users, this path should be adjusted to point to the appropriate directory where the Tessdata files are stored.