Commit Graph

8 Commits

Author SHA1 Message Date
Omar Ahmed Hassan
afad06bed4
Extract tables from PDF to CSV using Tabula (#2312)
* Add Tabula dependency and exclude slf4j-simple

- Add tabula-java dependency to extract tables into CSV.
- Exclude slf4j-simple due to Logback

* Add a flexible CSVWriter

- Add FlexibleCSVWriter which extends CSVWriter to pass a custom CSVFormat, as CSVWriter's parameterized constructor (that allows changing CSVFormat) is protected.

* Use Tabula in extracting tables from PDF

- Use Tabula in extracting tables from PDF instead of the existing implementation

* Delete PDFTableStripper as It is unneeded

- Delete PDFTableStripper as It is unneeded as Tabula-Java is used instead.

* Use correct class in ExtractCSVController logger

* Exclude gson and bcprov-jdk15on dependencies from tabula

- Exclude gson and bcprov-jdk15on from tabula-java due to detected security vulnerabilities.
2024-11-23 23:28:44 +00:00
Anthony Stirling
c85463bc18
Frooodle/license (#1994) 2024-10-14 22:34:41 +01:00
Anthony Stirling
2fff3083ae
Update TextFinder.java (#980) 2024-03-26 19:25:16 +00:00
sbplat
53afb865c5 refactor: replace ImageFinder with getAllImages using strategy behind ExtractImagesController 2024-01-29 11:23:58 -05:00
Anthony Stirling
5f771b7851 formatting 2023-12-30 19:11:27 +00:00
Anthony Stirling
0f3df6e92b cleanup imports 2023-08-27 00:39:22 +01:00
Anthony Stirling
cd0e1a3962 redact 2023-08-24 23:23:25 +01:00
Anthony Stirling
1b45ab7222 util move around 2023-05-31 20:15:48 +01:00