refactor(tests): move & expand TextFinder/RedactController tests; fix TextFinder empty search-term handling; update token filtering API (#4264)

# Description of Changes

- **What was changed**
  - Relocated and refactored unit tests:
- `TextFinderTest` and `RedactControllerTest` moved under
`app/core/src/test/...` to align with module structure.
- Expanded test coverage: whole-word vs. partial matches, complex
regexes (emails, SSNs, IPs, currency), international/accented
characters, multi-page documents, malformed PDFs, operator preservation,
color decoding, and performance assertions.
  - **API adjustments in redaction flow**:
- `createTokensWithoutTargetText(...)` now accepts the `PDDocument`
alongside `PDPage` to properly manage resources/streams.
- Introduced/used `createPlaceholderWithFont(...)` to maintain text
width with explicit font context.
  - **Bug fix in `TextFinder`**:
- Early-return when the (trimmed) search term is empty to prevent
unnecessary processing and avoid false positives/errors.
- Minor cleanup (removed redundant `super()` call) and improved guard
logic around regex/whole-word wrapping.

- **Why the change was made**
- Improve reliability and determinism of PDF redaction and text finding
by exercising real-world patterns and edge cases.
- Ensure structural PDF operators (graphics/positioning) are preserved
during token filtering.
- Prevent crashes or misleading matches when users provide
empty/whitespace-only search terms.
- Align tests with the current project layout and increase
maintainability.

---

## Checklist

### General

- [x] I have read the [Contribution
Guidelines](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/CONTRIBUTING.md)
- [x] I have read the [Stirling-PDF Developer
Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md)
(if applicable)
- [ ] I have read the [How to add new languages to
Stirling-PDF](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md)
(if applicable)
- [x] I have performed a self-review of my own code
- [x] My changes generate no new warnings

### Documentation

- [ ] I have updated relevant docs on [Stirling-PDF's doc
repo](https://github.com/Stirling-Tools/Stirling-Tools.github.io/blob/main/docs/)
(if functionality has heavily changed)
- [ ] I have read the section [Add New Translation
Tags](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/HowToAddNewLanguage.md#add-new-translation-tags)
(for new translation tags only)

### UI Changes (if applicable)

- [ ] Screenshots or videos demonstrating the UI changes are attached
(e.g., as comments or direct attachments in the PR)

### Testing (if applicable)

- [ ] I have tested my changes locally. Refer to the [Testing
Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/devGuide/DeveloperGuide.md#6-testing)
for more details.
This commit is contained in:
Ludy
2025-08-24 22:20:28 +02:00
committed by GitHub
parent 2baa258e11
commit 9779c75df4
3 changed files with 419 additions and 234 deletions

View File

@@ -27,7 +27,6 @@ public class TextFinder extends PDFTextStripper {
public TextFinder(String searchTerm, boolean useRegex, boolean wholeWordSearch)
throws IOException {
super();
this.searchTerm = searchTerm;
this.useRegex = useRegex;
this.wholeWordSearch = wholeWordSearch;
@@ -68,11 +67,15 @@ public class TextFinder extends PDFTextStripper {
}
String processedSearchTerm = this.searchTerm.trim();
if (processedSearchTerm.isEmpty()) {
super.endPage(page);
return;
}
String regex = this.useRegex ? processedSearchTerm : "\\Q" + processedSearchTerm + "\\E";
if (this.wholeWordSearch) {
if (processedSearchTerm.length() == 1
&& Character.isDigit(processedSearchTerm.charAt(0))) {
regex = "(?<![\\w])" + regex + "(?![\\w])";
regex = "(?<![\\w])(?<!\\d[\\.,])" + regex + "(?![\\w])(?![\\.,]\\d)";
} else if (processedSearchTerm.length() == 1) {
regex = "(?<![\\w])" + regex + "(?![\\w])";
} else {