clean up and more (#2756)

# Description of Changes This PR introduces multiple updates across various files and workflows: ### **What was changed:** 1. **Deleted Scripts:** - `check_duplicates.py`: Removed script that checked for duplicate keys in properties files. - `check_tabulator.py`: Removed script that ensured no tabulators existed in HTML, CSS, or JS files. 2. **Updated GitHub Actions Workflow (`pre_commit.yml`):** - Added a weekly schedule trigger (`cron`) for the pre-commit workflow. - Updated the `create-pull-request` action to exclude certain files (`.github/workflows/.*`) from formatting. - Improved detection and handling of staged changes during commit creation. 3. **`.pre-commit-config.yaml`:** - Adjusted regex for file matching in `ruff` and `codespell` hooks to ensure better file filtering. - Removed local hooks that relied on deleted scripts. 4. **Scripts (`counter_translation.py`):** - Updated file writing methods to enforce consistent newline characters (`newline="\n"`). ### **Why the change was made:** - To simplify the repository by removing unnecessary or outdated scripts (`check_duplicates.py` and `check_tabulator.py`). - To enhance the workflow automation by introducing a scheduled run for pre-commit checks. - To improve code formatting and file consistency by addressing newline character issues and refining file exclusions in `pre-commit`. ### **Challenges encountered:** - Ensuring that all references to deleted scripts were properly removed from configuration files. - Verifying that workflow and pre-commit changes do not introduce regressions in existing automation. Closes # (issue_number) --- ## Checklist ### General - [x] I have read the [Contribution Guidelines](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/CONTRIBUTING.md) - [x] I have read the [Stirling-PDF Developer Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/DeveloperGuide.md) (if applicable) - [x] I have read the [How to add new languages to Stirling-PDF](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/HowToAddNewLanguage.md) (if applicable) - [x] I have performed a self-review of my own code - [x] My changes generate no new warnings ### Documentation - [ ] I have updated relevant docs on [Stirling-PDF's doc repo](https://github.com/Stirling-Tools/Stirling-Tools.github.io/blob/main/docs/) (if functionality has heavily changed) - [ ] I have read the section [Add New Translation Tags](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/HowToAddNewLanguage.md#add-new-translation-tags) (for new translation tags only) ### UI Changes (if applicable) - [ ] Screenshots or videos demonstrating the UI changes are attached (e.g., as comments or direct attachments in the PR) ### Testing (if applicable) - [x] I have tested my changes locally. Refer to the [Testing Guide](https://github.com/Stirling-Tools/Stirling-PDF/blob/main/DeveloperGuide.md#6-testing) for more details.
2025-06-30 01:16:39 +02:00 · 2025-01-21 12:09:01 +01:00 · 2025-01-21 12:09:01 +01:00 · 05add001fb
commit 05add001fb
parent abc3ff3529
5 changed files with 9 additions and 159 deletions
--- a/.github/scripts/check_duplicates.py
+++ b/.github/scripts/check_duplicates.py
@ -1,51 +0,0 @@
-import sys
-
-
-def find_duplicate_keys(file_path):
-    """
-    Finds duplicate keys in a properties file and returns their occurrences.
-
-    This function reads a properties file, identifies any keys that occur more than
-    once, and returns a dictionary with these keys and the line numbers of their occurrences.
-
-    Parameters:
-    file_path (str): The path to the properties file to be checked.
-
-    Returns:
-    dict: A dictionary where each key is a duplicated key in the file, and the value is a list
-          of line numbers where the key occurs.
-    """
-    with open(file_path, "r", encoding="utf-8") as file:
-        lines = file.readlines()
-
-    keys = {}
-    duplicates = {}
-
-    for line_number, line in enumerate(lines, start=1):
-        line = line.strip()
-        if line and not line.startswith("#") and "=" in line:
-            key = line.split("=", 1)[0].strip()
-            if key in keys:
-                # If the key already exists, add the current line number
-                duplicates.setdefault(key, []).append(line_number)
-                # Also add the first instance of the key if not already done
-                if keys[key] not in duplicates[key]:
-                    duplicates[key].insert(0, keys[key])
-            else:
-                # Store the line number of the first instance of the key
-                keys[key] = line_number
-
-    return duplicates
-
-
-if __name__ == "__main__":
-    failed = False
-    for ar in sys.argv[1:]:
-        duplicates = find_duplicate_keys(ar)
-        if duplicates:
-            for key, lines in duplicates.items():
-                lines_str = ", ".join(map(str, lines))
-                print(f"{key} duplicated in {ar} on lines {lines_str}")
-                failed = True
-    if failed:
-        sys.exit(1)
--- a/.github/scripts/check_tabulator.py
+++ b/.github/scripts/check_tabulator.py
@ -1,85 +0,0 @@
-"""check_tabulator.py"""
-
-import argparse
-import sys
-
-
-def check_tabs(file_path):
-    """
-    Checks for tabs in the specified file.
-
-    Args:
-        file_path (str): The path to the file to be checked.
-
-    Returns:
-        bool: True if tabs are found, False otherwise.
-    """
-    with open(file_path, "r", encoding="utf-8") as file:
-        content = file.read()
-
-    if "\t" in content:
-        print(f"Tab found in {file_path}")
-        return True
-    return False
-
-
-def replace_tabs_with_spaces(file_path, replace_with="  "):
-    """
-    Replaces tabs with a specified number of spaces in the file.
-
-    Args:
-        file_path (str): The path to the file where tabs will be replaced.
-        replace_with (str): The character(s) to replace tabs with. Defaults to two spaces.
-    """
-    with open(file_path, "r", encoding="utf-8") as file:
-        content = file.read()
-
-    updated_content = content.replace("\t", replace_with)
-
-    with open(file_path, "w", encoding="utf-8") as file:
-        file.write(updated_content)
-
-
-def main():
-    """
-    Main function to replace tabs with spaces in the provided files.
-    The replacement character and files to check are taken from command line arguments.
-    """
-    # Create ArgumentParser instance
-    parser = argparse.ArgumentParser(
-        description="Replace tabs in files with specified characters."
-    )
-
-    # Define optional argument `--replace_with`
-    parser.add_argument(
-        "--replace_with",
-        default="  ",
-        help="Character(s) to replace tabs with. Default is two spaces.",
-    )
-
-    # Define argument for file paths
-    parser.add_argument("files", metavar="FILE", nargs="+", help="Files to process.")
-
-    # Parse arguments
-    args = parser.parse_args()
-
-    # Extract replacement characters and files from the parsed arguments
-    replace_with = args.replace_with
-    files_checked = args.files
-
-    error = False
-
-    for file_path in files_checked:
-        if check_tabs(file_path):
-            replace_tabs_with_spaces(file_path, replace_with)
-            error = True
-
-    if error:
-        print("Error: Originally found tabs in HTML files, now replaced.")
-        sys.exit(1)
-
-    sys.exit(0)
-
-
-if __name__ == "__main__":
-    main()
--- a/.github/workflows/pre_commit.yml
+++ b/.github/workflows/pre_commit.yml
@ -2,6 +2,8 @@ name: Pre-commit

 on:
  workflow_dispatch:
+  schedule:
+    - cron: "0 0 * * 1"

 permissions:
  contents: read
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@ -6,10 +6,10 @@ repos:
        args:
          - --fix
          - --line-length=127
-        files: ^((.github/scripts|scripts)/.+)?[^/]+\.py$
+        files: ^((\.github/scripts|scripts)/.+)?[^/]+\.py$
        exclude: (split_photos.py)
      - id: ruff-format
-        files: ^((.github/scripts|scripts)/.+)?[^/]+\.py$
+        files: ^((\.github/scripts|scripts)/.+)?[^/]+\.py$
        exclude: (split_photos.py)
  - repo: https://github.com/codespell-project/codespell
    rev: v2.3.0
@ -19,7 +19,7 @@ repos:
          - --ignore-words-list=
          - --skip="./.*,*.csv,*.json,*.ambr"
          - --quiet-level=2
-        files: \.(properties|html|css|js|py|md)$
+        files: \.(html|css|js|py|md)$
        exclude: (.vscode|.devcontainer|src/main/resources|Dockerfile|.*/pdfjs.*|.*/thirdParty.*|bootstrap.*|.*\.min\..*|.*diff\.js)
  - repo: https://github.com/gitleaks/gitleaks
    rev: v8.22.0
@ -35,23 +35,7 @@ repos:
    hooks:
      - id: end-of-file-fixer
        files: ^.*(\.js|\.java|\.py|\.yml)$
-        exclude: ^(.*/pdfjs.*|.*/thirdParty.*|bootstrap.*|.*\.min\..*|.*diff\.js$)
+        exclude: ^(.*/pdfjs.*|.*/thirdParty.*|bootstrap.*|.*\.min\..*|.*diff\.js|\.github/workflows/.*$)
      - id: trailing-whitespace
        files: ^.*(\.js|\.java|\.py|\.yml)$
-        exclude: ^(.*/pdfjs.*|.*/thirdParty.*|bootstrap.*|.*\.min\..*|.*diff\.js$)
-
-  - repo: local
-    hooks:
-      - id: check-duplicate-properties-keys
-        name: Check Duplicate Properties Keys
-        entry: python .github/scripts/check_duplicates.py
-        language: python
-        files: ^(src)/.+\.properties$
-      - id: check-html-tabs
-        name: Check HTML for tabs
-        description: Ensures HTML/CSS/JS files do not contain tab characters
-        # args: ["--replace_with=  "]
-        entry: python .github/scripts/check_tabulator.py
-        language: python
-        exclude: ^(.*/pdfjs.*|.*/thirdParty.*|bootstrap.*|.*\.min\..*|.*diff\.js$)
-        files: ^.*(\.html|\.css|\.js)$
+        exclude: ^(.*/pdfjs.*|.*/thirdParty.*|bootstrap.*|.*\.min\..*|.*diff\.js|\.github/workflows/.*$)
--- a/scripts/counter_translation.py
+++ b/scripts/counter_translation.py
@ -75,7 +75,7 @@ def write_readme(progress_list: list[tuple[str, int]]) -> None:
                        f"![{value}%](https://geps.dev/progress/{value})",
                    )

-    with open("README.md", "w", encoding="utf-8") as file:
+    with open("README.md", "w", encoding="utf-8", newline="\n") as file:
        file.writelines(content)


@ -196,7 +196,7 @@ def compare_files(
            )
        )
    ignore_translation = convert_to_multiline(sort_ignore_translation)
-    with open(ignore_translation_file, "w", encoding="utf-8") as file:
+    with open(ignore_translation_file, "w", encoding="utf-8", newline="\n") as file:
        file.write(tomlkit.dumps(ignore_translation))

    unique_data = list(set(result_list))