llm #3580
program4results
started this conversation in
Show and tell
llm
#3580
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
hi all. not sure this functionality is already in Stirling <
im trying to add two features . 1 validate pdf before ie those that are corrupt and those that are of PDF/A, PDF/E, and PDF/X, or password password protected that need special attanti9on 2. merge and split thousands of pdf that pass test 1 above into 200 mb chunks ie notebooklm compliant. I also running into problems with hanging stirling that just bombs out whilst processing 1500 pdf's despite max its ram . I have hundreds of thousands of pdf to process . I am unable to allocate more cpu due to my synology nas sa3400 restriction which has about 100 gb free ram . ie Environment Variables → Add:
JAVA_TOOL_OPTIONS=-Xms8g -Xmx28g -XX:+UseG1GC
STIRLING_PDF_THREADS=12
PIPELINE_MAX_CONCURRENT=6
i also need a browse and add pdf , browse and add pdf picker :// NEW CONTROLLER: src/main/java/stirling/software/SPDF/controller/api/FileBrowserController.java
@RestController
@RequestMapping("/api/v1/file-browser")
@tag(name = "File Browser", description = "Server-side file browsing and PDF validation")
public class FileBrowserController {
}
// NEW SERVICE: src/main/java/stirling/software/SPDF/service/PDFValidationService.java
@service
public class PDFValidationService {
}
// NEW MODELS: src/main/java/stirling/software/SPDF/model/
public class DirectoryListing {
private String currentPath;
private String parentPath;
private List directories = new ArrayList<>();
private List pdfs = new ArrayList<>();
}
public class PDFInfo {
private String name;
private String path;
private long size;
private Date lastModified;
private boolean selected;
}
public class ValidationResult {
private String filePath;
private String fileName;
private boolean overallValid;
private boolean pdfboxValid;
private boolean qpdfValid;
private boolean structureValid;
private int pageCount;
private long fileSize;
private String error;
}
public class BatchProcessRequest {
private List validPdfPaths;
private String outputFolderName;
}
// FRONTEND TEMPLATE: src/main/resources/templates/pdf-folder-browser.html
<title>PDF Folder Browser & Validator</title>PDF Folder Browser & Validator
this in pre revalidation step-
python stirling_preprocessor.py
"/path/to/your/1700/pdfs"
"/volume1/docker/stirling/pipeline/watchedFolders"
--max-size 180
this is teh docker compose
version: '3.8'
services:
stirling-pdf:
image: frooodle/s-pdf:latest
container_name: stirling-pdf-turbo
ports:
- '8092:8080'
networks:
default:
driver: bridge
here is the preprocessor code :
#!/usr/bin/env python3
"""
Stirling PDF Pre-processor
Validates PDFs and prepares them for Stirling pipeline processing
"""
import os
import sys
import logging
import json
import shutil
from pathlib import Path
from typing import List, Tuple, Dict
import subprocess
from datetime import datetime
try:
import fitz # PyMuPDF
PYMUPDF_AVAILABLE = True
except ImportError:
PYMUPDF_AVAILABLE = False
try:
import pikepdf
PIKEPDF_AVAILABLE = True
except ImportError:
PIKEPDF_AVAILABLE = False
class StirlingPreProcessor:
def init(self, source_dir: str, stirling_watch_dir: str, max_size_mb: int = 180):
self.source_dir = Path(source_dir)
self.stirling_watch_dir = Path(stirling_watch_dir)
self.max_size_bytes = max_size_mb * 1024 * 1024
def main():
import argparse
if name == "main":
main()
Beta Was this translation helpful? Give feedback.
All reactions