You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wanted to share an update on the future of DocuDigger and the changes currently underway.
Current Challenges
The current implementation of the scraping process has become extremely complex and heavily hard-coded, leading to several issues:
Difficulty in adding new websites: The integration process is cumbersome and time-consuming.
Bug fixes are resource-intensive: Addressing issues often requires significant effort.
Unmaintainable codebase: Over time, the code has become hard to extend and maintain.
The Plan: A Complete Rework with Improved Architecture
To address these challenges, I’ve started working on a full rework of the project. The new approach aims to make scraping more flexible and manageable. Here's an outline of the upcoming changes:
1. JSON Definitions for Generic Actions
Instead of specializing everything for individual websites, there will be generic, composable actions.
These actions will be defined in JSON and modeled together into a complete scraping process.
This architecture will make it easier to implement a future modeling interface and enabling to create a "store" where you can get precomposed scrapes for known websites or to add your own scrape.
2. Switch to a NestJS API
The project will transition to a NestJS-based API, providing a robust backend structure that simplifies building future interfaces or apps.
Since NestJS supports CLIs through Nest-Commander, we will be removing OClif from the project.
As a result, the NPM package will no longer exist in its current form.
3. Service-Based Structure
The specialized scraping logic is already being refactored into services.
Eventually, we may further extract these components for better modularity and maintainability.
I’m excited about this new direction and would love to hear your thoughts, feedback, or suggestions. Stay tuned for more updates!
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
What's Next for DocuDigger? 🚀
Hi everyone,
I wanted to share an update on the future of DocuDigger and the changes currently underway.
Current Challenges
The current implementation of the scraping process has become extremely complex and heavily hard-coded, leading to several issues:
The Plan: A Complete Rework with Improved Architecture
To address these challenges, I’ve started working on a full rework of the project. The new approach aims to make scraping more flexible and manageable. Here's an outline of the upcoming changes:
1. JSON Definitions for Generic Actions
2. Switch to a NestJS API
Nest-Commander
, we will be removing OClif from the project.3. Service-Based Structure
I’m excited about this new direction and would love to hear your thoughts, feedback, or suggestions. Stay tuned for more updates!
Beta Was this translation helpful? Give feedback.
All reactions