-
-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: Couple of bugs notices with the usage, listing them down here. #36
Comments
This seems interesting, I've never encountered these patterns before. I'll work on enhancing the parser. edit: Was 'sample Mission Impossible Multi UHD 10bit TrueHD DTOne (1996)/sample Mission Impossible Multi UHD 10bit TrueHD DTOne (1996) - [x265][HDR][5.1].mkv' an actual sample file with a small size, or is it just a naming convention? |
file is sample, but title should not get affected. there are couple of libraries as well, you can check on their implemanttion for ideas, listing here though in js, can easily port into python. |
This seems good but even with this parser we still need to double filter it,
output: { My current approach is to refine our existing parser and explore the possibility of integrating the JavaScript parser for fallback options. Do you have any alternative suggestions? |
I was trying following approach.
I've been trying with multiple filename parsers, overriding with custom regex, but none gives a 100%. Even if we can target above 95%, it's a major win. Another possibility I've tried is to leverage genAi prompts. This is most promising so far, comes with following limitation.
I've also tried to build one from scratch, leveraging pre-trained model with fine tuning couple of last layers, but it require a lot of data, and it's very hard to get clean data. Probably if possible we can do one thing,
Doing this recursively build a large tested regex repo for our usecase. Only thing to be cautious of is to organise these regex, so these can be easily debugged. Probably enforcing named-group capturing in regex !! Let me know if you need assist in planning out the execution of the plan and splitting the work, can help on weekends. Also sharing a dump of filepath I gathered from my repo and couple of my friends. It's very divers, will helps with boosting regex accuracy and stability. |
Thanks! I'll check them out and we can discuss how to proceed with this further . If you have any additional suggestions or updates, reach out on Discord as well. |
- Added logic to handle cases where the result is a tuple. - fixes imdb/tmdb formatting issue by directly appending ID's to movie name Addresses #36
- Add advanced query cleaning function with: * Configurable max word limit * Better handling of TV/movie title variations - Expand show episode pattern matching to support "series.X.YofZ" format - Enhance movie title cleaning with better technical term filtering - Fix proper name propagation in movie processing results Addresses #36
@arao, I've updated a few logics to improve accuracy. Please give these changes a try and let me know if they make a difference. The code is available in the anime-fix branch. If you need a Docker image, you can use sureshfizzy/cinesync:anime-fix. |
** Currently parsed name is used as it is for directory structure. Torrent being torrent, simple case issue cause multiple folder creation for same content, like
** Invalid keyword picked from directory, instead of file name
possible solution could be to skip unnecessary keywords, like sample and all, but require further investigation.
wrong parsing
Can remove non unicode, probably mandrin charactors from file name.
possible solution, anything "[]" at start of title, probably not work picking.
let me know if directory map is require, I can share same. BTW, great initiative to map the media directory, been looking to something similar for some time.
The text was updated successfully, but these errors were encountered: