Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[[Cohost] but also every other website] Stop putting post text in file names, or at least properly escape it #6262

Open
Scripter17 opened this issue Oct 1, 2024 · 2 comments

Comments

@Scripter17
Copy link
Contributor

While arching cohost I found that some of the posts try to save files with, for example, newlines in the name

This doesn't work as they're not escaped by the default extractor.cohost.filename config value

I think it's just a bad idea to put post text in the file name at all. It's just begging for weird bugs involving path lengths (one of the files was a bunch of 👀 and it was too long to save) and invalid file names

Fixing this is a breaking change but it really should be fixed

Though to prevent redownloading entire accounts #1673 would need to be implemented for people like me who still don't use an archive file for some reason. Still a breaking change but that would make it better

@mikf
Copy link
Owner

mikf commented Oct 1, 2024

newlines in the name
This doesn't work as they're not escaped

But they are: https://gdl-org.github.io/docs/configuration.html#extractor-path-remove
ASCII control characters, including newlines, do get removed by default.

It's just begging for weird bugs involving path lengths

The length is restricted to 100 characters, but I guess it should have been bytes instead.

filename_fmt = ("{postId}_{headline|plainTextBody:?/_/[:100]}"

@mikf
Copy link
Owner

mikf commented Oct 1, 2024

Also, --rename might be useful here:

  --rename FORMAT             Rename previously downloaded files from FORMAT
                              to the current filename format

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants