Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Remove tubeup host IP address from info.json prior to IA upload #192

Open
brandongalbraith opened this issue Apr 3, 2022 · 6 comments
Assignees
Labels

Comments

@brandongalbraith
Copy link
Collaborator

brandongalbraith commented Apr 3, 2022

Previous PR on the topic: #146

Had a bit of an epiphany on this topic while working on another problem; we shouldn't attempt to pattern match for an IP. We should wholesale parse the video URL in info.json and then remove the ip query param in the url key that contains the temporary video URL generated.

image

Stackoverflow link on the topic with hints on code for the solution for my notes: https://stackoverflow.com/questions/7734569/how-do-i-remove-a-query-string-from-url-using-python

@vxbinaca
Copy link
Collaborator

I like this solution, said a variant of it a while back, just drop that element. We don't need it even for diagnosing problems on our end. YT-DLP would need that though, but we don't.

@vxbinaca
Copy link
Collaborator

vxbinaca commented Jun 13, 2022

also I think YTDL guys did the filename thing to try to see if filesystem limits would trigger a failed rip. So like "()" in NTFS are illegal, as are raw Twitter video dumps because the extractor puts the content of the tweet into the filename which can break on both NTFS and EXT4 (this is because of filename length). You'd have to filter output to 'extractor-videoID'.

Anyway, we don't need that diagnostic information. I've always thought dropping those elements in the JSON and keeping the rest was good.

Thanks, Brandon.

@vxbinaca
Copy link
Collaborator

We've discussed this with Pukaden and theres no satisfactory way to sanitize IPs from the JSON without negativly affecting the video description or other false flags.

Yes there is a secondary copy of the description but I still don't like it's touching the JSON that messes with IAs bots that fix items.

I move to close as not feasibly fixable @brandongalbraith what do you think?

@brandongalbraith
Copy link
Collaborator Author

@vxbinaca Let me spike this out next week and if I don't get any traction in short order, we'll close it out as WONTFIX.

@brandongalbraith
Copy link
Collaborator Author

My apologies to all, I have not had the bandwidth to work on this. Please consider this comment a Request for Contributions; contributors can propose a fiat amount for the contribution and I will consider reasonable offers.

@vxbinaca
Copy link
Collaborator

we have lives it's fine

drzraf added a commit to drzraf/tubeup that referenced this issue Sep 17, 2023
This reverts commit 0911477.

urllib3 is needed to postprocess/parse/URL for sanitization and privacy purpose (bibanon#192)
vxbinaca pushed a commit that referenced this issue Sep 18, 2023
* Revert "Remove urllib3 dependency"

This reverts commit 0911477.

urllib3 is needed to postprocess/parse/URL for sanitization and privacy purpose (#192)

* IA currently leaks the IP address of the submitter. This is bad.

We fix this by carefully redacting the IP address in the JSON fields known to contain it.

* added tests

* flake8 linting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants