-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace IP addresses with 0.0.0.0 #146
Conversation
@antonizoon @brandongalbraith If neither of you have qualms about this I'm going to merge, I'm a little confused about how it finds the actual value but it does replace it. |
@imabritishcow Would you mind adding a corresponding test for this change? You might also consider replacing the IP address with With regards to the regex pattern, it's a bit unwieldy, but I'm happy to ship a meh solution now and a more refined solution later so that IP addresses of tubeup users isn't being committed permanently to the Internet Archive. |
I have changed it to replace with REDACTED but i'm not really sure how to create a test. |
you need to edit either the Travis.cl code or if we switch to Github tests (ahem anyone else is wanna look over that PR) it should test each PR. |
@imabritishcow Regarding a test for this enhancement, we want to ensure we're confident we're only replacing the user's IP address by using the regex provided. Did you test the provided regex pattern on common IP address formats? |
Yes, I have tested the regex on both IPv4 and IPv6 addresses |
This isn't a great solution, i'll work on something better. |
Appreciate it @imabritishcow 🙏. TubeUp is executed not just from the command line, but is also baked into pipelines and other automated systems folks are running for archival purposes. Definitely want to ensure we're not munging data or erroring out unexpectedly. |
This would have removed any IP addresses in the description or title. I'm working on a solution that will edit only the formats section. |
Good catch! We'll keep an eye out for an updated review when your new commits land. |
Yeah please be really careful with metadata. Half of the uploads from this script are automated (mostly me). I have limited mental bandwidth to vet that a change doesn't have knockon effects down the line. I'm open to a solution but not a board one. Maybe have it look for a value in JSON and snip only that value. |
Improved IP removal and created a new test for it. |
The IPv6 addresses in the file are URL encoded, the regex won't match regular IPv6 addresses but it works on URL encoded IPv6 like FE80%3A0000%3A0000%3A0000%3A0202%3AB3FF%3AFE1E%3A8329 |
@vxbinaca @brandongalbraith What do you think? Should I fix the old solution and use that or use the new solution? |
I think a better way to implement this would be to redact based on a match of the public IP address of the network. A regular expression could end up overwriting something else, and introduces a lot of complexity, especially if we want to extend it to IPv6. |
Why can't you just go in and post-process the JSON, strip out that or whatever other element you want? Or, maybe use tubeup on a VPS like the README says. Or take the privacy L and accept it and run it on your desktop. These seem to be easier solutions than this. I'll keep the issue open but this seems to be a lot of work and a lot of risk of destroying metadata than just dropping elemnets within the JSON entirely, changing behavoir/mindset or just accepting the privacy loss. |
The README doesn't actually tell you to use a VPS. I've personally been using a VPS but it's not straightforward because you have to mess around with the source code if the VPS doesn't have storage for the videos. I rented 100GB from digital ocean but it was mounted elsewhere so I had to change the source code to save the files in the hard drive. I think youtube is the only offender here. Personally I just deleted those fields that contain the ip address and moved on with my life. @XXLuigiMario I don't think that's a bad idea. Though I remember in the past ipinfo restricted the amount of requests you could make. Not sure if they still do that, but something to keep in mind. |
Digital Oceans problem. Tubeup expects |
IP addresses in the info.json is now replaced with 0.0.0.0. Tested with IPv4 and IPv6.
Example

Test uploads
https://archive.org/download/youtube-hh82cgqftgM
https://archive.org/details/youtube-Ti-DCmmS07Q
https://archive.org/details/youtube-sKoUtJpubqc