Skip to content

Conversation

@talanc
Copy link
Contributor

@talanc talanc commented Aug 8, 2021

  • I carefully read the contribution guidelines and agree to them.
  • I have tested the API against NewPipe.
  • I agree to create a pull request for NewPipe as soon as possible to make it compatible with the changed API.

Extractor fix for issue TeamNewPipe/NewPipe#6757

This is the extractor PR for supporting the subscriptions.csv file from Google Takeout.
Also supports selecting a ZIP file (which will in turn find the subscriptions.csv file)
I introduced a new method on SubscriptionExtractor to determine what the stream is (e.g. json, zip, or csv):
List<SubscriptionItem> fromInputStream(@Nonnull final InputStream contentInputStream, String contentType)

Ready for review and happy to make any changes to get this accepted.

Copy link
Member

@Stypox Stypox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this needed usability improvement!


/**
* Extract subscriptions from a Google takout export (the user has to get the JSON out of the zip)
* Extract subscriptions from a Google takeout export (the user has to get the JSON out of the zip)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Extract subscriptions from a Google takeout export (the user has to get the JSON out of the zip)
* Extract subscriptions from a Google takeout export

Comment on lines 124 to 127
public static final String[] ENTRY_PATHS = {
"Takeout/YouTube and YouTube Music/subscriptions/subscriptions.csv",
"Takeout/YouTube y YouTube Music/suscripciones/suscripciones.csv" // There is a <NBSP> between YouTube and Music
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't rely on a list of paths. It will be different for each language. I'd navigate the whole zip searching for a valid file instead.

Copy link

@tbjgolden tbjgolden Aug 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A possible way to find this file is to look for files in the glob Takeout/YouTube*/*/*.csv and find one where the file starts with a match for this regex (this regex is a JS one, unsure of the Java equiv):

/^.*\n([0-9A-Za-z_-]{20,}),[^,]*?\/\1,.+\n/

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ intuition, this regex skips the first line, checks the second line to see if it starts with something that looks like a channel id, then looks for / followed by that same id again followed by a comma ,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, it should work out fine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done--brief summary of changes:

CSV: after 5 lines, checks to make sure it's found subscription items (ensures the 2nd entry is a channel url)
if there's no items, then it exits out of the function, returning an empty list.

ZIP: iterates over every CSV file, gets the subscription items from that CSV file, if there's an error or no items in the list, it goes to the next CSV file in the zip.

@talanc
Copy link
Contributor Author

talanc commented Aug 10, 2021

Thanks for your feedback -- ready for review again.

@opusforlife2 opusforlife2 requested a review from Stypox August 10, 2021 17:03
Copy link
Member

@Stypox Stypox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this approach looks good to me! Have you tested it? Also, please put final in front of every variable you never change after declaration.

@talanc
Copy link
Contributor Author

talanc commented Aug 14, 2021

@Stypox
Testing:
Yes.
It has unit tests for the CSV and ZIP import.
I've manually tested (via an emulator) with two different subscription files.
And I'm sure there will be another round of testing when it gets merged and the UI layer is tested: TeamNewPipe/NewPipe#6882

Final:
I've added final to variables where appropriate.

Copy link
Member

@Stypox Stypox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost good ;-)

Copy link
Member

@Stypox Stypox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good code-wise :-)

Copy link
Member

@litetex litetex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good 😄
Test could be in theory parametrized, however because this is pain in JUnit 4 and creates a bunch of overhead, it's okay for me to go with a foreach-loop.

@talanc
Copy link
Contributor Author

talanc commented Aug 17, 2021

I think this is good to be merged now.

The failed tests are unrelated to my changes (a Peertube test is failing).

@Stypox
Copy link
Member

Stypox commented Aug 24, 2021

@TobiGr please merge this, I don't have permission as apparently snyk failed. And I don't have access to Snyk.

@XiangRongLin
Copy link
Collaborator

@Stypox Synk fails because of outdated jsoup version having a vulnerability. The version was already updated and merged. So if this is rebased, the check should succeed

@Stypox Stypox force-pushed the dev branch 2 times, most recently from d5e74f3 to ca6d64c Compare August 24, 2021 13:42
csv:
Improved error messages
Exits early if it hasnt found any items in the first few lines

zip:
Now checks all CSV files instead of hard-coded paths

final qualifiers for immutable locals and parameters

Co-authored-by: litetex <[email protected]>
@Stypox Stypox changed the title YoutubeSubscriptionExtractor: Support for subscriptions.csv (and zip) [YouTube] Support csv and zip subscription import (Google Takeout) Aug 24, 2021
@Stypox
Copy link
Member

Stypox commented Aug 24, 2021

Rebased, thank you!

@Stypox Stypox merged commit db6c729 into TeamNewPipe:dev Aug 24, 2021
@Simon311
Copy link

Importing the CSV file doesn't work in the latest NewPipe available from F-Droid.
But perhaps more importantly, my Google export only returned 6 channels out of hundreds. I know there's nothing you guys can do about it, but has anyone had this same issue? Is there a way to report this to Google, since this is a clear GDPR violation?

@opusforlife2
Copy link
Collaborator

Importing the CSV file doesn't work in the latest NewPipe available from F-Droid.

@Simon311 The version with this fix isn't released yet. You can install and test the Release Candidate by checking the pinned issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Issue is related to a bug youtube service, https://www.youtube.com/

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants