Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sources.takeout: add support for new youtube csv format #436

Merged
merged 3 commits into from
Mar 31, 2024

Conversation

purarue
Copy link
Contributor

@purarue purarue commented Mar 10, 2024

google takeout recently changed the format to CSV files for youtube comments, I added support for it to google_takeout_parser a few weeks ago.

I haven't taken a stab at trying to de-dupe comments that exist in the old HTML format and the new CSV one yet, it is on my todos, but I thought it would be good to get this in here so that new people making an export can at least get access to their comments. There might be some duplication but better than erroring or not existing

this is very basic right now, it does not have any error checking, so if the user is on an old version of google_takeout_parser, this will just error. Should I add a warning message in the ImportError reminding them to upgrade? Wasnt sure if that was too much

If theres anything else you think should be changed/added for this, let me know

@purarue
Copy link
Contributor Author

purarue commented Mar 10, 2024

hmm, looks like hypothesis test data may be gone:

  Error: fatal: repository 'https://github.com/judell/Hypothesis.git/' not found
  Error: fatal: clone of 'https://github.com/judell/Hypothesis.git' into submodule path '/home/runner/work/promnesia/promnesia/tests/testdata/hypexport/src/hypexport/Hypothesis' failed
  Failed to clone 'src/hypexport/Hypothesis' a second time, aborting

@karlicoss
Copy link
Owner

Yeah also just noticed the CI stuff -- fixed here karlicoss/hypexport@b9f1cab (has some explanation why I used a submodule in the first place). If you rebase should hopefully all good!

@karlicoss
Copy link
Owner

And thanks for the change! Haven't seen this data yet I think, but haven't done exports for some months
Yeah, I think it's worth making these new imports more defensive, otherwise the whole data source will go down. I would probably try to import new ones separetely, if that fails -- warn/emit exception -- and could also assign the 'new' imports to some dummy class
e.g.

class dummy:
    pass

CSVYoutubeLiveChat = dummy

that way the rest of the code with isinstance checks won't need changes

@purarue
Copy link
Contributor Author

purarue commented Mar 13, 2024

yep, gotcha

im a bit busy for the next few days but will get to that when I have some time

@purarue
Copy link
Contributor Author

purarue commented Mar 15, 2024

have not tested on old version yet, but I think something like this should work

will test on old/new versions of google_takeout_parser later and let you know

does look like it at least works on new version:

[INFO    2024-03-15 15:03:11 promnesia dump.py:182] database stats changes: browser +92
[INFO    2024-03-15 15:03:11 promnesia dump.py:182] database stats changes: error -2
[INFO    2024-03-15 15:03:11 promnesia dump.py:182] database stats changes: promnesia_sean.sources.zsh +2
[INFO    2024-03-15 15:03:11 promnesia dump.py:182] database stats changes: takeout +154
[ ~ ] $

@karlicoss
Copy link
Owner

whoops, forgot to press merge! thanks

@karlicoss karlicoss merged commit 31ee24b into karlicoss:master Mar 31, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants