Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom outputting for pupa data #299

Closed
wants to merge 81 commits into from

Conversation

doubleswirve
Copy link

Apologies if this is a little rough around the edges (not sure what the etiquette is on some of this stuff), but I wanted to submit this as a work-in-progress PR to get feedback.

This PR allows pupa data to be sent to other targets besides being written to a file. This initially includes an output option for Google Cloud Pub/Sub (thanks @showerst for initial implementation), but could also be extended to additional targets/services (e.g., Kafka).

The basic idea is we hook into the __init__ method of the Scraper class, and set up an instance variable output_target (which defaults to self for the default file writing). Based on the specified OUTPUT_TARGET environment variable, we call the save_object method on either the default Scraper instance or the alternative output target instance (e.g., Pub/Sub instance). So the only requirement is to have the alternative output target class implement a save_object method.

So far this works pretty well; however, a couple redundant spots include:

  • obj.pre_save call
  • info/debug logging prior to writing to file/sending to service/etc
  • object validation (i.e., obj.validate())
  • obj._related iterating/saving

Seems like some of these could be moved to methods in the Scraper class so alternative output target classes wouldn't need to include them.

We probably need some unit testing in there as well. Anyway, open to ideas and look forward to getting your feedback. Thanks!

doubleswirve and others added 28 commits January 18, 2018 23:33
…date meta data from google pubsub as it is already part of the message object (during subscription)
Google Pub/Sub env var adjustments and helper methods
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants