-
-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specify unique id #23
Comments
thanks for this, I like this idea & think it is definitely worth exploring |
I'm thinking of tackling this in 3 pieces:
I'm realizing that 1 and 2 are actually already nearly possible, but the UX here could be better (or at least better documented).
I'm wondering if you have any thoughts about the UX here, is I'm thinking at least the first two pieces of this can fit into 0.9. I'm undecided on the modified date piece since it seems like it'll come with a whole new set of edge cases, and the mentioned use case of checking results into git already works as long as the output itself doesn't change. |
Thanks @jamesturk
Yes I think Another option, though it may not always be available, is using the the col name for the primary key. |
Is your feature request related to a problem? Please describe.
Describe the solution you'd like
When persisting to disk, currently each
scrape
run creates a new folder, and each item (row in a table) has a randomly generated, unique ID.If I were able to specify which column in the table had a unique ID, then:
Describe alternatives you've considered
Alternatives including using a database, and would involve a lot more development overhead. This solution is, IMO, more lightweight, as it is basically using the local file system as the database, and also depends on the data provider having some kind of unique id, and the scraper developer being able to identify and use this ID.
Additional context
Additionally, it would be good if, on subsequent runs/scrapes, if spatula could read in the already persisted json, and compare it to what is being scraped, and only persist if there is differences.
The text was updated successfully, but these errors were encountered: