-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to do paging with scan-parallel? #17
Comments
Hi Ulrik, sorry for the delay responding to this. Just to clarify: have you used :limit successfully with I haven't tested it (don't have any db creds with me atm) - but I can't think of a reason why :limit shouldn't work with (defn scan-parallel
"Like `scan` but starts a number of worker threads and automatically handles
parallel scan options (:total-segments and :segment). Returns a vector of
`scan` results.
Ref. http://goo.gl/KLwnn (official parallel scan documentation)."
[creds table total-segments & [opts]]
(let [opts (assoc opts :total-segments total-segments)]
(->> (mapv (fn [seg] (future (scan creds table (assoc opts :segment seg))))
(range total-segments))
(mapv deref)))) As for using So to implement paging you'd want to do something like this [untested, don't have a db with me]: (scan creds :my-table {:limit 2 :attr-conds {:age [:in [24 27]]}})
=> [{:age 24, :name \"Steve\"} {:age 27, :name \"Susan\"}]
(scan creds :my-table {:last-prim-kvs {:age 24 :name \"Susan\"} :attr-conds {:age [:in [24 27]]}}) Does that help? |
I am using I have around a million entries that I want to process, and I don't want to read them all into memory at once. I'm currently using |
Sorry I don't have any test dbs on hand atm - it'd help if you could be a little more specific. Are you seeing an error when you replace
It should be possible. Unless I'm misunderstanding something about what you're trying to do - it should literally be as simple as replacing |
I didn't want to provide lots of details if I was completely misunderstanding the functionality of
|
I am using (scan creds :my-table {:limit 1}) This will give me a vector containing the first page of entries (the actual number depends, in my case 5): [{:id 1, :x "a"} {:id 2, :x "b"} {:id 3, :x "c"} {:id 4, :x "d"} {:id 5, :x "e"}] In the next call, I set (scan creds :my-table {:last-prim-kvs {:id 5} :limit 1}) This will give me the next page of entries: [{:id 6, :x "f"} {:id 7, :x "g"} {:id 8, :x "h"} {:id 9, :x ""} {:id 10, :x "i"}] I can't understand how to do it with (scan-parallel creds :my-table 2 {:limit 1}) This will give me a vector of size 2, where each element is a vector containing some page of entries, not necessarily page one and two: [
[{:id 1, :x "a"} {:id 2, :x "b"} {:id 3, :x "c"} {:id 4, :x "d"} {:id 5, :x "e"}]
[{:id 16, :x "s"} {:id 17, :x "r"} {:id 18, :x "k"} {:id 19, :x "q"} {:id 20, :x "p"}]
] What about the subsequent calls for the remaining pages? How do I specify user=> (scan-parallel creds :my-table 2 {:last-prim-kvs {:id "5"} :limit 1})
AmazonServiceException The provided starting key is invalid: Invalid ExclusiveStartKey.
Please use ExclusiveStartKey with correct Segment. TotalSegments: 2 Segment: 1
com.amazonaws.http.AmazonHttpClient.handleErrorResponse (AmazonHttpClient.java:679) You're saying that |
Hi, closing this - assuming it's gone stale? |
I couldn't get it to work, but I still don't know if I did something wrong or if there is something missing in faraday. |
Yeah, sorry - I'm actually not using DynamoDB myself at the moment. Not sure off hand, and don't have any test dbs handy to look into this quickly. Would need to spend some proper time to dig into the DDB docs + API to confirm: may be a DDB limitation, or a Faraday limitation that needs fixing. Will reopen in case I do find some time in future, or someone else has some input. Really sorry to leave you hanging on this, wasn't intentional. |
Quick Google yielded this: http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Scan.html
So it seems like parallel scans should be pageable, but Faraday's scan implementation would need some work to allow this to be automatic. Have made a TODO note in the code though realistically don't think I'll personally have time to look into this near-term. You may be able to use PRs super welcome if you (or anyone else) feels like taking a stab at this! Cheers :-) |
Is this something worth fixing for a Faraday noob? Are you open for reviewing PR on this? Any thoughts about a reasonable solution direction? |
I think it would be nice to get this fixed @barkanido given we have the beginning of an implementation, so I expect PR's would be welcomed by the community. Having said that however, you could probably just manage the threads and paging yourself and just call I have an implementation of a |
@kipz fair enough. Maybe your example deserve a place here as an example people can refer to. Or even in the README. Just a thought. Anyway I was just looking for a way to contribute and found this issue. Maybe a task from the TODO is of higher priority? |
@joelittlejohn what are your thoughts on all this? |
@barkanido Re your question about whether this ticket is a good one for a Faraday noob to tackle, it's probably not 🙂 The existing paging implementation is one of the most complex parts of Faraday and as @kipz mentions people have often found that they prefer to avoid the paging feature altogether and implement their own solution (over which they have more control) outside Faraday. Is this a feature you need or were you just interested in making a useful contribution? I think the most useful thing to be done for Faraday is better documentation. Better docstrings and/or I think it would be very useful to have a list of examples that show real-world usage covering all typical ways to use Faraday's functions. |
For a Faraday noob that wants to contribute something useful, I recommend using the library for a while on a few projects and over time you will inevitably uncover something you'd like but is missing. |
I'm trying to do paging with
scan-parallel
using:limit
, but I'm not sure how to specify:last-prim-kvs
in subsequent calls. Each segment needs its own last key, I presume.Am I missing something, or is paging not implemented for parallel scan?
The text was updated successfully, but these errors were encountered: