LG-7470 | Very crude start of background job [WIP]#7085
Conversation
| key = key(timestamp) | ||
| redis_pool.with do |client| | ||
| # see client.hscan which refs https://redis.io/commands/scan/ | ||
| # but it's... a lil' bit weird. |
There was a problem hiding this comment.
Arguably, this change would be the big win, in allowing us to read and write in batches rather than fetching everything into memory. Will need to play around with this.
There was a problem hiding this comment.
I think a good task for tomorrow is to write something quick to bulk-generate events. We're going to want something like that for generating a large sample file for the IRS as well.
My gut feeling is that hgetall won't be a great choice with a huge number of events, and they'll all be loaded in memory. I like the idea of being able to hscan and write them to the file as we fetch them.
It is occurring to me tonight that we've talked about doing this "background processing" to fetch all the events, put them in a flat file, and then store that in Redis rather than S3. Is that actually saving us anything over what we have now? All the more reason to generate a ton of events and bang on this.
| # Get this to run at the early part of the hour | ||
|
|
||
| def perform(subject_timestamp) | ||
| puts 'Howdy, partner' |
There was a problem hiding this comment.
Obviously this should not be merged; just an easy way to see when this runs.
| key = key(timestamp) | ||
| redis_pool.with do |client| | ||
| # see client.hscan which refs https://redis.io/commands/scan/ | ||
| # but it's... a lil' bit weird. |
There was a problem hiding this comment.
I think a good task for tomorrow is to write something quick to bulk-generate events. We're going to want something like that for generating a large sample file for the IRS as well.
My gut feeling is that hgetall won't be a great choice with a huge number of events, and they'll all be loaded in memory. I like the idea of being able to hscan and write them to the file as we fetch them.
It is occurring to me tonight that we've talked about doing this "background processing" to fetch all the events, put them in a flat file, and then store that in Redis rather than S3. Is that actually saving us anything over what we have now? All the more reason to generate a ton of events and bang on this.
| file.write event | ||
| end | ||
| file.close | ||
| file.path |
There was a problem hiding this comment.
Commenting in case someone else ends up picking this up.
Right now, this reads all the events for a given hour out of Redis, writes them to a temp file, and then returns the file path (but nothing is looking at the return value, except me in the console). This is not very useful.
We want to figure out where to store this. We've discussed either S3 or Redis. With multiple servers behind a load balancer, and instances periodically recycled, we can't rely on saving the file locally.
We also want to apply encryption and gzip on this. See the fetch_events rake task, at least for the encryption bit.
The other bit of work on this is, once all that's working, change the endpoint to return that file, wherever it's stored, rather than generating it on the fly.
|
Can/should we close this in favor of #7259? |
I think so. Matt is aware that I took this on. |
🎫 Ticket
LG-7470 TK
🛠 Summary of changes
This is currently so barebones as to be useless, but I wanted to get something up.
📜 Testing Plan
Provide a checklist of steps to confirm the changes.
👀 Screenshots
If relevant, include a screenshot or screen capture of the changes.
Before:
After:
🚀 Notes for Deployment
Include any special instructions for deployment.