Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions app/jobs/irs_attempt_events_batch_job.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
class IrsAttemptEventsBatchJob < ApplicationJob

require 'tempfile'

# copypasta
include JobHelpers::StaleJobHelper
queue_as :default
discard_on JobHelpers::StaleJobHelper::StaleJobError

# Get this to run at the early part of the hour

def perform(subject_timestamp)
puts 'Howdy, partner'
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obviously this should not be merged; just an easy way to see when this runs.


# Probably want something more durable
# mktmpdir won't auto-delete if it's not called with a block:
# https://ruby-doc.org/stdlib-2.5.1/libdoc/tmpdir/rdoc/Dir.html
dir = Dir.mktmpdir(subject_timestamp.to_s)
file = File.new("#{dir}/#{Time.now.to_fs(:number)}", 'w')

events = IrsAttemptsApi::RedisClient.new.read_events(timestamp: subject_timestamp)
events.each do |event|
file.write event
end
file.close
file.path
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commenting in case someone else ends up picking this up.

Right now, this reads all the events for a given hour out of Redis, writes them to a temp file, and then returns the file path (but nothing is looking at the return value, except me in the console). This is not very useful.

We want to figure out where to store this. We've discussed either S3 or Redis. With multiple servers behind a load balancer, and instances periodically recycled, we can't rely on saving the file locally.

We also want to apply encryption and gzip on this. See the fetch_events rake task, at least for the encryption bit.

The other bit of work on this is, once all that's working, change the endpoint to return that file, wherever it's stored, rather than generating it on the fly.

end

private

end
2 changes: 2 additions & 0 deletions app/services/irs_attempts_api/redis_client.rb
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ def write_event(event_key:, jwe:, timestamp:)
def read_events(timestamp:)
key = key(timestamp)
redis_pool.with do |client|
# see client.hscan which refs https://redis.io/commands/scan/
# but it's... a lil' bit weird.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arguably, this change would be the big win, in allowing us to read and write in batches rather than fetching everything into memory. Will need to play around with this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a good task for tomorrow is to write something quick to bulk-generate events. We're going to want something like that for generating a large sample file for the IRS as well.

My gut feeling is that hgetall won't be a great choice with a huge number of events, and they'll all be loaded in memory. I like the idea of being able to hscan and write them to the file as we fetch them.

It is occurring to me tonight that we've talked about doing this "background processing" to fetch all the events, put them in a flat file, and then store that in Redis rather than S3. Is that actually saving us anything over what we have now? All the more reason to generate a ton of events and bang on this.

client.hgetall(key)
end
end
Expand Down
6 changes: 6 additions & 0 deletions config/initializers/job_configurations.rb
Original file line number Diff line number Diff line change
Expand Up @@ -217,6 +217,12 @@
class: 'ThreatMetrixJsVerificationJob',
cron: cron_1h,
},
# Batch up IRS Attempts API events
irs_attempt_events_aggregator: {
class: 'IrsAttemptEventsBatchJob',
cron: cron_1h,
args: -> { [Time.zone.now - 1.hour] },
},
}
end
# rubocop:enable Metrics/BlockLength
Expand Down