Use Postgresql to push/pop log files to be processed.#1209
Conversation
SQS delivers the messages as at-least-one, so we need to make sure our job is idempotent. Previous implentation was rely on Redis for that, however there was a TTL on it, which could cause problems if a message is delivered again after the TTL. Besides we would like to remove Redis from our stack, for a few other resons.
| class LogTicket < ActiveRecord::Base | ||
| enum backend: [ :s3, :local ] | ||
|
|
||
| scope :latest_pending, -> { limit(1).lock(true).select("id").where(status: "pending") } |
Also add more tests
|
@simi I did the two changes you proposed. |
|
This is done, and tested locally with the local FS backend. |
|
Ahh, good work! I just spotted |
|
@simi sounds good. lets wait for this to be merged, and we can do a PR for that clear afterwards.. thanks ❤️ |
|
👍 to replacing Redis with Postgres for this use case |
|
I'm 👍 on this PR, with one code review comment. |
|
|
||
| create_table "log_tickets", force: :cascade do |t| | ||
| t.string "key" | ||
| t.string "directory" |
There was a problem hiding this comment.
IMHO, calling S3 buckets directories is confusing b/c key could contain slashes.
If you get rid of the pluggable backends and use S3 stubbing instead, I recommend renaming this to bucket to more clearly map to the S3 domain model.
There was a problem hiding this comment.
I didn't want to make this specific to S3. Thats why I called directory.
Indeed if we removed the backed we could make it specific.
However as I explained above, we already have the s3/local abstraction in RubygemFS, so I made that work, and it allow us to test local development environment too.. End-to-end, so I can just drop a file in a folder, and that will be processed locally.
There was a problem hiding this comment.
I agree with @ktheory here, but I can see the value for testing.
…rocess Use Postgresql to push/pop log files to be processed.
Problem
SQS delivers the messages as at-least-one, so we need to make sure our
job is idempotent. Previous implementation was rely on Redis for that, however
there was a TTL on it, which could cause problems if a message is delivered again
after the TTL.
Besides we would like to remove Redis from our stack, for a few other
reasons.
[related https://github.com//issues/1208]
Solution
Use Postgresql to guarantee idempontency in our LogProcessor job.
First, we will only queue a LoProcessor job once, to process a specific ticket
How
Every time we get a SQS message, push an entry to a table(
log_tickets) as pending and also enqueue a job. That table has a unique constraint on bucket(directory), key, so the database will guarantee the idempontency of this job.The job will unqueue that ticket and update its status to
processingin a atomic operation, so even if we had two jobs running at the same time for the bucket/key, that entry would not be processed twice.After the logger is processed, we update the counters and change the job status to
processed, ideally, once we move the counters outside Redis, we will be able to do that in a atomic transaction, so they ticket will only be updated if the counters were updated too.Other features of this implementation
What is in here:
review @ktheory @dwradcliffe @evanphx @qrush @tarcieri