Skip to content

Use Postgresql to push/pop log files to be processed.#1209

Merged
arthurnn merged 12 commits into
masterfrom
arthurnn/postgres_fastly_log_process
Mar 8, 2016
Merged

Use Postgresql to push/pop log files to be processed.#1209
arthurnn merged 12 commits into
masterfrom
arthurnn/postgres_fastly_log_process

Conversation

@arthurnn
Copy link
Copy Markdown
Member

Problem

SQS delivers the messages as at-least-one, so we need to make sure our
job is idempotent. Previous implementation was rely on Redis for that, however
there was a TTL on it, which could cause problems if a message is delivered again
after the TTL.
Besides we would like to remove Redis from our stack, for a few other
reasons.
[related https://github.com//issues/1208]

Solution

Use Postgresql to guarantee idempontency in our LogProcessor job.
First, we will only queue a LoProcessor job once, to process a specific ticket

How

Every time we get a SQS message, push an entry to a table(log_tickets) as pending and also enqueue a job. That table has a unique constraint on bucket(directory), key, so the database will guarantee the idempontency of this job.
The job will unqueue that ticket and update its status to processing in a atomic operation, so even if we had two jobs running at the same time for the bucket/key, that entry would not be processed twice.
After the logger is processed, we update the counters and change the job status to processed, ideally, once we move the counters outside Redis, we will be able to do that in a atomic transaction, so they ticket will only be updated if the counters were updated too.

Other features of this implementation

  • I extracted the logic to lookup the log file, so we can process files locally for tests as well as files from s3.
  • The LogProcessor job doesn't need to be enqueued with a key/bucket, we can just process the tickets in a FIFO manner.
  • We can change a status of a ticket at any time to re process it.

What is in here:

  • Create new schema to store keys and ensure idempontency.
  • Allow multiple backends to process the logs, for easy test locally. allow S3 and local.
  • Write tests
  • Handle failure cases and test them

review @ktheory @dwradcliffe @evanphx @qrush @tarcieri

SQS delivers the messages as at-least-one, so we need to make sure our
job is idempotent. Previous implentation was rely on Redis for that, however
there was a TTL on it, which could cause problems if a message is delivered again
after the TTL.

Besides we would like to remove Redis from our stack, for a few other
resons.
Comment thread app/models/log_ticket.rb Outdated
class LogTicket < ActiveRecord::Base
enum backend: [ :s3, :local ]

scope :latest_pending, -> { limit(1).lock(true).select("id").where(status: "pending") }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't order missing here?

@arthurnn
Copy link
Copy Markdown
Member Author

@simi I did the two changes you proposed.

@arthurnn
Copy link
Copy Markdown
Member Author

This is done, and tested locally with the local FS backend.

@simi
Copy link
Copy Markdown
Contributor

simi commented Feb 28, 2016

Ahh, good work!

I just spotted Dir.mktmpdir calls are not cleaned. But this is not introduced in this change. I can send PR to fix that later.

@arthurnn
Copy link
Copy Markdown
Member Author

@simi sounds good. lets wait for this to be merged, and we can do a PR for that clear afterwards.. thanks ❤️

@tarcieri
Copy link
Copy Markdown

👍 to replacing Redis with Postgres for this use case

@indirect
Copy link
Copy Markdown

indirect commented Mar 1, 2016

I'm 👍 on this PR, with one code review comment.

Comment thread db/schema.rb

create_table "log_tickets", force: :cascade do |t|
t.string "key"
t.string "directory"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, calling S3 buckets directories is confusing b/c key could contain slashes.

If you get rid of the pluggable backends and use S3 stubbing instead, I recommend renaming this to bucket to more clearly map to the S3 domain model.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to make this specific to S3. Thats why I called directory.
Indeed if we removed the backed we could make it specific.

However as I explained above, we already have the s3/local abstraction in RubygemFS, so I made that work, and it allow us to test local development environment too.. End-to-end, so I can just drop a file in a folder, and that will be processed locally.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @ktheory here, but I can see the value for testing.

arthurnn pushed a commit that referenced this pull request Mar 8, 2016
…rocess

Use Postgresql to push/pop log files to be processed.
@arthurnn arthurnn merged commit 01fb6ce into master Mar 8, 2016
@arthurnn arthurnn deleted the arthurnn/postgres_fastly_log_process branch March 8, 2016 18:35
@arthurnn arthurnn temporarily deployed to staging March 8, 2016 18:41 Inactive
@dwradcliffe dwradcliffe temporarily deployed to production March 8, 2016 20:10 Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants