-
-
Notifications
You must be signed in to change notification settings - Fork 1k
Use Postgresql to push/pop log files to be processed. #1209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
9e9c183
8241e3f
4daaadd
abdae3d
93a3cb0
e809c5f
a01c4b0
07ee0f3
346d217
c03988b
4d634cf
1b0a5f0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,23 @@ | ||
| class LogTicket < ActiveRecord::Base | ||
| enum backend: [:s3, :local] | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we make this dependent on the environment, rather than the model/record? We shouldn't ever have both in any one environment. For development it would be pretty easy to flip back and forth if needed?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Not sure about that, specially on staging we could test both. Also if we ever need to fix something manually, we could use local FS easily to do that. |
||
|
|
||
| scope :pending, -> { limit(1).lock(true).select("id").where(status: "pending").order("id ASC") } | ||
|
|
||
| def self.pop(key: nil, directory: nil) | ||
| scope = pending | ||
| scope = scope.where(key: key) if key | ||
| scope = scope.where(directory: directory) if directory | ||
| sql = scope.to_sql | ||
|
|
||
| find_by_sql(["UPDATE #{quoted_table_name} SET status = ? WHERE id IN (#{sql}) RETURNING *", 'processing']).first | ||
| end | ||
|
|
||
| def filesystem | ||
| @fs ||= | ||
| if s3? | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IMHO it will be better to hide this branching and passing credentials in for example: @filesystem ||= RubygemFS.new(directory: directory, s3: s3?)
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah, i will play with that.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK, I had another idea also: @filesystem ||= RubygemFS.new(directory: directory, backend: backend)Just to move responsibility of backend selecting to one place. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree with moving this into
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @arthurnn FYI, you could stub out the S3 responses to use a local log file in tests. That's what I did in the FastlyLogProcessor unit tests. Here's a good blog post all about stubbing AWS responses. That seems simpler than supporting pluggable
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good thing about supporting local file is so I can test using development environment too, not just on test. That was a simple win, as we already had the API |
||
| RubygemFs::S3.new(bucket: directory) | ||
| else | ||
| RubygemFs::Local.new(directory) | ||
| end | ||
| end | ||
| end | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| class CreateLogTickets < ActiveRecord::Migration | ||
| def change | ||
| create_table :log_tickets do |t| | ||
| t.string :key | ||
| t.string :directory | ||
| t.integer :backend, default: 0 | ||
| t.string :status | ||
|
|
||
| t.timestamps null: false | ||
| end | ||
|
|
||
| add_index :log_tickets, [:directory, :key], unique: true | ||
| end | ||
| end |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -11,7 +11,7 @@ | |
| # | ||
| # It's strongly recommended that you check this file into your version control system. | ||
|
|
||
| ActiveRecord::Schema.define(version: 20150709170542) do | ||
| ActiveRecord::Schema.define(version: 20160227194735) do | ||
|
|
||
| # These are extensions that must be enabled in order to support this database | ||
| enable_extension "plpgsql" | ||
|
|
@@ -76,6 +76,17 @@ | |
|
|
||
| add_index "linksets", ["rubygem_id"], name: "index_linksets_on_rubygem_id", using: :btree | ||
|
|
||
| create_table "log_tickets", force: :cascade do |t| | ||
| t.string "key" | ||
| t.string "directory" | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IMHO, calling S3 buckets If you get rid of the pluggable backends and use S3 stubbing instead, I recommend renaming this to
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I didn't want to make this specific to S3. Thats why I called directory. However as I explained above, we already have the s3/local abstraction in RubygemFS, so I made that work, and it allow us to test local development environment too.. End-to-end, so I can just drop a file in a folder, and that will be processed locally.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree with @ktheory here, but I can see the value for testing. |
||
| t.integer "backend", default: 0 | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @arthurnn: IMHO, having a facade wrapper around 3rd-party APIs is great, so that's a big 👍 for this PR and OTOH, adding the So my pie-in-the-sky ideal would be a way to support S3 stubbing/local files in dev & test envs, without needing the database.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One solution I could do is, instead of having a Also, the extra tests are like 3 tests, 12 lines of code. As the facade itself(RubygemsFS) is already been tested as a unit. I think having a backend per log, will give us more flexibility and it actually hasn't cost much in terms of code, so thats why I am pushing this approach. Also, this allow us to process files from other backends in the future, lets say we start pushing logs to kafka, and we want to process them. I know this sounds a bit like crazy talk now, but for the current use case we have, having two backends sounded like a good abstraction area to me. |
||
| t.string "status" | ||
| t.datetime "created_at", null: false | ||
| t.datetime "updated_at", null: false | ||
| end | ||
|
|
||
| add_index "log_tickets", ["directory", "key"], name: "index_log_tickets_on_directory_and_key", unique: true, using: :btree | ||
|
|
||
| create_table "oauth_access_grants", force: :cascade do |t| | ||
| t.integer "resource_owner_id", null: false | ||
| t.integer "application_id", null: false | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does
extramean? Can we make this more descriptive?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra, means a processor that was enqueued but didnt have any ticket/log to process. That should not happen, unless we manually queue jobs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, okay.