Conversation
|
Not sure if we wanted any of the other disposable email databases that were a part of https://github.com/fnando/email_data/tree/main#data-sources, I didnt see any benefit necessarily for having them but open to additional input |
There was a problem hiding this comment.
I think we should consider a different approach for this. A couple things that stand out to me:
- The data files in the gem are quite large (4.8MB currently), and we'll include them in every build artifact when they're only used once
- We'll likely want to be able to update the database over time rather than only including an initial set of data in a one-time migration
We have an existing pattern for handling large files like geo_data and pwned_passwords, perhaps something similar would be useful here (whether it's ultimately backed by a database table or something else).
lib/load_disposable_domain.rb
Outdated
| Faraday.get(url).body.each_line do |line| | ||
| DisposableDomain.find_or_create_by(name: line) |
There was a problem hiding this comment.
I'm pretty sure this one-at-a-time will be much slower than a bulk insert, have we looked into the INSERT ... ON CONFLICT approach for a bulk load?
There was a problem hiding this comment.
insert_all is probably the preferable approach for this
| @@ -0,0 +1,11 @@ | |||
| class CreateEmailDataTable < ActiveRecord::Migration[7.0] | |||
| def change | |||
| enable_extension "citext" | |||
There was a problem hiding this comment.
Am I correct in assuming we won't need anything additional to be installed in deployed environments for this to be enabled?
aduth
left a comment
There was a problem hiding this comment.
LGTM 👍 I expect we'll want to test this in a lower environment before running the script in production
|
yup! I will have this tested in dev first |
| t.index ["user_id", "last_used_at"], name: "index_device_user_id_last_used_at" | ||
| end | ||
|
|
||
| create_table "disposable_domains", force: :cascade do |t| |
There was a problem hiding this comment.
It didn't occur to me until now, but it would be nice to incorporate the word "email" in this name. The domains themselves aren't disposable, it's the fact that the domains offer a disposable email service.
| create_table "disposable_domains", force: :cascade do |t| | |
| create_table "disposable_email_domains", force: :cascade do |t| |
There was a problem hiding this comment.
if we don't want to deal with a table rename, we could rename the active record model and set the table name to what it is now (like we do with the Gpo models)
🎫 Ticket
LG-11208: disposable email database loaded
🛠 Summary of changes
Disposable Email created and loaded,