Skip to content

LG-11208: disposable email database#9440

Merged
mdiarra3 merged 15 commits intomainfrom
LG-11208-disposable-emails-db
Nov 8, 2023
Merged

LG-11208: disposable email database#9440
mdiarra3 merged 15 commits intomainfrom
LG-11208-disposable-emails-db

Conversation

@mdiarra3
Copy link
Contributor

🎫 Ticket

LG-11208: disposable email database loaded

🛠 Summary of changes

Disposable Email created and loaded,

@mdiarra3 mdiarra3 requested a review from a team October 24, 2023 17:36
@mdiarra3
Copy link
Contributor Author

Not sure if we wanted any of the other disposable email databases that were a part of https://github.com/fnando/email_data/tree/main#data-sources, I didnt see any benefit necessarily for having them but open to additional input

@mdiarra3 mdiarra3 changed the title changelog: Internal, Authentication, Disposable emails database loaded LG-111208: disposable email database Oct 24, 2023
Copy link
Contributor

@mitchellhenke mitchellhenke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should consider a different approach for this. A couple things that stand out to me:

  1. The data files in the gem are quite large (4.8MB currently), and we'll include them in every build artifact when they're only used once
  2. We'll likely want to be able to update the database over time rather than only including an initial set of data in a one-time migration

We have an existing pattern for handling large files like geo_data and pwned_passwords, perhaps something similar would be useful here (whether it's ultimately backed by a database table or something else).

@mdiarra3 mdiarra3 marked this pull request as draft October 30, 2023 20:31
Comment on lines +3 to +4
Faraday.get(url).body.each_line do |line|
DisposableDomain.find_or_create_by(name: line)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure this one-at-a-time will be much slower than a bulk insert, have we looked into the INSERT ... ON CONFLICT approach for a bulk load?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

insert_all is probably the preferable approach for this

@mdiarra3 mdiarra3 marked this pull request as ready for review November 3, 2023 13:56
@@ -0,0 +1,11 @@
class CreateEmailDataTable < ActiveRecord::Migration[7.0]
def change
enable_extension "citext"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I correct in assuming we won't need anything additional to be installed in deployed environments for this to be enabled?

Copy link
Contributor

@aduth aduth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 I expect we'll want to test this in a lower environment before running the script in production

@mdiarra3
Copy link
Contributor Author

mdiarra3 commented Nov 8, 2023

yup! I will have this tested in dev first

@mdiarra3 mdiarra3 merged commit e68ec07 into main Nov 8, 2023
@mdiarra3 mdiarra3 deleted the LG-11208-disposable-emails-db branch November 8, 2023 14:43
t.index ["user_id", "last_used_at"], name: "index_device_user_id_last_used_at"
end

create_table "disposable_domains", force: :cascade do |t|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It didn't occur to me until now, but it would be nice to incorporate the word "email" in this name. The domains themselves aren't disposable, it's the fact that the domains offer a disposable email service.

Suggested change
create_table "disposable_domains", force: :cascade do |t|
create_table "disposable_email_domains", force: :cascade do |t|

Copy link
Contributor

@zachmargolis zachmargolis Nov 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we don't want to deal with a table rename, we could rename the active record model and set the table name to what it is now (like we do with the Gpo models)

@aduth aduth changed the title LG-111208: disposable email database LG-11208: disposable email database Jan 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants