Skip to content

[HACKATHON] Initial crack at an IdV matcher#11624

Merged
solipet merged 3 commits intologin-hackathon-2024-user-narrativefrom
matthinz/idv-matcher
Dec 12, 2024
Merged

[HACKATHON] Initial crack at an IdV matcher#11624
solipet merged 3 commits intologin-hackathon-2024-user-narrativefrom
matthinz/idv-matcher

Conversation

@matthinz
Copy link
Contributor

This PR adds a big, giant "IDV" matcher.

The matcher runs through event logs and observes when new attempts at identity verification start. It then tries to describe how they went.

Things it does well:

  • Tells you what Instant Verify checks failed
  • Identifies the case where the MVA does not recognize the submitted license number
  • Catches when a user resets their password while they're waiting for a GPO letter

Things it does not so well:

  • Pretty much anything else

Ultimately, identity verification is a "big" thing, and having a facility for sub-matchers rather than one monolithic matcher would probably be good. I don't think this PR represents the final form, but it does represent a form, and starts to capture some of the built-in knowledge you need to interpret IDV-related event logs.

Matcher is a state machine that collects IDV "attempts" as they happen and tries to suss out interesting things about them.

[skip changelog]
Comment on lines +68 to +69
attr_reader :current_idv_attempt
attr_reader :idv_attempts
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we encounter a new "welcome submitted" event, a new IdvAttempt is created and stored in current_idv_attempt. Subsequent events then update the state of that attempt. If we see another "welcome submitted" event, the current_idv_attempt is put in idv_attempts and a new one is started.

end

when RATE_LIMIT_REACHED_EVENT
handle_rate_limit_reached(event:)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not use for_current_idv_attempt because not all rate limits are idv-related. We need to figure that out first.


private

def add_significant_event(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super happy about introducing another overload of the term "event" here, but the idea is we capture only the most interesting things in a way we can display to the end user

Comment on lines +40 to +42
find_cloudwatch_events do |event|
Time.zone ||= 'America/New_York'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably easier to set the timezone once, instead of for each event?

Suggested change
find_cloudwatch_events do |event|
Time.zone ||= 'America/New_York'
Time.zone ||= 'America/New_York'
find_cloudwatch_events do |event|

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #11627

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran into an issue with this--I think that when the Cloudwatch client uses multiple threads each thread needs its time zone initialized.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I'll update it to use the configured zone.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm maybe we need to update the CW client to call that block back on the main thread only?

Copy link
Contributor

@solipet solipet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great start!

@solipet solipet merged commit 9dca95a into login-hackathon-2024-user-narrative Dec 12, 2024
@solipet solipet deleted the matthinz/idv-matcher branch December 12, 2024 17:06
matthinz added a commit that referenced this pull request Jan 8, 2025
* Initial cloudwatch query script to summarize events

* Query cloudwatch and get user events

* add timestamp remove limit

* [Hackathon] Allow sourcing events from stdin (#11619)

* Allow sourcing events from stdin

It may be useful sometimes to take a local cache of cloudwatch events and pipe them into this command.

[skip changelog]

* Add 'limit: 10000' to CW query

This is required for `complete` to work

* [Hackathon] Add ExampleMatcher (#11622)

* Add ExampleMatcher

Add an example matcher that just counts events and outputs how many it saw.

[skip changelog]

* Remove excess whitespace

* Add frozen_string_literal: true

* use optparse to allow command options/defaults (#11627)

* [HACKATHON] Initial crack at an IdV matcher (#11624)

* Initial crack at an IdV matcher

Matcher is a state machine that collects IDV "attempts" as they happen and tries to suss out interesting things about them.

[skip changelog]

* removed unused method

---------

Co-authored-by: Douglas Price <douglas.price@gsa.gov>

* [HACKATHON] Output formatting tweaks (#11635)

* Normalize @timestamp to UTC for each event

Pre-parse it in the script so that matchers don't have to worry about it

* Slightly improve output

- Include timestamps where possible

[skip changelog]

* [HACKATHON] Minor tweaks (#11637)

* Don't crash if no events found

* Tweak handling of --end-date

- Use a dash rather than underscore
- Make sure we respect it if it's passed in

* Sort events on stdin before processing

Events from Cloudwatch queries will be sorted, but stdin is not guaranteed.

Processing unsorted events can lead to weird, weird, outcomes

* report on TrueID success/failure (#11638)

* Try to identify IDV abandonment (#11639)

If the user:

- Has not completed the initial workflow and
- Does not have an idv-related event new that 1 hour

Call their attempt abandoned

* Login hackathon 2024 user narrative account deletion (#11629)

* include timestamp

* add account deletion narrative matcher

* remove unneeded matcher requirement

* add deletion matcher

* lint

* rename account deletion

* read events from file without changing stdin

* remove ipp from gpo code submission event

* update example documentation in script

* Update lib/event_summarizer/vendor_result_evaluators/aamva.rb

Co-authored-by: Zach Margolis <zachmargolis@users.noreply.github.com>

* Start writing a spec

* Tidy up logic in IV result evaluator

* Set event['name'] if not already set

* Fix typo

* Use Eastern time zone by default

* Update pluralization code + add spec

* Start on spec for summarize-user-events command

* Protect rubocop's delicate sensibilities

* Add more specs

Add some specs around option parsing, time parsing, and actually running the program

* Look at banner michael

---------

Co-authored-by: Malick Diarra <malick.diarra@gsa.gov>
Co-authored-by: Doug Price <douglas.price@gsa.gov>
Co-authored-by: Eileen <eileenmcfarland@navapbc.com>
Co-authored-by: eileen-nava <80347702+eileen-nava@users.noreply.github.com>
Co-authored-by: Zach Margolis <zachmargolis@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants