Skip to content

Protocols report#10537

Merged
vrajmohan merged 1 commit intomainfrom
protocols-report
May 8, 2024
Merged

Protocols report#10537
vrajmohan merged 1 commit intomainfrom
protocols-report

Conversation

@vrajmohan
Copy link
Contributor

@vrajmohan vrajmohan commented May 1, 2024

🎫 Ticket

Link to the relevant ticket:
https://gitlab.login.gov/lg-people/lg-people-appdev/protocols/common/-/issues/1

🛠 Summary of changes

Added a CloudWatch report of:

  1. authentication requests by protocol
  2. SAML signatures - missing and invalid

📜 Testing Plan

  • Run the report using aws-vault exec $DEPLOYED_ENVIRONMENT-$LOGIN_IAM_PROFILE -- bundle exec rails runner lib/reporting/protocols_report.rb --date=$SOMEDATE.

  • Confirm the validity of the reported metrics by querying CloudWatch logs from the AWS console

@vrajmohan vrajmohan requested a review from Sgtpluck May 1, 2024 14:35
Comment on lines 98 to 152
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider formatting this for shorter lines for easier readability, also using consistent casing for keywords like AS

Suggested change
format(<<~QUERY, params)
fields name as protocol, properties.event_properties.service_provider as issuer, properties.event_properties.request_signed = 1 AS signed, properties.event_properties.request_signed != 1 AS not_signed, isempty(properties.event_properties.matching_cert_serial) and signed as invalid_signature
| filter name IN %{event}
| stats count(*) as request_count, sum(signed) as signed_count, sum(not_signed) as unsigned_count, sum(invalid_signature) as invalid_signature_count by protocol, issuer
QUERY
format(<<~QUERY, params)
fields
name AS protocol
, properties.event_properties.service_provider AS issuer
, properties.event_properties.request_signed = 1 AS signed
, properties.event_properties.request_signed != 1 AS not_signed
, isempty(properties.event_properties.matching_cert_serial) AND signed AS invalid_signature
| filter name IN %{event}
| stats
count(*) AS request_count
, sum(signed) AS signed_count
, sum(not_signed) AS unsigned_count
, sum(invalid_signature) AS invalid_signature_count
by
protocol
, issuer
QUERY

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I don't like the leading commas - it hurts my eyes :-). However, if it is a standard here, I am happy to adopt it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% consistent but we do have many files formatted that way:

identity-idp> git grep -l  "  , " | wc -l
      18

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mentioned in some of our standups, i think this report will end up growing beyond these initial needs, that will require different fields/manipulation of the events. so i was originally envisioning this to be a query for the events with some specific fields that are then manipulated in the data method (example: https://github.com/18F/identity-idp/blob/main/lib/reporting/drop_off_report.rb#L410-L421). so this will probably change in the future, but i think it's fine for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option is also to have multiple separate queries that get combined in one email or something

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have done that by pulling out the LOA ACR metrics that was also asked for in this ticket into a separate ticket (and consequently separate report). I do see requests by protocols and SAML signature issues as being separate concerns. I didn't want to do too much given the Data Warehouse plan. I also see that some of these reports are perfectly fine as CloudWatch reports run from the AWS web console...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decided to go with the separate queries after all (rationale - would have increased code duplication with separate reports). Will also be bringing in the LOA ACR query into this report.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should nil be []? (also on line 162)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is nil so that Array#compact can remove it. I don't know if this is idiomatic Ruby. I miss my list comprehensions.

Copy link
Contributor

@Sgtpluck Sgtpluck May 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i see, i misread what was happening here. i think it would be easier/more readable to use keep_if than collect.

issuers_with_invalid_signatures = data
  .keep_if { |slice| slice['invalid_signature_count'].to_i > 0 }
  .uniq

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is still not enough. I would still need to do collect or map. It would have to be:

issuers_with_invalid_signatures = data
  .keep_if { |slice| slice['invalid_signature_count'].to_i > 0 }
  .collect { |slice| slice['issuer'] }
  .uniq

Would you prefer that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

personally I like select and map but otherwise same same

issuers_with_invalid_signatures = data
  .select { |slice| slice['invalid_signature_count'].to_i > 0 }
  .map { |slice| slice['issuer'] }
  .uniq

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange, Rubocop wants "Place the . on the previous line, together with the method call receiver."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah, i don't like that particular rubocop option, but it's what we have standardized

@vrajmohan vrajmohan force-pushed the protocols-report branch 3 times, most recently from 1c8e397 to 0b75172 Compare May 1, 2024 21:09
@vrajmohan vrajmohan marked this pull request as ready for review May 1, 2024 22:28
@vrajmohan vrajmohan requested a review from Sgtpluck May 3, 2024 14:32
Copy link
Contributor

@Sgtpluck Sgtpluck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine, and if it works that's great. i have a couple suggestions about moving some of the data transformation into the data blocks so the rest of the class can ingest the data as it needs it, but i know this has been sitting for awhile so we it's not blocking (can always move things around later)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we do some data transformation here, since we are only using the protocol_data method in sum_protocol_data_by? i think we could use this method to create a hash that looks like

{
  saml: $SAML_COUNT,
  oidc: $OIDC_COUNT
}

and reduce the number of methods/abstraction to get the data we need (i believe the other reporting files have examples of stuff like this, i think the inject method with a ternary is a bit hard to reason about)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to my suggestion about the protocol_data, i wonder if this makes more sense in the saml_signature_data block. if not, this is very similar to saml_issuers_with_invalid_signatures, and i wonder if that should be a method?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vrajmohan this works for me!

@vrajmohan vrajmohan requested a review from Sgtpluck May 8, 2024 16:27
@vrajmohan vrajmohan force-pushed the protocols-report branch from 7cfed37 to d44dee9 Compare May 8, 2024 17:03
See https://gitlab.login.gov/lg-people/lg-people-appdev/protocols/common/-/issues/1

**Why**:
-  We would like to like have some baseline information about the state
   of our partner API consumption.

**How**:
- Using a similar pattern to other reports, query our CloudWatch logs to
  report from the analytics events log.

changelog: Internal, Reporting, Create protocols report
@vrajmohan vrajmohan force-pushed the protocols-report branch from d44dee9 to 87a552b Compare May 8, 2024 20:02
@vrajmohan vrajmohan merged commit 659f2bf into main May 8, 2024
@vrajmohan vrajmohan deleted the protocols-report branch May 8, 2024 20:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants