Skip to content

Add audit log stream to stackdriver filter#2970

Merged
istio-testing merged 28 commits intoistio:masterfrom
davidraskin:master
Aug 20, 2020
Merged

Add audit log stream to stackdriver filter#2970
istio-testing merged 28 commits intoistio:masterfrom
davidraskin:master

Conversation

@davidraskin
Copy link
Contributor

This adds an audit log stream to stackdriver. The goal would be to have requests to be auditted will be filtered through the audit action on Istio Authorization Policy.
See: istio/api#1552, Audit log stream proposal

cc @liminw @gargnupur @kyessenov @mandarjog

This is WIP. It should only be merged after envoyproxy/envoy#11705

Signed-off-by: davidraskin <draskin@google.com>
Signed-off-by: davidraskin <draskin@google.com>
@davidraskin davidraskin requested review from a team, gargnupur and kyessenov August 6, 2020 17:14
@istio-policy-bot
Copy link

😊 Welcome @davidraskin! This is either your first contribution to the Istio proxy repo, or it's been
awhile since you've been here.

You can learn more about the Istio working groups, code of conduct, and contributing guidelines
by referring to Contributing to Istio.

Thanks for contributing!

Courtesy of your friendly welcome wagon.

@googlebot googlebot added the cla: yes Set by the Google CLA bot to indicate the author of a PR has signed the Google CLA. label Aug 6, 2020
@istio-testing istio-testing added needs-rebase Indicates a PR needs to be rebased before being merged size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 6, 2020
Signed-off-by: davidraskin <draskin@google.com>
@istio-testing istio-testing removed the needs-rebase Indicates a PR needs to be rebased before being merged label Aug 6, 2020
@davidraskin davidraskin marked this pull request as draft August 6, 2020 17:24
@istio-testing istio-testing added the do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. label Aug 6, 2020
Signed-off-by: davidraskin <draskin@google.com>
@davidraskin davidraskin marked this pull request as ready for review August 6, 2020 17:42
@istio-testing istio-testing removed the do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. label Aug 6, 2020
bool disable_server_access_logging = 1;

// Optional. Controls whether to export audit log.
bool enable_server_access_auditting = 10;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we name this audit_server_access ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.


// Name of the HTTP server access log.
constexpr char kServerAccessLogName[] = "server-accesslog-stackdriver";
constexpr char kAuditAccessLogName[] = "server-accesslog-stackdriver-audit";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this naming convention important? does it need to match the historic name for the other access logs, or should we consider calling this server-audit-log (or something else less verbose)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's important. I just went the simplest route. I'll rename it.

@bianpengyuan
Copy link
Contributor

This might be a dumb question, but isn't it that request should not proceed until audit log succeed? Looks like the current implementation is still logging at Log callback which happens after request finishes.

@davidraskin
Copy link
Contributor Author

The request is not exported if there are no audit log entries added to it. This works the same way as with regular log entries currently.

log_entries_request_ =
std::make_unique<google::logging::v2::WriteLogEntriesRequest>();

audit_entries_request_ =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check this out... I have made it a map: https://github.com/istio/proxy/pull/2955/files#diff-d116ca95f1292dece4d352210d220a13 so would be easier to add many types of entries now...

@bianpengyuan
Copy link
Contributor

bianpengyuan commented Aug 6, 2020

@davidraskin Yeah, but current access log implementation is for telemetry and at best effort. The exporting could have transient failure which would leave some requests not logged. Is this fine for audit log? I thought the assumption of audit log is that every request reaches application must be logged.

@davidraskin
Copy link
Contributor Author

davidraskin commented Aug 6, 2020

@bianpengyuan I see what you're saying. I think it's still important to see the result of the request, and whether it succeeds or fails. But maybe it should fail closed rather than open.


// Name of the HTTP server access log.
constexpr char kServerAccessLogName[] = "server-accesslog-stackdriver";
constexpr char kAuditAccessLogName[] = "server-accesslog-stackdriver-audit";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, have you thought about fill in protoPayload in logEntry, which supports google.cloud.audit.AuditLog: https://cloud.google.com/logging/docs/reference/audit/auditlog/rest/Shared.Types/AuditLog. Sound like that is a more standard way to do audit log at Google?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the AuditLog payload is used primarily for auditting GCP service calls, it doesn't apply to application logging, afaik.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bianpengyuan @davidraskin can you find out if google.cloud.audit.AuditLog cannot be used to audit other services?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mandarjog Maybe it can theoretically be used, because I think the only difference is the proto payload, but the Audit Log docs state that it is for GCP API calls. Additionally, configuring the Data Access Audit Logs is done through the Cloud Console, which could be confusing since that wouldn't be the case here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these logs be actually called audit logs? https://cloud.google.com/logging/docs/audit cloud audit logs looks like are logs for telling who or what accessed a GCP resource... This is more about capturing auth info for application?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're still audit logs I think. Just ones at the application level, separate from GCP audit logs. We can call them IstioAudit logs to reduce confusion.

Signed-off-by: davidraskin <draskin@google.com>
Signed-off-by: davidraskin <draskin@google.com>
Signed-off-by: davidraskin <draskin@google.com>
@istio-testing istio-testing added the needs-rebase Indicates a PR needs to be rebased before being merged label Aug 11, 2020
Signed-off-by: davidraskin <draskin@google.com>
Signed-off-by: davidraskin <draskin@google.com>
Signed-off-by: davidraskin <draskin@google.com>
@davidraskin
Copy link
Contributor Author

/retest

Copy link
Contributor

@bianpengyuan bianpengyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code lgtm, the only comment is about adding more details on audit logging. Also can we make sure that it is ok for audit log happens after request finishes? Sorry for being paranoid here, but I still don't feel this is quite right... @liminw Can you comment on this?

false /* audit */);
}

if (enableAuditLog() && shouldAuditThisRequest()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to call ::Wasm::Common::populateExtendedRequestInfo(&request_info); here too?

auto log_entry_type = GetLogEntryType(outbound);
const ::Wasm::Common::FlatNode& local_node_info, bool outbound,
bool audit) {
const LogEntryType log_entry_type = GetLogEntryType(outbound, audit);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for const with enum

google::logging::v2::LogEntry* log_entry) {
const ::Wasm::Common::FlatNode& peer_node_info,
google::logging::v2::LogEntry* log_entry, bool outbound, bool audit) {
auto& entries_request =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be const auto& i think...

@gargnupur
Copy link
Contributor

code looks ok, with some minor nits.. can you also add an integration test (https://github.com/istio/proxy/blob/master/test/envoye2e/stackdriver_plugin/stackdriver_test.go) ?

@liminw
Copy link

liminw commented Aug 19, 2020

@davidraskin Yeah, but current access log implementation is for telemetry and at best effort. The exporting could have transient failure which would leave some requests not logged. Is this fine for audit log? I thought the assumption of audit log is that every request reaches application must be logged.

Peter, you are totally correct. Audit log has strong reliability requirement. This is the first step of enabling configurable audit feature. Going forward, we plan to improve the reliability of audit stream, which requires a separate design.

@bianpengyuan
Copy link
Contributor

@liminw Ok, thanks for clarification! Can you elaborate a bit on the first step of enabling configurable audit feature? It does not look right to me to check in code that is not going to used for production. It will likely just become tech debt so hope we can avoid that here.

@davidraskin
Copy link
Contributor Author

@bianpengyuan If request completion is the concern, we can just do the logging in the requestHeaders callback, and fail if there is some issue, and/or have an extra verification step to see that the log is received.

@bianpengyuan
Copy link
Contributor

we can just do the logging in the requestHeaders callback

That would be ideal IMO. thanks!

@liminw
Copy link

liminw commented Aug 19, 2020

@liminw Ok, thanks for clarification! Can you elaborate a bit on the first step of enabling configurable audit feature? It does not look right to me to check in code that is not going to used for production. It will likely just become tech debt so hope we can avoid that here.

I don't think we are close to claim audit feature production ready. As far as I know, Cloud audit team has done a lot of work to improve the reliability of audit stream. We still need to learn from them and design for it. But this can be done incrementally.

Signed-off-by: davidraskin <draskin@google.com>
@davidraskin davidraskin requested a review from a team August 19, 2020 19:51
@istio-testing istio-testing added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 19, 2020
Signed-off-by: davidraskin <draskin@google.com>
@davidraskin
Copy link
Contributor Author

/retest

Signed-off-by: davidraskin <draskin@google.com>
respCode string
logEntryCount int
}{
{"StackdriverAudit", "200", 5},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As there is only one testcase, we don't need this?

Copy link
Contributor

@gargnupur gargnupur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just have one minor nit.. adding "do-not-merge" for that and approval from @bianpengyuan

@gargnupur gargnupur added the do-not-merge Block automatic merging of a PR. label Aug 19, 2020
Signed-off-by: davidraskin <draskin@google.com>
@gargnupur gargnupur removed the do-not-merge Block automatic merging of a PR. label Aug 20, 2020
@istio-testing istio-testing merged commit 40c3904 into istio:master Aug 20, 2020
@elfinhe
Copy link
Member

elfinhe commented Sep 14, 2020

/cherry-pick release-1.7

@istio-testing
Copy link
Collaborator

@elfinhe: #2970 failed to apply on top of branch "release-1.7":

Applying: Initial auditEntry implementation
Using index info to reconstruct a base tree...
M	extensions/stackdriver/log/logger.cc
M	extensions/stackdriver/log/logger.h
Falling back to patching base and 3-way merge...
Auto-merging extensions/stackdriver/log/logger.h
Auto-merging extensions/stackdriver/log/logger.cc
Applying: Initial audit log implementation
Using index info to reconstruct a base tree...
M	extensions/common/context.cc
M	extensions/common/context.h
M	extensions/stackdriver/log/logger.cc
M	extensions/stackdriver/log/logger.h
Falling back to patching base and 3-way merge...
Auto-merging extensions/stackdriver/log/logger.h
CONFLICT (content): Merge conflict in extensions/stackdriver/log/logger.h
Auto-merging extensions/stackdriver/log/logger.cc
CONFLICT (content): Merge conflict in extensions/stackdriver/log/logger.cc
Auto-merging extensions/common/context.h
Auto-merging extensions/common/context.cc
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0002 Initial audit log implementation
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

Details

In response to this:

/cherry-pick release-1.7

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@istio-testing
Copy link
Collaborator

@elfinhe: new issue could not be created for failed cherrypick: status code 410 not one of [201], body: {"message":"Issues are disabled for this repo","documentation_url":"https://docs.github.com/v3/issues/"}

Details

In response to this:

/cherry-pick release-1.7

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla: yes Set by the Google CLA bot to indicate the author of a PR has signed the Google CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants