Skip to content

[v18] Add vnet config audit events#62617

Closed
tangyatsu wants to merge 1700 commits intomasterfrom
tangyatsu/add-vnet-config-audit-events-backport-v18
Closed

[v18] Add vnet config audit events#62617
tangyatsu wants to merge 1700 commits intomasterfrom
tangyatsu/add-vnet-config-audit-events-backport-v18

Conversation

@tangyatsu
Copy link
Copy Markdown
Contributor

Backport #62383 to branch/v18

hugoShaka and others added 30 commits November 21, 2025 21:04
…ready (#61620)

* Add a way to announce which sevrices should be expected (#59667)

I was looking into tying the auth readiness with its cache health (so we
don't end up in a state with no ready cache across all auths during a
rollout) and I saw that we are currently reporting ready as soon as one
of the sevrice heartbeats. We don't keep track of which services should
be heartbeating/reporting ready.

This PR introduces a new `process.ExpectService(component)` function to
declare early that we are starting a service and that the process should
not be ready without it.

* disable expect service in backport

* Expect the auth backend/cache to be initialized before turning ready (#59907)
…61623)

* Add nested AL tests + change IsAccessListMember signature

* address marek's feedback

* fixup! address marek's feedback

* Apply suggestions from code review



* address pawel's feedback

---------

Co-authored-by: Pawel Kopiczko <pawel.kopiczko@goteleport.com>
* add missing tctl autoupdates agents docs

* Update docs/pages/reference/cli/tctl.mdx

Co-authored-by: Hugo Shaka <hugo.hervieux@goteleport.com>

* Update tctl.mdx

---------

Co-authored-by: Hugo Shaka <hugo.hervieux@goteleport.com>
* [v18] proto/accessgraph: Add RPC for sending k8s audit logs to Access Graph

Add a `KubeAuditLogsStream()` rpc to the `AccessGraphService` for
streaming Kubernetes apiserver audit logs from the Teleport discovery
service to access graph. This is intended for EKS audit logs which are
made available via CloudWatch, but can accommodate other k8s services.

The audit log messages are represented as a `google.protobuf.Struct` so
as to not depend on the k8s.io .proto files, but also as k8s typically
uses protos internally only - the expectation is that we'll receive the
apiserver audit logs as json-encoded strings. This encode easily as a
`google.protobuf.Struct`.

* [v18] proto: Generate protos for accessgraph

Generate proto and grpc code for changes to accessgraph/v1alpha1:

    make grpc/host

These changes add the `KubeAuditLogsStream()` rpc and associated types.

* [v18] proto/types: Add AccessGraphAWSSyncEKSAuditLogs message

Add the `AccessGraphAWSSyncEKSAuditLogs` message used by new field in
`AccessGraphAWSSync` for specifying which EKS clusters should have
apiserver audit logs fetched and sent to Access Graph.

* [v18] proto: Regenerate protos for types

Generate proto code for `AccessGraphAWSSyncEKSAuditLogs` message:

    make grpc/host
    make derive

* [v18] lib/config: Add static config for AccessGraph EKS audit logs

Extend the static config for Access Graph discovery to be able to
specify the EKS cluster for which apiserver audit logs should be fetched
and sent to Access Graph.

* [v18] discovery: Add AWS EKS audit log fetching for Access Graph

Add a watcher to start fetchers for all access graph EKS clusters that
are configured to have Kubernetes apiserver audit logs fetched and send
them to access graph. It receives the set of clusters to fetch audit
logs for from the AWS resource syncer as it discovers EKS clusters.
Those clusters are reconciled against the current set of log fetchers,
with no-longer-needed fetchers stopped and new fetchers started as
needed.

This commit requires go.mod be updated with:

    go get github.com/aws/aws-sdk-go-v2/service/cloudwatchlogs@latest

It is left out of this commit for now as it makes rebasing/merging
master easier.

* [v18] Add github.com/aws/aws-sdk-go-v2/service/cloudwatchlogs to go.mod

Run:

    go get github.com/aws/aws-sdk-go-v2/service/cloudwatchlogs@latest
    make go-mod-tidy-all
    # Manually move the go.mod line back to the first section!?!?

This commit is kept separate for easier merging/rebasing.

* [v18] discovery: Refactor eks audit log fetching for testing

Refactor the eksAuditLog{Watcher,Fetcher} and the aws_sync.Fetcher
cloudwatchlogs to be more testable:

* factor away eksAuditLogFetcher from eksAuditLogWatcher. The watcher
  just needs a factory function to create a fetcher, and all the watcher
  needs from that fetcher is a `Run()` method. Lift the cancel func out
  of the watcher and store it directly in the watcher, as only the
  watcher uses it.
* factor away aws_sync.Fetcher from eksAuditLogFetcher. All it needs
  from the sync fetcher it calls is one method to fetch cloudwatch logs.
  Make that an interface and use just that. This allows a fake source of
  cloudwatch logs to be provided for testing. While here, use protobuf
  getters rather than accessing fields directly.
* Use protobuf getters in aws_sync.Fetcher cloudwatchlogs instead of
  accessing fields directly. In future, we could pass in an interface
  with those getters to make the code more testable.

* [v18] discovery: Add eks audit log tests

Add tests for `eksAuditLogWatcher` and `eksAuditLogFetcher`. Copy the
grpc stream testing util from the access graph repo into teleport as it
is useful for the bidirectional streaming methods uses by access graph,
and makes it easier to test on the client side.

* [v18] accessgraph sync: Add AWS IAM role for EKS audit logs

Update the `teleport configure integration acces-graph aws-iam` command
to add a permission to access EKS audit logs via CloudWatch Logs if the
`--eks-audit-logs` flag is passed. This is necessary so that an
integration can pull the EKS audit logs if so configured in a discovery
access graph matcher.

* [v18] web: Add web eksAuditLogs to integration configure endpoint

Extend the web endpoint for the webscript for integrations configure
access-graph-cloud-sync-iam.sh to add the `eksAuditLogs` query param to
configure with EKS audit logs enabled. Add tests for this endpoint as
there were none.

* [v18] Regenerate AWS regions.go file

Run `make go-generate` to update the `lib/utils/aws/region/regions.go`
file as something in the backport of EKS audit logs for Identity
Activity Center has changed what would be generated.
The changes here are meant to reduce resources consumed by
integration tests and eliminate the footgun of connecting to
a host before its been populated in the appropriate caches.

`(integrationTestSuite) defaultServiceConfig` now disables more services
by default. Host user creation, the Web UI and Web Service, and the
Database Proxy are all now turned off by default. The tests that require
them can opt into turning them on.

`(TeleInstance) Start` now blocks until the processes SSH server is present
in caches before returning. Most tests were already calling this manually
after Start, however, not all of them were. This is frequently forgotten
in new tests and causes flakes if client connections are attempted as
soon as Start returns. There are still cases where calling WaitForNodeCount
directly applies - tests that spin up additional nodes or tests that create
nodes in a leaf cluster.
The test was reliant on all the output of a session being
replicated to all parties before a 5s timer closed the session.
To make the test more resiliant it was rewritten such that
it validates the expected output from one of the parties output,
then exits the session, then when all parties have been closed,
it examines the entire output of the other parties to validate
their output.

Fixes #55044
Reduce repetition in order to make the sidebar easier to read. This
change edits labels in the following sidebar sections:

- reference/agent-services/database-access-reference
- reference/agent-services/desktop-access-reference
- reference/agent-services
- reference/architecture
- reference/deployment
- reference/machine-workload-identity/workload-identity
- reference
* Add entra ID metrics (#60537)

* Add entra ID metrics

This commit adds metrics for entra ID sync. This is the OSS part, it
contains the msgraph client metrics.

As many different parts of Teleport are using the msgraph client and
might not have access to a metric registerer yet, the client gracefully
handles not being given a metric registry. In this case it won't
register its metrics, we don't want to continue polluting the global
metrics registry.

* lint

* add optional reconciler metrics (#60581)

* expose TeleportProcess metrics registry (#60654)

* test setting a non-nil registry in config

* expose teleport process metric registry

* remove metric config

* fixup! remove metric config

* Add support in process for additional metrics gatherers (#60852)

* Add support in process for additional metrics gatherers

Before this change, we were gathering from 2 metrics gatherers:
- the process registry
- the global registry

There are cases where we must add and remove metrics (e.g. plugins).
We could throw them into the global registry but:
- this would pollute the global registry and cause duplicates/conflicts
  in tests
- this would conflate all metrics from the same plugin kind. We support
  several instances of the same hosted plugin and we might want to
  keep distinct metrics.

This change makes the gatherers a list, and add a function so teleport.e
can add its own gatherer. A teleport.e PR using this mechanism will
follow.

* Protect gatherer slice with a mutex

* Fix the generic reconciler metric API (#60853)

When implementing reconciler metrics in #60581
I did not realize some GenericReconciler usage, including the one I
wanted to observe, were short-lived. The implementation had 2 blatant
issues:
- metrics were lost for each invocations
- creating a new reonciler would attempt to register the metric a second
  time and cause a conflict

This PR changes the reconciler metrics API so the caller is responsible
for creating and registering the metrics beforehand. This allows the
caller to create the metric struct once and pass them to successive
`NewGenericReconciler` calls.

* Introduce metrics.Registry to pass down registries (#61239)

* Introduce metrics.Registry and use it

* Update lib/metrics/registry.go

Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>

* BlackHole -> BlackHoleRegistry

* merge lib/metrics and lib/observability/metrics

* lint

* address noah's feedback

---------

Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>

* metrics.Registry.Wrap() handle empty subsystems properly (#61392)

* handle empty subsystems properly

* appeasing our italian engineering team

* Fix build after rebase

---------

Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>
Backports #61632

* Generate an Access Monitoring event reference

Closes #60074

Use the getters we declared to access protobuf-generated event structs
in order to create Athena views for Access Monitoring events, populating
a template with the event structs and including examples of `tctl audit
query exec` queries.

Add Make targets to generate the page and check that it was generated.
Add a step to the `Lint (Proto)` job to check that this was generated.

* Clean up the event schema reference generator

- Remove unnecessary line breaks.
- Move `teleport` requirement into the main `require` block.
- Embed the template.
- Accommodate characters that exceed one byte in `prepareDescription`.
- Use a no-allocation alternative to strings.Join in `colNameList`.
…61731)

The tests establish trust between two clusters, connect via SSH
to a host in the leaf, tear down trust, reestablish trust, and
then attempt another SSH connection. The flakiness stemmed from
tearing down and and reestablishing trust too quickly without
first properly verifying that the leaf cluster disappeared from
the root before reestablishing trust. While the test did make
use of `helpers.WaitForClusters`, it is satisfied if tunnels
for clusters were healthy within the last 10s. But this could
be true of a stale cluster that was not yet cleaned up. To
remedy this the test was updated to first wait for the reverse
tunnel for the leaf cluster to be completely gone before
reestablishing trust and proceeding.

Closes #60836.
Closes #48823.
As previously written, the error suggests that RemoteFX is not enabled.
This has confused users in cases where RemoteFX looks to be correctly
enabled.

Change the wording to both:
- suggest that a bitmap (ie non-RemoteFX) frame was received
- add a little more info about the data that was received

Updates #61061
* Port `gitlab` join method to new join service

This ports the `gitlab` join method to the new join service.

The `gitlab` package was moved to `lib/join/` with minimal changes,
token verification was moved into `lib/join/gitlab` so it could be
reused between both legacy and new endpoints, and a small adapter was
added to provide backwards compatibility.

See also: [RFD 27e](https://github.com/gravitational/teleport.e/blob/master/rfd/0027e-auth-assigned-uuids.md)

* Fix imports

* Add gitlab to whitelist, reorder entries alphabetically

* Rename checkAndSetDefaults() to validate()
* feat: add User Details view

* cleanup
…61656)

* [v18] Fix `TestRevocationService_CRL` flakes with `testing/synctest`

Backport #61277 to branch/v18.

* Fix test hanging on context cancellation
* Add cross-version synctest utility package

* Use Go 1.25 synctest api in tests

* Add depguard rule against synctest

* Appease linter, now that it actually runs
…1643)

The domain field doesn't cause any problems, since we ignore it
when `ad: false`, but it's technically incorrect so we shouldn't
be setting the field at all.
Backports #61682

Fix duplicate meta descriptions and titles in the following guides:
- Encrypted session recording
- AWS IAM Identity Center
- MWI introduction and index page
- Terraform provider reference and how-to guide index pages
- Device Trust guides
… ports (#61812)

* refactor: make get database func required for dbcmd

* refactor: code review suggestions

* refactor: make get database func an required arg
…g semaphore, and preventing early termination of the uploader on failed uploads (#61774)
TestNotifications has been flaky, and I suspect part of it is due
to the way we handle time. Notifications use UUIDv7, which include
a timestamp, so our tests were mixing a fake clock and a real clock.

This commit tests notifications using synctest without a fake clock
in sight.

Additionally, TestNotifications was testing too many things (pagination,
matching, RBAC, state mutation, etc). This commit breaks the test up
into several more-focused tests.

Updates #58392
doggydogworld and others added 26 commits December 29, 2025 20:02
* Adding changelog entry for private release

* Update CHANGELOG.md

Co-authored-by: Zac Bergquist <zac.bergquist@goteleport.com>

---------

Co-authored-by: Zac Bergquist <zac.bergquist@goteleport.com>
This consolidates all references to BoringCrypto and FIPS-140-2 to
a single section of the docs, using more generic FIPS terminology
elsewhere. As a result, we only need to change one place when we
update our FIPS module in the future.

In addition, mention the specific versions of BoringCrypto and
associated CMVP certificate numbers.
* types: Add `ComponentFeatures` proto types

* feat: Add handling for `ComponentFeatures` to Auth, Proxy, AppServer

- Impl `ComponentFeatures` for Auth, Proxy, AppServer
- Add helper funcs
- Add/update tests
- Update WebUI types

* docs: Update operator crd manifests, tf docs
* Clean up tctl comands for joining agents

- Rename template var from auth_server to proxy_server in cases
  where we know we're always passing a proxy address.
- Include CA pins when --format flag specifies json or yaml format
- Prefer the proxy address in the joining instructions we print.
  Previously, we would prefer the auth server but would suggest the
  proxy server when we knew we were targeting a cloud cluster.

* Update tool/tctl/common/app_command.go



---------

Co-authored-by: Gus Luxton <gus@goteleport.com>
This change fixes handling non-absolute paths in
moderated sftp.
Add a runtime architecture suffix to the node buildbox OCI image name so
that we do not keep rebuilding it when doing an ARM64 release. ARM64
releases are built on ARM64 hosts, but the node buildbox is for AMD64 so
the ARM64 build always has to rebuild it. Adding a runtime architecture
suffix should prevent this rebuild and make the release faster.
If Identity Security (AKA "Policy" in the entitlement) is not enabled
on the cluster, then the summarizer gRPC service is never instantiated.

This is expected behavior, but we log a warning that has caused
confusion. Remove the warning since there's nothing exceptional about
this case.

Closes #62310
…62579)

* fix: add availability checks for instance metadata methods

* clarify comment

* ensure no race conditions occur
…62544)

* ui: indicate which join tokens are managed by Teleport Cloud

Join tokens with the "teleport.internal/cloud/token" label will be
marked as "managed by cloud" and shown as a different color in the UI.
In addition, the edit button is disabled with a tooltip that explains
the token is managed by us.

Updates gravitational/cloud#15362

* Update the join token table style

Mute the text in the disabled row.

This requires an override because the table itself hard codes the
color in the <td> styles.
#62451)

* Add support for standard TLS secret key names for Event Handler helm chart

* Add support for Teleport Cluster helm chart

* Add support for Teleport Operator and Kube Agent; update comments for Teleport Cluster

* Re-render/update docs

* Minor fix for Teleport Cluster chart docs

* Remove redundant default in templates

* Add newline

* Add required and error msgs; standardize teleport-relay chart

* Re-render docs
* docs: Add docs for EKS audit log discovery

Add docs for configuring discovery to discovery EKS clusters for
importing audit logs into Access Graph Identity Activity Center.

As more classes of Kubernetes deployment types (i.e. more than just
EKS), it is expected this will move to it's own page, for now, it can be
placed with the other Access Graph discovery config documentation.

* fixup! docs: Add docs for EKS audit log discovery

* fixup! docs: Add docs for EKS audit log discovery

* fixup! docs: Add docs for EKS audit log discovery

* fixup! docs: Add docs for EKS audit log discovery

Add EKS to title and to introductory paragraphs.

* fixup! docs: Add docs for EKS audit log discovery

* fixup! docs: Add docs for EKS audit log discovery
* [terraform] add v8 support to role resource

* Remove V8_SUPPORT_CHANGES.md & update docs

* Update kube_resources tests

* Update tests to use v8 roles

* Implement plan_modifier for kubernetes_resources

* make gen-tfschema

* Re-enable TestRoleVersionUpgrade test

* Revert computed_fields

---------

Co-authored-by: James Goodhouse <4684194+jamesgoodhouse@users.noreply.github.com>
Co-authored-by: teleport-post-release-automation[bot] <128860004+teleport-post-release-automation[bot]@users.noreply.github.com>
Backports #61492

The SSO documentation currently consists of:
- One long conceptual guide
- IdP-specific how-to guides

The conceptual guide holds a collection of topics with no unifying
theme besides Teleport's SSO support, making it difficult to discover
topics for readers who have not made their way through the entire guide.
This change aims to improve the discoverability of SSO topics by:
- Splitting the conceptual guide into multiple pages
- Moving IdP-specific guides into their own subsection
- Moving the Login Rules docs, which only make sense if a user is
  setting up SSO, into the SSO section of the docs.

To split the main conceptual guide, this change:
- Turns the SSO for MFA section into a how-to guide. This mostly the
  structure of a how-to guide, but we couldn't use this structure
  because this text was a section of the conceptual SSO guide. By
  splitting the SSO guide, we can make this a how-to guide.
- Move the "Changing callback address" section into a partial that we
  can include in the IdP-specific how-to guides, since the level of detail within this
  section is consistent with those guides.
- Move "Working with an external identity". This belongs in Role
  Templates, and has the wrong level of detail for an overview of SSO.
  To get this to work, add H3s to the interpolation section of the Role
  Templates guide.
- Move all content re: configuring specific authentication connector
  fields into the section index page for the IdP-specific guides.
- Remove the Troubleshooting section. This text is repeated using a
  partial in every IdP-specific guide, and is the wrong level of detail
  for a general overview.
- Move the Login Rules example to where we include other Login Rule
  examples, in the Login Rules guide.
- Remove the tabbed examples of authentication connectors from the SSO
  index page. There is not nearly enough context on this page to help a
  reader make use of these examples. Instead, refer the reader to the
  IdP-specific how-to guides.
- Move "multiple SSO provider" discussion to its own page.
* Cloud Client IP Restrictions Docs

* Update docs/pages/cloud-client-ip-restrictions.mdx

Co-authored-by: marie <marie.mcallister@goteleport.com>

* Update docs/pages/cloud-client-ip-restrictions.mdx

Co-authored-by: marie <marie.mcallister@goteleport.com>

* Update docs/pages/cloud-client-ip-restrictions.mdx

Co-authored-by: marie <marie.mcallister@goteleport.com>

* Update docs/pages/cloud-client-ip-restrictions.mdx

Co-authored-by: marie <marie.mcallister@goteleport.com>

* Update docs/pages/cloud-client-ip-restrictions.mdx

Co-authored-by: marie <marie.mcallister@goteleport.com>

* Update docs/pages/cloud-client-ip-restrictions.mdx

Co-authored-by: marie <marie.mcallister@goteleport.com>

* Update docs/pages/cloud-client-ip-restrictions.mdx

Co-authored-by: marie <marie.mcallister@goteleport.com>

* Update docs/pages/cloud-client-ip-restrictions.mdx

Co-authored-by: marie <marie.mcallister@goteleport.com>

* Update docs/pages/cloud-client-ip-restrictions.mdx

Co-authored-by: marie <marie.mcallister@goteleport.com>

* Update docs/pages/cloud-client-ip-restrictions.mdx

Co-authored-by: marie <marie.mcallister@goteleport.com>

* Update docs/pages/cloud-client-ip-restrictions.mdx

Co-authored-by: marie <marie.mcallister@goteleport.com>

* client ip restrictions

* Update docs/pages/cloud-client-ip-restrictions.mdx

Co-authored-by: Zac Bergquist <zac.bergquist@goteleport.com>

* Update docs/pages/cloud-client-ip-restrictions.mdx

Co-authored-by: Zac Bergquist <zac.bergquist@goteleport.com>

* Update docs/pages/cloud-client-ip-restrictions.mdx

Co-authored-by: Zac Bergquist <zac.bergquist@goteleport.com>

* Address feedback

* remove image

* remove link

* Update docs/pages/cloud-client-ip-restrictions.mdx

Co-authored-by: Zac Bergquist <zac.bergquist@goteleport.com>

* update location

* Updates

* add extra explicitly cloud only callout

---------

Co-authored-by: Logan Davis <logan.davis@goteleport.com>
Co-authored-by: Logan Davis <38335829+logand22@users.noreply.github.com>
Co-authored-by: marie <marie.mcallister@goteleport.com>
Co-authored-by: Zac Bergquist <zac.bergquist@goteleport.com>
@tangyatsu tangyatsu closed this Jan 5, 2026
@tangyatsu tangyatsu deleted the tangyatsu/add-vnet-config-audit-events-backport-v18 branch January 5, 2026 18:21
@tangyatsu tangyatsu restored the tangyatsu/add-vnet-config-audit-events-backport-v18 branch January 5, 2026 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.