Teleport 10 Test Plan #13340

r0mant · 2022-06-09T16:08:09Z

Manual Testing Plan

Below are the items that should be manually tested with each release of Teleport.
These tests should be run on both a fresh install of the version to be released
as well as an upgrade of the previous version of Teleport.

User accounting @xacrimon

Verify that active interactive sessions are tracked in /var/run/utmp on Linux.
Verify that interactive sessions are logged in /var/log/wtmp on Linux.

Combinations @capnspacehook

For some manual testing, many combinations need to be tested. For example, for
interactive sessions the 12 combinations are below.

Teleport with EKS/GKE @tigrato

Deploy Teleport on a single EKS cluster
Deploy Teleport on two EKS clusters and connect them via trusted cluster feature
Deploy Teleport Proxy outside of GKE cluster fronting connections to it (use this script to generate a kubeconfig)
Deploy Teleport Proxy outside of EKS cluster fronting connections to it (use this script to generate a kubeconfig)

Teleport with multiple Kubernetes clusters @tigrato

Note: you can use GKE or EKS or minikube to run Kubernetes clusters.
Minikube is the only caveat - it's not reachable publicly so don't run a proxy there.

Teleport with FIPS mode @alistanis @r0mant

Perform trusted clusters, Web and SSH sanity check with all teleport components deployed in FIPS mode.

ACME @rudream

Teleport can fetch TLS certificate automatically using ACME protocol.

Migrations @hugoShaka

Migrate trusted clusters from 9.3 to 10.0
- Migrate auth server on main cluster, then rest of the servers on main cluster
  SSH should work for both main and old clusters
- Migrate auth server on remote cluster, then rest of the remote cluster
  SSH should work

Command Templates

When interacting with a cluster, the following command templates are useful:

OpenSSH

# when connecting to the recording proxy, `-o 'ForwardAgent yes'` is required.
ssh -o "ProxyCommand ssh -o 'ForwardAgent yes' -p 3023 %[email protected] -s proxy:%h:%p" \
  node.example.com

# the above command only forwards the agent to the proxy, to forward the agent
# to the target node, `-o 'ForwardAgent yes'` needs to be passed twice.
ssh -o "ForwardAgent yes" \
  -o "ProxyCommand ssh -o 'ForwardAgent yes' -p 3023 %[email protected] -s proxy:%h:%p" \
  node.example.com

# when connecting to a remote cluster using OpenSSH, the subsystem request is
# updated with the name of the remote cluster.
ssh -o "ProxyCommand ssh -o 'ForwardAgent yes' -p 3023 %[email protected] -s proxy:%h:%[email protected]" \
  node.foo.com

Teleport

# when connecting to a OpenSSH node, remember `-p 22` needs to be passed.
tsh --proxy=proxy.example.com --user=<username> --insecure ssh -p 22 node.example.com

# an agent can be forwarded to the target node with `-A`
tsh --proxy=proxy.example.com --user=<username> --insecure ssh -A -p 22 node.example.com

# the --cluster flag is used to connect to a node in a remote cluster.
tsh --proxy=proxy.example.com --user=<username> --insecure ssh --cluster=foo.com -p 22 node.foo.com

Teleport with SSO Providers @ptgott @Tener

`tctl sso` family of commands @Tener

tctl sso configure helps to construct a valid connector definition:

tctl sso configure github ... creates valid connector definitions
tctl sso configure oidc ... creates valid connector definitions
tctl sso configure saml ... creates valid connector definitions

tctl sso test test a provided connector definition, which can be loaded from
file or piped in with tctl sso configure or tctl get --with-secrets. Valid
connectors are accepted, invalid are rejected with sensible error messages.

Teleport Plugins @marcoandredinis

Test receiving a message via Teleport Slackbot
Test receiving a new Jira Ticket via Teleport Jira

AWS Node Joining @nklaassen

Docs

On EC2 instance with ec2:DescribeInstances permissions for local account:
TELEPORT_TEST_EC2=1 go test ./integration -run TestEC2NodeJoin
On EC2 instance with any attached role:
TELEPORT_TEST_EC2=1 go test ./integration -run TestIAMNodeJoin
EC2 Join method in IoT mode with node and auth in different AWS accounts
IAM Join method in IoT mode with node and auth in different AWS accounts

Passwordless @r0mant @espadolini

Passwordless requires tsh compiled with libfido2 for most operations (apart
from Touch ID). Ask for a statically-built tsh binary for realistic tests.

Touch ID requires a properly built and signed tsh binary. Ask for a
pre-release binary so you may run the tests.

This sections complements "Users -> Managing MFA devices". Ideally both macOS
and Linux tsh binaries are tested for FIDO2 items.

WEB UI @kimlisa @rudream @hatched

Main

For main, test with a role that has access to all resources.

Top Nav

Verify that cluster selector displays all (root + leaf) clusters
Verify that user name is displayed
Verify that user menu shows logout, help&support, and account settings (for local users)

Side Nav

Verify that each item has an icon
Verify that Collapse/Expand works and collapsed has icon >, and expand has icon v
Verify that it automatically expands and highlights the item on page refresh

Servers aka Nodes

Applications

Verify that clicking on Add Application button renders dialogue
- Verify input validation (prevent empty value and invalid url)
- Verify after input and clicking on Generate Script, bash command is rendered
- Verify clicking on Regenerate button regenerates token value in bash command

Databases

Verify that clicking on Add Database button renders dialogue for manual instructions:
- Verify selecting different options on Step 4 changes Step 5 commands

Active Sessions

Verify that "empty" state is handled
Verify that it displays the session when session is active
Verify that "Description", "Session ID", "Users", "Nodes" and "Duration" columns show correct values
Verify that "OPTIONS" button allows to join a session

Audit log

Verify that time range button is shown and works
Verify that clicking on Session Ended event icon, takes user to session player
Verify event detail dialogue renders when clicking on events details button
Verify searching by type, description, created works

Users

Verify that users are shown
Verify that creating a new user works
Verify that editing user roles works
Verify that removing a user works
Verify resetting a user's password works
Verify search by username, roles, and type works

Auth Connectors

Verify when there are no connectors, empty state renders
Verify that creating OIDC/SAML/GITHUB connectors works
Verify that editing OIDC/SAML/GITHUB connectors works
Verify that error is shown when saving an invalid YAML
Verify that correct hint text is shown on the right side
Verify that encrypted SAML assertions work with an identity provider that supports it (Azure).
Verify that created github, saml, oidc card has their icons

Roles

Verify that roles are shown
Verify that "Create New Role" dialog works
Verify that deleting and editing works
Verify that error is shown when saving an invalid YAML
Verify that correct hint text is shown on the right side

Managed Clusters

Verify that it displays a list of clusters (root + leaf)
Verify that every menu item works: nodes, apps, audit events, session recordings, etc.

Help & Support

Verify that all URLs work and correct (no 404)

Access Requests

Access Request is a Enterprise feature and is not available for OSS.

Creating Access Requests (Role Based)

Create a role with limited permissions allow-roles-and-nodes. This role allows you to see the Role screen and ssh into all nodes.

kind: role
metadata:
  name: allow-roles-and-nodes
spec:
  allow:
    logins:
    - root
    node_labels:
      '*': '*'
    rules:
    - resources:
      - role
      verbs:
      - list
      - read
  options:
    max_session_ttl: 8h0m0s
version: v5

Create another role with limited permissions allow-users-with-short-ttl. This role session expires in 4 minutes, allows you to see Users screen, and denies access to all nodes.

kind: role
metadata:
  name: allow-users-with-short-ttl
spec:
  allow:
    rules:
    - resources:
      - user
      verbs:
      - list
      - read
  deny:
    node_labels:
      '*': '*'
  options:
    max_session_ttl: 4m0s
version: v5

Create a user that has no access to anything but allows you to request roles:

kind: role
metadata:
  name: test-role-based-requests
spec:
  allow:
    request:
      roles:
      - allow-roles-and-nodes
      - allow-users-with-short-ttl
      suggested_reviewers:
      - random-user-1
      - random-user-2
version: v5

Verify that under requestable roles, only allow-roles-and-nodes and allow-users-with-short-ttl are listed
Verify you can select/input/modify reviewers
Verify you can view the request you created from request list (should be in pending states)
Verify there is list of reviewers you selected (empty list if none selected AND suggested_reviewers wasn't defined)
Verify you can't review own requests

Creating Access Requests (Search Based)

Create a role with access to searcheable resources (apps, db, kubes, nodes, desktops). The template searcheable-resources is below.

kind: role
metadata:
  name: searcheable-resources
spec:
  allow:
    app_labels:  # just example labels
      label1-key: label1-value
      env: [dev, staging] 
    db_labels:
      '*': '*'   # asteriks gives user access to everything
    kubernetes_labels:
      '*': '*' 
    node_labels:
      '*': '*'
    windows_desktop_labels:
      '*': '*'
version: v5

Create a user that has no access to resources, but allows you to search them:

kind: role
metadata:
  name: test-search-based-requests
spec:
  allow:
    request:
      search_as_roles:
      - searcheable resources
      suggested_reviewers:
      - random-user-1
      - random-user-2
version: v5

Verify that a user can see resources based on the searcheable-resources rules
Verify you can select/input/modify reviewers
Verify you can view the request you created from request list (should be in pending states)
Verify there is list of reviewers you selected (empty list if none selected AND suggested_reviewers wasn't defined)
Verify you can't review own requests
Verify that you can't mix adding resources from different clusters (there should be a warning dialogue that clears the selected list)

Viewing & Approving/Denying Requests

Create a user with the role reviewer that allows you to review all requests, and delete them.

kind: role
version: v3
metadata:
  name: reviewer
spec:
  allow:
    review_requests:
      roles: ['*']

Verify you can view access request from request list
Verify you can approve a request with message, and immediately see updated state with your review stamp (green checkmark) and message box
Verify you can deny a request, and immediately see updated state with your review stamp (red cross)
Verify deleting the denied request is removed from list

Assuming Approved Requests (Role Based)

Verify that assuming allow-roles-and-nodes allows you to see roles screen and ssh into nodes
After assuming allow-roles-and-nodes, verify that assuming allow-users-short-ttl allows you to see users screen, and denies access to nodes
- Verify a switchback banner is rendered with roles assumed, and count down of when it expires
- Verify switching back goes back to your default static role
- Verify after re-assuming allow-users-short-ttl role, the user is automatically logged out after the expiry is met (4 minutes)

Assuming Approved Requests (Search Based)

Verify that assuming approved request, allows you to see the resources you've requested.

Assuming Approved Requests (Both)

Verify assume buttons are only present for approved request and for logged in user
Verify that after clicking on the assume button, it is disabled in both the list and in viewing
Verify that after re-login, requests that are not expired and are approved are assumable again

Access Request Waiting Room

Strategy Reason

Create the following role:

kind: role
metadata:
  name: waiting-room
spec:
  allow:
    request:
      roles:
      - <some other role to assign user after approval>
  options:
    max_session_ttl: 8h0m0s
    request_access: reason
    request_prompt: <some custom prompt to show in reason dialogue>
version: v3

Verify after login, reason dialogue is rendered with prompt set to request_prompt setting
Verify after clicking send request, pending dialogue renders
Verify after approving a request, dashboard is rendered
Verify the correct role was assigned

Strategy Always

With the previous role you created from Strategy Reason, change request_access to always:

Verify after login, pending dialogue is auto rendered
Verify after approving a request, dashboard is rendered
Verify after denying a request, access denied dialogue is rendered
Verify a switchback banner is rendered with roles assumed, and count down of when it expires
Verify switchback button says Logout and clicking goes back to the login screen

Strategy Optional

With the previous role you created from Strategy Reason, change request_access to optional:

Verify after login, dashboard is rendered as normal

Terminal

Verify that top nav has a user menu (Main and Logout)
Verify that switching between tabs works on alt+[1...9]

Node List Tab

Verify that Cluster selector works (URL should change too)
Verify that Quick launcher input works
Verify that Quick launcher input handles input errors
Verify that "Connect" button shows a list of available logins
Verify that "Hostname", "Address" and "Labels" columns show the current values
Verify that "Search" by hostname, address, labels work
Verify that new tab is created when starting a session

Session Tab

Verify that session and browser tabs both show the title with login and node name
Verify that terminal resize works
- Install midnight commander on the node you ssh into: $ sudo apt-get install mc
- Run the program: $ mc
- Resize the terminal to see if panels resize with it
Verify that session tab shows/updates number of participants when a new user joins the session
Verify that tab automatically closes on "$ exit" command
Verify that SCP Upload works
Verify that SCP Upload handles invalid paths and network errors
Verify that SCP Download works
Verify that SCP Download handles invalid paths and network errors

Session Player

Verify that it can replay a session
Verify that when playing, scroller auto scrolls to bottom most content
Verify when resizing player to a small screen, scroller appears and is working
Verify that error message is displayed (enter an invalid SID in the URL)

Invite and Reset Form

Verify that input validates
Verify that invite works with 2FA disabled
Verify that invite works with OTP enabled
Verify that invite works with U2F enabled
Verify that invite works with WebAuthn enabled
Verify that error message is shown if an invite is expired/invalid

Login Form and Change Password

Multi-factor Authentication (mfa)

Create/modify teleport.yaml and set the following authentication settings under auth_service

authentication:
  type: local
  second_factor: optional
  require_session_mfa: yes
  webauthn:
    rp_id: example.com

MFA invite, login, password reset, change password

Verify during invite/reset, second factor list all auth types: none, hardware key, and authenticator app
Verify registration works with all option types
Verify login with all option types
Verify changing password with all option types
Change second_factor type to on and verify that mfa is required (no option none in dropdown)

MFA require auth

Go to Account Settings > Two-Factor Devices and register a new device

Using the same user as above:

Verify logging in with registered WebAuthn key works
Verify connecting to a ssh node prompts you to tap your registered WebAuthn key
Verify in the web terminal, you can scp upload/download files

MFA Management

Verify adding first device works without requiring re-authentication
Verify re-authenticating with a WebAuthn device works
Verify re-authenticating with a U2F device works
Verify re-authenticating with a OTP device works
Verify adding a WebAuthn device works
Verify adding a U2F device works
Verify adding an OTP device works
Verify removing a device works
Verify second_factor set to off disables adding devices

Passwordless

Pure passwordless registrations and resets are possible
Verify adding a passwordless device (WebAuthn)
Verify passwordless logins

Cloud

From your cloud staging account, change the field teleportVersion to the test version.

$ kubectl -n <namespace> edit tenant

Recovery Code Management

Verify generating recovery codes for local accounts with email usernames works
Verify local accounts with non-email usernames are not able to generate recovery codes
Verify SSO accounts are not able to generate recovery codes

Invite/Reset

Verify email as usernames, renders recovery codes dialog
Verify non email usernames, does not render recovery codes dialog

Recovery Flow: Add new mfa device

Verify recovering (adding) a new hardware key device with password
Verify recovering (adding) a new otp device with password
Verify viewing and deleting any old device (but not the one just added)
Verify new recovery codes are rendered at the end of flow

Recovery Flow: Change password

Verify recovering password with any mfa device
Verify new recovery codes are rendered at the end of flow

Recovery Email

Verify receiving email for link to start recovery
Verify receiving email for successfully recovering
Verify email link is invalid after successful recovery
Verify receiving email for locked account when max attempts reached

RBAC

Create a role, with no allow.rules defined:

kind: role
metadata:
  name: rbac
spec:
  allow:
    app_labels:
      '*': '*'
    logins:
    - root
    node_labels:
      '*': '*'
  options:
    max_session_ttl: 8h0m0s
version: v3

Verify that a user has access only to: "Servers", "Applications", "Databases", "Kubernetes", "Active Sessions", "Access Requests" and "Manage Clusters"
Verify there is no Add Server, Application, Databases, Kubernetes button in each respective view
Verify only Servers, Apps, Databases, and Kubernetes are listed under options button in Manage Clusters

Note: User has read/create access_request access to their own requests, despite resource settings

Add the following under spec.allow.rules to enable read access to the audit log:

  - resources:
      - event
      verbs:
      - list

Verify that the Audit Log and Session Recordings is accessible
Verify that playing a recorded session is denied

Add the following to enable read access to recorded sessions

  - resources:
      - session
      verbs:
      - read

Verify that a user can re-play a session (session.end)

Add the following to enable read access to the roles

- resources:
      - role
      verbs:
      - list
      - read

Verify that a user can see the roles
Verify that a user cannot create/delete/update a role

Add the following to enable read access to the auth connectors

- resources:
      - auth_connector
      verbs:
      - list
      - read

Verify that a user can see the list of auth connectors.
Verify that a user cannot create/delete/update the connectors

Add the following to enable read access to users

  - resources:
      - user
      verbs:
      - list
      - read

Verify that a user can access the "Users" screen
Verify that a user cannot reset password and create/delete/update a user

Add the following to enable read access to trusted clusters

  - resources:
      - trusted_cluster
      verbs:
      - list
      - read

Verify that a user can access the "Trust" screen
Verify that a user cannot create/delete/update a trusted cluster.

Performance/Soak Test @rosstimothy @espadolini

Using tsh bench tool, perform the soak tests and benchmark tests on the following configurations:

Cluster with 10K nodes in normal (non-IOT) node mode with ETCD
Cluster with 10K nodes in normal (non-IOT) mode with DynamoDB
Cluster with 1K IOT nodes with ETCD
Cluster with 1K IOT nodes with DynamoDB
Cluster with 500 trusted clusters with ETCD
Cluster with 500 trusted clusters with DynamoDB

Soak Tests

Run 4hour soak test with a mix of interactive/non-interactive sessions:

tsh bench --duration=4h user@teleport-monster-6757d7b487-x226b ls
tsh bench -i --duration=4h user@teleport-monster-6757d7b487-x226b ps uax

Observe prometheus metrics for goroutines, open files, RAM, CPU, Timers and make sure there are no leaks

Verify that prometheus metrics are accurate.

Breaking load tests

Load system with tsh bench to the capacity and publish maximum numbers of concurrent sessions with interactive
and non interactive tsh bench loads.

Teleport with Cloud Providers

AWS @lxea

Deploy Teleport to AWS. Using DynamoDB & S3
Deploy Teleport Enterprise to AWS. Using HA Setup https://gravitational.com/teleport/docs/aws-terraform-guide/

GCP @EdwardDowling

Deploy Teleport to GCP. Using Cloud Firestore & Cloud Storage
Deploy Teleport to GKE. Google Kubernetes engine.
Deploy Teleport Enterprise to GCP.

IBM @r0mant

Deploy Teleport to IBM Cloud. Using IBM Database for etcd & IBM Object Store
Deploy Teleport to IBM Cloud Kubernetes.
Deploy Teleport Enterprise to IBM Cloud.

Application Access @strideynet

Database Access @smallinsky

TLS Routing @smallinsky

Verify that teleport proxy v2 configuration starts only a single listener.

version: v2
teleport:
  proxy_service:
    enabled: "yes"
    public_addr: ['root.example.com']
    web_listen_addr: 0.0.0.0:3080

Run Teleport Proxy in multiplex mode auth_service.proxy_listener_mode: "multiplex"
- Trusted cluster
  - Setup trusted clusters using single port setup web_proxy_addr == tunnel_addr
```
kind: trusted_cluster
spec:
  ...
  web_proxy_addr: root.example.com:443
  tunnel_addr: root.example.com:443
  ...
```
Database Access
- Verify that tsh db connect works through proxy running in multiplex mode
  - Postgres
  - MySQL
  - MariaDB
  - MongoDB
  - CockroachDB
- Verify connecting to a database through TLS ALPN SNI local proxy tsh db proxy with a GUI client.
Application Access
- Verify app access through proxy running in multiplex mode
SSH Access
- Connect to a OpenSSH server through a local ssh proxy ssh -o "ForwardAgent yes" -o "ProxyCommand tsh proxy ssh" [email protected]
- Connect to a OpenSSH server on leaf-cluster through a local ssh proxyssh -o "ForwardAgent yes" -o "ProxyCommand tsh proxy ssh --user=%r --cluster=leaf-cluster %h:%p" [email protected]
- Verify tsh ssh access through proxy running in multiplex mode
Kubernetes access: @GavinFrazar
- Verify kubernetes access through proxy running in multiplex mode

Desktop Access

Basic Sessions (@LKozlowski)

Direct mode (set listen_addr):
- Can connect to desktop defined in static hosts section.
- Can connect to desktop discovered via LDAP
IoT mode (reverse tunnel through proxy):
- Can connect to desktop defined in static hosts section.
- Can connect to desktop discovered via LDAP
Connect multiple windows_desktop_services to the same Teleport cluster,
verify that connections to desktops on different AD domains works. (Attempt to
connect several times to verify that you are routed to the correct
windows_desktop_service)

User Input (@ibeckermayer)

Verify user input
- Download Keyboard Key Info and
  verify all keys are processed correctly in each supported browser. Known
  issues: F11 cannot be captured by the browser without
  special configuration
  on MacOS.
- Left click and right click register as Windows clicks. (Right click on
  the desktop should show a Windows menu, not a browser context menu)
- Vertical and horizontal scroll work.
  Horizontal Scroll Test
Locking and access (@ibeckermayer)
- Verify that placing a user lock terminates an active desktop session.
- Verify that placing a desktop lock terminates an active desktop session.
- Verify that placing a role lock terminates an active desktop session.
- Verify that connecting to a locked desktop fails.
- Set client_idle_timeout to a small value and verify that idle sessions
  are terminated (the session should end and an audit event will confirm it
  was due to idle connection)
Labeling (@LKozlowski)
- All desktops have teleport.dev/origin label.
- Dynamic desktops have additional teleport.dev labels for OS, OS
  Version, DNS hostname, and OU.
- Regexp-based host labeling applies across all desktops, regardless of
  origin.
- LDAP attribute labeling functions correctly
RBAC (@zmb3)
- RBAC denies access to a Windows desktop due to labels
- RBAC denies access to a Windows desktop with the wrong OS-login.
Clipboard Support (@zmb3)
- When a user has a role with clipboard sharing enabled and is using a chromium based browser
  - Going to a desktop when clipboard permissions are in "Ask" mode (aka "prompt") causes the browser to show a prompt while the UI shows a spinner
  - X-ing out of the prompt (causing the clipboard permission to remain in "Ask" mode) causes the prompt to show up again
  - Denying clibpoard permissions brings up a relevant error alert (with "Clipboard Sharing Disabled" in the top bar)
  - Allowing clipboard permissions allows you to see the desktop session, with "Clipboard Sharing Enabled" highlighted in the top bar
  - Copy text from local workstation, paste into remote desktop
  - Copy text from remote desktop, paste into local workstation
  - Copying unicode text also works in both directions
- When a user has a role with clipboard sharing enabled and is not using a chromium based browser
  - The UI shows a relevant alert and "Clipboard Sharing Disabled" is highlighted in the top bar
- When a user has a role with clipboard sharing disabled and is using a chromium and non-chromium based browser (confirm both)
  - The live session should show disabled in the top bar and copy/paste should not work between your workstation and the remote desktop.
  - [ ]
Per-Session MFA (try webauthn on each of Chrome, Safari, and Firefox) @zmb3
- Attempting to start a session no keys registered shows an error message
- Attempting to start a session with a webauthn registered pops up the "Verify Your Identity" dialog
  - Hitting "Cancel" shows an error message
  - Hitting "Verify" causes your browser to prompt you for MFA
  - Cancelling that browser MFA prompt shows an error
  - Successful MFA verification allows you to connect
Session Recording (@LKozlowski)
- Verify sessions are not recorded if all of a user's roles disable recording
- Verify sync recording (mode: node-sync or mode: proy-sync)
- Verify async recording (mode: node or mode: proxy)
- Sessions show up in session recordings UI with desktop icon
- Sessions can be played back, including play/pause functionality
- A session that ends with a TDP error message can be played back, ends by displaying the error message,
  and the progress bar progresses to the end.
- Attempting to play back a session that doesn't exist (i.e. by entering a non-existing session id in the url) shows
  a relevant error message.
- RBAC for sessions: ensure users can only see their own recordings when
  using the RBAC rule from our
  docs
Audit Events (check these after performing the above tests) (@ibeckermayer)
- windows.desktop.session.start (TDP00I) emitted on start
- windows.desktop.session.start (TDP00W) emitted when session fails to
  start (due to RBAC, for example)
- windows.desktop.session.end (TDP01I) emitted on end
- desktop.clipboard.send (TDP02I) emitted for local copy -> remote
  paste
- desktop.clipboard.receive (TDP03I) emitted for remote copy -> local
  paste

Binaries compatibility @fheinecke

Verify that teleport/tsh/tctl/tbot run on:
- CentOS 7
- CentOS 8
- Ubuntu 18.04
- Ubuntu 20.04
- Debian 9
Verify tsh runs on:
- Windows 10
- MacOS

Machine ID @timothyb89

SSH

With a default Teleport instance configured with a SSH node:

Verify you are able to create a new bot user with tctl bots add robot --roles=access. Follow the instructions provided in the output to start tbot
Verify you are able to connect to the SSH node using openssh with the generated ssh_config in the destination directory
Verify that after the renewal period (default 20m, but this can be reduced via configuration), that newly generated certificates are placed in the destination directory
Verify that sending both SIGUSR1 and SIGHUP to a running tbot process causes a renewal and new certificates to be generated

Ensure the above tests are completed for both:

Directly connecting to the auth server
Connecting to the auth server via the proxy reverse tunnel

DB Access

With a default Postgres DB instance, a Teleport instance configured with DB access and a bot user configured:

Verify you are able to connect to and interact with a database using tbot db while tbot start is running

Teleport Connect @ravicious @gzdunek @avatus

Host users creation @jakule

Host users creation docs
Host users creation RFD

Verify host users creation functionality
- non-existing users are created automatically
- users are added to groups
  - non existing configured groups are created
  - created users are added to the teleport-system group
- users are cleaned up after their session ends
  - cleanup occurs if a program was left running after session ends
- sudoers file creation is successful
  - Invalid sudoers files are not created
- existing host users are not modified
- setting disable_create_host_user: true stops user creation from occurring

CA rotations @espadolini

Verify the CA rotation functionality itself (by checking in the backend or with tctl get cert_authority)
- standby phase: only active_keys, no additional_trusted_keys
- init phase: active_keys and additional_trusted_keys
- update_clients and update_servers phases: the certs from the init phase are swapped
- standby phase: only the new certs remain in active_keys, nothing in additional_trusted_keys
- rollback phase (second pass, after completing a regular rotation): same content as in the init phase
- standby phase after rollback: same content as in the previous standby phase
Verify functionality in all phases (clients might have to log in again in lieu of waiting for credentials to expire between phases)
- SSH session in tsh from a previous phase
- SSH session in web UI from a previous phase
- New SSH session with tsh
- New SSH session with web UI
- New SSH session in a child cluster on the same major version
- New SSH session in a child cluster on the previous major version - blocked on v9 leaf clusters with a v10 root are malfunctioning because of db authority #13793
- New SSH session from a parent cluster
- Application access through a browser
- Application access through curl with tsh app login
- kubectl get po after tsh kube login
- Database access (no configuration change should be necessary if the database CA isn't rotated, other Teleport functionality should not be affected if only the database CA is rotated)

IP-based validation

SSH @probakowski

Verify IP-based validation works for SSH
- pin_source_ip: true option can be added in role definition
- tsh ssh works when invoked from the same machine/IP that was used for logging in
- tsh ssh prompts for relogin when invoked from different machine (copy certs after login)
- connecting to sshd server works as above in both cases
- ssh works as above in both cases
- SSH access from WebUI works with IP pinning enabled
tsh status -d shows pinned IP

The text was updated successfully, but these errors were encountered:

zmb3 · 2022-06-15T16:03:33Z

Looks like we "regressed" and increased the GLIBC dependency again.

Edit: this appears to be related to the Rust version. Reverting to 1.58.1 seems to fix it.

I will downgrade for now: #13544

codingllama · 2022-06-15T20:34:41Z

A few preliminary findings:

tctl and teleport always print the warning below on macOS, which I think could be downgraded:

$ tctl -c ./teleport.yaml users ls
> 2022-06-15T17:29:04-03:00 WARN             Disabling host user creation as this feature is only available on Linux config/configuration.go:998

$ teleport start -c ./teleport.yaml
> 2022-06-15T17:28:58-03:00 WARN             Disabling host user creation as this feature is only available on Linux config/configuration.go:998

tctl still mentions the (removed) "admin" role:

$ tctl -c ./teleport.yaml users add --help
(...)
Examples:

  > tctl users add --roles=admin,dba joe

  This creates a Teleport account 'joe' who will assume the roles 'admin' and 'dba'
  To see the permissions of 'admin' role, execute 'tctl get role/admin'

tsh Touch ID authn isn't respecting users and picking the "oldest" credential

Repro by adding >1 credential and then >1 users. 😢

I'll focus on (3), (1) and (2) are easy pickings if someone wants to fix them.

r0mant · 2022-06-15T20:36:50Z

@lxea Could you take a look at "1" and "2" from Alan's comment above?

GavinFrazar · 2022-06-16T00:12:13Z

I noticed in the audit log when I do anything on my database (mysql) the log entries always show [undefined], even if I select a database explicitly during my session with "use ". Looks like this:

User [remote-alice-cluster1] has executed query [show tables] in database [undefined] on [testmysql]
User [remote-alice-cluster1] has executed query [show databases] in database [undefined] on [testmysql]
User [remote-alice-cluster1] has changed default database to [foodb] on [testmysql]

edit: found an issue for this #5903

It appears the behavior is to always show the database name used on login.

So if I do $ tsh db login --db-name=foodb testmysql or tsh db connect --db-name=foodb testmysql then all audit logs in that session will show [foodb] as the database. If I switch databases in mysql with use otherdb, then audit log continues to show actions as if they were done in [foodb]. If I don't specify any --db-name with login/connect then it's always [undefined].

Joerger · 2022-06-16T01:18:38Z

I found a tsh ssh -J regression related to TLS routing - #13554

strideynet · 2022-06-16T10:17:32Z

tsh play <chunk-id> can fetch and print a session chunk archive.

Not concerned this is a blocker, and may actually just be the test plan being incorrect. This command fails with offset 0 not found for session. This is because by default tsh play attempts to play a session back to the PTY which is not compatible with application access session recordings. Running the command with --format json succeeds. Looking at the blame of the code, it doesn't look like this is a recent regression, and may have always been the case.

Do we want to update the test plan with the correct command ? I imagine eventually it would be nice if user's didn't have to provide this flag for the command to work, but given how we currently switch in the implementation between two modes, it will probably involve rewriting onPlay to support that.

strideynet · 2022-06-16T11:24:39Z

Discovered a regression with using the configuration output by teleport configure: #13558

I'll write a fix for this today and we should be able to get it merged down asap.

This fix has been merged down to branch/v10 and I can confirm the regression appears to be fixed.

rosstimothy · 2022-06-16T16:00:36Z

Discovered some backwards incompatibility with SSO login: #13575

Edit (Joerger): Fixed in #13589

Joerger · 2022-06-16T21:01:33Z

Found a regression in tsh join, I'll try fixing it.

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x158 pc=0x17b3fc0]

goroutine 1 [running]:
github.com/gravitational/teleport/api/types.(*SessionTrackerV1).GetAddress(0x390c700?)
	/home/bjoerger/gravitational/teleport/api/types/session_tracker.go:274
github.com/gravitational/teleport/lib/client.(*TeleportClient).Join(0xc00025e700, {0x3931f90, 0xc0000541f8}, {0x341aee2, 0x4}, {0x3426c0b?, 0x7}, {0x7ffd260b70f7, 0x24}, {0x0, ...})
	/home/bjoerger/gravitational/teleport/lib/client/api.go:1976 +0x6f2
main.onJoin.func1()
	/home/bjoerger/gravitational/teleport/tool/tsh/tsh.go:2584 +0x65
github.com/gravitational/teleport/lib/client.RetryWithRelogin({0x3932000, 0xc000a4c4b0}, 0xc00025e700, 0xc000b3e550)
	/home/bjoerger/gravitational/teleport/lib/client/api.go:719 +0x4e
main.onJoin(0xc0006ac000)
	/home/bjoerger/gravitational/teleport/tool/tsh/tsh.go:2583 +0x1b5
main.Run({0x39330d8, 0xc0002ae780}, {0xc00004e090, 0x3, 0x3}, {0x0, 0x0, 0xc0000021a0?})
	/home/bjoerger/gravitational/teleport/tool/tsh/tsh.go:859 +0x12445
main.main()
	/home/bjoerger/gravitational/teleport/tool/tsh/tsh.go:396 +0x318

Edit: fixed in #13596

Joerger · 2022-06-16T23:09:33Z

Possible regression: I can't join/view my own sessions despite having permissions to do so. Am I missing something in https://goteleport.com/docs/ver/10.0/access-controls/reference/?

#13595

GavinFrazar · 2022-06-17T02:22:09Z

Some issues I ran into while testing kube access locally:

tsh kube exec --tty --stdin shell-demo /bin/sh leads to panic:

Click for example

> tsh kube exec --tty --stdin shell-demo /bin/sh 
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x2 addr=0x0 pc=0x106905790]

goroutine 1 [running]:
main.(*StreamOptions).SetupTTY(0x14000abe410)
	/Users/gavin/work/teleport/tool/tsh/kube.go:281 +0x180
main.(*ExecOptions).Run(0x14000abe410)
	/Users/gavin/work/teleport/tool/tsh/kube.go:356 +0x280
main.(*kubeExecCommand).run(0x14000674600, 0x0?)
	/Users/gavin/work/teleport/tool/tsh/kube.go:467 +0x388
main.Run({0x1075eac90, 0x140006f1540}, {0x140001b6010, 0x6, 0x6}, {0x0, 0x0, 0x300000002?})
	/Users/gavin/work/teleport/tool/tsh/tsh.go:896 +0x12e98
main.main()
	/Users/gavin/work/teleport/tool/tsh/tsh.go:396 +0x2c0
 [19:16:57] gavin@mac ~ [SIGINT] 
> kubectl exec -it shell-demo -- /bin/sh
# whoami
root

tsh kube credentials issue when --teleport-cluster flag does not match $TELEPORT_CLUSTER

On first use, you get an error. If you immediately run the same command, it prints the credentials.
I ran into this because teleport modifies kubeconfig to execute this command to authenticate, if you're not already logged into teleport. So essentially, rm -rf ~/.tsh && kubectl get pods prompts me for my password and then prints an error message, but if I just run kubectl get pods again, it works.

Click for example

[19:08:20] gavin@mac ~ [1] 
> rm -rf ~/.tsh
[19:09:07] gavin@mac ~  
> tenv show
TELEPORT_CLUSTER=cluster2
TELEPORT_DEV_OUT=/tmp/out2.log
TELEPORT_CONFIG_FILE=/Users/gavin/teleport-config/nodes/cluster2.yaml
TELEPORT_USER=alice
TELEPORT_DEV_CONFIG_FILE=/Users/gavin/teleport-config/nodes/cluster2.yaml
TELEPORT_PROXY=proxy2.local.gd:4080
[19:09:11] gavin@mac ~  
> bat ~/.kube/config | rg "exec" -A 10
    exec:
      apiVersion: client.authentication.k8s.io/v1beta1
      args:
      - kube
      - credentials
      - --kube-cluster=minikube
      - --teleport-cluster=cluster1
      - --proxy=proxy1.local.gd:3080
      - --insecure
      command: /Users/gavin/work/teleport/build/tsh
      env: null
[19:09:41] gavin@mac ~  
> kubectl get pods
Enter password for Teleport user alice:
WARNING: You are using insecure connection to SSH proxy https://proxy1.local.gd:3080
ERROR: SSH cert not available

Unable to connect to the server: getting credentials: exec: executable /Users/gavin/work/teleport/build/tsh failed with exit code 1
[19:09:57] gavin@mac ~ [1] 
> kubectl get pods
NAME         READY   STATUS    RESTARTS   AGE
shell-demo   1/1     Running   0          75m

Tener · 2022-06-17T08:10:27Z

@GavinFrazar

tsh kube credentials issue when --teleport-cluster flag does not match $TELEPORT_CLUSTER

I'm not sure if this would fix the outlined issue, but I noticed recently that a couple of --cluster and --teleport-cluster flags should really use .Envar(clusterEnvVar) in their mix. At the time I didn't realise this may cause issues like the one you outlined, but perhaps the fix is as simple as adding that call to the mix as appropriate. For example:

	c.Flag("teleport-cluster", "Name of the teleport cluster to get credentials for.").Required().StringVar(&c.teleportCluster)

becomes

c.Flag("teleport-cluster", "Name of the teleport cluster to get credentials for.").Required().Envar(clusterEnvVar).StringVar(&c.teleportCluster)

	ssh.Flag("cluster", clusterHelp).Short('c').StringVar(&cf.SiteName)

becomes

	ssh.Flag("cluster", clusterHelp).Envar(clusterEnvVar).Short('c').StringVar(&cf.SiteName)

Tener · 2022-06-17T15:24:12Z

@atburke

Regression due to #12934:

Basically the logic between onListDatabases and listDatabasesAllClusters is out of sync. The former contains the correct code to fetch roles:

teleport/tool/tsh/db.go

Lines 81 to 104 in 77b35b8

    
           // get roles and traits. default to the set from profile, try to get up-to-date version from server point of view. 
        
           roles := profile.Roles 
        
           traits := profile.Traits 
        
           // GetCurrentUser() may not be implemented, fail gracefully. 
        
           user, err := cluster.GetCurrentUser(cf.Context) 
        
           if err == nil { 
        
           	roles = user.GetRoles() 
        
           	traits = user.GetTraits() 
        
           } else { 
        
           	log.Debugf("Failed to fetch current user information: %v.", err) 
        
           } 
        
           // get the role definition for all roles of user. 
        
           // this may only fail if the role which we are looking for does not exist, or we don't have access to it. 
        
           // example scenario when this may happen: 
        
           // 1. we have set of roles [foo bar] from profile. 
        
           // 2. the cluster is remote and maps the [foo, bar] roles to single role [guest] 
        
           // 3. the remote cluster doesn't implement GetCurrentUser(), so we have no way to learn of [guest]. 
        
           // 4. services.FetchRoles([foo bar], ..., ...) fails as [foo bar] does not exist on remote cluster. 
        
           roleSet, err := services.FetchRoles(roles, cluster, traits) 
        
           if err != nil { 
        
           	log.Debugf("Failed to fetch user roles: %v.", err) 
        
           }

The latter does not (profile.Roles):

teleport/tool/tsh/db.go

Lines 163 to 167 in 77b35b8

    
           roleSet, err := services.FetchRoles(profile.Roles, cluster, profile.Traits) 
        
           if err != nil { 
        
           	errors = append(errors, err) 
        
           	continue 
        
           }

The result is that we try to get definition for role which we do not have in the leaf cluster and we may not have permission to do so.

For example, given clusters boson.tener.io and quark.tener.io and the trusted cluster role mapping giving only access role:

kind: trusted_cluster
metadata:
  id: 1655472056507184000
  name: boson.tener.io
spec:
  enabled: true
  role_map:
  - local:
    - access
    remote: access
  token: foo
  tunnel_addr: boson.tener.io:3080
  web_proxy_addr: boson.tener.io:3080
version: v2

We will get errors when tsh tries to read the editor and auditor roles from quark.tener.io. This is an error because the mapping only gives access role. The code in onListDatabases correctly handles that case.

$ tsh clusters
Cluster Name   Status Cluster Type Labels Selected
-------------- ------ ------------ ------ --------
boson.tener.io online root                *
quark.tener.io online leaf

$ tsh db ls
Name Description Allowed Users Labels Connect
---- ----------- ------------- ------ -------

$ tsh db --cluster=quark.tener.io ls
Name                            Description         Allowed Users     Labels  Connect
------------------------------- ------------------- ----------------- ------- ------------------------------------------------------------------------
> qmongo (user: alice)                              [alice bob tener]         tsh db connect --cluster=quark.tener.io --db-name=<name> qmongo
> qmongo-insecure (user: alice)                     [alice bob tener]         tsh db connect --cluster=quark.tener.io --db-name=<name> qmongo-insecure
redisquark                      Quark Redis example [alice bob tener] env=dev

$ tsh db --cluster=quark.tener.io ls --all
ERROR: access denied to perform action "read" on "role"

I'm unlikely to have the time to fix it before my PTO.

jakule · 2022-06-20T13:27:12Z

I found two issues related to the host user creations #13663 #13662

nklaassen · 2022-06-21T01:48:52Z

found an issue with the "Instance" role and the EC2 join method #13677

LKozlowski · 2022-06-21T09:07:02Z

I found an issue with LDAP attribute labeling - it does not work correctly: #13680

LKozlowski · 2022-06-21T10:28:43Z

Regexp-based host labeling applies across all desktops, regardless of origin.

I don't know if this is an issue or not, but I had a hard time figuring out why it does not work the way I would expect it to work. There is an inconsistency between how we treat LDAP discovered hosts vs static hosts.

Scenario 1: LDAP hosts

windows_desktop_service:
...
  discovery:
    base_dn: "*"
  host_labels:
    - match: '^.*\.example\.com$'
      labels:
        environment: dev

Using this configuration if the discovered host has dns host name set as EXAMPLE-82K6DLP.example.com
we'll get regexp match and that host will have an extra label environment/dev

Scenario 2: Static hosts

windows_desktop_service:
...
  hosts:
    - EXAMPLE-82K6DLP.example.com
  host_labels:
    - match: '^.*\.example\.com$'
      labels:
        environment: dev

Using this configuration, with the same regexp and the same dns host name for a static host we won't get a regexp match and this host won't have an extra label.

The reason being for that is in case of static hosts, we do try to match regexp against hostname:port. In our example we would compare our regex with EXAMPLE-82K6DLP.example.com:3389 which would fail to match because of the $ at the end of our regexp.

teleport/lib/srv/desktop/windows_server.go

Lines 982 to 989 in ca52099

    
           addr := netAddr.String() 
        
           name, err := s.nameForStaticHost(addr) 
        
           if err != nil { 
        
           	return nil, trace.Wrap(err) 
        
           } 
        
           // for static hosts, we match against the host's addr, 
        
           // as the name is a randomly generated UUID 
        
           labels := getHostLabels(addr)

Since I don't know if this was intended or we should fix it by changing the behavior of it to just use host without port it would be great if @zmb3 could take a look into my comment as I think he is the author of this functionality.

zmb3 · 2022-06-21T13:44:23Z

@LKozlowski I don't think we ever noticed this before, but technically regex-based labeling is working as intended, we're just not clear in the docs or examples that the port is included.

Feels like the simplest thing would be to remove the $ from the examples and mention in the docs that the port is included in the match for static hosts.

espadolini · 2022-06-21T13:46:41Z

That will end up match anything with an example.com prefix tho; perhaps the docs should add a (:3389)? before the $ instead, if that works (or a (:\d+)?, if we want to be pedantic).

ibeckermayer · 2022-06-21T15:10:01Z

I found an issue with desktop access scroll behavior: #13690

zmb3 · 2022-06-21T15:28:23Z

That will end up match anything with an example.com prefix tho; perhaps the docs should add a (:3389)? before the $ instead, if that works (or a (:\d+)?, if we want to be pedantic).

Sure, that works. Or I'm also fine not matching against the host and not the port.

I don't see this as a major issue since it has always been this way, and few people use static hosts.

Favor newer Touch ID credentials in the allowed set for MFA, or just the newer credential for passwordless. Fixes a capture-by-reference bug and adds coverage for it. Issue #13340. * Add tests for Touch ID credential-choosing logic * Favor newer Touch ID credentials within the allowed set * Warn about origin vs RPID mismatch

atburke · 2022-06-21T23:50:08Z

@nklaassen #13529 should fix the EC2 labels error.

LKozlowski · 2022-06-22T08:04:11Z

That will end up match anything with an example.com prefix tho; perhaps the docs should add a (:3389)? before the $ instead, if that works (or a (:\d+)?, if we want to be pedantic).

Sure, that works. Or I'm also fine not matching against the host and not the port.

I don't see this as a major issue since it has always been this way, and few people use static hosts.

I just wanted to bring it up as it wasn't clear for me when I was testing it, but I agree that it is working fine. As you said, we just need to either update docs or slightly update the code. Anyway, I'll mark it in the test plan as working and we'll just improve it later so it doesn't block the v10 release.

Favor newer Touch ID credentials in the allowed set for MFA, or just the newer credential for passwordless. Fixes a capture-by-reference bug and adds coverage for it. Issue #13340. * Add tests for Touch ID credential-choosing logic * Favor newer Touch ID credentials within the allowed set * Warn about origin vs RPID mismatch

#13712) Favor newer Touch ID credentials in the allowed set for MFA, or just the newer credential for passwordless. Fixes a capture-by-reference bug and adds coverage for it. Issue #13340. Backports #13672 and #13761. * Add tests for Touch ID credential-choosing logic * Favor newer Touch ID credentials within the allowed set * Warn about origin vs RPID mismatch * Do not dereference assertion before checking for nil

espadolini · 2022-06-23T13:14:53Z

Found a compatibility issue between v9 leafs and v10 roots related to the new database CA:

v9 leaf clusters with a v10 root are malfunctioning because of db authority #13793

ravicious · 2022-06-23T15:28:38Z

Is tsh status supposed to report -teleport-internal-join as one of the SSH logins? I can see it in the logins list for v10 clusters but not for the ones running older versions of Teleport.

espadolini · 2022-06-23T15:29:56Z

Is tsh status supposed to report -teleport-internal-join as one of the SSH logins?

We should probably filter out that one and the -teleport-nologin-<uuid> ones.

Joerger · 2022-06-24T00:03:38Z

ssh -J <teleport-proxy> doesn't work with tls routing (since v8.0.0) - #13833

fheinecke · 2022-06-27T17:33:30Z

tsh does not work on Debian 9 due to glibc 2.25 dependency - #13894

zmb3 · 2022-06-27T18:38:43Z

I'm seeing a "session data" event that I'm not used to seeing which renders with a missing session ID in the audit log.

It's not just a UI thing, the JSON for the event has "sid": "".

rosstimothy · 2022-06-27T19:49:28Z

Direct Dial Nodes unreachable because they are reporting an address of [::]:3022 #13898

rosstimothy · 2022-06-27T20:27:02Z

Reverse Tunnel Nodes getting stuck initializing and not connecting: #13911

rosstimothy · 2022-06-27T22:05:35Z

etcd 500 TC Scaling Test

https://teleportcoreteam.grafana.net/goto/m-ivFEqnk?orgId=1

codingllama · 2022-06-27T22:05:36Z

Something minor I just noticed: my (idle) local teleport was spamming a session recording warning (shutdown logs included):

2022-06-27T17:58:47-03:00 [UPLOAD]    WARN Skipped session recording 25366a4e-03f8-47e6-a4ea-6c54d1290c4f.tar. error:[session file could be corrupted or is using unsupported format: session recording 25366a4e-03f8-47e6-a4ea-6c54d1290c4f is either corrupted or is using unsupported format, remove the file /path/to/teleport/log/upload/streaming/default/25366a4e-03f8-47e6-a4ea-6c54d1290c4f.tar to correct the problem, remove the /path/to/teleport/log/upload/streaming/default/25366a4e-03f8-47e6-a4ea-6c54d1290c4f.error file to retry the upload] filesessions/fileasync.go:253
^C2022-06-27T17:58:51-03:00 [PROC:1]    INFO Got signal "interrupt", exiting immediately. pid:27917.1 service/signals.go:83
2022-06-27T17:58:51-03:00 [PROC:1]    WARN Sync rotation state cycle failed. Retrying in ~10s pid:27917.1 service/connect.go:682
2022-06-27T17:58:51-03:00 [AUDIT:1]   INFO File uploader is shutting down. pid:27917.1 service/service.go:2480
2022-06-27T17:58:51-03:00 [AUDIT:1]   INFO File uploader has shut down. pid:27917.1 service/service.go:2482

I didn't do anything special with the cluster today, other than a few login attempts. Posting here in case it rings a bell for someone.

avatus · 2022-06-27T22:09:16Z

Something minor I just noticed: my (idle) local teleport was spamming a session recording warning (shutdown logs included):

2022-06-27T17:58:47-03:00 [UPLOAD]    WARN Skipped session recording 25366a4e-03f8-47e6-a4ea-6c54d1290c4f.tar. error:[session file could be corrupted or is using unsupported format: session recording 25366a4e-03f8-47e6-a4ea-6c54d1290c4f is either corrupted or is using unsupported format, remove the file /path/to/teleport/log/upload/streaming/default/25366a4e-03f8-47e6-a4ea-6c54d1290c4f.tar to correct the problem, remove the /path/to/teleport/log/upload/streaming/default/25366a4e-03f8-47e6-a4ea-6c54d1290c4f.error file to retry the upload] filesessions/fileasync.go:253
^C2022-06-27T17:58:51-03:00 [PROC:1]    INFO Got signal "interrupt", exiting immediately. pid:27917.1 service/signals.go:83
2022-06-27T17:58:51-03:00 [PROC:1]    WARN Sync rotation state cycle failed. Retrying in ~10s pid:27917.1 service/connect.go:682
2022-06-27T17:58:51-03:00 [AUDIT:1]   INFO File uploader is shutting down. pid:27917.1 service/service.go:2480
2022-06-27T17:58:51-03:00 [AUDIT:1]   INFO File uploader has shut down. pid:27917.1 service/service.go:2482

I didn't do anything special with the cluster today, other than a few login attempts. Posting here in case it rings a bell for someone.

This happened to me as well and adding auth_service.session_recording = off into the config failed to stop the warning. If that provides any further context

espadolini · 2022-06-28T09:44:42Z

my (idle) local teleport was spamming a session recording warning

Should be fixed by #13826, fixing the warning in a running cluster involves manually deleting the file in the recordings I think.

r0mant · 2022-06-29T00:12:35Z

Can't get passwordless scenario to work as described in the test plan:

Adding touchid device using tsh mfa add ✅
Touchid device is visible in tsh mfa ls and tsh touchid ls (the latter also brings up touchid prompt) ✅
Running tsh -d login --proxy=root.gravitational.io:3080 --auth=passwordless doesn't work, asking to tap a security key (which I didn't register any separately) ❌

➜  e git:(afa3414) ✗ tsh login --proxy=root.gravitational.io:3080 --auth=passwordless
Tap your security key
^CERROR: context canceled

Logs:

➜  e git:(afa3414) ✗ tsh -d login --proxy=root.gravitational.io:3080 --auth=passwordless
DEBU [CLIENT]    open /Users/r0mant/.tsh/root.gravitational.io.yaml: no such file or directory client/api.go:1052
INFO [CLIENT]    No teleport login given. defaulting to r0mant client/api.go:1394
INFO [CLIENT]    no host login given. defaulting to r0mant client/api.go:1404
INFO [CLIENT]    [KEY AGENT] Connected to the system agent: "/private/tmp/com.apple.launchd.0G1kn68Tdf/Listeners" client/api.go:3934
DEBU [CLIENT]    attempting to use loopback pool for local proxy addr: root.gravitational.io:3080 client/api.go:3892
DEBU [CLIENT]    reading self-signed certs from: /var/lib/teleport/webproxy_cert.pem client/api.go:3900
DEBU [CLIENT]    could not open any path in: /var/lib/teleport/webproxy_cert.pem client/api.go:3904
DEBU             Attempting GET root.gravitational.io:3080/webapi/ping/passwordless webclient/webclient.go:115
DEBU [CLIENT]    attempting to use loopback pool for local proxy addr: root.gravitational.io:3080 client/api.go:3892
DEBU [CLIENT]    reading self-signed certs from: /var/lib/teleport/webproxy_cert.pem client/api.go:3900
DEBU [CLIENT]    could not open any path in: /var/lib/teleport/webproxy_cert.pem client/api.go:3904
DEBU [CLIENT]    HTTPS client init(proxyAddr=root.gravitational.io:3080, insecure=false) client/weblogin.go:233
DEBU             Attempting platform login webauthncli/api.go:97
DEBU             Platform login failed, falling back to cross-platform error:[credential not found] webauthncli/api.go:103
DEBU             FIDO2: Using libfido2 for assertion webauthncli/api.go:113
DEBU             FIDO2: Info for device ioreg://4294970624: &libfido2.DeviceInfo{Versions:[]string{"U2F_V2", "FIDO_2_0", "FIDO_2_1_PRE"}, Extensions:[]string{"credProtect", "hmac-secret"}, AAGUID:[]uint8{0xee, 0x88, 0x28, 0x79, 0x72, 0x1c, 0x49, 0x13, 0x97, 0x75, 0x3d, 0xfc, 0xce, 0x97, 0x7, 0x2a}, Options:[]libfido2.Option{libfido2.Option{Name:"rk", Value:"true"}, libfido2.Option{Name:"up", Value:"true"}, libfido2.Option{Name:"plat", Value:"false"}, libfido2.Option{Name:"clientPin", Value:"false"}, libfido2.Option{Name:"credentialMgmtPreview", Value:"true"}}, Protocols:[]uint8{0x1}} webauthncli/fido2.go:658
DEBU             FIDO2: Device ioreg://4294970624: filtered due to lack of UV webauthncli/fido2.go:137
Tap your security key
DEBU             FIDO2: Selecting devices error:[no suitable devices found] webauthncli/fido2.go:612
DEBU             FIDO2: Selecting devices error:[no suitable devices found] webauthncli/fido2.go:612
DEBU             FIDO2: Selecting devices error:[no suitable devices found] webauthncli/fido2.go:612
DEBU             FIDO2: Selecting devices error:[no suitable devices found] webauthncli/fido2.go:612
DEBU             FIDO2: Selecting devices error:[no suitable devices found] webauthncli/fido2.go:612
DEBU             FIDO2: Selecting devices error:[no suitable devices found] webauthncli/fido2.go:612
DEBU             FIDO2: Selecting devices error:[no suitable devices found] webauthncli/fido2.go:612
DEBU             FIDO2: Selecting devices error:[no suitable devices found] webauthncli/fido2.go:612
DEBU             FIDO2: Selecting devices error:[no suitable devices found] webauthncli/fido2.go:612
DEBU             FIDO2: Selecting devices error:[no suitable devices found] webauthncli/fido2.go:612
DEBU             FIDO2: Selecting devices error:[no suitable devices found] webauthncli/fido2.go:612
DEBU             FIDO2: Selecting devices error:[no suitable devices found] webauthncli/fido2.go:612
DEBU             FIDO2: Selecting devices error:[no suitable devices found] webauthncli/fido2.go:612
DEBU             FIDO2: Selecting devices error:[no suitable devices found] webauthncli/fido2.go:612
^C

cc @codingllama

codingllama · 2022-06-29T15:19:41Z

Can't get passwordless scenario to work as described in the test plan:

@r0mant could you double-check that you are using tsh from the signed/notarized/etc tsh.app bundle? I downloaded the tsh-v10.0.0-alpha.2.pkg installer and cleared the testplan without problems using it. Hit me up on Slack if you still have issues.

espadolini · 2022-07-01T12:25:46Z

@codingllama @r0mant all clear on the passwordless test plan for me on macOS.

rosstimothy · 2022-07-01T13:15:27Z

etcd Soak Test

kubectl logs -n loadtest-tross soaktest-pvnlr-6gv5f -f
+ tsh --insecure --proxy=monster.gravitational.co:3080 -i /etc/teleport/auth -l root ls -f names
node-65c8f5c9db-5zzfd
iot-node-5b4f7757f8-f2966

----Direct Dial Node Test----
+ tsh --insecure --proxy=monster.gravitational.co:3080 -i /etc/teleport/auth bench --duration=30m root@node-65c8f5c9db-5zzfd ls

* Requests originated: 17999
* Requests failed: 0

Histogram

Percentile Response Duration
---------- -----------------
25         157 ms
50         162 ms
75         168 ms
90         174 ms
95         178 ms
99         193 ms
100        474 ms

+ tsh --insecure --proxy=monster.gravitational.co:3080 -i /etc/teleport/auth bench --duration=30m --interactive root@node-65c8f5c9db-5zzfd ps aux

* Requests originated: 17999
* Requests failed: 0

Histogram

Percentile Response Duration
---------- -----------------
25         159 ms
50         164 ms
75         170 ms
90         175 ms
95         180 ms
99         195 ms
100        5179 ms

+ tsh --insecure --proxy=monster.gravitational.co:3080 -i /etc/teleport/auth bench --duration=30m root@iot-node-5b4f7757f8-f2966 ls
----Reverse Tunnel Node Test----

* Requests originated: 17999
* Requests failed: 0

Histogram

Percentile Response Duration
---------- -----------------
25         155 ms
50         160 ms
75         166 ms
90         172 ms
95         178 ms
99         193 ms
100        418 ms

+ tsh --insecure --proxy=monster.gravitational.co:3080 -i /etc/teleport/auth bench --duration=30m --interactive root@iot-node-5b4f7757f8-f2966 ps aux

* Requests originated: 17999
* Requests failed: 0

Histogram

Percentile Response Duration
---------- -----------------
25         154 ms
50         159 ms
75         165 ms
90         170 ms
95         175 ms
99         192 ms
100        5171 ms

etcd 10k Reverse Tunnel Nodes

https://teleportcoreteam.grafana.net/goto/vJFIH33nk?orgId=1

etcd 10k Direct Dial Nodes

https://teleportcoreteam.grafana.net/goto/yky9Oqqnz?orgId=1

russjones · 2022-07-02T01:11:11Z

Aggregate last 3 releases.

Backend	Cluster Size	Mode	PTY	8.0	9.0	10.0
etcd	10k	Regular	No	3335 ms	700 ms	474 ms
etcd	10k	Regular	Yes	4647 ms	393 ms	5179 (99%: 195ms)
etcd	10k	Tunnel	No	4259 ms	143 ms	418 ms
etcd	10k	Tunnel	Yes	3143 ms	799 ms	5171 ms (99%: 192ms)
DynamoDB	10k	Regular	No		5147 ms
DynamoDB	10k	Regular	Yes		222 ms
DynamoDB	10k	Tunnel	No		235 ms
DynamoDB	10k	Tunnel	Yes		198 ms
DynamoDB	1	Regular	No	1824 ms
DynamoDB	1	Regular	Yes	1483 ms
DynamoDB	1	Tunnel	No	2125 ms
DynamoDB	1	Tunnel	Yes	2002 ms

fspmarshall · 2022-07-06T15:54:40Z

500 TC Scaling Test (DynamoDB)

note: Initial dynamo 10k tests are not complete yet due to issues with the test automation, but I've gotten up to a 6k dynamo cluster without any issues on teleport's end of things. Working on re-running with different automation.

fspmarshall · 2022-07-06T21:04:34Z

10K Dynamo IoT

edit: See #13340 (comment) for updated bench numbers.

tsh bench --duration=30m root@node-848df68b94-zzxjg ls

* Requests originated: 17934
* Requests failed: 109
* Last error: EOF

Histogram

Percentile Response Duration 
---------- ----------------- 
25         5939 ms           
50         9655 ms           
75         13911 ms          
90         16655 ms          
95         17519 ms          
99         18351 ms          
100        55071 ms

tsh bench --duration=30m --interactive root@node-848df68b94-zzw65 ps aux

* Requests originated: 17903
* Requests failed: 22
* Last error: failed connecting to node node-848df68b94-zzw65. 

Histogram

Percentile Response Duration 
---------- ----------------- 
25         6115 ms           
50         9879 ms           
75         14103 ms          
90         16751 ms          
95         17583 ms          
99         18431 ms          
100        45471 ms

Note: benches run concurrently with scaling and against nodes in a different region/cloud, which I think explains the differences in response duration. Looking into it.

fspmarshall · 2022-07-08T03:25:59Z

10K Dynamo Non-IoT

tsh bench --duration=30m [email protected] ls

* Requests originated: 17998
* Requests failed: 0

Histogram

Percentile Response Duration 
---------- ----------------- 
25         185 ms            
50         197 ms            
75         211 ms            
90         232 ms            
95         251 ms            
99         358 ms            
100        2161 ms

tsh bench --duration=30m --interactive [email protected] ps aux

* Requests originated: 17998
* Requests failed: 0

Histogram

Percentile Response Duration 
---------- ----------------- 
25         193 ms            
50         206 ms            
75         221 ms            
90         240 ms            
95         260 ms            
99         418 ms            
100        4579 ms

Note: these benches were run against individual bare-metal nodes within a 2-node cluster with tsh located within the same vpc as the auth, proxy, and nodes.

fspmarshall · 2022-07-11T17:34:06Z

DynamoDB Small Cluster Bench

(previously posted dynamodb bench numbers were from a 10k cluster with sub-optimal network conditions, and therefore not particularly useful for comparison)

tsh bench --duration=30m root@ip-172-31-4-81-us-west-2-compute-internal ls

* Requests originated: 17998
* Requests failed: 0

Histogram

Percentile Response Duration 
---------- ----------------- 
25         198 ms            
50         210 ms            
75         222 ms            
90         238 ms            
95         255 ms            
99         372 ms            
100        3495 ms

tsh bench --duration=30m --interactive root@ip-172-31-9-206-us-west-2-compute-internal ps aux

* Requests originated: 17998
* Requests failed: 0

Histogram

Percentile Response Duration 
---------- ----------------- 
25         221 ms            
50         231 ms            
75         244 ms            
90         262 ms            
95         280 ms            
99         466 ms            
100        2003 ms

r0mant added bug test-plan A list of tasks required to ship a successful product release. labels Jun 9, 2022

r0mant mentioned this issue Jun 9, 2022

Teleport 10 Release Plan #13341

Closed

31 tasks

Joerger mentioned this issue Jun 16, 2022

tsh proxy jump fails with TLS routing enabled #13554

Closed

lxea mentioned this issue Jun 16, 2022

use editor instead of admin in tctl usage example #13557

Merged

Joerger mentioned this issue Jun 16, 2022

Fix nil pointer exception when joining non-existent session #13596

Merged

atburke mentioned this issue Jun 17, 2022

Fix database role fetch for tsh db ls --all #13617

Merged

codingllama mentioned this issue Jun 20, 2022

Favor newer Touch ID credentials within the allowed set #13672

Merged

ravicious mentioned this issue Jun 28, 2022

Filter out logins starting with dash gravitational/webapps#932

Merged

codingllama closed this as completed Jun 29, 2022

codingllama reopened this Jun 29, 2022

zmb3 mentioned this issue Jul 1, 2022

session.data events are missing session ID #14042

Open

r0mant closed this as completed Jul 9, 2022

Teleport 10 Test Plan #13340

Teleport 10 Test Plan #13340

Comments

r0mant commented Jun 9, 2022 • edited by hugoShaka Loading

Manual Testing Plan

User accounting @xacrimon

Combinations @capnspacehook

Teleport with EKS/GKE @tigrato

Teleport with multiple Kubernetes clusters @tigrato

Teleport with FIPS mode @alistanis @r0mant

ACME @rudream

Migrations @hugoShaka

Command Templates

OpenSSH

Teleport

Teleport with SSO Providers @ptgott @Tener

tctl sso family of commands @Tener

Teleport Plugins @marcoandredinis

AWS Node Joining @nklaassen

Passwordless @r0mant @espadolini

WEB UI @kimlisa @rudream @hatched

Main

Top Nav

Side Nav

Servers aka Nodes

Applications

Databases

Active Sessions

Audit log

Users

Auth Connectors

Roles

Managed Clusters

Help & Support

Access Requests

Creating Access Requests (Role Based)

Creating Access Requests (Search Based)

Viewing & Approving/Denying Requests

Assuming Approved Requests (Role Based)

Assuming Approved Requests (Search Based)

Assuming Approved Requests (Both)

Access Request Waiting Room

Strategy Reason

Strategy Always

Strategy Optional

Terminal

Node List Tab

Session Tab

Session Player

Invite and Reset Form

Login Form and Change Password

Multi-factor Authentication (mfa)

MFA invite, login, password reset, change password

MFA require auth

MFA Management

Passwordless

Cloud

Recovery Code Management

Invite/Reset

Recovery Flow: Add new mfa device

Recovery Flow: Change password

Recovery Email

RBAC

Performance/Soak Test @rosstimothy @espadolini

Teleport with Cloud Providers

AWS @lxea

GCP @EdwardDowling

IBM @r0mant

Application Access @strideynet

Database Access @smallinsky

TLS Routing @smallinsky

Desktop Access

Basic Sessions (@LKozlowski)

User Input (@ibeckermayer)

Binaries compatibility @fheinecke

Machine ID @timothyb89

SSH

DB Access

Teleport Connect @ravicious @gzdunek @avatus

Host users creation @jakule

r0mant commented Jun 9, 2022 •

edited by hugoShaka

Loading

`tctl sso` family of commands @Tener

zmb3 commented Jun 15, 2022 •

edited

Loading

codingllama commented Jun 15, 2022 •

edited

Loading

GavinFrazar commented Jun 16, 2022 •

edited

Loading

strideynet commented Jun 16, 2022 •

edited

Loading

rosstimothy commented Jun 16, 2022 •

edited by Joerger

Loading

Joerger commented Jun 16, 2022 •

edited

Loading

Tener commented Jun 17, 2022 •

edited

Loading

espadolini commented Jun 21, 2022 •

edited

Loading

zmb3 commented Jun 27, 2022 •

edited

Loading

rosstimothy commented Jun 27, 2022 •

edited

Loading

avatus commented Jun 27, 2022 •

edited

Loading

russjones commented Jul 2, 2022 •

edited

Loading

fspmarshall commented Jul 6, 2022 •

edited

Loading

fspmarshall commented Jul 8, 2022 •

edited

Loading