Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Teleport 10 Test Plan #13340

Closed
Tracked by #13341
r0mant opened this issue Jun 9, 2022 · 45 comments
Closed
Tracked by #13341

Teleport 10 Test Plan #13340

r0mant opened this issue Jun 9, 2022 · 45 comments
Labels
bug test-plan A list of tasks required to ship a successful product release.

Comments

@r0mant
Copy link
Collaborator

r0mant commented Jun 9, 2022

Manual Testing Plan

Below are the items that should be manually tested with each release of Teleport.
These tests should be run on both a fresh install of the version to be released
as well as an upgrade of the previous version of Teleport.

  • Adding nodes to a cluster @avatus

    • Adding Nodes via Valid Static Token
    • Adding Nodes via Valid Short-lived Tokens
    • Adding Nodes via Invalid Token Fails
    • Revoking Node Invitation
  • Labels @avatus

    • Static Labels
    • Dynamic Labels
  • Trusted Clusters @EdwardDowling @hugoShaka

    • Adding Trusted Cluster Valid Static Token
    • Adding Trusted Cluster Valid Short-lived Token
    • Adding Trusted Cluster Invalid Token
    • Removing Trusted Cluster
  • RBAC @alistanis

    Make sure that invalid and valid attempts are reflected in audit log.

    • Successfully connect to node with correct role
    • Unsuccessfully connect to a node in a role restricting access by label
    • Unsuccessfully connect to a node in a role restricting access by invalid SSH login
    • Allow/deny role option: SSH agent forwarding
    • Allow/deny role option: Port forwarding
  • Verify that custom PAM environment variables are available as expected. @xacrimon

  • Users @codingllama

    With every user combination, try to login and signup with invalid second
    factor, invalid password to see how the system reacts.

    WebAuthn in the release tsh binary is implemented using libfido2. Ask for
    a statically built pre-release binary for realistic tests. (tsh fido2 diag
    should work in our binary.)

    Touch ID requires a signed tsh, ask for a signed pre-release binary so you
    may run the tests.

    • Adding Users Password Only

    • Adding Users OTP

    • Adding Users WebAuthn

    • Adding Users Touch ID

    • Managing MFA devices

      • Add an OTP device with tsh mfa add
      • Add a WebAuthn device with tsh mfa add
      • Add a Touch ID device with tsh mfa add
      • List MFA devices with tsh mfa ls
      • Remove an OTP device with tsh mfa rm
      • Remove a WebAuthn device with tsh mfa rm
      • Attempt removing the last MFA device on the user
        • with second_factor: on in auth_service, should fail
        • with second_factor: optional in auth_service, should succeed
    • Login Password Only

    • Login with MFA

      • Add an OTP, a WebAuthn and a Touch ID device with tsh mfa add
      • Login via OTP
      • Login via WebAuthn
      • Login via Touch ID
      • Login via WebAuthn using an U2F device

      U2F devices must be registered in a previous version of Teleport.

      Using Teleport v9, set auth_service.authentication.second_factor = u2f,
      restart the server and then register an U2F device (tsh mfa add). Upgrade
      the install to the current Teleport version (one major at a time) and try to
      login using the U2F device as your second factor - it should work.

    • Login OIDC @Tener

    • Login SAML @Tener

    • Login GitHub @Tener

    • Deleting Users @Tener

  • Backends

  • Session Recording @gabrielcorado

    • Session recording can be disabled
    • Sessions can be recorded at the node
      • Sessions in remote clusters are recorded in remote clusters
    • Sessions can be recorded at the proxy
      • Sessions on remote clusters are recorded in the local cluster
      • Enable/disable host key checking.
  • Audit Log @gabrielcorado

    • Failed login attempts are recorded

    • Interactive sessions have the correct Server ID

      • Server ID is the ID of the node in "session_recording: node" mode
      • Server ID is the ID of the proxy in "session_recording: proxy" mode

      Node/Proxy ID may be found at /var/lib/teleport/host_uuid in the
      corresponding machine.

      Node IDs may also be queried via tctl nodes ls.

    • Exec commands are recorded

    • scp commands are recorded

    • Subsystem results are recorded

      Subsystem testing may be achieved using both
      Recording Proxy mode
      and
      OpenSSH integration.

      Assuming the proxy is proxy.example.com:3023 and node1 is a node running
      OpenSSH/sshd, you may use the following command to trigger a subsystem audit
      log:

      sftp -o "ProxyCommand ssh -o 'ForwardAgent yes' -p 3023 %[email protected] -s proxy:%h:%p" root@node1
  • Interact with a cluster using tsh @alistanis @hugoShaka

    These commands should ideally be tested for recording and non-recording modes as they are implemented in a different ways.

    • tsh ssh <regular-node>
    • tsh ssh <node-remote-cluster>
    • tsh ssh -A <regular-node>
    • tsh ssh -A <node-remote-cluster>
    • tsh ssh <regular-node> ls
    • tsh ssh <node-remote-cluster> ls
    • tsh join <regular-node>
    • tsh join <node-remote-cluster>
    • tsh play <regular-node>
    • tsh play <node-remote-cluster>
    • tsh scp <regular-node>
    • tsh scp <node-remote-cluster>
    • tsh ssh -L <regular-node>
    • tsh ssh -L <node-remote-cluster>
    • tsh ls
    • tsh clusters
  • Interact with a cluster using ssh @Joerger
    Make sure to test both recording and regular proxy modes.

    • ssh <regular-node>
    • ssh <node-remote-cluster>
    • ssh -A <regular-node>
    • ssh -A <node-remote-cluster>
    • ssh <regular-node> ls
    • ssh <node-remote-cluster> ls
    • scp <regular-node>
    • scp <node-remote-cluster>
    • ssh -L <regular-node>
    • ssh -L <node-remote-cluster>
  • Verify proxy jump functionality @Joerger
    Log into leaf cluster via root, shut down the root proxy and verify proxy jump works.

    • tsh ssh -J <leaf-proxy>
    • ssh -J <leaf-proxy>
  • Interact with a cluster using the Web UI @Joerger

    • Connect to a Teleport node
    • Connect to a OpenSSH node
    • Check agent forwarding is correct based on role and proxy mode.

User accounting @xacrimon

  • Verify that active interactive sessions are tracked in /var/run/utmp on Linux.
  • Verify that interactive sessions are logged in /var/log/wtmp on Linux.

Combinations @capnspacehook

For some manual testing, many combinations need to be tested. For example, for
interactive sessions the 12 combinations are below.

  • Connect to a OpenSSH node in a local cluster using OpenSSH.
  • Connect to a OpenSSH node in a local cluster using Teleport.
  • Connect to a OpenSSH node in a local cluster using the Web UI.
  • Connect to a Teleport node in a local cluster using OpenSSH.
  • Connect to a Teleport node in a local cluster using Teleport.
  • Connect to a Teleport node in a local cluster using the Web UI.
  • Connect to a OpenSSH node in a remote cl uster using OpenSSH.
  • Connect to a OpenSSH node in a remote cluster using Teleport.
  • Connect to a OpenSSH node in a remote cluster using the Web UI.
  • Connect to a Teleport node in a remote cluster using OpenSSH.
  • Connect to a Teleport node in a remote cluster using Teleport.
  • Connect to a Teleport node in a remote cluster using the Web UI.

Teleport with EKS/GKE @tigrato

  • Deploy Teleport on a single EKS cluster
  • Deploy Teleport on two EKS clusters and connect them via trusted cluster feature
  • Deploy Teleport Proxy outside of GKE cluster fronting connections to it (use this script to generate a kubeconfig)
  • Deploy Teleport Proxy outside of EKS cluster fronting connections to it (use this script to generate a kubeconfig)

Teleport with multiple Kubernetes clusters @tigrato

Note: you can use GKE or EKS or minikube to run Kubernetes clusters.
Minikube is the only caveat - it's not reachable publicly so don't run a proxy there.

  • Deploy combo auth/proxy/kubernetes_service outside of a Kubernetes cluster, using a kubeconfig
    • Login with tsh login, check that tsh kube ls has your cluster
    • Run kubectl get nodes, kubectl exec -it $SOME_POD -- sh
    • Verify that the audit log recorded the above request and session
  • Deploy combo auth/proxy/kubernetes_service inside of a Kubernetes cluster
    • Login with tsh login, check that tsh kube ls has your cluster
    • Run kubectl get nodes, kubectl exec -it $SOME_POD -- sh
    • Verify that the audit log recorded the above request and session
  • Deploy combo auth/proxy_service outside of the Kubernetes cluster and kubernetes_service inside of a Kubernetes cluster, connected over a reverse tunnel
    • Login with tsh login, check that tsh kube ls has your cluster
    • Run kubectl get nodes, kubectl exec -it $SOME_POD -- sh
    • Verify that the audit log recorded the above request and session
  • Deploy a second kubernetes_service inside of another Kubernetes cluster, connected over a reverse tunnel
    • Login with tsh login, check that tsh kube ls has both clusters
    • Switch to a second cluster using tsh kube login
    • Run kubectl get nodes, kubectl exec -it $SOME_POD -- sh on the new cluster
    • Verify that the audit log recorded the above request and session
  • Deploy combo auth/proxy/kubernetes_service outside of a Kubernetes cluster, using a kubeconfig with multiple clusters in it
    • Login with tsh login, check that tsh kube ls has all clusters
  • Test Kubernetes screen in the web UI (tab is located on left side nav on dashboard):
    • Verify that all kubes registered are shown with correct name and labels
    • Verify that clicking on a rows connect button renders a dialogue on manual instructions with Step 2 login value matching the rows name column
    • Verify searching for name or labels in the search bar works
    • Verify you can sort by name colum

Teleport with FIPS mode @alistanis @r0mant

  • Perform trusted clusters, Web and SSH sanity check with all teleport components deployed in FIPS mode.

ACME @rudream

  • Teleport can fetch TLS certificate automatically using ACME protocol.

Migrations @hugoShaka

  • Migrate trusted clusters from 9.3 to 10.0
    • Migrate auth server on main cluster, then rest of the servers on main cluster
      SSH should work for both main and old clusters
    • Migrate auth server on remote cluster, then rest of the remote cluster
      SSH should work

Command Templates

When interacting with a cluster, the following command templates are useful:

OpenSSH

# when connecting to the recording proxy, `-o 'ForwardAgent yes'` is required.
ssh -o "ProxyCommand ssh -o 'ForwardAgent yes' -p 3023 %[email protected] -s proxy:%h:%p" \
  node.example.com

# the above command only forwards the agent to the proxy, to forward the agent
# to the target node, `-o 'ForwardAgent yes'` needs to be passed twice.
ssh -o "ForwardAgent yes" \
  -o "ProxyCommand ssh -o 'ForwardAgent yes' -p 3023 %[email protected] -s proxy:%h:%p" \
  node.example.com

# when connecting to a remote cluster using OpenSSH, the subsystem request is
# updated with the name of the remote cluster.
ssh -o "ProxyCommand ssh -o 'ForwardAgent yes' -p 3023 %[email protected] -s proxy:%h:%[email protected]" \
  node.foo.com

Teleport

# when connecting to a OpenSSH node, remember `-p 22` needs to be passed.
tsh --proxy=proxy.example.com --user=<username> --insecure ssh -p 22 node.example.com

# an agent can be forwarded to the target node with `-A`
tsh --proxy=proxy.example.com --user=<username> --insecure ssh -A -p 22 node.example.com

# the --cluster flag is used to connect to a node in a remote cluster.
tsh --proxy=proxy.example.com --user=<username> --insecure ssh --cluster=foo.com -p 22 node.foo.com

Teleport with SSO Providers @ptgott @Tener

  • G Suite install instructions work
    • G Suite Screenshots are up to date
  • Azure Active Directory (AD) install instructions work
    • Azure Active Directory (AD) Screenshots are up to date
  • ActiveDirectory (ADFS) install instructions work
    • Active Directory (ADFS) Screenshots are up to date
  • Okta install instructions work
    • Okta Screenshots are up to date
  • OneLogin install instructions work
    • OneLogin Screenshots are up to date
  • GitLab install instructions work
    • GitLab Screenshots are up to date
  • OIDC install instructions work
    • OIDC Screenshots are up to date
  • All providers with guides in docs are covered in this test plan

tctl sso family of commands @Tener

tctl sso configure helps to construct a valid connector definition:

  • tctl sso configure github ... creates valid connector definitions
  • tctl sso configure oidc ... creates valid connector definitions
  • tctl sso configure saml ... creates valid connector definitions

tctl sso test test a provided connector definition, which can be loaded from
file or piped in with tctl sso configure or tctl get --with-secrets. Valid
connectors are accepted, invalid are rejected with sensible error messages.

  • Connectors can be tested with tctl sso test.
    • GitHub
    • SAML
    • OIDC
      • Google Workspace
      • Non-Google IdP

Teleport Plugins @marcoandredinis

  • Test receiving a message via Teleport Slackbot
  • Test receiving a new Jira Ticket via Teleport Jira

AWS Node Joining @nklaassen

Docs

  • On EC2 instance with ec2:DescribeInstances permissions for local account:
    TELEPORT_TEST_EC2=1 go test ./integration -run TestEC2NodeJoin
  • On EC2 instance with any attached role:
    TELEPORT_TEST_EC2=1 go test ./integration -run TestIAMNodeJoin
  • EC2 Join method in IoT mode with node and auth in different AWS accounts
  • IAM Join method in IoT mode with node and auth in different AWS accounts

Passwordless @r0mant @espadolini

Passwordless requires tsh compiled with libfido2 for most operations (apart
from Touch ID). Ask for a statically-built tsh binary for realistic tests.

Touch ID requires a properly built and signed tsh binary. Ask for a
pre-release binary so you may run the tests.

This sections complements "Users -> Managing MFA devices". Ideally both macOS
and Linux tsh binaries are tested for FIDO2 items.

  • Diagnostics

    Both commands should pass all tests.

    • tsh fido2 diag
    • tsh touchid diag
  • Registration

    • Register a passworldess FIDO2 key (tsh mfa add, choose WEBAUTHN and
      passwordless)
    • Register a Touch ID credential (tsh mfa add, choose TOUCHID)
  • Login

    • Passwordless login using FIDO2 (tsh login --auth=passwordless)
    • Passwordless login using Touch ID (tsh login --auth=passwordless)
    • tsh login --auth=passwordless --mfa-mode=cross-platform uses FIDO2
    • tsh login --auth=passwordless --mfa-mode=platform uses Touch ID
    • tsh login --auth=passwordless --mfa-mode=auto prefers Touch ID
    • Passwordless disable switch works
      (auth_service.authentication.passwordless = false)
    • Cluster in passwordless mode defaults to passwordless
      (auth_service.authentication.connector_name = passwordless)
    • Cluster in passwordless mode allows MFA login
      (tsh login --auth=local)
  • Touch ID support commands

    • tsh touchid ls works
    • tsh touchid rm works (careful, may lock you out!)

WEB UI @kimlisa @rudream @hatched

Main

For main, test with a role that has access to all resources.

Top Nav

  • Verify that cluster selector displays all (root + leaf) clusters
  • Verify that user name is displayed
  • Verify that user menu shows logout, help&support, and account settings (for local users)

Side Nav

  • Verify that each item has an icon
  • Verify that Collapse/Expand works and collapsed has icon >, and expand has icon v
  • Verify that it automatically expands and highlights the item on page refresh

Servers aka Nodes

  • Verify that "Servers" table shows all joined nodes
  • Verify that "Connect" button shows a list of available logins
  • Verify that "Hostname", "Address" and "Labels" columns show the current values
  • Verify that "Search" by hostname, address, labels works
  • Verify that terminal opens when clicking on one of the available logins
  • Verify that clicking on Add Server button renders dialogue set to Automatically view
    • Verify clicking on Regenerate Script regenerates token value in the bash command
    • Verify using the bash command successfully adds the server (refresh server list)
    • Verify that clicking on Manually tab renders manual steps
    • Verify that clicking back to Automatically tab renders bash command

Applications

  • Verify that clicking on Add Application button renders dialogue
    • Verify input validation (prevent empty value and invalid url)
    • Verify after input and clicking on Generate Script, bash command is rendered
    • Verify clicking on Regenerate button regenerates token value in bash command

Databases

  • Verify that clicking on Add Database button renders dialogue for manual instructions:
    • Verify selecting different options on Step 4 changes Step 5 commands

Active Sessions

  • Verify that "empty" state is handled
  • Verify that it displays the session when session is active
  • Verify that "Description", "Session ID", "Users", "Nodes" and "Duration" columns show correct values
  • Verify that "OPTIONS" button allows to join a session

Audit log

  • Verify that time range button is shown and works
  • Verify that clicking on Session Ended event icon, takes user to session player
  • Verify event detail dialogue renders when clicking on events details button
  • Verify searching by type, description, created works

Users

  • Verify that users are shown
  • Verify that creating a new user works
  • Verify that editing user roles works
  • Verify that removing a user works
  • Verify resetting a user's password works
  • Verify search by username, roles, and type works

Auth Connectors

  • Verify when there are no connectors, empty state renders
  • Verify that creating OIDC/SAML/GITHUB connectors works
  • Verify that editing OIDC/SAML/GITHUB connectors works
  • Verify that error is shown when saving an invalid YAML
  • Verify that correct hint text is shown on the right side
  • Verify that encrypted SAML assertions work with an identity provider that supports it (Azure).
  • Verify that created github, saml, oidc card has their icons

Roles

  • Verify that roles are shown
  • Verify that "Create New Role" dialog works
  • Verify that deleting and editing works
  • Verify that error is shown when saving an invalid YAML
  • Verify that correct hint text is shown on the right side

Managed Clusters

  • Verify that it displays a list of clusters (root + leaf)
  • Verify that every menu item works: nodes, apps, audit events, session recordings, etc.

Help & Support

  • Verify that all URLs work and correct (no 404)

Access Requests

Access Request is a Enterprise feature and is not available for OSS.

Creating Access Requests (Role Based)

Create a role with limited permissions allow-roles-and-nodes. This role allows you to see the Role screen and ssh into all nodes.

kind: role
metadata:
  name: allow-roles-and-nodes
spec:
  allow:
    logins:
    - root
    node_labels:
      '*': '*'
    rules:
    - resources:
      - role
      verbs:
      - list
      - read
  options:
    max_session_ttl: 8h0m0s
version: v5

Create another role with limited permissions allow-users-with-short-ttl. This role session expires in 4 minutes, allows you to see Users screen, and denies access to all nodes.

kind: role
metadata:
  name: allow-users-with-short-ttl
spec:
  allow:
    rules:
    - resources:
      - user
      verbs:
      - list
      - read
  deny:
    node_labels:
      '*': '*'
  options:
    max_session_ttl: 4m0s
version: v5

Create a user that has no access to anything but allows you to request roles:

kind: role
metadata:
  name: test-role-based-requests
spec:
  allow:
    request:
      roles:
      - allow-roles-and-nodes
      - allow-users-with-short-ttl
      suggested_reviewers:
      - random-user-1
      - random-user-2
version: v5
  • Verify that under requestable roles, only allow-roles-and-nodes and allow-users-with-short-ttl are listed
  • Verify you can select/input/modify reviewers
  • Verify you can view the request you created from request list (should be in pending states)
  • Verify there is list of reviewers you selected (empty list if none selected AND suggested_reviewers wasn't defined)
  • Verify you can't review own requests

Creating Access Requests (Search Based)

Create a role with access to searcheable resources (apps, db, kubes, nodes, desktops). The template searcheable-resources is below.

kind: role
metadata:
  name: searcheable-resources
spec:
  allow:
    app_labels:  # just example labels
      label1-key: label1-value
      env: [dev, staging] 
    db_labels:
      '*': '*'   # asteriks gives user access to everything
    kubernetes_labels:
      '*': '*' 
    node_labels:
      '*': '*'
    windows_desktop_labels:
      '*': '*'
version: v5

Create a user that has no access to resources, but allows you to search them:

kind: role
metadata:
  name: test-search-based-requests
spec:
  allow:
    request:
      search_as_roles:
      - searcheable resources
      suggested_reviewers:
      - random-user-1
      - random-user-2
version: v5
  • Verify that a user can see resources based on the searcheable-resources rules
  • Verify you can select/input/modify reviewers
  • Verify you can view the request you created from request list (should be in pending states)
  • Verify there is list of reviewers you selected (empty list if none selected AND suggested_reviewers wasn't defined)
  • Verify you can't review own requests
  • Verify that you can't mix adding resources from different clusters (there should be a warning dialogue that clears the selected list)

Viewing & Approving/Denying Requests

Create a user with the role reviewer that allows you to review all requests, and delete them.

kind: role
version: v3
metadata:
  name: reviewer
spec:
  allow:
    review_requests:
      roles: ['*']
  • Verify you can view access request from request list
  • Verify you can approve a request with message, and immediately see updated state with your review stamp (green checkmark) and message box
  • Verify you can deny a request, and immediately see updated state with your review stamp (red cross)
  • Verify deleting the denied request is removed from list

Assuming Approved Requests (Role Based)

  • Verify that assuming allow-roles-and-nodes allows you to see roles screen and ssh into nodes
  • After assuming allow-roles-and-nodes, verify that assuming allow-users-short-ttl allows you to see users screen, and denies access to nodes
    • Verify a switchback banner is rendered with roles assumed, and count down of when it expires
    • Verify switching back goes back to your default static role
    • Verify after re-assuming allow-users-short-ttl role, the user is automatically logged out after the expiry is met (4 minutes)

Assuming Approved Requests (Search Based)

  • Verify that assuming approved request, allows you to see the resources you've requested.

Assuming Approved Requests (Both)

  • Verify assume buttons are only present for approved request and for logged in user
  • Verify that after clicking on the assume button, it is disabled in both the list and in viewing
  • Verify that after re-login, requests that are not expired and are approved are assumable again

Access Request Waiting Room

Strategy Reason

Create the following role:

kind: role
metadata:
  name: waiting-room
spec:
  allow:
    request:
      roles:
      - <some other role to assign user after approval>
  options:
    max_session_ttl: 8h0m0s
    request_access: reason
    request_prompt: <some custom prompt to show in reason dialogue>
version: v3
  • Verify after login, reason dialogue is rendered with prompt set to request_prompt setting
  • Verify after clicking send request, pending dialogue renders
  • Verify after approving a request, dashboard is rendered
  • Verify the correct role was assigned

Strategy Always

With the previous role you created from Strategy Reason, change request_access to always:

  • Verify after login, pending dialogue is auto rendered
  • Verify after approving a request, dashboard is rendered
  • Verify after denying a request, access denied dialogue is rendered
  • Verify a switchback banner is rendered with roles assumed, and count down of when it expires
  • Verify switchback button says Logout and clicking goes back to the login screen

Strategy Optional

With the previous role you created from Strategy Reason, change request_access to optional:

  • Verify after login, dashboard is rendered as normal

Terminal

  • Verify that top nav has a user menu (Main and Logout)
  • Verify that switching between tabs works on alt+[1...9]

Node List Tab

  • Verify that Cluster selector works (URL should change too)
  • Verify that Quick launcher input works
  • Verify that Quick launcher input handles input errors
  • Verify that "Connect" button shows a list of available logins
  • Verify that "Hostname", "Address" and "Labels" columns show the current values
  • Verify that "Search" by hostname, address, labels work
  • Verify that new tab is created when starting a session

Session Tab

  • Verify that session and browser tabs both show the title with login and node name
  • Verify that terminal resize works
    • Install midnight commander on the node you ssh into: $ sudo apt-get install mc
    • Run the program: $ mc
    • Resize the terminal to see if panels resize with it
  • Verify that session tab shows/updates number of participants when a new user joins the session
  • Verify that tab automatically closes on "$ exit" command
  • Verify that SCP Upload works
  • Verify that SCP Upload handles invalid paths and network errors
  • Verify that SCP Download works
  • Verify that SCP Download handles invalid paths and network errors

Session Player

  • Verify that it can replay a session
  • Verify that when playing, scroller auto scrolls to bottom most content
  • Verify when resizing player to a small screen, scroller appears and is working
  • Verify that error message is displayed (enter an invalid SID in the URL)

Invite and Reset Form

  • Verify that input validates
  • Verify that invite works with 2FA disabled
  • Verify that invite works with OTP enabled
  • Verify that invite works with U2F enabled
  • Verify that invite works with WebAuthn enabled
  • Verify that error message is shown if an invite is expired/invalid

Login Form and Change Password

  • Verify that input validates
  • Verify that login works with 2FA disabled
  • Verify that changing passwords works for 2FA disabled
  • Verify that login works with OTP enabled
  • Verify that changing passwords works for OTP enabled
  • Verify that login works with U2F enabled
  • Verify that changing passwords works for U2F enabled
  • Verify that login works with WebAuthn enabled
  • Verify that changing passwords works for WebAuthn enabled
  • Verify that login works for Github/SAML/OIDC
  • Verify that redirect to original URL works after successful login
  • Verify that account is locked after several unsuccessful login attempts
  • Verify that account is locked after several unsuccessful change password attempts

Multi-factor Authentication (mfa)

Create/modify teleport.yaml and set the following authentication settings under auth_service

authentication:
  type: local
  second_factor: optional
  require_session_mfa: yes
  webauthn:
    rp_id: example.com

MFA invite, login, password reset, change password

  • Verify during invite/reset, second factor list all auth types: none, hardware key, and authenticator app
  • Verify registration works with all option types
  • Verify login with all option types
  • Verify changing password with all option types
  • Change second_factor type to on and verify that mfa is required (no option none in dropdown)

MFA require auth

Go to Account Settings > Two-Factor Devices and register a new device

Using the same user as above:

  • Verify logging in with registered WebAuthn key works
  • Verify connecting to a ssh node prompts you to tap your registered WebAuthn key
  • Verify in the web terminal, you can scp upload/download files

MFA Management

  • Verify adding first device works without requiring re-authentication
  • Verify re-authenticating with a WebAuthn device works
  • Verify re-authenticating with a U2F device works
  • Verify re-authenticating with a OTP device works
  • Verify adding a WebAuthn device works
  • Verify adding a U2F device works
  • Verify adding an OTP device works
  • Verify removing a device works
  • Verify second_factor set to off disables adding devices

Passwordless

  • Pure passwordless registrations and resets are possible
  • Verify adding a passwordless device (WebAuthn)
  • Verify passwordless logins

Cloud

From your cloud staging account, change the field teleportVersion to the test version.

$ kubectl -n <namespace> edit tenant

Recovery Code Management

  • Verify generating recovery codes for local accounts with email usernames works
  • Verify local accounts with non-email usernames are not able to generate recovery codes
  • Verify SSO accounts are not able to generate recovery codes

Invite/Reset

  • Verify email as usernames, renders recovery codes dialog
  • Verify non email usernames, does not render recovery codes dialog

Recovery Flow: Add new mfa device

  • Verify recovering (adding) a new hardware key device with password
  • Verify recovering (adding) a new otp device with password
  • Verify viewing and deleting any old device (but not the one just added)
  • Verify new recovery codes are rendered at the end of flow

Recovery Flow: Change password

  • Verify recovering password with any mfa device
  • Verify new recovery codes are rendered at the end of flow

Recovery Email

  • Verify receiving email for link to start recovery
  • Verify receiving email for successfully recovering
  • Verify email link is invalid after successful recovery
  • Verify receiving email for locked account when max attempts reached

RBAC

Create a role, with no allow.rules defined:

kind: role
metadata:
  name: rbac
spec:
  allow:
    app_labels:
      '*': '*'
    logins:
    - root
    node_labels:
      '*': '*'
  options:
    max_session_ttl: 8h0m0s
version: v3
  • Verify that a user has access only to: "Servers", "Applications", "Databases", "Kubernetes", "Active Sessions", "Access Requests" and "Manage Clusters"
  • Verify there is no Add Server, Application, Databases, Kubernetes button in each respective view
  • Verify only Servers, Apps, Databases, and Kubernetes are listed under options button in Manage Clusters

Note: User has read/create access_request access to their own requests, despite resource settings

Add the following under spec.allow.rules to enable read access to the audit log:

  - resources:
      - event
      verbs:
      - list
  • Verify that the Audit Log and Session Recordings is accessible
  • Verify that playing a recorded session is denied

Add the following to enable read access to recorded sessions

  - resources:
      - session
      verbs:
      - read
  • Verify that a user can re-play a session (session.end)

Add the following to enable read access to the roles

- resources:
      - role
      verbs:
      - list
      - read
  • Verify that a user can see the roles
  • Verify that a user cannot create/delete/update a role

Add the following to enable read access to the auth connectors

- resources:
      - auth_connector
      verbs:
      - list
      - read
  • Verify that a user can see the list of auth connectors.
  • Verify that a user cannot create/delete/update the connectors

Add the following to enable read access to users

  - resources:
      - user
      verbs:
      - list
      - read
  • Verify that a user can access the "Users" screen
  • Verify that a user cannot reset password and create/delete/update a user

Add the following to enable read access to trusted clusters

  - resources:
      - trusted_cluster
      verbs:
      - list
      - read
  • Verify that a user can access the "Trust" screen
  • Verify that a user cannot create/delete/update a trusted cluster.

Performance/Soak Test @rosstimothy @espadolini

Using tsh bench tool, perform the soak tests and benchmark tests on the following configurations:

  • Cluster with 10K nodes in normal (non-IOT) node mode with ETCD

  • Cluster with 10K nodes in normal (non-IOT) mode with DynamoDB

  • Cluster with 1K IOT nodes with ETCD

  • Cluster with 1K IOT nodes with DynamoDB

  • Cluster with 500 trusted clusters with ETCD

  • Cluster with 500 trusted clusters with DynamoDB

Soak Tests

Run 4hour soak test with a mix of interactive/non-interactive sessions:

tsh bench --duration=4h user@teleport-monster-6757d7b487-x226b ls
tsh bench -i --duration=4h user@teleport-monster-6757d7b487-x226b ps uax

Observe prometheus metrics for goroutines, open files, RAM, CPU, Timers and make sure there are no leaks

  • Verify that prometheus metrics are accurate.

Breaking load tests

Load system with tsh bench to the capacity and publish maximum numbers of concurrent sessions with interactive
and non interactive tsh bench loads.

Teleport with Cloud Providers

AWS @lxea

GCP @EdwardDowling

  • Deploy Teleport to GCP. Using Cloud Firestore & Cloud Storage
  • Deploy Teleport to GKE. Google Kubernetes engine.
  • Deploy Teleport Enterprise to GCP.

IBM @r0mant

  • Deploy Teleport to IBM Cloud. Using IBM Database for etcd & IBM Object Store
  • Deploy Teleport to IBM Cloud Kubernetes.
  • Deploy Teleport Enterprise to IBM Cloud.

Application Access @strideynet

  • Run an application within local cluster.
    • Verify the debug application debug_app: true works.
    • Verify an application can be configured with command line flags.
    • Verify an application can be configured from file configuration.
    • Verify that applications are available at auto-generated addresses name.rootProxyPublicAddr and well as publicAddr.
  • Run an application within a trusted cluster.
    • Verify that applications are available at auto-generated addresses name.rootProxyPublicAddr.
  • Verify Audit Records.
    • app.session.start and app.session.chunk events are created in the Audit Log.
    • app.session.chunk points to a 5 minute session archive with multiple app.session.request events inside.
    • tsh play <chunk-id> can fetch and print a session chunk archive.
  • Verify JWT using verify-jwt.go.
  • Verify RBAC.
  • Verify CLI access with tsh app login.
  • Verify AWS console access.
    • Can log into AWS web console through the web UI.
    • Can interact with AWS using tsh aws commands.
  • Verify dynamic registration.
    • Can register a new app using tctl create.
    • Can update registered app using tctl create -f.
    • Can delete registered app using tctl rm.
  • Test Applications screen in the web UI (tab is located on left side nav on dashboard):
    • Verify that all apps registered are shown
    • Verify that clicking on the app icon takes you to another tab
    • Verify using the bash command produced from Add Application dialogue works (refresh app screen to see it registered)

Database Access @smallinsky

  • Connect to a database within a local cluster.
  • Connect to a database within a remote cluster via a trusted cluster.
  • Verify audit events @Tener
    • db.session.start is emitted when you connect.
    • db.session.end is emitted when you disconnect.
    • db.session.query is emitted when you execute a SQL query.
  • Verify RBAC @smallinsky
    • tsh db ls shows only databases matching role's db_labels.
    • Can only connect as users from db_users.
    • (Postgres only) Can only connect to databases from db_names.
      • db.session.start is emitted when connection attempt is denied.
    • (MongoDB only) Can only execute commands in databases from db_names.
      • db.session.query is emitted when command fails due to permissions.
    • Can configure per-session MFA.
      • MFA tap is required on each tsh db connect.
  • Verify dynamic registration @Tener
    • Can register a new database using tctl create.
    • Can update registered database using tctl create -f.
    • Can delete registered database using tctl rm.
  • Verify discovery @greedy52
    • Can detect and register RDS instances.
    • Can detect and register Aurora clusters, and their reader and custom endpoints.
    • Can detect and register Redshift clusters.
    • Can detect and register ElastiCache Redis clusters.
  • Test Databases screen in the web UI (tab is located on left side nav on dashboard): @gabrielcorado
    • Verify that all dbs registered are shown with correct name, description, type, and labels
    • Verify that clicking on a rows connect button renders a dialogue on manual instructions with Step 2 login value matching the rows name column
    • Verify searching for all columns in the search bar works
    • Verify you can sort by all columns except labels

TLS Routing @smallinsky

  • Verify that teleport proxy v2 configuration starts only a single listener.
    version: v2
    teleport:
      proxy_service:
        enabled: "yes"
        public_addr: ['root.example.com']
        web_listen_addr: 0.0.0.0:3080
    
  • Run Teleport Proxy in multiplex mode auth_service.proxy_listener_mode: "multiplex"
    • Trusted cluster
      • Setup trusted clusters using single port setup web_proxy_addr == tunnel_addr
      kind: trusted_cluster
      spec:
        ...
        web_proxy_addr: root.example.com:443
        tunnel_addr: root.example.com:443
        ...
      
  • Database Access
    • Verify that tsh db connect works through proxy running in multiplex mode
      • Postgres
      • MySQL
      • MariaDB
      • MongoDB
      • CockroachDB
    • Verify connecting to a database through TLS ALPN SNI local proxy tsh db proxy with a GUI client.
  • Application Access
    • Verify app access through proxy running in multiplex mode
  • SSH Access
    • Connect to a OpenSSH server through a local ssh proxy ssh -o "ForwardAgent yes" -o "ProxyCommand tsh proxy ssh" [email protected]
    • Connect to a OpenSSH server on leaf-cluster through a local ssh proxyssh -o "ForwardAgent yes" -o "ProxyCommand tsh proxy ssh --user=%r --cluster=leaf-cluster %h:%p" [email protected]
    • Verify tsh ssh access through proxy running in multiplex mode
  • Kubernetes access: @GavinFrazar
    • Verify kubernetes access through proxy running in multiplex mode

Desktop Access

Basic Sessions (@LKozlowski)

  • Direct mode (set listen_addr):
    • Can connect to desktop defined in static hosts section.
    • Can connect to desktop discovered via LDAP
  • IoT mode (reverse tunnel through proxy):
    • Can connect to desktop defined in static hosts section.
    • Can connect to desktop discovered via LDAP
  • Connect multiple windows_desktop_services to the same Teleport cluster,
    verify that connections to desktops on different AD domains works. (Attempt to
    connect several times to verify that you are routed to the correct
    windows_desktop_service)

User Input (@ibeckermayer)

  • Verify user input

    • Download Keyboard Key Info and
      verify all keys are processed correctly in each supported browser. Known
      issues: F11 cannot be captured by the browser without
      special configuration
      on MacOS.
    • Left click and right click register as Windows clicks. (Right click on
      the desktop should show a Windows menu, not a browser context menu)
    • Vertical and horizontal scroll work.
      Horizontal Scroll Test
  • Locking and access (@ibeckermayer)

    • Verify that placing a user lock terminates an active desktop session.
    • Verify that placing a desktop lock terminates an active desktop session.
    • Verify that placing a role lock terminates an active desktop session.
    • Verify that connecting to a locked desktop fails.
    • Set client_idle_timeout to a small value and verify that idle sessions
      are terminated (the session should end and an audit event will confirm it
      was due to idle connection)
  • Labeling (@LKozlowski)

    • All desktops have teleport.dev/origin label.
    • Dynamic desktops have additional teleport.dev labels for OS, OS
      Version, DNS hostname, and OU.
    • Regexp-based host labeling applies across all desktops, regardless of
      origin.
    • LDAP attribute labeling functions correctly
  • RBAC (@zmb3)

    • RBAC denies access to a Windows desktop due to labels
    • RBAC denies access to a Windows desktop with the wrong OS-login.
  • Clipboard Support (@zmb3)

    • When a user has a role with clipboard sharing enabled and is using a chromium based browser
      • Going to a desktop when clipboard permissions are in "Ask" mode (aka "prompt") causes the browser to show a prompt while the UI shows a spinner
      • X-ing out of the prompt (causing the clipboard permission to remain in "Ask" mode) causes the prompt to show up again
      • Denying clibpoard permissions brings up a relevant error alert (with "Clipboard Sharing Disabled" in the top bar)
      • Allowing clipboard permissions allows you to see the desktop session, with "Clipboard Sharing Enabled" highlighted in the top bar
      • Copy text from local workstation, paste into remote desktop
      • Copy text from remote desktop, paste into local workstation
      • Copying unicode text also works in both directions
    • When a user has a role with clipboard sharing enabled and is not using a chromium based browser
      • The UI shows a relevant alert and "Clipboard Sharing Disabled" is highlighted in the top bar
    • When a user has a role with clipboard sharing disabled and is using a chromium and non-chromium based browser (confirm both)
      • The live session should show disabled in the top bar and copy/paste should not work between your workstation and the remote desktop.
      • [ ]
  • Per-Session MFA (try webauthn on each of Chrome, Safari, and Firefox) @zmb3

    • Attempting to start a session no keys registered shows an error message
    • Attempting to start a session with a webauthn registered pops up the "Verify Your Identity" dialog
      • Hitting "Cancel" shows an error message
      • Hitting "Verify" causes your browser to prompt you for MFA
      • Cancelling that browser MFA prompt shows an error
      • Successful MFA verification allows you to connect
  • Session Recording (@LKozlowski)

    • Verify sessions are not recorded if all of a user's roles disable recording
    • Verify sync recording (mode: node-sync or mode: proy-sync)
    • Verify async recording (mode: node or mode: proxy)
    • Sessions show up in session recordings UI with desktop icon
    • Sessions can be played back, including play/pause functionality
    • A session that ends with a TDP error message can be played back, ends by displaying the error message,
      and the progress bar progresses to the end.
    • Attempting to play back a session that doesn't exist (i.e. by entering a non-existing session id in the url) shows
      a relevant error message.
    • RBAC for sessions: ensure users can only see their own recordings when
      using the RBAC rule from our
      docs
  • Audit Events (check these after performing the above tests) (@ibeckermayer)

    • windows.desktop.session.start (TDP00I) emitted on start
    • windows.desktop.session.start (TDP00W) emitted when session fails to
      start (due to RBAC, for example)
    • windows.desktop.session.end (TDP01I) emitted on end
    • desktop.clipboard.send (TDP02I) emitted for local copy -> remote
      paste
    • desktop.clipboard.receive (TDP03I) emitted for remote copy -> local
      paste

Binaries compatibility @fheinecke

  • Verify that teleport/tsh/tctl/tbot run on:
    • CentOS 7
    • CentOS 8
    • Ubuntu 18.04
    • Ubuntu 20.04
    • Debian 9
  • Verify tsh runs on:
    • Windows 10
    • MacOS

Machine ID @timothyb89

SSH

With a default Teleport instance configured with a SSH node:

  • Verify you are able to create a new bot user with tctl bots add robot --roles=access. Follow the instructions provided in the output to start tbot
  • Verify you are able to connect to the SSH node using openssh with the generated ssh_config in the destination directory
  • Verify that after the renewal period (default 20m, but this can be reduced via configuration), that newly generated certificates are placed in the destination directory
  • Verify that sending both SIGUSR1 and SIGHUP to a running tbot process causes a renewal and new certificates to be generated

Ensure the above tests are completed for both:

  • Directly connecting to the auth server
  • Connecting to the auth server via the proxy reverse tunnel

DB Access

With a default Postgres DB instance, a Teleport instance configured with DB access and a bot user configured:

  • Verify you are able to connect to and interact with a database using tbot db while tbot start is running

Teleport Connect @ravicious @gzdunek @avatus

  • Auth methods @ravicious
    • Verify that the app supports clusters using different auth settings
      (auth_service.authentication in the cluster config):
      • type: local, second_factor: "off"
      • type: local, second_factor: "otp"
      • type: local, second_factor: "webauthn"
      • type: local, second_factor: "optional", log in without MFA
      • type: local, second_factor: "optional", log in with OTP
      • type: local, second_factor: "optional", log in with hardware key
      • type: local, second_factor: "on", log in with OTP
      • type: local, second_factor: "on", log in with hardware key
      • Authentication connectors: @ravicious
        • For those you might want to use clusters that are deployed on the web, specified in parens.
          Or set up the connectors on a local enterprise cluster following the guide from our wiki.
        • GitHub (asteroid)
          • local login on a GitHub-enabled cluster
        • SAML (platform cluster)
        • OIDC (e-demo)
  • Shell @gzdunek
    • Verify that the shell is pinned to the correct cluster (for root clusters and leaf clusters).
      • That is, opening new shell sessions in other workspaces or other clusters within the same
        workspace should have no impact on the original shell session.
    • Verify that the local shell is opened with correct env vars.
      • TELEPORT_PROXY and TELEPORT_CLUSTER should pin the session to the correct cluster.
      • TELEPORT_HOME should point to ~/Library/Application Support/Teleport Connect/tsh.
      • PATH should include /Applications/Teleport Connect.app/Contents/Resources/bin.
    • Verify that the working directory in the tab title is updated when you change the directory
      (only for local terminals).
    • Verify that terminal resize works for both local and remote shells.
      • Install midnight commander on the node you ssh into: $ sudo apt-get install mc
      • Run the program: $ mc
      • Resize Teleport Connect to see if the panels resize with it
    • Verify that the tab automatically closes on $ exit command.
  • State restoration @ravicious
    • Verify that the app asks about restoring the previous tabs when launched and restores them
      properly.
    • Verify that the app opens with the cluster that was active when you closed the app.
    • Verify that the app remembers size & position after restart.
    • Verify that reopening a cluster that has no workspace assigned
      works.
    • Verify that reopening the app after removing ~/Library/Application Support/Teleport Connect/tsh
      doesn't crash the app.
    • Verify that reopening the app after removing ~/Library/Application Support/Teleport Connect/app_state.json
      but not the tsh dir doesn't crash the app.
    • Verify that logging out of a cluster and then logging in to the same cluster doesn't
      remember previous tabs (they should be cleared on logout).
  • Connections picker @ravicious
    • Verify that the connections picker shows new connections when ssh & db tabs are opened.
    • Check if those connections are available after the app restart.
    • Check that those connections are removed after you log out of the root cluster that they
      belong to.
    • Verify that reopening a db connection from the connections picker remembers last used port & database name.
  • Cluster resources (servers/databases) @gzdunek
    • Verify that the app shows the same resources as the Web UI.
    • Verify that search is working for the resources lists.
    • Verify that you can connect to these resources.
    • Verify that clicking "Connect" shows available logins and db usernames.
      • Logins and db usernames are taken from the role, under spec.allow.logins and
        spec.allow.db_users.
    • Repeat the above steps for resources in leaf clusters. @ravicious
    • Verify that tabs have correct titles set.
    • Verify that the port number remains the same for a db connection between app restarts.
    • Create a db connection, close the app, run tsh proxy db with the same port, start the app.
      Verify that the app doesn't crash and the db connection tab shows you the error (address in
      use) and offers a way to retry creating the connection.
  • Shortcuts @gzdunek
    • Verify that switching between tabs works on Cmd+[1...9].
    • Verify that other shortcuts are shown after you close all tabs.
    • Verify that the other shortcuts work and each of them is shown on hover on relevant UI
      elements.
  • Workspaces @ravicious
    • Verify that logging in to a new cluster adds it to the identity switcher and switches to the
      workspace of that cluster automatically.
    • Verify that the state of the current workspace is preserved when you change the workspace (by
      switching to another cluster) and return to the previous workspace.
  • Command bar & autocomplete @gzdunek
    • Do the steps for the root cluster, then switch to a leaf cluster and repeat them.
    • Verify that the autocomplete for tsh ssh filters SSH logins and autocompletes them.
    • Verify that the autocomplete for tsh ssh filters SSH hosts by name and label and
      autocompletes them.
    • Verify that launching an invalid tsh ssh command shows the error in a new tab.
    • Verify that launching a valid tsh ssh command opens a new tab with the session opened.
    • Verify that the autocomplete for tsh proxy db filters databases by name and label and
      autocompletes them.
    • Verify that launching a tsh proxy db command opens a new local shell with the command
      running.
    • Verify that the autocomplete for tsh ssh doesn't break when you cut/paste commands in
      various points.
    • Verify that manually typing out what the autocomplete would suggest doesn't break the
      command bar.
    • Verify that launching any other command that's not supported by the autocomplete opens a new
      local shell with that command running.
  • Resilience when resources become unavailable @gzdunek
    • For each scenario, create at least one tab for each available kind (minus k8s for now).
    • For each scenario, first do the external action, then click "Sync" on the relevant cluster tab.
      Verify that no unrecoverable error was raised. Then restart the app and verify that it was
      restarted gracefully (no unrecoverable error on restart, the user can continue using the app).
      • Stop the root cluster.
      • Stop a leaf cluster.
      • Disconnect your device from the internet.
  • Refreshing certs @gzdunek
    • To test scenarios from this section, create a user with a role that has TTL of 1m
      (spec.options.max_session_ttl).
    • Log in, create a db connection and run the CLI command; wait for the cert to expire, click
      "Sync" on the cluster tab.
      • Verify that after successfully logging in:
        • the cluster info is synced
        • the connection in the running CLI db client wasn't dropped; try executing select now();, the client should be able to automatically reinstantiate the connection.
        • the database proxy is able to handle new connections; click "Run" in the db tab and see
          if it connects without problems. You might need to resync the cluster again in case they
          managed to expire.
      • Verify that closing the login modal without logging in shows an error related to syncing
        the cluster.
    • Log in; wait for the cert to expire, click "Connect" next to a db in the cluster tab.
      • Verify that clicking "Connect" and then navigating to a different tab before the request
        completes doesn't show the login modal and instead immediately shows the error.
      • For this one, you might want to use a sever in our Cloud if the introduced latency is high
        enough. Perhaps enabling throttling in dev tools can help too.
    • Log in; create two db connections, then remove access to one of the db servers for that
      user; wait for the cert to expire, click "Sync", verify that the db tab with no access shows an
      appropriate error and that the other db tab still handles old and new connections.
  • Verify that logs are collected for all processes (main, renderer, shared, tshd) under
    ~/Library/Application\ Support/Teleport\ Connect/logs. @ravicious
  • Verify that the password from the login form is not saved in the renderer log. @ravicious
  • Log in to a cluster, then log out and log in again as a different user. Verify that the app
    works properly after that. @gzdunek

Host users creation @jakule

Host users creation docs
Host users creation RFD

  • Verify host users creation functionality
    • non-existing users are created automatically
    • users are added to groups
      • non existing configured groups are created
      • created users are added to the teleport-system group
    • users are cleaned up after their session ends
      • cleanup occurs if a program was left running after session ends
    • sudoers file creation is successful
      • Invalid sudoers files are not created
    • existing host users are not modified
    • setting disable_create_host_user: true stops user creation from occurring

CA rotations @espadolini

  • Verify the CA rotation functionality itself (by checking in the backend or with tctl get cert_authority)
    • standby phase: only active_keys, no additional_trusted_keys
    • init phase: active_keys and additional_trusted_keys
    • update_clients and update_servers phases: the certs from the init phase are swapped
    • standby phase: only the new certs remain in active_keys, nothing in additional_trusted_keys
    • rollback phase (second pass, after completing a regular rotation): same content as in the init phase
    • standby phase after rollback: same content as in the previous standby phase
  • Verify functionality in all phases (clients might have to log in again in lieu of waiting for credentials to expire between phases)
    • SSH session in tsh from a previous phase
    • SSH session in web UI from a previous phase
    • New SSH session with tsh
    • New SSH session with web UI
    • New SSH session in a child cluster on the same major version
    • New SSH session in a child cluster on the previous major version - blocked on v9 leaf clusters with a v10 root are malfunctioning because of db authority #13793
    • New SSH session from a parent cluster
    • Application access through a browser
    • Application access through curl with tsh app login
    • kubectl get po after tsh kube login
    • Database access (no configuration change should be necessary if the database CA isn't rotated, other Teleport functionality should not be affected if only the database CA is rotated)

IP-based validation

SSH @probakowski

  • Verify IP-based validation works for SSH
    • pin_source_ip: true option can be added in role definition
    • tsh ssh works when invoked from the same machine/IP that was used for logging in
    • tsh ssh prompts for relogin when invoked from different machine (copy certs after login)
    • connecting to sshd server works as above in both cases
    • ssh works as above in both cases
    • SSH access from WebUI works with IP pinning enabled
  • tsh status -d shows pinned IP
@r0mant r0mant added bug test-plan A list of tasks required to ship a successful product release. labels Jun 9, 2022
@r0mant r0mant mentioned this issue Jun 9, 2022
31 tasks
@zmb3
Copy link
Collaborator

zmb3 commented Jun 15, 2022

Looks like we "regressed" and increased the GLIBC dependency again.

Edit: this appears to be related to the Rust version. Reverting to 1.58.1 seems to fix it.

I will downgrade for now: #13544

@codingllama
Copy link
Contributor

codingllama commented Jun 15, 2022

A few preliminary findings:

  1. tctl and teleport always print the warning below on macOS, which I think could be downgraded:
$ tctl -c ./teleport.yaml users ls
> 2022-06-15T17:29:04-03:00 WARN             Disabling host user creation as this feature is only available on Linux config/configuration.go:998

$ teleport start -c ./teleport.yaml
> 2022-06-15T17:28:58-03:00 WARN             Disabling host user creation as this feature is only available on Linux config/configuration.go:998
  1. tctl still mentions the (removed) "admin" role:
$ tctl -c ./teleport.yaml users add --help
(...)
Examples:

  > tctl users add --roles=admin,dba joe

  This creates a Teleport account 'joe' who will assume the roles 'admin' and 'dba'
  To see the permissions of 'admin' role, execute 'tctl get role/admin'
  1. tsh Touch ID authn isn't respecting users and picking the "oldest" credential

Repro by adding >1 credential and then >1 users. 😢

I'll focus on (3), (1) and (2) are easy pickings if someone wants to fix them.

@r0mant
Copy link
Collaborator Author

r0mant commented Jun 15, 2022

@lxea Could you take a look at "1" and "2" from Alan's comment above?

@GavinFrazar
Copy link
Contributor

GavinFrazar commented Jun 16, 2022

I noticed in the audit log when I do anything on my database (mysql) the log entries always show [undefined], even if I select a database explicitly during my session with "use ". Looks like this:

User [remote-alice-cluster1] has executed query [show tables] in database [undefined] on [testmysql]
User [remote-alice-cluster1] has executed query [show databases] in database [undefined] on [testmysql]
User [remote-alice-cluster1] has changed default database to [foodb] on [testmysql]

edit: found an issue for this #5903

It appears the behavior is to always show the database name used on login.

So if I do $ tsh db login --db-name=foodb testmysql or tsh db connect --db-name=foodb testmysql then all audit logs in that session will show [foodb] as the database. If I switch databases in mysql with use otherdb, then audit log continues to show actions as if they were done in [foodb]. If I don't specify any --db-name with login/connect then it's always [undefined].

@Joerger
Copy link
Contributor

Joerger commented Jun 16, 2022

I found a tsh ssh -J regression related to TLS routing - #13554

@strideynet
Copy link
Contributor

tsh play <chunk-id> can fetch and print a session chunk archive.

Not concerned this is a blocker, and may actually just be the test plan being incorrect. This command fails with offset 0 not found for session. This is because by default tsh play attempts to play a session back to the PTY which is not compatible with application access session recordings. Running the command with --format json succeeds. Looking at the blame of the code, it doesn't look like this is a recent regression, and may have always been the case.

Do we want to update the test plan with the correct command ? I imagine eventually it would be nice if user's didn't have to provide this flag for the command to work, but given how we currently switch in the implementation between two modes, it will probably involve rewriting onPlay to support that.

@strideynet
Copy link
Contributor

strideynet commented Jun 16, 2022

Discovered a regression with using the configuration output by teleport configure: #13558

I'll write a fix for this today and we should be able to get it merged down asap.


This fix has been merged down to branch/v10 and I can confirm the regression appears to be fixed.

@rosstimothy
Copy link
Contributor

rosstimothy commented Jun 16, 2022

Discovered some backwards incompatibility with SSO login: #13575

Edit (Joerger): Fixed in #13589

@Joerger
Copy link
Contributor

Joerger commented Jun 16, 2022

Found a regression in tsh join, I'll try fixing it.

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x158 pc=0x17b3fc0]

goroutine 1 [running]:
github.com/gravitational/teleport/api/types.(*SessionTrackerV1).GetAddress(0x390c700?)
	/home/bjoerger/gravitational/teleport/api/types/session_tracker.go:274
github.com/gravitational/teleport/lib/client.(*TeleportClient).Join(0xc00025e700, {0x3931f90, 0xc0000541f8}, {0x341aee2, 0x4}, {0x3426c0b?, 0x7}, {0x7ffd260b70f7, 0x24}, {0x0, ...})
	/home/bjoerger/gravitational/teleport/lib/client/api.go:1976 +0x6f2
main.onJoin.func1()
	/home/bjoerger/gravitational/teleport/tool/tsh/tsh.go:2584 +0x65
github.com/gravitational/teleport/lib/client.RetryWithRelogin({0x3932000, 0xc000a4c4b0}, 0xc00025e700, 0xc000b3e550)
	/home/bjoerger/gravitational/teleport/lib/client/api.go:719 +0x4e
main.onJoin(0xc0006ac000)
	/home/bjoerger/gravitational/teleport/tool/tsh/tsh.go:2583 +0x1b5
main.Run({0x39330d8, 0xc0002ae780}, {0xc00004e090, 0x3, 0x3}, {0x0, 0x0, 0xc0000021a0?})
	/home/bjoerger/gravitational/teleport/tool/tsh/tsh.go:859 +0x12445
main.main()
	/home/bjoerger/gravitational/teleport/tool/tsh/tsh.go:396 +0x318

Edit: fixed in #13596

@Joerger
Copy link
Contributor

Joerger commented Jun 16, 2022

Possible regression: I can't join/view my own sessions despite having permissions to do so. Am I missing something in https://goteleport.com/docs/ver/10.0/access-controls/reference/?

#13595

@GavinFrazar
Copy link
Contributor

Some issues I ran into while testing kube access locally:

  1. tsh kube exec --tty --stdin shell-demo /bin/sh leads to panic:
Click for example

> tsh kube exec --tty --stdin shell-demo /bin/sh 
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x2 addr=0x0 pc=0x106905790]

goroutine 1 [running]:
main.(*StreamOptions).SetupTTY(0x14000abe410)
	/Users/gavin/work/teleport/tool/tsh/kube.go:281 +0x180
main.(*ExecOptions).Run(0x14000abe410)
	/Users/gavin/work/teleport/tool/tsh/kube.go:356 +0x280
main.(*kubeExecCommand).run(0x14000674600, 0x0?)
	/Users/gavin/work/teleport/tool/tsh/kube.go:467 +0x388
main.Run({0x1075eac90, 0x140006f1540}, {0x140001b6010, 0x6, 0x6}, {0x0, 0x0, 0x300000002?})
	/Users/gavin/work/teleport/tool/tsh/tsh.go:896 +0x12e98
main.main()
	/Users/gavin/work/teleport/tool/tsh/tsh.go:396 +0x2c0
 [19:16:57] gavin@mac ~ [SIGINT] 
> kubectl exec -it shell-demo -- /bin/sh
# whoami
root

  1. tsh kube credentials issue when --teleport-cluster flag does not match $TELEPORT_CLUSTER
  • On first use, you get an error. If you immediately run the same command, it prints the credentials.
    I ran into this because teleport modifies kubeconfig to execute this command to authenticate, if you're not already logged into teleport. So essentially, rm -rf ~/.tsh && kubectl get pods prompts me for my password and then prints an error message, but if I just run kubectl get pods again, it works.
Click for example

[19:08:20] gavin@mac ~ [1] 
> rm -rf ~/.tsh
[19:09:07] gavin@mac ~  
> tenv show
TELEPORT_CLUSTER=cluster2
TELEPORT_DEV_OUT=/tmp/out2.log
TELEPORT_CONFIG_FILE=/Users/gavin/teleport-config/nodes/cluster2.yaml
TELEPORT_USER=alice
TELEPORT_DEV_CONFIG_FILE=/Users/gavin/teleport-config/nodes/cluster2.yaml
TELEPORT_PROXY=proxy2.local.gd:4080
[19:09:11] gavin@mac ~  
> bat ~/.kube/config | rg "exec" -A 10
    exec:
      apiVersion: client.authentication.k8s.io/v1beta1
      args:
      - kube
      - credentials
      - --kube-cluster=minikube
      - --teleport-cluster=cluster1
      - --proxy=proxy1.local.gd:3080
      - --insecure
      command: /Users/gavin/work/teleport/build/tsh
      env: null
[19:09:41] gavin@mac ~  
> kubectl get pods
Enter password for Teleport user alice:
WARNING: You are using insecure connection to SSH proxy https://proxy1.local.gd:3080
ERROR: SSH cert not available

Unable to connect to the server: getting credentials: exec: executable /Users/gavin/work/teleport/build/tsh failed with exit code 1
[19:09:57] gavin@mac ~ [1] 
> kubectl get pods
NAME         READY   STATUS    RESTARTS   AGE
shell-demo   1/1     Running   0          75m

@Tener
Copy link
Contributor

Tener commented Jun 17, 2022

@GavinFrazar

tsh kube credentials issue when --teleport-cluster flag does not match $TELEPORT_CLUSTER

I'm not sure if this would fix the outlined issue, but I noticed recently that a couple of --cluster and --teleport-cluster flags should really use .Envar(clusterEnvVar) in their mix. At the time I didn't realise this may cause issues like the one you outlined, but perhaps the fix is as simple as adding that call to the mix as appropriate. For example:

	c.Flag("teleport-cluster", "Name of the teleport cluster to get credentials for.").Required().StringVar(&c.teleportCluster)

becomes

c.Flag("teleport-cluster", "Name of the teleport cluster to get credentials for.").Required().Envar(clusterEnvVar).StringVar(&c.teleportCluster)
	ssh.Flag("cluster", clusterHelp).Short('c').StringVar(&cf.SiteName)

becomes

	ssh.Flag("cluster", clusterHelp).Envar(clusterEnvVar).Short('c').StringVar(&cf.SiteName)

@Tener
Copy link
Contributor

Tener commented Jun 17, 2022

@atburke

Regression due to #12934:

Basically the logic between onListDatabases and listDatabasesAllClusters is out of sync. The former contains the correct code to fetch roles:

teleport/tool/tsh/db.go

Lines 81 to 104 in 77b35b8

// get roles and traits. default to the set from profile, try to get up-to-date version from server point of view.
roles := profile.Roles
traits := profile.Traits
// GetCurrentUser() may not be implemented, fail gracefully.
user, err := cluster.GetCurrentUser(cf.Context)
if err == nil {
roles = user.GetRoles()
traits = user.GetTraits()
} else {
log.Debugf("Failed to fetch current user information: %v.", err)
}
// get the role definition for all roles of user.
// this may only fail if the role which we are looking for does not exist, or we don't have access to it.
// example scenario when this may happen:
// 1. we have set of roles [foo bar] from profile.
// 2. the cluster is remote and maps the [foo, bar] roles to single role [guest]
// 3. the remote cluster doesn't implement GetCurrentUser(), so we have no way to learn of [guest].
// 4. services.FetchRoles([foo bar], ..., ...) fails as [foo bar] does not exist on remote cluster.
roleSet, err := services.FetchRoles(roles, cluster, traits)
if err != nil {
log.Debugf("Failed to fetch user roles: %v.", err)
}

The latter does not (profile.Roles):

teleport/tool/tsh/db.go

Lines 163 to 167 in 77b35b8

roleSet, err := services.FetchRoles(profile.Roles, cluster, profile.Traits)
if err != nil {
errors = append(errors, err)
continue
}

The result is that we try to get definition for role which we do not have in the leaf cluster and we may not have permission to do so.

For example, given clusters boson.tener.io and quark.tener.io and the trusted cluster role mapping giving only access role:

kind: trusted_cluster
metadata:
  id: 1655472056507184000
  name: boson.tener.io
spec:
  enabled: true
  role_map:
  - local:
    - access
    remote: access
  token: foo
  tunnel_addr: boson.tener.io:3080
  web_proxy_addr: boson.tener.io:3080
version: v2

We will get errors when tsh tries to read the editor and auditor roles from quark.tener.io. This is an error because the mapping only gives access role. The code in onListDatabases correctly handles that case.

$ tsh clusters
Cluster Name   Status Cluster Type Labels Selected
-------------- ------ ------------ ------ --------
boson.tener.io online root                *
quark.tener.io online leaf

$ tsh db ls
Name Description Allowed Users Labels Connect
---- ----------- ------------- ------ -------

$ tsh db --cluster=quark.tener.io ls
Name                            Description         Allowed Users     Labels  Connect
------------------------------- ------------------- ----------------- ------- ------------------------------------------------------------------------
> qmongo (user: alice)                              [alice bob tener]         tsh db connect --cluster=quark.tener.io --db-name=<name> qmongo
> qmongo-insecure (user: alice)                     [alice bob tener]         tsh db connect --cluster=quark.tener.io --db-name=<name> qmongo-insecure
redisquark                      Quark Redis example [alice bob tener] env=dev

$ tsh db --cluster=quark.tener.io ls --all
ERROR: access denied to perform action "read" on "role"

I'm unlikely to have the time to fix it before my PTO.

@jakule
Copy link
Contributor

jakule commented Jun 20, 2022

I found two issues related to the host user creations #13663 #13662

@nklaassen
Copy link
Contributor

found an issue with the "Instance" role and the EC2 join method #13677

@LKozlowski
Copy link
Contributor

I found an issue with LDAP attribute labeling - it does not work correctly: #13680

@LKozlowski
Copy link
Contributor

Regexp-based host labeling applies across all desktops, regardless of origin.

I don't know if this is an issue or not, but I had a hard time figuring out why it does not work the way I would expect it to work. There is an inconsistency between how we treat LDAP discovered hosts vs static hosts.

Scenario 1: LDAP hosts

windows_desktop_service:
...
  discovery:
    base_dn: "*"
  host_labels:
    - match: '^.*\.example\.com$'
      labels:
        environment: dev

Using this configuration if the discovered host has dns host name set as EXAMPLE-82K6DLP.example.com
we'll get regexp match and that host will have an extra label environment/dev

Scenario 2: Static hosts

windows_desktop_service:
...
  hosts:
    - EXAMPLE-82K6DLP.example.com
  host_labels:
    - match: '^.*\.example\.com$'
      labels:
        environment: dev

Using this configuration, with the same regexp and the same dns host name for a static host we won't get a regexp match and this host won't have an extra label.

The reason being for that is in case of static hosts, we do try to match regexp against hostname:port. In our example we would compare our regex with EXAMPLE-82K6DLP.example.com:3389 which would fail to match because of the $ at the end of our regexp.

addr := netAddr.String()
name, err := s.nameForStaticHost(addr)
if err != nil {
return nil, trace.Wrap(err)
}
// for static hosts, we match against the host's addr,
// as the name is a randomly generated UUID
labels := getHostLabels(addr)

Since I don't know if this was intended or we should fix it by changing the behavior of it to just use host without port it would be great if @zmb3 could take a look into my comment as I think he is the author of this functionality.

@zmb3
Copy link
Collaborator

zmb3 commented Jun 21, 2022

@LKozlowski I don't think we ever noticed this before, but technically regex-based labeling is working as intended, we're just not clear in the docs or examples that the port is included.

Feels like the simplest thing would be to remove the $ from the examples and mention in the docs that the port is included in the match for static hosts.

@espadolini
Copy link
Contributor

espadolini commented Jun 21, 2022

That will end up match anything with an example.com prefix tho; perhaps the docs should add a (:3389)? before the $ instead, if that works (or a (:\d+)?, if we want to be pedantic).

@ibeckermayer
Copy link
Contributor

I found an issue with desktop access scroll behavior: #13690

@zmb3
Copy link
Collaborator

zmb3 commented Jun 21, 2022

That will end up match anything with an example.com prefix tho; perhaps the docs should add a (:3389)? before the $ instead, if that works (or a (:\d+)?, if we want to be pedantic).

Sure, that works. Or I'm also fine not matching against the host and not the port.

I don't see this as a major issue since it has always been this way, and few people use static hosts.

codingllama added a commit that referenced this issue Jun 21, 2022
Favor newer Touch ID credentials in the allowed set for MFA, or just the newer
credential for passwordless.

Fixes a capture-by-reference bug and adds coverage for it.

Issue #13340.

* Add tests for Touch ID credential-choosing logic
* Favor newer Touch ID credentials within the allowed set
* Warn about origin vs RPID mismatch
@atburke
Copy link
Contributor

atburke commented Jun 21, 2022

@nklaassen #13529 should fix the EC2 labels error.

@LKozlowski
Copy link
Contributor

That will end up match anything with an example.com prefix tho; perhaps the docs should add a (:3389)? before the $ instead, if that works (or a (:\d+)?, if we want to be pedantic).

Sure, that works. Or I'm also fine not matching against the host and not the port.

I don't see this as a major issue since it has always been this way, and few people use static hosts.

I just wanted to bring it up as it wasn't clear for me when I was testing it, but I agree that it is working fine. As you said, we just need to either update docs or slightly update the code. Anyway, I'll mark it in the test plan as working and we'll just improve it later so it doesn't block the v10 release.

codingllama added a commit that referenced this issue Jun 22, 2022
Favor newer Touch ID credentials in the allowed set for MFA, or just the newer
credential for passwordless.

Fixes a capture-by-reference bug and adds coverage for it.

Issue #13340.

* Add tests for Touch ID credential-choosing logic
* Favor newer Touch ID credentials within the allowed set
* Warn about origin vs RPID mismatch
codingllama added a commit that referenced this issue Jun 22, 2022
#13712)

Favor newer Touch ID credentials in the allowed set for MFA, or just the newer
credential for passwordless.

Fixes a capture-by-reference bug and adds coverage for it.

Issue #13340.

Backports #13672 and #13761.

* Add tests for Touch ID credential-choosing logic
* Favor newer Touch ID credentials within the allowed set
* Warn about origin vs RPID mismatch
* Do not dereference assertion before checking for nil
@espadolini
Copy link
Contributor

Found a compatibility issue between v9 leafs and v10 roots related to the new database CA:

@ravicious
Copy link
Member

Is tsh status supposed to report -teleport-internal-join as one of the SSH logins? I can see it in the logins list for v10 clusters but not for the ones running older versions of Teleport.

@espadolini
Copy link
Contributor

Is tsh status supposed to report -teleport-internal-join as one of the SSH logins?

We should probably filter out that one and the -teleport-nologin-<uuid> ones.

@Joerger
Copy link
Contributor

Joerger commented Jun 24, 2022

ssh -J <teleport-proxy> doesn't work with tls routing (since v8.0.0) - #13833

@fheinecke
Copy link
Contributor

tsh does not work on Debian 9 due to glibc 2.25 dependency - #13894

@zmb3
Copy link
Collaborator

zmb3 commented Jun 27, 2022

I'm seeing a "session data" event that I'm not used to seeing which renders with a missing session ID in the audit log.

image

It's not just a UI thing, the JSON for the event has "sid": "".

@rosstimothy
Copy link
Contributor

Direct Dial Nodes unreachable because they are reporting an address of [::]:3022 #13898

@rosstimothy
Copy link
Contributor

Reverse Tunnel Nodes getting stuck initializing and not connecting: #13911

@rosstimothy
Copy link
Contributor

rosstimothy commented Jun 27, 2022

etcd 500 TC Scaling Test

image

https://teleportcoreteam.grafana.net/goto/m-ivFEqnk?orgId=1

@codingllama
Copy link
Contributor

Something minor I just noticed: my (idle) local teleport was spamming a session recording warning (shutdown logs included):

2022-06-27T17:58:47-03:00 [UPLOAD]    WARN Skipped session recording 25366a4e-03f8-47e6-a4ea-6c54d1290c4f.tar. error:[session file could be corrupted or is using unsupported format: session recording 25366a4e-03f8-47e6-a4ea-6c54d1290c4f is either corrupted or is using unsupported format, remove the file /path/to/teleport/log/upload/streaming/default/25366a4e-03f8-47e6-a4ea-6c54d1290c4f.tar to correct the problem, remove the /path/to/teleport/log/upload/streaming/default/25366a4e-03f8-47e6-a4ea-6c54d1290c4f.error file to retry the upload] filesessions/fileasync.go:253
^C2022-06-27T17:58:51-03:00 [PROC:1]    INFO Got signal "interrupt", exiting immediately. pid:27917.1 service/signals.go:83
2022-06-27T17:58:51-03:00 [PROC:1]    WARN Sync rotation state cycle failed. Retrying in ~10s pid:27917.1 service/connect.go:682
2022-06-27T17:58:51-03:00 [AUDIT:1]   INFO File uploader is shutting down. pid:27917.1 service/service.go:2480
2022-06-27T17:58:51-03:00 [AUDIT:1]   INFO File uploader has shut down. pid:27917.1 service/service.go:2482

I didn't do anything special with the cluster today, other than a few login attempts. Posting here in case it rings a bell for someone.

@avatus
Copy link
Contributor

avatus commented Jun 27, 2022

Something minor I just noticed: my (idle) local teleport was spamming a session recording warning (shutdown logs included):

2022-06-27T17:58:47-03:00 [UPLOAD]    WARN Skipped session recording 25366a4e-03f8-47e6-a4ea-6c54d1290c4f.tar. error:[session file could be corrupted or is using unsupported format: session recording 25366a4e-03f8-47e6-a4ea-6c54d1290c4f is either corrupted or is using unsupported format, remove the file /path/to/teleport/log/upload/streaming/default/25366a4e-03f8-47e6-a4ea-6c54d1290c4f.tar to correct the problem, remove the /path/to/teleport/log/upload/streaming/default/25366a4e-03f8-47e6-a4ea-6c54d1290c4f.error file to retry the upload] filesessions/fileasync.go:253
^C2022-06-27T17:58:51-03:00 [PROC:1]    INFO Got signal "interrupt", exiting immediately. pid:27917.1 service/signals.go:83
2022-06-27T17:58:51-03:00 [PROC:1]    WARN Sync rotation state cycle failed. Retrying in ~10s pid:27917.1 service/connect.go:682
2022-06-27T17:58:51-03:00 [AUDIT:1]   INFO File uploader is shutting down. pid:27917.1 service/service.go:2480
2022-06-27T17:58:51-03:00 [AUDIT:1]   INFO File uploader has shut down. pid:27917.1 service/service.go:2482

I didn't do anything special with the cluster today, other than a few login attempts. Posting here in case it rings a bell for someone.

This happened to me as well and adding auth_service.session_recording = off into the config failed to stop the warning. If that provides any further context

@espadolini
Copy link
Contributor

my (idle) local teleport was spamming a session recording warning

Should be fixed by #13826, fixing the warning in a running cluster involves manually deleting the file in the recordings I think.

@r0mant
Copy link
Collaborator Author

r0mant commented Jun 29, 2022

Can't get passwordless scenario to work as described in the test plan:

  1. Adding touchid device using tsh mfa add
  2. Touchid device is visible in tsh mfa ls and tsh touchid ls (the latter also brings up touchid prompt) ✅
  3. Running tsh -d login --proxy=root.gravitational.io:3080 --auth=passwordless doesn't work, asking to tap a security key (which I didn't register any separately) ❌
➜  e git:(afa3414) ✗ tsh login --proxy=root.gravitational.io:3080 --auth=passwordless
Tap your security key
^CERROR: context canceled

Logs:

➜  e git:(afa3414) ✗ tsh -d login --proxy=root.gravitational.io:3080 --auth=passwordless
DEBU [CLIENT]    open /Users/r0mant/.tsh/root.gravitational.io.yaml: no such file or directory client/api.go:1052
INFO [CLIENT]    No teleport login given. defaulting to r0mant client/api.go:1394
INFO [CLIENT]    no host login given. defaulting to r0mant client/api.go:1404
INFO [CLIENT]    [KEY AGENT] Connected to the system agent: "/private/tmp/com.apple.launchd.0G1kn68Tdf/Listeners" client/api.go:3934
DEBU [CLIENT]    attempting to use loopback pool for local proxy addr: root.gravitational.io:3080 client/api.go:3892
DEBU [CLIENT]    reading self-signed certs from: /var/lib/teleport/webproxy_cert.pem client/api.go:3900
DEBU [CLIENT]    could not open any path in: /var/lib/teleport/webproxy_cert.pem client/api.go:3904
DEBU             Attempting GET root.gravitational.io:3080/webapi/ping/passwordless webclient/webclient.go:115
DEBU [CLIENT]    attempting to use loopback pool for local proxy addr: root.gravitational.io:3080 client/api.go:3892
DEBU [CLIENT]    reading self-signed certs from: /var/lib/teleport/webproxy_cert.pem client/api.go:3900
DEBU [CLIENT]    could not open any path in: /var/lib/teleport/webproxy_cert.pem client/api.go:3904
DEBU [CLIENT]    HTTPS client init(proxyAddr=root.gravitational.io:3080, insecure=false) client/weblogin.go:233
DEBU             Attempting platform login webauthncli/api.go:97
DEBU             Platform login failed, falling back to cross-platform error:[credential not found] webauthncli/api.go:103
DEBU             FIDO2: Using libfido2 for assertion webauthncli/api.go:113
DEBU             FIDO2: Info for device ioreg://4294970624: &libfido2.DeviceInfo{Versions:[]string{"U2F_V2", "FIDO_2_0", "FIDO_2_1_PRE"}, Extensions:[]string{"credProtect", "hmac-secret"}, AAGUID:[]uint8{0xee, 0x88, 0x28, 0x79, 0x72, 0x1c, 0x49, 0x13, 0x97, 0x75, 0x3d, 0xfc, 0xce, 0x97, 0x7, 0x2a}, Options:[]libfido2.Option{libfido2.Option{Name:"rk", Value:"true"}, libfido2.Option{Name:"up", Value:"true"}, libfido2.Option{Name:"plat", Value:"false"}, libfido2.Option{Name:"clientPin", Value:"false"}, libfido2.Option{Name:"credentialMgmtPreview", Value:"true"}}, Protocols:[]uint8{0x1}} webauthncli/fido2.go:658
DEBU             FIDO2: Device ioreg://4294970624: filtered due to lack of UV webauthncli/fido2.go:137
Tap your security key
DEBU             FIDO2: Selecting devices error:[no suitable devices found] webauthncli/fido2.go:612
DEBU             FIDO2: Selecting devices error:[no suitable devices found] webauthncli/fido2.go:612
DEBU             FIDO2: Selecting devices error:[no suitable devices found] webauthncli/fido2.go:612
DEBU             FIDO2: Selecting devices error:[no suitable devices found] webauthncli/fido2.go:612
DEBU             FIDO2: Selecting devices error:[no suitable devices found] webauthncli/fido2.go:612
DEBU             FIDO2: Selecting devices error:[no suitable devices found] webauthncli/fido2.go:612
DEBU             FIDO2: Selecting devices error:[no suitable devices found] webauthncli/fido2.go:612
DEBU             FIDO2: Selecting devices error:[no suitable devices found] webauthncli/fido2.go:612
DEBU             FIDO2: Selecting devices error:[no suitable devices found] webauthncli/fido2.go:612
DEBU             FIDO2: Selecting devices error:[no suitable devices found] webauthncli/fido2.go:612
DEBU             FIDO2: Selecting devices error:[no suitable devices found] webauthncli/fido2.go:612
DEBU             FIDO2: Selecting devices error:[no suitable devices found] webauthncli/fido2.go:612
DEBU             FIDO2: Selecting devices error:[no suitable devices found] webauthncli/fido2.go:612
DEBU             FIDO2: Selecting devices error:[no suitable devices found] webauthncli/fido2.go:612
^C

cc @codingllama

@codingllama
Copy link
Contributor

Can't get passwordless scenario to work as described in the test plan:

@r0mant could you double-check that you are using tsh from the signed/notarized/etc tsh.app bundle? I downloaded the tsh-v10.0.0-alpha.2.pkg installer and cleared the testplan without problems using it. Hit me up on Slack if you still have issues.

@espadolini
Copy link
Contributor

@codingllama @r0mant all clear on the passwordless test plan for me on macOS.

@rosstimothy
Copy link
Contributor

etcd Soak Test

kubectl logs -n loadtest-tross soaktest-pvnlr-6gv5f -f
+ tsh --insecure --proxy=monster.gravitational.co:3080 -i /etc/teleport/auth -l root ls -f names
node-65c8f5c9db-5zzfd
iot-node-5b4f7757f8-f2966

----Direct Dial Node Test----
+ tsh --insecure --proxy=monster.gravitational.co:3080 -i /etc/teleport/auth bench --duration=30m root@node-65c8f5c9db-5zzfd ls

* Requests originated: 17999
* Requests failed: 0

Histogram

Percentile Response Duration
---------- -----------------
25         157 ms
50         162 ms
75         168 ms
90         174 ms
95         178 ms
99         193 ms
100        474 ms

+ tsh --insecure --proxy=monster.gravitational.co:3080 -i /etc/teleport/auth bench --duration=30m --interactive root@node-65c8f5c9db-5zzfd ps aux

* Requests originated: 17999
* Requests failed: 0

Histogram

Percentile Response Duration
---------- -----------------
25         159 ms
50         164 ms
75         170 ms
90         175 ms
95         180 ms
99         195 ms
100        5179 ms

+ tsh --insecure --proxy=monster.gravitational.co:3080 -i /etc/teleport/auth bench --duration=30m root@iot-node-5b4f7757f8-f2966 ls
----Reverse Tunnel Node Test----

* Requests originated: 17999
* Requests failed: 0

Histogram

Percentile Response Duration
---------- -----------------
25         155 ms
50         160 ms
75         166 ms
90         172 ms
95         178 ms
99         193 ms
100        418 ms

+ tsh --insecure --proxy=monster.gravitational.co:3080 -i /etc/teleport/auth bench --duration=30m --interactive root@iot-node-5b4f7757f8-f2966 ps aux

* Requests originated: 17999
* Requests failed: 0

Histogram

Percentile Response Duration
---------- -----------------
25         154 ms
50         159 ms
75         165 ms
90         170 ms
95         175 ms
99         192 ms
100        5171 ms

etcd 10k Reverse Tunnel Nodes

image

https://teleportcoreteam.grafana.net/goto/vJFIH33nk?orgId=1

etcd 10k Direct Dial Nodes

image

https://teleportcoreteam.grafana.net/goto/yky9Oqqnz?orgId=1

@russjones
Copy link
Contributor

russjones commented Jul 2, 2022

Aggregate last 3 releases.

Backend Cluster Size Mode PTY 8.0 9.0 10.0
etcd 10k Regular No 3335 ms 700 ms 474 ms
etcd 10k Regular Yes 4647 ms 393 ms 5179 (99%: 195ms)
etcd 10k Tunnel No 4259 ms 143 ms 418 ms
etcd 10k Tunnel Yes 3143 ms 799 ms 5171 ms (99%: 192ms)
DynamoDB 10k Regular No 5147 ms
DynamoDB 10k Regular Yes 222 ms
DynamoDB 10k Tunnel No 235 ms
DynamoDB 10k Tunnel Yes 198 ms
DynamoDB 1 Regular No 1824 ms
DynamoDB 1 Regular Yes 1483 ms
DynamoDB 1 Tunnel No 2125 ms
DynamoDB 1 Tunnel Yes 2002 ms

@fspmarshall
Copy link
Contributor

500 TC Scaling Test (DynamoDB)

500-tc


note: Initial dynamo 10k tests are not complete yet due to issues with the test automation, but I've gotten up to a 6k dynamo cluster without any issues on teleport's end of things. Working on re-running with different automation.

@fspmarshall
Copy link
Contributor

fspmarshall commented Jul 6, 2022

10K Dynamo IoT

edit: See #13340 (comment) for updated bench numbers.

tsh bench --duration=30m root@node-848df68b94-zzxjg ls

* Requests originated: 17934
* Requests failed: 109
* Last error: EOF

Histogram

Percentile Response Duration 
---------- ----------------- 
25         5939 ms           
50         9655 ms           
75         13911 ms          
90         16655 ms          
95         17519 ms          
99         18351 ms          
100        55071 ms
tsh bench --duration=30m --interactive root@node-848df68b94-zzw65 ps aux

* Requests originated: 17903
* Requests failed: 22
* Last error: failed connecting to node node-848df68b94-zzw65. 

Histogram

Percentile Response Duration 
---------- ----------------- 
25         6115 ms           
50         9879 ms           
75         14103 ms          
90         16751 ms          
95         17583 ms          
99         18431 ms          
100        45471 ms

10k-dynamo-iot


Note: benches run concurrently with scaling and against nodes in a different region/cloud, which I think explains the differences in response duration. Looking into it.

@fspmarshall
Copy link
Contributor

fspmarshall commented Jul 8, 2022

10K Dynamo Non-IoT

tsh bench --duration=30m [email protected] ls

* Requests originated: 17998
* Requests failed: 0

Histogram

Percentile Response Duration 
---------- ----------------- 
25         185 ms            
50         197 ms            
75         211 ms            
90         232 ms            
95         251 ms            
99         358 ms            
100        2161 ms
tsh bench --duration=30m --interactive [email protected] ps aux

* Requests originated: 17998
* Requests failed: 0

Histogram

Percentile Response Duration 
---------- ----------------- 
25         193 ms            
50         206 ms            
75         221 ms            
90         240 ms            
95         260 ms            
99         418 ms            
100        4579 ms

10k-dynamo


Note: these benches were run against individual bare-metal nodes within a 2-node cluster with tsh located within the same vpc as the auth, proxy, and nodes.

@r0mant r0mant closed this as completed Jul 9, 2022
@fspmarshall
Copy link
Contributor

DynamoDB Small Cluster Bench

(previously posted dynamodb bench numbers were from a 10k cluster with sub-optimal network conditions, and therefore not particularly useful for comparison)

tsh bench --duration=30m root@ip-172-31-4-81-us-west-2-compute-internal ls

* Requests originated: 17998
* Requests failed: 0

Histogram

Percentile Response Duration 
---------- ----------------- 
25         198 ms            
50         210 ms            
75         222 ms            
90         238 ms            
95         255 ms            
99         372 ms            
100        3495 ms
tsh bench --duration=30m --interactive root@ip-172-31-9-206-us-west-2-compute-internal ps aux

* Requests originated: 17998
* Requests failed: 0

Histogram

Percentile Response Duration 
---------- ----------------- 
25         221 ms            
50         231 ms            
75         244 ms            
90         262 ms            
95         280 ms            
99         466 ms            
100        2003 ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug test-plan A list of tasks required to ship a successful product release.
Projects
None yet
Development

No branches or pull requests