Skip to content

Conversation

@amisevsk
Copy link
Collaborator

What does this PR do?

Adds support for an annotation (controller.devfile.io/debug-start: true) on devworkspaces that disables scale-to-zero when a workspace fails. This leaves the workspace deployment/pod on the cluster, allowing for debugging issues with startup.

Notes

  • Annotation vs Attribute: I opted to use annotations for this purpose since it's a DWO-specific feature and shouldn't be included in devfiles by default.
  • Annotation name: Instead of something more specific like disable-scale-to-zero I named the annotation controller.devfile.io/debug-start. This annotation could be used for enabling additional debug information in the future.

What issues does this PR fix or reference?

Closes #418

Is it tested? How?

cat <<EOF | kubectl apply -f -
kind: DevWorkspace
apiVersion: workspace.devfile.io/v1alpha2
metadata:
  name: plain
  annotations:
    controller.devfile.io/debug-start: "true"
spec:
  started: true
  routingClass: 'basic'
  template:
    components:
      - name: web-terminal
        container:
          image: quay.io/wto/web-terminal-tooling:latest
          memoryLimit: 512Mi
          mountSources: true
          command:
           - "exit"
           - "1"
EOF
kubectl get dw -w
  • Deployment and pod should be left on cluster once workspace fails

@openshift-ci openshift-ci bot requested a review from JPinkney May 11, 2021 19:51

// DevWorkspaceDebugStartAnnotation enables debugging workspace startup if set to "true". If a workspace with this annotation
// fails to start (i.e. enters the "Failed" phase), its deployment will not be scaled down in order to allow viewing logs, etc.
DevWorkspaceDebugStartAnnotation = "controller.devfile.io/debug-start"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems it's a good time(or too late) to agree on a convention on annotations/labels names we use.
We have - and _, like stopped-by debug-start but endpoint_name, devworkspace_name, devworkspace_id.
BTW we need to continue with one format even if leave another existing untouched.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah this is tricky -- stopped-by is used in Web Terminal, but endpoint_name is used by Theia, so it's hard to bring them in sync without requiring changes elsewhere. I think I'm responsible for our naming on both sides (_ and -) :D

I'm on the side of using dashes as a convention, since it's similar to common k8s labels such as app.kubernetes.io/part-of going forward (and also it's easier to type)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to use - for a new annotations and properties.


// Stop failed workspaces
if workspace.Status.Phase == dw.DevWorkspaceStatusFailed && workspace.Spec.Started {
// If debug annotation is present, leave the deployment in place to let users
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leaving the deployment in place is the result, but what actually it does - do not initialize stopping of failed/failing workspace.

Do I understand correctly, that with https://github.com/devfile/devworkspace-operator/pull/424/files users are supposed to manually stop workspace before starting it again if debug mode is activated?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be the requirement, yes -- it leaves failed workspaces with spec.started=true so that you can debug issues (basically it treats workspaces with the annotation similar to how they were treated before #362).

The other way to implement this would be to leave the deployment in place for stopped workspaces when the debug annotation is present, which I think is more confusing.

This PR will need to be reworked slightly once #424 is merged.

Copy link
Member

@sleshchenko sleshchenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but I haven't tested since it's going to be reworked after PR with failing state is merged

Add annotation 'controller.devfile.io/debug-start' that disables the
"stop workspace on failure" functionality. This is to make sure the
workspace deployment stays on the cluster, allowing users to view logs
and debug issues.

Signed-off-by: Angel Misevski <[email protected]>
@amisevsk
Copy link
Collaborator Author

Currently, the way the debug annotation works is:

  1. Apply debug annotation and start failing workspace
  2. Workspace enters failing state
  3. Debug annotation blocks stopping and entering failed state, leaving deployment on cluster.
  4. Debug issues with deployment by checking logs, etc.
  5. Once problem is figured out, remove debug annotation from DevWorkspace and workspace enters failed state, deployment is scaled to zero.
  6. Workspace can be restarted normally.

I think this is an okay flow. Any suggestions for improvement? I considered trying to allow workspaces to still enter the failed state, but thinking about it more that may be more confusing and also make the flow above harder to use.

Copy link
Member

@sleshchenko sleshchenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Recently, I already had a case when I could suggest QE to use to investigate failure

@sleshchenko
Copy link
Member

/test v7-devworkspaces-operator-e2e, v7-devworkspace-happy-path

@openshift-ci
Copy link

openshift-ci bot commented Jun 24, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: amisevsk, JPinkney, sleshchenko

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [JPinkney,amisevsk,sleshchenko]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@amisevsk
Copy link
Collaborator Author

Recently, I already had a case when I could suggest QE to use to investigate failure

I've had to temporarily cherry-pick this commit into multiple PR branches to figure out why my workspace wasn't starting :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add annotation/attribute to disable scale-to-zero on DevWorkspace failure

3 participants