Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Telemetry] Add basic telemetry to Jupyter server and begin emitting server events #364

Closed
wants to merge 55 commits into from

Conversation

kiendang
Copy link
Contributor

@kiendang kiendang commented Dec 17, 2020

original pr #233

This is for emitting server events. For client events using the eventlog endpoint see #501.

yuvipanda and others added 20 commits May 19, 2020 14:42
Bump telemetry extension commit as well
Experiments here informed the schema naming
recommendations in jupyter/telemetry#11
python 3.5 is still supported
This lets us add detailed documentation & description
to our schemas, which is very hard to do in JSON.

We also add a lot of documentation to the one
JSON schema we have
Primary advantage over JSON is that we can do multi-line strings
for more detailed documentation. We also expect humans to read &
write these, so YAML is a much better format there. All JSON
is also valid YAML, so that helps.

Depends on jupyter/telemetry#13
@robertpyke
Copy link

Hi all, I was just curious if this is planned for a specific release / date? I noticed a few issues have been replaced by this one.

We are investigating adding telemetry event logging for certain actions, and it looks like this work covers use cases we would want to cover.

We're still planning how to approach our work, but we use the zero-to-jupyterhub-k8s for a deployment on AWS. We planned to do something to the effect of configure the event handler (like shown here: https://jupyter-telemetry.readthedocs.io/en/latest/pages/configure.html )

c.EventLog.handlers = [handler]

to use a CloudWatch or S3 EventLogger (as we're in AWS). e.g. https://pypi.org/project/cloudwatch/ (cloudwatch log handler).

Then we would import https://github.com/jupyterlab/jupyterlab-telemetry , and start injecting our events via our own extension (by subscribing to "signals", or customizing code as necessary).

Is that approach aligned with the work you're doing here? I assume we'd just start getting the additional events you're adding, once this code launches.

@kiendang
Copy link
Contributor Author

Hi there's currently no specific release date for this but this is something that we've been actively working on and trying to get done as soon as we can. Could you elaborate more about your use cases? What events are you looking to record? These could potentially help us make better informed design decisions. Thanks!

@tmlohman
Copy link

Hello! I work with @robertpyke and can speak to the details of this project. We are in the early stages and still nailing down specific requirements. But I can give a bit of context. We have a large user base on JupyterHub and are trying to get a better idea of usage patterns to inform our development. For instance, we'd like to know when people are using R kernel vs Python. When we have a better list I'll let you guys know.

@kiendang
Copy link
Contributor Author

Our plan is to have telemetry events for Jupyterhub and Jupyter server REST APIs. Events for Jupyter server would be tracked under this PR while those for Jupyterhub would be under jupyterhub/jupyterhub#3218.

For instance, we'd like to know when people are using R kernel vs Python.

Starting a kernel would be one of those events being recorded, along with information on the user and kernel. That should work for your example use case?

When we have a better list I'll let you guys know.

Great!

Currently we are working on the telemetry backend itself. Once that is settled we will come back to adding telemetry events under these two aforementioned PRs.

Copy link
Contributor

@yuvipanda yuvipanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM, but then I wrote some good chunk of it :D Minor comments, but I'd also love to get this merged.

self.init_server_extensions()
self.init_configurables()
self.init_components()
self.init_eventlog()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is being called twice.

@@ -0,0 +1,61 @@
Eventlogging and Telemetry
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a security note about not relying on this for security or auditing purposes, unless the user can categorically not mess with the python environment this is running in? Otherwise you can just change the code to emit whatever you want. Likewise, user code can just hit the eventlogging endpoint to add any spurious events it wants.

Almost everyone I talk to wants to use this for security, and this caveat is very important.


# Profile, may need to move to a background thread if this is problematic
try:
self.eventlog.record_event(schema_name, version, event)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires the schema to be registered serverside before clients can post to it. How do we do that? Perhaps a traitlet, with corresponding entrypoint based loading as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok let me create a separate PR for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pinged you on gitter about this but Imma just copy paste here.

This means something like a jupyter telemetry [register|remove|list] <schema> command?
register would put the schema file in a event-schemas directory. The directory structure would be something like event-schemas/<name>/version/<schema>.[json|yaml]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds great! I'm thinking that most things that want to submit events will be classic notebook extensions or jupyterlab extensions. These will be python packages, so perhaps we can begin by just allowing entrypoints? You call it, and it returns a bunch of schemas to register.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that makes more sense. I was thinking of another type of entrypoint. I'll make a separate PR on this one.

6. delete
Delete a file or empty directory at given path
path:
category: personally-identifiable-information
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We decided to use URIs for category filtering, and there are a few well-known ones defined in https://jupyter-telemetry.readthedocs.io/en/latest/pages/schemas.html#property-categories.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires changes in jupyter/telemetry master branch that haven't been made a released yet. Should we make a new release for telemetry?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, making a new release there seems appropriate. Would you be able to do that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope I don't have the permission.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated this one and use the latest commit on jupyter/telemetry master for now, will change it once we have a new release.

Also should the unrestricted category be URI or just a generic one? Seems like we're currently using a universal unrestricted category
https://github.com/jupyter/telemetry/blob/d2c19ffa03fde7d1903759de4a265ef711b333ef/jupyter_telemetry/eventlog.py#L165-L176

While we're on the topic of URI may I have your thoughts on jupyter/telemetry#56? Thanks!

@kiendang kiendang marked this pull request as ready for review April 26, 2021 10:07
@kiendang kiendang requested a review from yuvipanda April 26, 2021 10:07
@Zsailer Zsailer self-assigned this Apr 27, 2021
@Zsailer
Copy link
Member

Zsailer commented Apr 27, 2021

Thanks, @kiendang. I've added this to the Jupyter Server Team Meeting agenda for this week.

In the meantime, I'm going move this PR to "draft" state until we merge and release jupyter/telemetry#60.

@Zsailer Zsailer marked this pull request as draft April 28, 2021 18:21
Move to a separate branch
@kiendang
Copy link
Contributor Author

kiendang commented May 3, 2021

related jupyter/telemetry#61

@Zsailer Zsailer changed the title Jupyter telemetry Add basic telemetry to Jupyter server and begin emitting server events May 3, 2021
@Zsailer Zsailer changed the title Add basic telemetry to Jupyter server and begin emitting server events [Telemetry] Add basic telemetry to Jupyter server and begin emitting server events May 3, 2021
@Zsailer Zsailer added the Telemetry Discussion or issues related to Telemetry/Event-logging label Jun 14, 2021
@mlucool
Copy link

mlucool commented Nov 30, 2021

Is this still being worked on?

@kiendang
Copy link
Contributor Author

kiendang commented Mar 24, 2022

@Zsailer sorry for not working on this for quite a while. I have been busy for the last year. I do have some free time now and would like to resume working on this and getting this merged.

The original todo list was

Does that plan still look valid? anything new/changed regarding telemetry?

@Zsailer
Copy link
Member

Zsailer commented Aug 29, 2022

This work has been superceded by #954 and #862.

Closing this PR in favor of the others.

@Zsailer Zsailer closed this Aug 29, 2022
@kiendang kiendang deleted the jupyter_telemetry branch August 29, 2022 21:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Telemetry Discussion or issues related to Telemetry/Event-logging
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants