Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[🚀 Feature]: Implement download tracking in selenium manager #11211

Closed
titusfortner opened this issue Nov 2, 2022 · 34 comments · Fixed by #13173
Closed

[🚀 Feature]: Implement download tracking in selenium manager #11211

titusfortner opened this issue Nov 2, 2022 · 34 comments · Fixed by #13173

Comments

@titusfortner
Copy link
Member

titusfortner commented Nov 2, 2022

Feature and motivation

The idea is to get basic usage information about who is using Selenium Manager.

We could create a basic page on https://www.selenium.dev/selenium/manager. The Selenium Manager would route all requests for downloads through that page.

Something like:

https://selenium.dev/selenium/download/manager?language=ruby&os=linux&driver=geckodriver&driver_version=v0.32.0

Which would redirect and return the correct file, but would allow us to see what language/os/browser/browser versions are being used.

@bonigarcia
Copy link
Member

I agree that it would be very interesting to know how many users are using Selenium Manager. But I am unsure if proxying each Selenium Manager request is the best choice since it can be problematic. In the best case, the performance of downloading drivers will be higher by putting an intermediate page intercepting the traffic. Moreover, the selenium.dev server could be a bottleneck. Finally, we will need backend support to track all these requests.

Maybe an alternative way is to continue downloading directly from each driver repository (chromedriver, geckodriver, etc.), and create a tracking endpoint in selenium.dev which is invoked after resolving a driver by Selenium Manager. But even in that case, we are creating additional overhead for the sake of monitoring the use of Selenium Manager.

@krmahadevan
Copy link
Contributor

Just curious. What do we intend to do with this data?

@diemol
Copy link
Member

diemol commented Jan 31, 2023

As a start, we want to track usage, know how many users in which languages are now relying on Selenium Manager. We have invested a great amount of time, and it'd be very useful to know if the code we are shipping is being used. After learning usage, we can start figuring out what else we want to measure and how.

Also, following what @bonigarcia mentions, coding and/or hosting ourselves a tracking solution is shooting ourselves in the foot. It would be the ideal solution because we could ensure some privacy first. Currently we use https://plausible.io/ to track selenium.dev, but the quota we are paying would not be enough to hold the data we would capture.

I guess the easiest solution would be to use Google Analytics , I implemented a client to track Zalenium usage back then, but we had help from the legal area in my previous company to draft the terms and conditions to be compliant. If we want to use that, we need to check with the PLC and ask the SFC for their help with the legal terms.

@titusfortner
Copy link
Member Author

How much extra traffic are we expecting if each driver download sends a http call with a few bytes of data that doesn't expect a response?

@titusfortner
Copy link
Member Author

Just saying it might be worth paying plausible a little more if it can track that info. Would still want to double check with SFC on privacy

@diemol
Copy link
Member

diemol commented Feb 1, 2023

I will check with Plausible and see how much they could charge us.

@bonigarcia
Copy link
Member

I am lost with this feature. Is it still required?

@diemol
Copy link
Member

diemol commented Aug 2, 2023

It would be nice to have. But it adds overhead to each request. Maybe we can wait until we release Selenium Manager officially to track downloads in that way? Tracking every single request might be too much.

@bonigarcia
Copy link
Member

I totally agree that it would be great to monitor the use of Selenium Manager, but adding that overhead is too much, IMO.

@diemol
Copy link
Member

diemol commented Aug 2, 2023

I just realized that having an official release won't track downloads since we package SM in the bindings.

@bonigarcia
Copy link
Member

I just realized that having an official release won't track downloads since we package SM in the bindings.

Yes, that's true.

@krmahadevan
Copy link
Contributor

@diemol @bonigarcia Would it make sense if selenium manager emits a beacon that can contain the tracking information and which can be collected separately (optionally ofcourse). That way we wouldn't be interfering with the primary responsibility of selenium manager, nor would we be creating a single point of failure/bottleneck, but still we would be able to gather the usage data. But yes, it would involve a bit more work and wont necessarily be the need of the hour right now.

@diemol
Copy link
Member

diemol commented Aug 2, 2023

@diemol @bonigarcia Would it make sense if selenium manager emits a beacon that can contain the tracking information and which can be collected separately (optionally ofcourse). That way we wouldn't be interfering with the primary responsibility of selenium manager, nor would we be creating a single point of failure/bottleneck, but still we would be able to gather the usage data. But yes, it would involve a bit more work and wont necessarily be the need of the hour right now.

That is the overhead we want to avoid. Besides we need a place where to store that.

@titusfortner
Copy link
Member Author

We could implement a stupid basic page with a couple lines of JS that just redirects a the download request and records what was requested: https://selenium-manager.dev/download?path=https://edgedl.me.gvt1.com/edgedl/chrome/chrome-for-testing/115.0.5790.102/mac-x64/chromedriver-mac-x64.zip

Could add additional metadata:
https://selenium-manager.dev/download?path=https://edgedl.me.gvt1.com/edgedl/chrome/chrome-for-testing/115.0.5790.102/mac-x64/chromedriver-mac-x64.zip&language=ruby&seversion=4.11.2

@titusfortner
Copy link
Member Author

What if we do it as a straight url redirect:
https://selenium.dev/selenium/manager_redirect.html?redirect=https://edgedl.me.gvt1.com/edgedl/chrome/chrome-for-testing/118.0.5993.70/mac-x64/chromedriver-mac-x64.zip&language=rb&version=0.4.14&os=linux

I really do think this needs to be part of our final release. We currently have no way of knowing what Selenium usage looks like. If that moves us to a new Plausible tier, I think it's worth it. If it adds a fraction of a second to the download, I think that's worth it as well.

@diemol
Copy link
Member

diemol commented Oct 22, 2023

I would avoid using selenium.dev because that relies on GitHub infrastructure, and if there is an outage, I am not sure how much priority GitHub pages have. Let's find a better service for that.

@titusfortner
Copy link
Member Author

I think github is more reliable than most services?

Also we can have a fallback in Selenium Manager. The advantage of selenium.dev is also that we already have Plausible hooked up.

@diemol
Copy link
Member

diemol commented Oct 22, 2023

We should send requests directly to Plausible then.

@titusfortner
Copy link
Member Author

That works

@diemol
Copy link
Member

diemol commented Nov 13, 2023

I reached out to Plausible, and they said we can start sending events. With that, they can figure out the new pricing. We can go over our quota for 2 months without incurring extra costs.

This is the events API https://plausible.io/docs/events-api

@titusfortner
Copy link
Member Author

Don't we want something we can disable, though?
Once we've released a version of SM that will always ping plausible and we'll never be able to change it.

I was thinking we could start with: https://plausible.io/docs/custom-query-params

and then use: https://www.selenium.dev/manager/0.4.16/stats.html?browser=chrome&version=119.x.x.x&os=macos&arch=amd64&lang=ruby

If something unforeseen happens, we can turn off tracking to https://www.selenium.dev/manager/0.4.16/stats.html on our end.

@diemol
Copy link
Member

diemol commented Nov 14, 2023

Right, we need to write down first what we want to collect.

@titusfortner
Copy link
Member Author

  • Selenium Version
  • Language Binding
  • Browser
  • Browser Version
  • Operating System
  • Architecture

@bonigarcia
Copy link
Member

I've been analyzing how to implement this issue and have different questions.

If I'm not wrong, first, we need to create an HTML page in the Selenium site, for instance, called stats.html. @titusfortner proposed to store it in the path /manager/0.4.16, but I don't understand why this page should be versioned:

https://www.selenium.dev/manager/0.4.16/stats.html

Wouldn't it be better to store this page directly in the manager folder, i.e., without the version? I mean:

https://www.selenium.dev/manager/stats.html

Then, the stats.html will have some script to track the visits to that page by Plausible. Is that right?

In the Rust, each time Selenium Manager is executed, SM sends a HTTP request to something like:

https://www.selenium.dev/manager/stats.html?browser=chrome&browser_version=119&os=macos&arch=amd64&lang=ruby

Regarding the parameters proposed by @titusfortner, browser, browser_version, operating_system, and architecture are known in the Rust side.

Regarding the language_binding, it is currently unknown in Rust. So we need to implement a new argument in Rust (like --language-binding) which should be specified in all the languages (e.g. --language-binding java, --language-binding python, etc.).

Regarding the selenium_version, I see two alternatives. Since SM knows its own version (e.g., 0.4.16), it can be set automatically. For instance, when SM is 0.4.16, the value in the URL will be 4.16 (since the minor is unknown in Rust). If we also want to track the minor Selenium version, an additional flag would be required in Rust (like --selenium-version), and that argument needs to be included in the bindings when SM is executed.

I have further comments but would like to resolve these questions first.

@titusfortner
Copy link
Member Author

The reason I suggested that it have a version is to make it so we can turn off tracking for a specific version if there was something wrong with it, without affecting everything else. If something in a release got formatted incorrectly or logged something differently in a way we don't want... Maybe it isn't necessary, but that was my thought process.

The rest makes sense to me, use SM version & add --language-binding flag

@diemol
Copy link
Member

diemol commented Nov 16, 2023

I don't think we should build a page to parse the values we send in the query parameters when we can build all those in a payload and send it to Plausible. Maybe I am missing something?

@bonigarcia
Copy link
Member

Indeed, building a custom webpage on our site won't work since that page is supposed to be parsed in a browser that executes the JavaScript.

@diemol Maybe your idea is that Selenium Manager creates a POST request as specified in the event API doc to a non-existing URL as follows:

curl -i -X POST https://plausible.io/api/event \
  -H 'User-Agent: Selenium Manager 0.4.16' \
  -H 'Content-Type: application/json' \
  --data '{"name":"pageview","url":"https://selenium.dev/manager/stats.html?browser=chrome&browser_version=119&os=macos&arch=amd64&lang=ruby","domain":"selenium.dev"}'

Is that correct?

@diemol
Copy link
Member

diemol commented Nov 16, 2023

Yes, that was the initial idea. Using a different URL, though. So the stats do not mix with the website stats.

@bonigarcia
Copy link
Member

What URL then?

Also. Do the Selenium project has already an account on Plausible?

@diemol
Copy link
Member

diemol commented Nov 16, 2023

Yes, we do. We can create a new subdomain, like sm.selenium.dev.

@titusfortner
Copy link
Member Author

I'm going to open a separate ticket for the bindings to send language info.

Primarily we need to make sure the data is showing up properly in Plausible from the SM in the PR before we close this out.

@joerg1985
Copy link
Member

@bonigarcia I think there should be a connection / read timeout set to only a few seconds to avoid issues in cases the service is offline or not reachable. As i read the code and the docs there is currently no timeout at all and it could block the selenium manager?

@bonigarcia
Copy link
Member

As explained in the comments of #13173:

The call to Plausible (.stats()) is done in a different thread. This way, even if Plausible is offline or any other unexpected error happens, the regular behavior of Selenium Manager will not be affected since the .setup() logic is executed in the main thread. I have tested it.

Moreover, the call to Plausible already has a little timeout of 3 seconds. See line 68 in stats.rs of PR https://github.com/SeleniumHQ/selenium/pull/13173/files#diff-fbd0d9979819832c3551e0d67289259e237ee62e38908df4901f2c7e4131f1aa

bonigarcia added a commit that referenced this issue Dec 13, 2023
…3173)

* [rust] Send stats to Plausible (#11211)

* [rust] Read selenium version from clap cargo feature

* [rust] Set language binding in test

* [rust] Include assertion about stats error message

* [rust] Use selenium domain (selenium.dev) as plausible stats target

* [rust] Make asynchronous request to plausible

* [rust] Include argument --avoid-stats to not send statistics to plausible.io

* [rust] Include request timeout and improve error handling

* [rust] Spawn new thread for calling to stats function

* [rust] Use message channel for passing errors when sending stats

* [rust] Update checksum un lock file

* Update rust/src/stats.rs

* [rust] Check if SM is offline before sending the stats

* [rust] Send custom properties to plausible in lowercase

---------

Co-authored-by: Diego Molina <[email protected]>
Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked and limited conversation to collaborators Jan 13, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

5 participants