Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement $queryprune parameter #760

Closed
8 tasks done
mtxadmin opened this issue Oct 17, 2019 · 120 comments
Closed
8 tasks done

Implement $queryprune parameter #760

mtxadmin opened this issue Oct 17, 2019 · 120 comments
Labels
enhancement New feature or request fixed issue has been addressed

Comments

@mtxadmin
Copy link

Prerequisites

  • I verified that this is not a filter issue
  • This is not a support issue or a question
  • I performed a cursory search of the issue tracker to avoid opening a duplicate issue
    • Your issue may already be reported.
  • I tried to reproduce the issue when...
    • uBlock Origin is the only extension
    • uBlock Origin with default lists/settings
    • using a new, unmodified browser profile
  • I am running the latest version of uBlock Origin
  • I checked the documentation to understand that the issue I report is not a normal behavior

Description

In #46 about $rewrite parameter author said:

I won't implementing this filter option, I see too many issues with it. I am however open to implement a different filter option with similar purpose, but which would not suffer the issues I see with how rewrite has been designed.
...
I see a better way to implement similar option but with a more focused purpose: to remove specific query parameters from a URL:

||content.uplynk.com/ext/.m3u8?$querystrip=

Where the querystrip option would mean: "remove all query parameters matching the given lists of tokens or pattern".

Will the $querystrip parameter be realized?

A specific URL where the issue occurs

[A specific URL is MANDATORY for issue happening on a web page, even if it happens "everywhere"]

Steps to Reproduce

  1. [First Step]
  2. [Second Step]
  3. [and so on...]

Expected behavior:

[What you expected to happen]

Actual behavior:

[What actually happened]

Your environment

  • uBlock Origin version:
  • Browser Name and version:
  • Operating System and version:
@uBlock-user uBlock-user added the enhancement New feature or request label Oct 17, 2019
@gorhill
Copy link
Member

gorhill commented Oct 17, 2019

I also said:

Anyway, as said I still need more than just one case to be an argument for such filter -- the last thing I want is to add technical debt to uBO for little tangible benefits overall.

I still do not see cases being brought up to justify the feature.

@mtxadmin
Copy link
Author

mtxadmin commented Oct 17, 2019

utm_* stuff (more like privacy and tracking)

Plus Aliexpress stm marketing parameters, and some others.

Yes, of course, there are some specialized URL redirecting extensions that can truncate the URL parameters. But, they are all lack subscription feature. If you want to set up blocking those trash parameters from URL on a new device, you have to install a separate extension and after then manual enter all the parameters. And after time, then you discover a new trash parameter and want to get rid of it, you have to add it on EVERY device, even those which have parents, relatives and friends. This is a way of hell to explain to inexperienced users which buttons should be clicked and which values added. Instead of adding all that to neatly autoupdated filterlist.

@liamengland1
Copy link

There are at least two cases in which the implementation of $querystrip can be used to eliminate pesky SSAI.

How $querystrip will fix this:
The video players on the below pages request files (xhr/iframe) with some ad-related parameters. If these parameters are removed, the m3u8 no longer has ads baked in server-side.

Case 1: videos on ABC-owned TV station websites (https://abcotvs.com/index.html)

Example URLS:

https://abc30.com/7285501/
https://6abc.com/7255376/
https://abc7.com/7282142/

Example from https://6abc.com/7255376/:

JSON requested:

https://content.uplynk.com/api/v3/preplay/external/10b98e7c615f43a98b180d51797e74aa/102320-wpvi-fyi4u-halloween-CC-vid.json?platform=web&ad.tag=fyi-holidays%2Challoween%2Cfyi-philly%2Cfyi-events%2Cphiladelphia%2Ccommunity&ad.vast3=1&ad.v=2&ad.tfcd=0&ad.is_lat=0&ad.npa=0&ad.correlator=49683&ad=wpvi_vod&ad.adUnit=%2Fwpvi%2F6abc.com%2Fweb%2Fcommunity&ad.pp=otv-web-desktop&ad.vid=otv-7275653&ad.description_url=https%3A%2F%2F6abc.com%2F7255376%2F&ad.sz=640x480&ad.ppid=&ad.vpi=1&accountID=10b98e7c615f43a98b180d51797e74aa&externalID=102320-wpvi-fyi4u-halloween-CC-vid&euid=_000_0_001_SF&ad.cust_params=accesslevel%3D0%26beacTyp%3Dssai%26isAuth%3D0%26aff%3Dwpvi%26lang%3Den%26ait%3Dssai%26chan%3Dmisc%26objid%3D7255376%26isAutoplay%3D1%26isDnt%3D0%26isMute%3D0%26pgtyp%3Dpost%26vdm%3Dvod%26fb_token%3D%26plt%3Ddesktop%26stp%3Dvdms%26swid%3D%26dtci_yxbk%3D%26var%3D16x9%26vps%3D640x360%26noad%3D0%26refDomain%3Dhttps%253A%252F%252F6abc.com%252F7255376%252F%26unid%3D

Response: https://gist.github.com/llacb47/b779691ae61947f3d553ab8a8c2e8b3b
If you open the playURL from that JSON in any video player, it has SSAI.

Possible querystrip filter: ||content.uplynk.com/api/v3/preplay/external/$domain=6abc.com,querystrip=/^ad\./

If the hypothetical resulting JSON is requested: https://content.uplynk.com/api/v3/preplay/external/10b98e7c615f43a98b180d51797e74aa/102320-wpvi-fyi4u-halloween-CC-vid.json?platform=web&accountID=10b98e7c615f43a98b180d51797e74aa&externalID=102320-wpvi-fyi4u-halloween-CC-vid&euid=_000_0_001_SF

Response:

{"prefix": "https://content-ause2.uplynk.com", "ads": {"breaks": [], "breakOffsets": [], "placeholderOffsets": []}, "videoView": [], "sid": "88183284ff9643bdbcd677b38be390fe", "playURL": "https://content-ause2.uplynk.com/api/v3/preplay2/712cb2a3041149c79afa3964b00021ca/447659e71fac930d104a56b5f820da5a/12UMPd9E2kAQYHf9eRchJk6JhLspdMc7aUmFVvNrTz4L.m3u8?pbs=88183284ff9643bdbcd677b38be390fe"}

By opening this playURL in any video player, it can be seen that there is no preroll.

Case 2: NBC videos

URLs:

https://www.nbc.com/late-night-with-seth-meyers/video/michael-keaton-haim/4249694
https://www.nbc.com/saturday-night-live/video/october-17-issa-rae/4246222
https://www.nbc.com/dateline/video/a-promise-to-helene/4247160

Example from https://www.nbc.com/dateline/video/a-promise-to-helene/4247160:

Page requests an iframe at this URL (probably NA geolocked)

https://player.theplatform.com/p/jujdhC/xkaAQrhkr9IU/select/media/guid/2410887629/4247160?mute=false&autoPlay=true&playbackStartPosition=0&policy=147097231&mParticleId=6082210794167379875&params=mode%3Don-demand%26uuid%3Db2372c28-9c5d-46a9-b293-866a484b2ec5%26did%3Dedb63874-27fd-1287-1a61-c829619aa2f2%26rdid%3Dedb63874-27fd-1287-1a61-c829619aa2f2%26userAgent%3DMozilla%252F5.0%2520%2528Windows%2520NT%252010.0%253B%2520rv%253A78.0%2529%2520Gecko%252F20100101%2520Firefox%252F78.0%26am_crmid%3D6082210794167379875%26am_playerv%3Dnull%26am_sdkv%3Dnull%26am_appv%3Dnull%26am_buildv%3Dnull%26am_stitcherv%3Dpoc%26uoo%3D0%26am_cpsv%3D4.0.0-2%26fw_ae%3D%26metr%3D1023%26csid%3Dnbc_tveverywhere_vod_hub%26am_extmp%3Ddefault%26am_abvrtd%3D0%26am_abtestid%3D0%26nw%3D169843%26_fw_did%3Db2372c28-9c5d-46a9-b293-866a484b2ec5%26prof%3Dnbcu_web_svp_js_https%26afid%3D136164654%26sfid%3D1676939%26policy%3D147097231%26fallbackSiteSectionId%3D9244655%26siteSectionId%3Doneapp_desktop_computer_web_ondemand%26manifest%3Dm3u%26switch%3DHLSOriginSecure%26_fw_vcid2%3D169843%3A6082210794167379875%26_fw_h_referer%3Dwww.nbc.com%26schema%3D2.0&episodetitle=A%20Promise%20to%20Helene&nbcuProfile=false&brand=NBC&show=Dateline&MVPDid=undefined#playerurl=https%3A//www.nbc.com/dateline/video/a-promise-to-helene/4247160

There are prerolls and midrolls inserted server-side. You can visit the iframe by itself to see them as well as the url provided on the NBC site.

Possible $querystrip filter: ||player.theplatform.com/p/*/select/media/guid/$subdocument,domain=nbc.com,querystrip='params'

Resulting URL:

https://player.theplatform.com/p/jujdhC/xkaAQrhkr9IU/select/media/guid/2410887629/4247160?mute=false&autoPlay=true&playbackStartPosition=0&policy=147097231&mParticleId=6082210794167379875&episodetitle=A%20Promise%20to%20Helene&nbcuProfile=false&brand=NBC&show=Dateline&MVPDid=undefined#playerurl=https%3A//www.nbc.com/dateline/video/a-promise-to-helene/4247160

No more prerolls or midrolls.

I think this is more than little tangible benefits, what do you think @gorhill ?

@gorhill
Copy link
Member

gorhill commented Oct 24, 2020

@llacb47 Is there an extension to rewrite URLs which you can use to confirm that removing the query parameters does really remove the ads?

@liamengland1
Copy link

liamengland1 commented Oct 24, 2020

Yes, you can use https://einaregilsson.com/redirector/ to test. I just verified that it works as expected.

You can import the rules I used to test: https://gist.githubusercontent.com/llacb47/23a446ac1cc7763a4574a672420626fb/raw/555966752881f67859205c2ab4c579a61d8ad523/redirect-rules.json

I just found that removing parameters also gets rid of SSAI on Discovery Networks sites as well.

Test with these URLs:

https://go.discovery.com/tv-shows/growing-belushi/full-episodes/a-mission-from-god
https://www.tlc.com/tv-shows/90-day-fiance-the-other-way/full-episodes/ready-or-not
https://watch.hgtv.com/tv-shows/fixer-to-fabulous/full-episodes/dave-jennys-pick-dreary-home-gets-bright-update
https://www.sciencechannel.com/tv-shows/unearthed/full-episodes/secrets-of-the-seven-wonders
https://www.ahctv.com/tv-shows/manhunt-kill-or-capture/full-episodes/whitey-bulger-boston-mob-king

and this rule for redirector: https://gist.githubusercontent.com/llacb47/e0c78ffbb44b203da6c975796e7d6608/raw/d4ed40b979d13d6de4d130fe586cfc98383460b2/discovery-redirector.json

@liamengland1
Copy link

Any update?

@gorhill
Copy link
Member

gorhill commented Oct 28, 2020

It's not trivial to implement with as little impact as possible on performance, so this will have to take the time it takes to implement it.

gorhill added a commit to gorhill/uBlock that referenced this issue Oct 31, 2020
Related issue:
- uBlockOrigin/uBlock-issues#760

The purpose of this new network filter option is to remove
query parameters form the URL of network requests.

The name `queryprune` has been picked over `querystrip`
since the purpose of the option is to remove some
parameters from the URL rather than all parameters.

`queryprune` is a modifier option (like `csp`) in that it
does not cause a network request to be blocked but rather
modified before being emitted.

`queryprune` must be assigned a value, which value will
determine which parameters from a query string will be
removed. The syntax for the value is that of regular
expression *except* for the following rules:

- do not wrap the regex directive between `/`
- do not use regex special values `^` and `$`
- do not use literal comma character in the value,
  though you can use hex-encoded version, `\x2c`
- to match the start of a query parameter, prepend `|`
- to match the end of a query parameter, append `|`

`queryprune` regex-like values will be tested against each
key-value parameter pair as `[key]=[value]` string. This
way you can prune according to either the key, the value,
or both.

This commit introduces the concept of modifier filter
options, which as of now are:

- `csp=`
- `queryprune=`

They both work in similar way when used with `important`
option or when used in exception filters. Modifier
options can apply to any network requests, hence the
logger reports the type of the network requests, and no
longer use the modifier as the type, i.e. `csp` filters
are no longer reported as requests of type `csp`.

Though modifier options can apply to any network requests,
for the time being the `csp=` modifier option still apply
only to top or embedded (frame) documents, just as before.
In some future we may want to apply `csp=` directives to
network requests of type script, to control the behavior
of service workers for example.

A new built-in filter expression has been added to the
logger: "modified", which allow to see all the network
requests which were modified before being emitted. The
translation work for this new option will be available
in a future commit.
@gorhill gorhill changed the title Implement $querystrip parameter Implement $queryprune parameter Oct 31, 2020
@gorhill
Copy link
Member

gorhill commented Oct 31, 2020

It's in the latest dev build. See commit message for usage -- I do not want to provide details in release notes yet, I prefer filter list authors to experiment with usage to find out if fine tuning is necessary.

@uBlock-user
Copy link
Contributor

uBlock-user commented Oct 31, 2020

Usage example - ||reddit.com^$queryprune=utm_, ||youtube.com^$queryprune=fbclid|gclid

@gorhill
Copy link
Member

gorhill commented Oct 31, 2020

Avoiding queryprune from being visited at all is best, I do hope filter authors will be as carefully as possible when crafting queryprune filters as I am careful at minimizing all overhead in the code -- otherwise all the coding efforts are going to waste. So typically the query parameter of interest will be part of the filter pattern:

||reddit.com^*utm_$queryprune=|utm_
||youtube.com^*fbclid$queryprune=fbclid
||youtube.com^*gclid$queryprune=gclid

This way uBO will scan the query parameters only when the URL is found to match the targeted query parameters. Mind performance when crafting filters. Your proposed filters forces uBO to scan every URL matching reddit.com and youtube.com.

Additionally, prepending queryprune values with | when the match is of the "starts with" kind also helps.

@curiosityseeker
Copy link

Many other parameters are shown on the Neat-URL site here and here.

@majonezzz
Copy link

majonezzz commented Nov 1, 2020

This comparision table from ClearURLs Wiki might be useful as well.

@gorhill
Copy link
Member

gorhill commented Nov 1, 2020

To be clear, the purpose of queryprune is not to replace URL cleaners, so it shouldn't be compared to these -- its purpose is only to remove query parameters, not to rewrite URLs, at most uBO's queryprune seems to match what Neat URL does, nothing more.

gorhill added a commit to gorhill/uBlock that referenced this issue Nov 1, 2020
Reported internally.

Regression from:
- 1e2eb03

Related issue:
- uBlockOrigin/uBlock-issues#760
@lain566

This comment has been minimized.

@gorhill

This comment has been minimized.

@lain566

This comment has been minimized.

@gorhill

This comment has been minimized.

@lain566

This comment has been minimized.

@gorhill

This comment has been minimized.

@gorhill

This comment has been minimized.

@lain566

This comment has been minimized.

kowith337 added a commit to kowith337/PersonalFilterListCollection that referenced this issue Nov 1, 2020
Hosts & pDNSF rules
- Add multiple `Adjust` tracking domains, in pDNSF will use wildcard instead.

uBO Test list
- Add query prune experiment for enhance privacy (a little bit?)
* Note: Query prune tests is require install dev build of uBO version 1.30.9b1 or above, more info can be found at uBlockOrigin/uBlock-issues#760
@gwarser
Copy link

gwarser commented Nov 1, 2020

Does it make sense to use separator placeholder (^) to math the boundary of the parameter in the filter matching part?

And how about using = in filter or option to mark boundary on the other side?

! .com/?utm_anything=
! .com/?notutm_u=&utm_anything=

||reddit.com^*^utm_*=$queryprune=|utm_

! &fbclid=

||youtube.com^*^fbclid=$queryprune=|fbclid=

WARNING! Using ...^*^.. is not optimal - #760 (comment)

@gwarser
Copy link

gwarser commented Nov 1, 2020

Filters:

||example.com^*^ga_*=$queryprune=|ga_
||example.com^*^utm_*=$queryprune=|utm_

Address:

http://example.com/?ga_asdf=2&utm_asdf=1

Result:

The page isn’t redirecting properly

Firefox has detected that the server is redirecting the request for this address in a way that will never complete.

1.30.9b3


WARNING! Using ...^*^.. is not optimal - #760 (comment)

@gorhill
Copy link
Member

gorhill commented Mar 2, 2023

Makes no difference, uBO can extract a token in either case. 1st = connatix, 2nd = connatix or cid.

@Yuki2718
Copy link

Yuki2718 commented Mar 2, 2023

Okay, so doesn't matter as long as at least one good token can be extracted. Thx!

@MasterKia
Copy link
Member

@Yuki2718
Copy link

Yuki2718 commented Mar 2, 2023

Wait, is ||reddit.com^$removeparam=utm_ bad? Token-wise, it's no different from ||reddit.com^*utm_$removeparam=utm_.

@MasterKia
Copy link
Member

MasterKia commented Mar 2, 2023

||reddit.com^$removeparam=utm_

gorhill: This filter forces uBO to scan every URL matching reddit.com.

||reddit.com^*utm_$removeparam=utm_

gorhill: This way uBO will scan the query parameters only when the URL is found to match the targeted query parameters.

@Yuki2718
Copy link

Yuki2718 commented Mar 2, 2023

That's essentially what I've asked. @gorhill Maybe the old comment could be edited to prevent confusion as

||reddit.com^*^utm_$queryprune=|utm_
||youtube.com^*^fbclid$queryprune=fbclid
||youtube.com^*^gclid$queryprune=gclid

@mtxadmin
Copy link
Author

mtxadmin commented Mar 5, 2023

Today I found $removeparam=param is case-sensitive even without regex. Maybe this should be documented in Wiki.

Hmm... unexpectedly...

@mtxadmin
Copy link
Author

One more caveat

The rule "$removeparam=/word/" will remove any parameter which value contains "word" - for instance, clears google etc. query with that word. Turned out, this is described in the wiki, but still looks weird.

https://github.com/gorhill/uBlock/wiki/Static-filter-syntax#removeparam :
"When using a literal regular expression, it gets tested against each query parameter name-value pair assembled into a single string as name=value."

@dimisa-RUAdList
Copy link

If the link contains #, then the removeparam does not work.

Example: https://vk.com/playgroundru?w=wall-100235_593191

uBlock Origin development build 1.50.1rc3 + RU AdList + Counters

Screenshot(s)

vk

@iam-py-test
Copy link
Contributor

That's because everything after the # is no longer part of the query string: https://developer.mozilla.org/en-US/docs/Web/API/Location/hash
It would be nice to have an additional scriptlet or filter option to remove it, but IMO this is out of scope of removeparam.

@gorhill
Copy link
Member

gorhill commented Jul 19, 2023

If you have tracking stuff after #, then this means client-side JS is being used to deal with the tracking data, it's then a matter of finding out if and what network requests are being fired as a result of that client-side JS parsing the hash portion of the URL.


I suppose we could have a scriptlet to deal with this.

@dimisa-RUAdList
Copy link

I noticed that ClearURLs solves this problem, so I had an idea to improve the removeparam.

@krystian3w
Copy link

krystian3w commented Jul 20, 2023

Interia started mask tracking utm_* under #.

https://www.interia.pl/#utm_source=Wydarzenia&utm_medium=logo&utm_campaign=powrot_z_wew&iwa_source=logo
https://wydarzenia.interia.pl/#iwa_source=logo
https://poczta.interia.pl/#iwa_source=sg_ikona
https://www.pomponik.pl/#iwa_source=sg_ikona
https://programtv.interia.pl/#iwa_source=side_menu

@KamiyaMinoru

This comment was marked as off-topic.

@MasterKia

This comment was marked as off-topic.

@krystian3w

This comment was marked as off-topic.

@D4niloMR
Copy link

How to remove $deep_link parameter with $removeparam?

$removeparam=$deep_link is discarded as invalid filter

And $removeparam=\$web_only is trying to remove as is

@uBlock-user
Copy link
Contributor

@D4niloMR example ?

@D4niloMR
Copy link

Add

||click.redditmail.com/CL0/$urlskip=/CL0\/.*?(www\.reddit\.com.+?)(?:\?|%3F)/ -uricomponent +https,badfilter
||click.redditmail.com/CL0/$urlskip=/\/CL0\/(http.*?)\/\d\/[a-f0-9-]+\// -uricomponent
||reddit.com^$doc,removeparam=correlation_id
||reddit.com^$doc,removeparam=ref
||reddit.com^$doc,removeparam=ref_campaign
||reddit.com^$doc,removeparam=ref_source
||reddit.com^$doc,removeparam=utm_content

And visit

https://click.redditmail.com/CL0/https:%2F%2Fwww.reddit.com%2F%3F$deep_link=true%26correlation_id=eec69afb-bdc1-47ef-a74b-2e3f0a96ec70%26ref=email_comment_reply%26ref_campaign=email_comment_reply%26ref_source=email/1/0100019288cc8ae6-731d17b5-9584-4504-8704-98a04b1c18f9-000000/fGIrIekyrcrxrhSW3OnhQX9rVeSmrQERcNZ6eRUmC20=374

@gwarser
Copy link

gwarser commented Dec 13, 2024

$removeparam=/^\$deep_link=/

gorhill added a commit to gorhill/uBlock that referenced this issue Dec 13, 2024
Related feedback:
uBlockOrigin/uBlock-issues#760 (comment)

Using quotes in filter option values is meant to remove ambiguity
when the value contains special characters. This was not working when
the value started with `$`. For example, fixes usage of quotes in:

  $removeparam='$deep_link'

Also, fixed logger output for scriptlets using empty parameters
in quotes.
@gorhill
Copy link
Member

gorhill commented Dec 13, 2024

I wanted to suggest using quotes, but turns out this didn't work as I expected:

$removeparam='$deep_link'

Fixed with gorhill/uBlock@8ba71f0.

@D4niloMR
Copy link

It works, but $ can only be the first character. This won't work: $removeparam='$$deep_link'

@gorhill
Copy link
Member

gorhill commented Dec 13, 2024

$$ is unfortunately an AdGuard-specific delimiter, so when uBO encounters $$ anywhere, it considers the filter invalid.

But you are right anyways, it doesn't handle all cases:

*$removeparam='d$eep_link'

I can probably improve but I have to be very careful to not cause regression in the parser for all existing filters, and I have to ensure that the fix doesn't introduce a more costly overhead to parsing. At least for the time being one can always use the regex workaround:

*$removeparam=/^d\$eep_link/

@uBlock-user
Copy link
Contributor

$$ is unfortunately an AdGuard-specific delimiter, so when uBO encounters $$ anywhere, it considers the filter invalid.

regex will still work *$removeparam=/^\$\$deep_link/

@ameshkov
Copy link

@gorhill what do you think about supporting escaping $? I.e. if you encounter||example.org^$opt=\$ then the value of opt should be $, and if you encounter opt=\\$, then the value is \$. IMO, it seems like a universal approach that people are used to have in other places.

@gorhill
Copy link
Member

gorhill commented Dec 17, 2024

Forcing maintainers to escape \ makes it too inconvenient and more prone to errors to craft regexes, so I prefer to avoid this -- and this would actually breaks current filters since they were written without the need to escape. In the current case just rewriting as a regex works. Ultimately, finding the real anchor position should take into account whether what follows $ is a valid filter option, i.e. the right-most $ with a valid filter option to its right is the option anchor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request fixed issue has been addressed
Projects
None yet
Development

No branches or pull requests