Skip to content

Commit

Permalink
New cosmetic filter parser using CSSTree library
Browse files Browse the repository at this point in the history
The new parser no longer uses the browser DOM to validate
that a cosmetic filter is valid or not, this is now done
through a JS library, CSSTree.

This means filter list authors will have to be more careful
to ensure that a cosmetic filter is really valid, as there is
no more guarantee that a cosmetic filter which works for a
given browser/version will still work properly on another
browser, or different version of the same browser.

This change has become necessary because of many reasons,
one of them being the flakiness of the previous parser as
exposed by many issues lately:

- uBlockOrigin/uBlock-issues#2262
- uBlockOrigin/uBlock-issues#2228

The new parser introduces breaking changes, there was no way
to do otherwise. Some current procedural cosmetic filters will
be shown as invalid with this change. This occurs because the
CSSTree library gets confused with some syntax which was
previously allowed by the previous parser because it was more
permissive.

Mainly the issue is with the arguments passed to some procedural
cosmetic filters, and these issues can be solved as follow:

Use quotes around the argument. You can use either single or
double-quotes, whichever is most convenient. If your argument
contains a single quote, use double-quotes, and vice versa.

Additionally, try to escape a quote inside an argument using
backslash. THis may work, but if not, use quotes around the
argument.

When the parser encounter quotes around an argument, it will
discard them before trying to process the argument, same with
escaped quotes inside the argument. Examples:

Breakage:

    ...##^script:has-text(toscr')

Fix:

    ...##^script:has-text(toscr\')

Breakage:

    ...##:xpath(//*[contains(text(),"VPN")]):upward(2)

Fix:

    ...##:xpath('//*[contains(text(),"VPN")]'):upward(2)

There are not many filters which break in the default set of
filter lists, so this should be workable for default lists.

Unfortunately those fixes will break the filter for previous
versions of uBO since these to not deal with quoted argument.
In such case, it may be necessary to keep the previous filter,
which will be discarded as broken on newer version of uBO.

THis was a necessary change as the old parser was becoming
more and more flaky after being constantly patched for new
cases arising, The new parser should be far more robust and
stay robist through expanding procedural cosmetic filter
syntax.

Additionally, in the MV3 version, filters are pre-compiled
using a Nodejs script, i.e. outside the browser, so validating
cosmetic filters using a live DOM no longer made sense.

This new parser will have to be tested throughly before stable
release.
  • Loading branch information
gorhill committed Sep 23, 2022
1 parent fe21ce5 commit a71b71e
Show file tree
Hide file tree
Showing 14 changed files with 556 additions and 550 deletions.
5 changes: 5 additions & 0 deletions platform/common/vapi-common.js
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@

// For background page or non-background pages

/* global browser */

'use strict';

/******************************************************************************/
Expand Down Expand Up @@ -89,6 +91,9 @@ vAPI.webextFlavor = {
soup.add('chromium')
.add('user_stylesheet');
flavor.major = parseInt(match[1], 10) || 0;
if ( flavor.major >= 105 ) {
soup.add('native_css_has');
}
}

// Don't starve potential listeners
Expand Down
2 changes: 1 addition & 1 deletion platform/mv3/make-rulesets.js
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ const outputDir = commandLineArgs.get('output') || '.';
const cacheDir = `${outputDir}/../mv3-data`;
const rulesetDir = `${outputDir}/rulesets`;
const scriptletDir = `${rulesetDir}/js`;
const env = [ 'chromium', 'ubol' ];
const env = [ 'chromium', 'ubol', 'native_css_has' ];

/******************************************************************************/

Expand Down
1 change: 1 addition & 0 deletions src/about.html
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@
<div class="li"><span><a href="https://github.com/foo123/RegexAnalyzer" target="_blank">Regular Expression Analyzer</a> by <a href="https://github.com/foo123">Nikos M.</a></span></div>
<div class="li"><span><a href="https://github.com/hsluv/hsluv" target="_blank">HSLuv - Human-friendly HSL</a> by <a href="https://github.com/boronine">Alexei Boronine</a></span></div>
<div class="li"><span><a href="https://searchfox.org/mozilla-central/rev/d317e93d9a59c9e4c06ada85fbff9f6a1ceaaad1/browser/extensions/webcompat/shims/google-ima.js" target="_blank">google-ima.js</a> by <a href="https://www.mozilla.org/">Mozilla</a></span></div>
<div class="li"><span><a href="https://github.com/csstree/csstree" target="_blank">CSSTree</a> by <a href="https://github.com/lahmatiy">Roman Dvornov</a></span></div>
</div>
<div class="li"><span data-i18n="aboutCDNs"></span></div>
<div class="liul">
Expand Down
67 changes: 35 additions & 32 deletions src/js/contentscript-extra.js
Original file line number Diff line number Diff line change
Expand Up @@ -95,19 +95,6 @@ class PSelectorMatchesCSSTask extends PSelectorTask {
}
}
}
class PSelectorMatchesCSSAfterTask extends PSelectorMatchesCSSTask {
constructor(task) {
super(task);
this.pseudo = '::after';
}
}

class PSelectorMatchesCSSBeforeTask extends PSelectorMatchesCSSTask {
constructor(task) {
super(task);
this.pseudo = '::before';
}
}

class PSelectorMatchesMediaTask extends PSelectorTask {
constructor(task) {
Expand Down Expand Up @@ -247,6 +234,20 @@ class PSelectorSpathTask extends PSelectorTask {
output.push(node);
}
}
// Helper method for other operators.
static qsa(node, selector) {
const parent = node.parentElement;
if ( parent === null ) { return []; }
let pos = 1;
for (;;) {
node = node.previousElementSibling;
if ( node === null ) { break; }
pos += 1;
}
return parent.querySelectorAll(
`:scope > :nth-child(${pos})${selector}`
);
}
}

class PSelectorUpwardTask extends PSelectorTask {
Expand Down Expand Up @@ -339,23 +340,20 @@ class PSelector {
constructor(o) {
if ( PSelector.prototype.operatorToTaskMap === undefined ) {
PSelector.prototype.operatorToTaskMap = new Map([
[ ':has', PSelectorIfTask ],
[ ':has-text', PSelectorHasTextTask ],
[ ':if', PSelectorIfTask ],
[ ':if-not', PSelectorIfNotTask ],
[ ':matches-css', PSelectorMatchesCSSTask ],
[ ':matches-css-after', PSelectorMatchesCSSAfterTask ],
[ ':matches-css-before', PSelectorMatchesCSSBeforeTask ],
[ ':matches-media', PSelectorMatchesMediaTask ],
[ ':matches-path', PSelectorMatchesPathTask ],
[ ':min-text-length', PSelectorMinTextLengthTask ],
[ ':not', PSelectorIfNotTask ],
[ ':nth-ancestor', PSelectorUpwardTask ],
[ ':others', PSelectorOthersTask ],
[ ':spath', PSelectorSpathTask ],
[ ':upward', PSelectorUpwardTask ],
[ ':watch-attr', PSelectorWatchAttrs ],
[ ':xpath', PSelectorXpathTask ],
[ 'has', PSelectorIfTask ],
[ 'has-text', PSelectorHasTextTask ],
[ 'if', PSelectorIfTask ],
[ 'if-not', PSelectorIfNotTask ],
[ 'matches-css', PSelectorMatchesCSSTask ],
[ 'matches-media', PSelectorMatchesMediaTask ],
[ 'matches-path', PSelectorMatchesPathTask ],
[ 'min-text-length', PSelectorMinTextLengthTask ],
[ 'not', PSelectorIfNotTask ],
[ 'others', PSelectorOthersTask ],
[ 'spath', PSelectorSpathTask ],
[ 'upward', PSelectorUpwardTask ],
[ 'watch-attr', PSelectorWatchAttrs ],
[ 'xpath', PSelectorXpathTask ],
]);
}
this.raw = o.raw;
Expand All @@ -374,7 +372,12 @@ class PSelector {
prime(input) {
const root = input || document;
if ( this.selector === '' ) { return [ root ]; }
return Array.from(root.querySelectorAll(this.selector));
let selector = this.selector;
if ( input !== document && /^ [>+~]/.test(this.selector) ) {
return Array.from(PSelectorSpathTask.qsa(input, this.selector));
}
const elems = root.querySelectorAll(selector);
return Array.from(elems);
}
exec(input) {
let nodes = this.prime(input);
Expand Down Expand Up @@ -453,7 +456,7 @@ class ProceduralFilterer {
let style, styleToken;
if ( selector.action === undefined ) {
style = vAPI.hideStyle;
} else if ( selector.action[0] === ':style' ) {
} else if ( selector.action[0] === 'style' ) {
style = selector.action[1];
}
if ( style !== undefined ) {
Expand Down
6 changes: 3 additions & 3 deletions src/js/cosmetic-filtering.js
Original file line number Diff line number Diff line change
Expand Up @@ -429,7 +429,7 @@ FilterContainer.prototype.compileGenericHideSelector = function(
// https://github.com/uBlockOrigin/uBlock-issues/issues/131
// Support generic procedural filters as per advanced settings.
// TODO: prevent double compilation.
if ( compiled !== raw ) {
if ( compiled.charCodeAt(0) === 0x7B /* '{' */ ) {
if ( µb.hiddenSettings.allowGenericProceduralFilters === true ) {
return this.compileSpecificSelector(parser, '', false, writer);
}
Expand Down Expand Up @@ -830,12 +830,12 @@ FilterContainer.prototype.cssRuleFromProcedural = function(json) {
let mq;
if ( tasks !== undefined ) {
if ( tasks.length > 1 ) { return; }
if ( tasks[0][0] !== ':matches-media' ) { return; }
if ( tasks[0][0] !== 'matches-media' ) { return; }
mq = tasks[0][1];
}
let style;
if ( Array.isArray(action) ) {
if ( action[0] !== ':style' ) { return; }
if ( action[0] !== 'style' ) { return; }
style = action[1];
}
if ( mq === undefined && style === undefined ) { return; }
Expand Down
5 changes: 4 additions & 1 deletion src/js/epicker-ui.js
Original file line number Diff line number Diff line change
Expand Up @@ -834,7 +834,10 @@ const startPicker = function() {
$id('candidateFilters').addEventListener('click', onCandidateClicked);
$stor('#resultsetDepth input').addEventListener('input', onDepthChanged);
$stor('#resultsetSpecificity input').addEventListener('input', onSpecificityChanged);
staticFilteringParser = new StaticFilteringParser({ interactive: true });
staticFilteringParser = new StaticFilteringParser({
interactive: true,
nativeCssHas: vAPI.webextFlavor.env.includes('native_css_has'),
});
};

/******************************************************************************/
Expand Down
4 changes: 3 additions & 1 deletion src/js/messaging.js
Original file line number Diff line number Diff line change
Expand Up @@ -1721,7 +1721,9 @@ const getURLFilteringData = function(details) {
};

const compileTemporaryException = function(filter) {
const parser = new StaticFilteringParser();
const parser = new StaticFilteringParser({
nativeCssHas: vAPI.webextFlavor.env.includes('native_css_has'),
});
parser.analyze(filter);
if ( parser.shouldDiscard() ) { return; }
return staticExtFilteringEngine.compileTemporary(parser);
Expand Down
4 changes: 3 additions & 1 deletion src/js/reverselookup.js
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,9 @@ const fromNetFilter = async function(rawFilter) {
if ( typeof rawFilter !== 'string' || rawFilter === '' ) { return; }

const writer = new CompiledListWriter();
const parser = new StaticFilteringParser();
const parser = new StaticFilteringParser({
nativeCssHas: vAPI.webextFlavor.env.includes('native_css_has'),
});
parser.setMaxTokenLength(staticNetFilteringEngine.MAX_TOKEN_LENGTH);
parser.analyze(rawFilter);

Expand Down
16 changes: 8 additions & 8 deletions src/js/static-dnr-filtering.js
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,7 @@ function addExtendedToDNR(context, parser) {
if ( bad ) { continue; }
if ( hn.endsWith('.*') ) { continue; }
const { compiled, exception } = parser.result;
if ( typeof compiled !== 'string' ) { continue; }
if ( compiled.startsWith('{') ) { continue; }
if ( exception ) { continue; }
let details = context.cosmeticFilters.get(compiled);
Expand Down Expand Up @@ -126,14 +127,14 @@ function addExtendedToDNR(context, parser) {
/******************************************************************************/

function addToDNR(context, list) {
const env = context.env || [];
const writer = new CompiledListWriter();
const lineIter = new LineIterator(
StaticFilteringParser.utils.preparser.prune(
list.text,
context.env || []
)
StaticFilteringParser.utils.preparser.prune(list.text, env)
);
const parser = new StaticFilteringParser();
const parser = new StaticFilteringParser({
nativeCssHas: env.includes('native_css_has'),
});
const compiler = staticNetFilteringEngine.createCompiler(parser);

writer.properties.set('name', list.name);
Expand Down Expand Up @@ -180,10 +181,9 @@ function addToDNR(context, list) {
/******************************************************************************/

async function dnrRulesetFromRawLists(lists, options = {}) {
const context = {};
const context = Object.assign({}, options);
staticNetFilteringEngine.dnrFromCompiled('begin', context);
context.extensionPaths = new Map(options.extensionPaths || []);
context.env = options.env;
context.extensionPaths = new Map(context.extensionPaths || []);
const toLoad = [];
const toDNR = (context, list) => addToDNR(context, list);
for ( const list of lists ) {
Expand Down
Loading

3 comments on commit a71b71e

@uBlock-user
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will these breaking changes get fixed eventually ?

@u-RraaLL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the syntax errors for the new version be accordingly highlighted in the picker/my filters or will the difference have to be trial-and-error tested instead (e.g. by using a style operator instead)?

@gorhill
Copy link
Owner Author

@gorhill gorhill commented on a71b71e Sep 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The breaking changes will require the filters to be fixed.

The filter will be shown as invalid filters in the editor. Most of the problematic filters are procedural ones, The rejected filters will be shown in log.txt next time I release a new uBOL version since I am adding procedural cosmetic support to it.

Please sign in to comment.