Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regular Expression finds #58287

Open
1 task done
RyanCavanaugh opened this issue Apr 22, 2024 · 11 comments
Open
1 task done

Regular Expression finds #58287

RyanCavanaugh opened this issue Apr 22, 2024 · 11 comments
Labels
Discussion Issues which may not have code impact

Comments

@RyanCavanaugh
Copy link
Member

RyanCavanaugh commented Apr 22, 2024

Acknowledgement

  • I acknowledge that issues using this template may be closed without further explanation at the maintainer's discretion.

Comment

Note: I eventually gave up on capturing "Not available unless target is ESXXXX" errors since they're not really interesting to look at

Via #58275

This character cannot be escaped in a regular expression.

const image_path_escape = image_path.replace(/\o/g, '/o') //escape string "\o" in "\output"

Named capturing groups are only available when targeting 'ES2018' or later

/^((?<negative>-)|\+)?P((?<years>\d*)Y)?((?<months>\d*)M)?((?<weeks>\d*)W)?((?<days>\d*)D)?((?<time>T)((?<hours>\d*[.,]?\d{1,9})H)?((?<minutes>\d*[.,]?\d{1,9})M)?((?<seconds>\d*[.,]?\d{1,9})S)?)?$/

Named capturing groups are only available when targeting 'ES2018' or later.

const IMPORT_REGEX = /(?<key>import|export)\s+(?:(?<alias>[\w,{}\s*]+)\s+from)?\s*(?:(?<quote>["'])?(?<ref>[@\w\s\\/.-]+)\3?)\s*(?<term>[;\n])/g

Named capturing groups are only available when targeting 'ES2018' or later

const match = text.match(/^(?<description>(.|\n)*)```(?<language>[^\n]+)\n(?<code>(.|\n)+)\n```$/m);

This regular expression flag is only available when targeting 'es2018' or later

return fileContent.replace(/<!--.*?-->/gs, '');

This character cannot be escaped in a regular expression

const fixedId = listItem.id.replace(/\_/g, "/").replace(/\-/g, "+");

Named capturing groups are only available when targeting 'ES2018' or later

const INPUT_EXTENSION_IMPORT_REGEX = /\.(svelte|(lite(\.tsx|\.jsx)?))(?<quote>['"])/g;

Octal escape sequences are not allowed. Use the syntax '\x04'

const propsRegex = /props\s*\.\s*([a-zA-Z0-9_\4]+)\(/;

Named capturing groups are only available when targeting 'ES2018' or later

private static SSH_PATH_RE = new RegExp(
    [
        /^\s*/,
        /(?:(?<proto>[a-z]+):\/\/)?/,
        /(?:(?<user>[a-z_][a-z0-9_-]+)@)?/,
        /(?<domain>[^\s\/\?#:]+)/,
        /(?::(?<port>[0-9]{1,5}))?/,
        /(?:[\/:](?<owner>[^\s\/\?#:]+))?/,
        /(?:[\/:](?<repo>(?:[^\s\?#:.]|\.(?!git\/?\s*$))+))/,
        /(?:.git)?\/?\s*$/,
    ]

Named capturing groups are only available when targeting 'ES2018' or later

const regexp = /\[(?<link>http:\/\/[^\]]+)\]/g

A character class range must not be bounded by another character class

this.relocDataSymNameRe = /^(?<symname>[^\d-+][\w.]*)?\s*(?<addend_or_value>.*)$/;

filepath.replace(/^C:\/Users\/[\w\d-.]*\/AppData\/Local\/Temp\/compiler-explorer-compiler[\w\d-.]*\//, '/app/')

const ATFILELINE_RE = /\s*at ([\w-/.]+):(\d+)/;

const selectedPassRe = /[0-9]*(i|t|r)\.([\w-_]*)/;

Octal escape sequences are not allowed. Use the syntax '\x02'.

const shellChars = /[\002-\011\013-\032\\#?`(){}[\]^*<=>~|; "!$&'\202-\377]/;

This character cannot be escaped in a regular expression.

private readonly nameWithOwner = /(?<owner>-?[a-z0-9][a-z0-9\-\_]*)\/(?<name>(?:\w|\.|\-)+)/;

const isURlCustomFormat = /\.[a-z]+\z/.test(anchor.href);

Octal escape sequences are not allowed. Use the syntax '\x02'

const regexp = /([^\s'"]+(['"])([^\2]*?)\2)|[^\s'"]+|(['"])([^\4]*?)\4/gi;

A character class range must not be bounded by another character class

// Source: https://stackoverflow.com/a/8234912/2013580
const urlRegExp = new RegExp(
  /((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=+$,\w]+@)?[A-Za-z0-9.-]+|(?:www.|[-;:&=+$,\w]+@)[A-Za-z0-9.-]+)((?:\/[+~%/.\w-_]*)?\??(?:[-+=&;%@.\w_]*)#?(?:[\w]*))?)/,
);

This regular expression flag is only available when targeting 'es2022' or later

// this regex is different from HASHTAG_REGEX in that it does not look for a
// #+character. It uses a negative look-ahead for `# `
const HASH_REGEX =
  /(?<=^|\s)#(?![ \t#])([0-9]*[\p{L}\p{Emoji_Presentation}\p{N}/_-]*)/dgu;

This regular expression flag is only available when targeting 'es2018' or later

return message.replace(/([{}](?:.*[{}])?)/su, `'$1'`)

This regular expression flag is only available when targeting 'es6' or later

return message.replace(/([{}](?:.*[{}])?)/su, `'$1'`)

Octal escape sequences are not allowed. Use the syntax '\x00'

// Since negative lookbehind isn't supported in all browsers, this leaves out the negative lookbehind condition `(?<!\.lock)` to ensure the branch name doesn't end with `.lock`
const validBranchOrTagRegex = /^[^/](?!.*\/\.)(?!.*\.\.)(?!.*\/\/)(?!.*@\{)[^\000-\037\177 ~^:?*[\\]+[^./]$/;

// Since negative lookbehind isn't supported in all browsers, leave out the negative lookbehind condition `(?<!\.lock)` to ensure the branch name doesn't end with `.lock`
const refRegexShared = /\b((?!.*\/\.)(?!.*\.\.)(?!.*\/\/)(?!.*@\{)[^\000-\037\177 ,~^:?*[\\]+[^ ./])\b/gi;

This regular expression flag is only available when targeting 'es2018' or later

if (!/(\{\{.+?\}\})|(\{#.+?#\})|(\{%.+?%\})/s.test(str)) {

A character class range must not be bounded by another character class

const rUrl = /((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=+$,\w]+@)?[A-Za-z0-9.-]+|(?:www.|[-;:&=+$,\w]+@)[A-Za-z0-9.-]+)((?:\/[+~%/.\w-_]*)?\??(?:[-+=&;%@.\w_]*)#?(?:[.!/\\w]*))?)/;

This character cannot be escaped in a regular expression

    expect(data['message']).toMatch(
      /Malformed FormData request. \_*Response.formData: Could not parse content as FormData./
    )

This regular expression flag is only available when targeting 'es6' or later

const validBundleID = /^([a-zA-Z]([a-zA-Z0-9_])*\.)+[a-zA-Z]([a-zA-Z0-9_])*$/u

There is nothing available for repetition

const regExp: RegExp = /const foo *= *{0x1: *'bar'};/;

'}' expected

/tag`foo *\${0x1 *\+ *0x1} *bar`;/

A character class range must not be bounded by another character class

if (!/^([\w-.]*)$/.test(name)) {

A character class range must not be bounded by another character class.

return str.replace(/^(\w)|[\s-_:]+(\w)/g, function (match, p1, p2) {

A character class range must not be bounded by another character class

const urlGithubRE = /^(?:https:\/\/(?:github\.com|api\.github\.com\/repos)|(?:\/)?(?:\/)?repos)([\w-.?!=&%*+:@\/]*)/g;

This character cannot be escaped in a regular expression.

const H_REGEX = /(?<tag>[\w\-]+)?(?:#(?<id>[\w\-]+))?(?<class>(?:\.(?:[\w\-]+))*)(?:@(?<name>(?:[\w\_])+))?/;

This regular expression flag is only available when targeting 'es2022' or later.

const markRegex = /\bMARK:\s*(.*)$/d;

Octal escape sequences are not allowed. Use the syntax '\x09'.

function cssEscape(str: string): string {
	return str.replace(/[\11\12\14\15\40]/g, '/'); // HTML class names can not contain certain whitespace characters, use / instead, which doesn't exist in file names.
}

A character class range must not be bounded by another character class.

const fileRegex = /(file:\/\/)?([a-zA-Z]:(\\\\|\\|\/)|(\\\\|\\|\/))?([\w-\._]+(\\\\|\\|\/))+[\w-\._]*/g;

A character class range must not be bounded by another character class.

/^\w([\w-.]*\w)?$/.test(x.preferredUsername)

Named capturing groups are only available when targeting 'ES2018' or later

const deprecation = (propDescriptor.description || '').match(/@deprecated(\s+(?<info>.*))?/);

A character class range must not be bounded by another character class.

let isText = /^[\w-\s.,\t\n]+$/.test(detail)

This character cannot be escaped in a regular expression

return tag.match(/^(?![\.\-])([a-zA-Z0-9\_\.\-])+$/g);

A character class range must not be bounded by another character class

const urlRegex = () =>
  /((?:https?(?::\/\/))(?:www\.)?(?:[a-zA-Z\d-_.]+(?:(?:\.|@)[a-zA-Z\d]{2,})|localhost)(?:(?:[-a-zA-Z\d:%_+.~#!?&//=@]*)(?:[,](?![\s]))*)*)/g;

A character class range must not be bounded by another character class

export function expandDefaultServerVariables(url: string, variables: object = {}) {
  return url.replace(
    /(?:{)([\w-.]+)(?:})/g,
    (match, name) => (variables[name] && variables[name].default) || match,
  );
}

dozens of these in this file, see #58275 (comment)

A decimal escape must refer to an existent capturing group. There are only 1 capturing groups in this regular expression

/([^a-zA-Z0-9\s{(\[<])(?:(?!\2)[^\\]|\\[\s\S])*\2(?:(?!\2)[^\\]|\\[\s\S])*\2/

A character class range must not be bounded by another character class

    // eslint-disable-next-line @typescript-eslint/prefer-regexp-exec
    const githubMatch = location.match(/https:\/\/github.com\/([\w-_]+\/[\w-_]+)/i);

Unicode property value expressions are only available when the Unicode (u) flag or the Unicode Sets (v) flag is set

    slug: ['', unicodePatternValidator(/^[\p{Letter}0-9._-]+$/)],

A character class range must not be bounded by another character class

export const wordPattern = /(#?-?\d*\.\d\w*%?)|([$@#!.:]?[\w-?]+%?)|[$@#!.]/g;

return stream.advanceIfRegExp(/^[_:\w][_:\w-.\d]*/).toLowerCase();
@rbuckton
Copy link
Member

I expect all of the errors not related to --target are a result of regular expressions that are allowed per Annex B.

@rbuckton
Copy link
Member

IMO, all of the "Octal escape sequences are not allowed" and "A decimal escape must refer to an existent capturing group" are probably indications of actual errors in user code. They're allowed in Annex B, but the user likely intended to use them as a backreference to a capture group and that's not how Annex B would treat them.

All of the "A character class range must not be bounded by another character class" errors are probably fine and shouldn't be reported. Annex B allows them and most users wrote something like [\w-.] or the like thinking it meant "word characters, -, and .", which is how Annex B treats them.

@graphemecluster
Copy link
Contributor

Ah, I did once mentioned this on my PR and thought it was fine since Ryan reacted on my comment.
#55600 (comment)
I am fine with weakening the grammar, however keep in mind that we can’t guarantee everything runs on engines with Annex B support though I understand that this is mostly the case. IMHO another compiler option is the only realistic way to solve this, unfortunately.

@graphemecluster
Copy link
Contributor

IMO, all of the "Octal escape sequences are not allowed" and "A decimal escape must refer to an existent capturing group" are probably indications of actual errors in user code.

Yes, I actually thought that there is a consensus on not allowing any octal escapes anywhere per #53198 😅

@nostalic
Copy link

Great work on adding validation for regexp!

We came across another regression on 5.5 for character class escape with script extensions that I did not see listed abover:

const regexpNonLatin = /\P{Script_Extensions=Latin}+/gu;

The issue seems specific to Script_Extensions and scx - Script is working fine. Same behavior is observed for \p and \P.

"٢".match(/\p{Script=Thaana}/u); // OK on 5.5
"٢".match(/\p{Script_Extensions=Thaana}/u); // KO on 5.5

// @ts-ignore can be used to work around the error, as hinted on #58295

Those regexps are part of the samples on https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Unicode_character_class_escape ; we use something similar in our codebase and faced this when pretesting our typescript upgrade.

Would it be possible to support script extension values in 5.5?

Related links:

@graphemecluster
Copy link
Contributor

@nostalic OMG, that’s totally my fault, I am very bad. I made it empty because the Script_Extensions section in PropertyValueAliases.txt shows nothing, without thinking much.
However, I don’t think the Team will have time to review PRs related to regular expressions in the immediate future; they even haven’t reviewed my short follow-up PRs yet 😅

@nostalic
Copy link

@graphemecluster This is a great improvement, and the regex validation helps to catch some issues we had, so thanks for implementing it!

The issue can be worked around and as such this is not a blocker for us, though it would be great to have it fixed in 5.5 🙂

@jakebailey
Copy link
Member

@nostalic OMG, that’s totally my fault, I am very bad. I made it empty because the Script_Extensions section in PropertyValueAliases.txt shows nothing, without thinking much. However, I don’t think the Team will have time to review PRs related to regular expressions in the immediate future; they even haven’t reviewed my short follow-up PRs yet 😅

Please do send things if you have them; I do think we want to get things looked at before 5.5 is branched off.

@jonnytest1
Copy link

jonnytest1 commented Jun 10, 2024

thoughts on this: since we already do regex group checking (as per release notes) shouldnt the resulting matchgroups be typed ?
image

(tried on playground with 5.5-beta)

@jakebailey
Copy link
Member

No, the type system does not special case regexes like this. (yet?)

@graphemecluster
Copy link
Contributor

Enabling further implementation of regex type checking is the most vital reason why I implemented regex syntax checking, and it’s gonna be the most exciting part 😆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Discussion Issues which may not have code impact
Projects
None yet
Development

No branches or pull requests

7 participants
@rbuckton @jakebailey @RyanCavanaugh @nostalic @jonnytest1 @graphemecluster and others