Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C++ std::regex syntax incompatibility #841

Open
lemire opened this issue Jan 15, 2025 · 1 comment
Open

C++ std::regex syntax incompatibility #841

lemire opened this issue Jan 15, 2025 · 1 comment

Comments

@lemire
Copy link
Member

lemire commented Jan 15, 2025

std::regex in C++ supports a modified ECMAScript syntax which differs from the actual ECMAScript regex syntax, causing compatibility issues with JavaScript-like patterns, notably in URL pattern matching.

This problem arises in the tests of following PR: Node.js Pull Request #56452

The implementation of std::regex rejects certain regex constructs that are valid in JavaScript ECMAScript syntax:

Example Pattern: "/([\\d&&[0-1]])"

Reproducible Example:

{
  "pattern": [{ "pathname": "/([\\d&&[0-1]])" }],
  "inputs": [{ "pathname": "/0" }],
  "expected_match": {
    "pathname": { "input": "/0", "groups": { "0": "0" } }
  }
}

C++ Code to Simulate Failure:

#include <iostream>
#include <regex>
#include "ada.h" // Assuming ada is a library for URL parsing

int main() {
    ada::url_pattern_init init{.pathname = "/([\\d&&[0-1]])"};
    auto url_pattern = ada::parse_url_pattern(init, nullptr, nullptr);
    if (!url_pattern) {
        std::cout << "URL pattern parsing failure" << std::endl;
    }

    // Simplified regex test
    try {
        std::regex regexPattern("[[0-1]]");
    } catch (const std::regex_error& e) {
        std::cout << "Regex construction error: " << e.what() << std::endl;
    }

    return 0;
}

The regex is somewhat confusing: [[0-1] is a character class made of three characters. Followed by ].

JavaScript handles this by interpreting the final ] as a literal character as if it does not close a valid character class.

C++ rejects the regex.

To achieve the intended regex behavior in C++, the pattern needs to be adjusted:

std::regex regexPattern("[[0-1]\\]");

This discrepancy can lead to failures when C++ and JavaScript regex compatibility is assumed or required

@lemire lemire changed the title C++ std::regex Syntax Incompatibility C++ std::regex syntax incompatibility Jan 15, 2025
@anonrig
Copy link
Member

anonrig commented Jan 17, 2025

I recommend keeping std::regex solution and on top of that using the regex engine proposal for others to properly support ecmascript regex through the plugin support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants