Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing custom extensions, length function #171

Open
joepatol opened this issue May 23, 2024 · 2 comments
Open

Writing custom extensions, length function #171

joepatol opened this issue May 23, 2024 · 2 comments

Comments

@joepatol
Copy link

Hi,

I have a use-case that doesn't seem to be possible with the current implementation.

Suppose I have some data looking like this:

data = {
    "Items": [
        {"val": "foo", "filter": "a"},
        {"val": "bar", "filter": "a"},
        {"val": "baz", "filter": "b"},
    ]
}

Now, I want to get the number of Items where filter == "a" (=2). I used the len function but that returns the length of the current objects not the array they are part of.

  query = '$.Items[?filter == "a"].`len`'
  jsonpath_expr: JSONPath = parse(query)
  result = jsonpath_expr.find(data)

  print(result)

# [DatumInContext(value=2, path=Len(), context=None), DatumInContext(value=2, path=Len(), context=None)]

After some research, this seems to be the expected behavior of JSONPath. I did see some other implementations which provide a length() function, which would be able to do what I want. That seems not to be supported by this library.

Is there a way to achieve what I want with the current implementation in this library?

The readme states: "More generally, this syntax allows "named operators" to extend JSONPath is arbitrary ways", which makes me think I should be able to extend the implementation with my own function. That'd also work for me, however I can't seem to find any documentation on how to write extensions for the library.

What is the recommended way, if any, to write custom extensions (like a length function) for this library.

Thanks for helping!

@jg-rp
Copy link

jg-rp commented May 23, 2024

Even with a custom named operator (like .`len`), I think you'll struggle to address filter (like [?filter == "a"]) results as a single sequence to be able to count them. Internally, the find() method of each selector is called once for each datum (analogous to "Value" or "Node" from RFC 9535), without the option to reference a list of intermediate results (like "Nodelist" in RFC 9535).

The following - somewhat hacky - example works around this by defining a "flat filter" operator (?* instead of ?), then uses len as normal.

from jsonpath_ng.ext.parser import ExtentedJsonPathParser as ExtendedJsonPathParser
from jsonpath_ng.ext.parser import ExtendedJsonPathLexer
from jsonpath_ng.ext.filter import Filter
from jsonpath_ng.jsonpath import DatumInContext


class MyJSONPathLexer(ExtendedJsonPathLexer):
    """An extended lexer with a "flat" filter operator."""

    tokens = ["FLAT_FILTER"] + ExtendedJsonPathLexer.tokens
    t_FLAT_FILTER = r"\?\*?"


class MyJSONPathParser(ExtendedJsonPathParser):
    """An extended parser with a "flat" filter operator."""

    tokens = MyJSONPathLexer.tokens

    def __init__(self, debug=False):
        super().__init__(debug, MyJSONPathLexer)

    def p_filter(self, p):
        "filter : FLAT_FILTER expressions"
        if p[1] == "?":
            p[0] = Filter(p[2])
        else:
            p[0] = FlatFilter(p[2])


class FlatFilter(Filter):
    def find(self, data):
        return [DatumInContext([d.value for d in super().find(data)])]


def parse(path, debug=False):
    return MyJSONPathParser(debug=debug).parse(path)


if __name__ == "__main__":
    data = {
        "Items": [
            {"val": "foo", "filter": "a"},
            {"val": "bar", "filter": "a"},
            {"val": "baz", "filter": "b"},
        ]
    }

    query = '$.Items[?*filter == "a"].`len`'
    jsonpath_expr = parse(query)
    result = jsonpath_expr.find(data)
    print([r.value for r in result])  # [2]

Note that RFC 9535 does not handle this use case either. You'd need to use Python's len() function on the query results to retrieve the number of values matched by the filter.

@michaelmior
Copy link
Collaborator

michaelmior commented Oct 11, 2024

@joepatol Note that if you are willing to modify your path, you could add `parent` right before `len` to get what I think is your desired behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants