Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for join expression in array scans #130

Closed
wants to merge 2 commits into from

Conversation

hannes-ucsc
Copy link

This query

SELECT links_id, version, JSON_EXTRACT_SCALAR(link_output, "$.output_id") AS output_id
FROM test_project.snapshot.links AS links
    JOIN UNNEST(JSON_EXTRACT_ARRAY(links.content, '$.links')) AS content_links
        ON JSON_EXTRACT_SCALAR(content_links, '$.link_type') = 'process_link'
    JOIN UNNEST(JSON_EXTRACT_ARRAY(content_links, '$.outputs')) AS link_output
        ON JSON_EXTRACT_SCALAR(link_output, "$.output_id") IN UNNEST(['8f8b9587-237f-4995-9461-c96eac53d615'])

when run in bigquery-emulator against a table whose content column is JSON with indirectly nested arrays, segfaults with

2024-01-11 09:18:35 2024-01-11T17:18:35.253Z    INFO    contentdata/repository.go:167           {"query": "\n            SELECT links_id, version, JSON_EXTRACT_SCALAR(link_output, \"$.output_id\") AS output_id\n            FROM test_project.snapshot.links AS links\n                JOIN UNNEST(JSON_EXTRACT_ARRAY(links.content, '$.links')) AS content_links\n                    ON JSON_EXTRACT_SCALAR(content_links, '$.link_type') = 'process_link'\n                JOIN UNNEST(JSON_EXTRACT_ARRAY(content_links, '$.outputs')) AS link_output\n                    ON JSON_EXTRACT_SCALAR(link_output, \"$.output_id\") IN UNNEST(['8f8b9587-237f-4995-9461-c96eac53d615'])\n        ", "values": []}
2024-01-11 09:18:35 panic: runtime error: invalid memory address or nil pointer dereference
2024-01-11 09:18:35 [signal SIGSEGV: segmentation violation code=0x1 addr=0x70 pc=0x154abf4]
2024-01-11 09:18:35 
2024-01-11 09:18:35 goroutine 2206 [running]:
2024-01-11 09:18:35 github.com/goccy/go-zetasqlite/internal.RegisterFunctions.func2({0x2ab1660?, 0x4e2f540?})
2024-01-11 09:18:35     /go/pkg/mod/github.com/goccy/[email protected]/internal/function_register.go:430 +0x34
2024-01-11 09:18:35 reflect.Value.call({0x2b311e0?, 0x2ef1378?, 0xc000e98340?}, {0x2e58683, 0x4}, {0xc000c8d008, 0x1, 0x4775fa?})
2024-01-11 09:18:35     /usr/local/go/src/reflect/value.go:596 +0xce7
2024-01-11 09:18:35 reflect.Value.Call({0x2b311e0?, 0x2ef1378?, 0x1?}, {0xc000c8d008?, 0x1?, 0x14b3320?})
2024-01-11 09:18:35     /usr/local/go/src/reflect/value.go:380 +0xb9
2024-01-11 09:18:35 github.com/mattn/go-sqlite3.(*functionInfo).Call(0xc000579000, 0x0?, {0x7fff88104a68?, 0x0?, 0x0?})
2024-01-11 09:18:35     /go/pkg/mod/github.com/mattn/[email protected]/sqlite3.go:407 +0x75
2024-01-11 09:18:35 github.com/mattn/go-sqlite3.callbackTrampoline(0x0?, 0x1, 0x7fff88104a68)
2024-01-11 09:18:35     /go/pkg/mod/github.com/mattn/[email protected]/callback.go:39 +0x59
2024-01-11 09:18:35 github.com/mattn/go-sqlite3._Cfunc__sqlite3_step_internal(0x7fff880f2db8)
2024-01-11 09:18:35     _cgo_gotypes.go:367 +0x47
2024-01-11 09:18:35 github.com/mattn/go-sqlite3.(*SQLiteRows).nextSyncLocked.func1(0xc000bac158?)
2024-01-11 09:18:35     /go/pkg/mod/github.com/mattn/[email protected]/sqlite3.go:2186 +0x45
2024-01-11 09:18:35 github.com/mattn/go-sqlite3.(*SQLiteRows).nextSyncLocked(0xc001234f60, {0xc000d446f0, 0x3, 0x30dd5d8?})
2024-01-11 09:18:35     /go/pkg/mod/github.com/mattn/[email protected]/sqlite3.go:2186 +0x37
2024-01-11 09:18:35 github.com/mattn/go-sqlite3.(*SQLiteRows).Next.func1()
2024-01-11 09:18:35     /go/pkg/mod/github.com/mattn/[email protected]/sqlite3.go:2167 +0x2c
2024-01-11 09:18:35 created by github.com/mattn/go-sqlite3.(*SQLiteRows).Next in goroutine 2097
2024-01-11 09:18:35     /go/pkg/mod/github.com/mattn/[email protected]/sqlite3.go:2166 +0x189

The first commit in this PR addresses that. The second commit adds support for the ON <joinExpr> clause to array scans such as used in the above query.

I don't know Go and am not familiar with this extremely useful project so I present this PR more as an elaborate issue report as opposed to something I think should or can be merged as is. It probably needs work but it makes my use cases work.

@codecov-commenter
Copy link

codecov-commenter commented Jan 13, 2024

Codecov Report

Merging #130 (93ea9da) into main (7d3a82f) will decrease coverage by 0.01%.
The diff coverage is 43.75%.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #130      +/-   ##
==========================================
- Coverage   45.82%   45.81%   -0.01%     
==========================================
  Files          47       47              
  Lines       17800    17814      +14     
==========================================
+ Hits         8156     8161       +5     
- Misses       8182     8189       +7     
- Partials     1462     1464       +2     

@hannes-ucsc hannes-ucsc changed the title Add support for join expr in array scans Add support for join expression in array scans Jan 14, 2024
@hannes-ucsc hannes-ucsc reopened this Jan 16, 2024
@ohaibbq
Copy link
Contributor

ohaibbq commented Jan 16, 2024

Thanks for your contribution @hannes-ucsc!

Could you add some test cases to query_test.go that specifies the expected output for this query?

WITH links AS (
  SELECT 1 as links_id, 'v1' as version, JSON '{"links": [{"link_type": "process_link", "outputs": [{"output_id": "8f8b9587-237f-4995-9461-c96eac53d615"}]}]}' AS content
)
SELECT link_output FROM links
    JOIN UNNEST(JSON_EXTRACT_ARRAY(links.content, '$.links')) AS content_links
        ON JSON_EXTRACT_SCALAR(content_links, '$.link_type') = 'process_link'
    JOIN UNNEST(JSON_EXTRACT_ARRAY(content_links, '$.outputs')) AS link_output
        ON JSON_EXTRACT_SCALAR(link_output, "$.output_id") IN UNNEST(['8f8b9587-237f-4995-9461-c96eac53d615'])

@hannes-ucsc
Copy link
Author

May I ask what your involvement is with this project? I'd like to make reasonably sure that further work by me is justified by a the likely prospect of it actually being merged.

@ohaibbq
Copy link
Contributor

ohaibbq commented Jan 17, 2024

Fair question. I'm just another user of the library but will be starting to contribute actively this quarter. We maintain a fork at Recidiviz/go-zetasqlite

@goccy
Copy link
Owner

goccy commented Jan 30, 2024

@hannes-ucsc ( cc @ohaibbq )
As noted in the comments, please add test cases. Also, when the PR is no longer in Draft status, it will be reviewed.

@goccy
Copy link
Owner

goccy commented Mar 9, 2024

Since there is no response, this PR is closed.

@ohaibbq If you can support this problem, I would appreciate your supports 🙏

@goccy goccy closed this Mar 9, 2024
@ohaibbq
Copy link
Contributor

ohaibbq commented Mar 9, 2024

I had forgotten about this PR and provided a fully implemented fix in #180 as it was causing a failing unit test in my other project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants