Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subgraph stitching query crashes BQ emulator #5847

Closed
hannes-ucsc opened this issue Jan 11, 2024 · 3 comments
Closed

Subgraph stitching query crashes BQ emulator #5847

hannes-ucsc opened this issue Jan 11, 2024 · 3 comments
Assignees
Labels
+ [priority] High bug [type] A defect preventing use of the system as specified no demo [process] Not to be demonstrated at the end of the sprint orange [process] Done by the Azul team spike:8 [process] Spike estimate of eight points test [subject] Unit and integration test code

Comments

@hannes-ucsc
Copy link
Member

hannes-ucsc commented Jan 11, 2024

With this patch …

Index: test/docker_container_test_case.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/test/docker_container_test_case.py b/test/docker_container_test_case.py
--- a/test/docker_container_test_case.py	(revision 0ed280fc5d28b56d582997e1187d0507aa2c4438)
+++ b/test/docker_container_test_case.py	(date 1705039000741)
@@ -79,7 +79,7 @@
         ports = None if is_sibling else {container_port: ('127.0.0.1', None)}
         container = cls._docker.containers.run(image,
                                                detach=True,
-                                               auto_remove=True,
+                                               auto_remove=False,
                                                ports=ports,
                                                **kwargs)
         try:
Index: test/indexer/test_tdr.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/test/indexer/test_tdr.py b/test/indexer/test_tdr.py
--- a/test/indexer/test_tdr.py	(revision 0ed280fc5d28b56d582997e1187d0507aa2c4438)
+++ b/test/indexer/test_tdr.py	(date 1705079815796)
@@ -308,28 +308,17 @@
         with self.assertRaises(RequirementError):
             self._test_fetch_bundle(bundle, load_tables=False)
 
-    @patch('azul.plugins.repository.tdr_hca.Plugin._find_upstream_bundles')
-    def test_subgraph_stitching(self, _mock_find_upstream_bundles):
+    def test_subgraph_stitching(self):
         downstream_uuid = '4426adc5-b3c5-5aab-ab86-51d8ce44dfbe'
         upstream_uuids = [
             'b0c2c714-45ee-4759-a32b-8ccbbcf911d4',
             'bd4939c1-a078-43bd-8477-99ae59ceb555',
         ]
-        # FIXME: Fix the crash in bigquery-emulator and remove the mock
-        #        https://github.com/DataBiosphere/azul/issues/5847
-        _mock_find_upstream_bundles.side_effect = [
-            {SourcedBundleFQID(source=self.source,
-                               uuid=uuid,
-                               version='2020-08-10T21:24:26.174274Z')}
-            for uuid in upstream_uuids
-        ]
         bundle = self._load_canned_bundle(SourcedBundleFQID(source=self.source,
                                                             uuid=downstream_uuid,
                                                             version='2020-08-10T21:24:26.174274Z'))
         assert any(e['is_stitched'] for e in bundle.manifest)
         self._test_fetch_bundle(bundle, load_tables=True)
-        self.assertEqual(_mock_find_upstream_bundles.call_count,
-                         len(upstream_uuids))
 
     def _test_fetch_bundle(self,
                            test_bundle: TDRHCABundle,

test_subgraph_stitching hangs with the output …

2024-01-11 09:18:34,010    INFO MainThread test.docker_container_test_case: Launched container lucid_hertz from image 465330168186.dkr.ecr.us-east-1.amazonaws.com/ghcr.io/goccy/bigquery-emulator:0.4.4, with container port 9050 mapped to 127.0.0.1:51782 on the host
2024-01-11 09:18:34,970   DEBUG MainThread azul.plugins.repository.tdr_hca: Retrieving 1 entities of type 'links' ...
2024-01-11 09:18:34,970   DEBUG MainThread azul.terra: Query (338 characters total): '\n            SELECT links_id, BYTE_LENGTH(content) AS content_size, JSON_EXTRACT_SCALAR(content, "$.schema_type") AS schema_type, project_id, version, content\n            FROM test_project.snapshot.links\n            WHERE (links_id, version) IN ((\'4426adc5-b3c5-5aab-ab86-51d8ce44dfbe\', TIMESTAMP(\'2020-08-10T21:24:26.174274Z\')))\n        '
2024-01-11 09:18:35,122   DEBUG MainThread azul.terra: Job info: {"job_id": "0c64c459-1f77-4127-9e80-d83a20cb0a74", "stats": {"totalBytesBilled": "3881", "totalBytesProcessed": "3881"}, "query": "\n            SELECT links_id, BYTE_LENGTH(content) AS content_size, JSON_EXTRACT_SCALAR(content, \"$.schema_type\") AS schema_type, project_id, version, content\n            FROM test_project.snapshot.links\n            WHERE (links_id, version) IN (('4426adc5-b3c5-5aab-ab86-51d8ce44dfbe', TIMESTAMP('2020-08-10T21:24:26.174274Z')))\n        "}
2024-01-11 09:18:35,244   DEBUG MainThread azul.plugins.repository.tdr_hca: Retrieved 3 entities of type 'links'
2024-01-11 09:18:35,244    INFO MainThread azul.plugins.repository.tdr_hca: There are 1 dangling inputs in bundle SourcedBundleFQID(uuid='4426adc5-b3c5-5aab-ab86-51d8ce44dfbe', version='2020-08-10T21:24:26.174274Z', source=SourceRef(id='cafebabe-feed-4bad-dead-beaf8badf00d', spec=TDRSourceSpec(prefix=Prefix(common='', partition=2), project='test_project', name='snapshot', is_snapshot=True)))
2024-01-11 09:18:35,244   DEBUG MainThread azul.plugins.repository.tdr_hca: Dangling inputs in bundle SourcedBundleFQID(uuid='4426adc5-b3c5-5aab-ab86-51d8ce44dfbe', version='2020-08-10T21:24:26.174274Z', source=SourceRef(id='cafebabe-feed-4bad-dead-beaf8badf00d', spec=TDRSourceSpec(prefix=Prefix(common='', partition=2), project='test_project', name='snapshot', is_snapshot=True))): {EntityReference(entity_type='sequence_file', entity_id='8f8b9587-237f-4995-9461-c96eac53d615')}
2024-01-11 09:18:35,244   DEBUG MainThread azul.terra: Query (557 characters total): '\n            SELECT links_id, version, JSON_EXTRACT_SCALAR(link_output, "$.output_id") AS output_id\n            FROM test_project.snapshot.links AS links\n                JOIN UNNEST(JSON_EXTRACT_ARRAY(links.content, \'$.links\')) AS content_links\n                    ON JSON_EXTRACT_SCALAR(content_links, \'$.link_type\') = \'process_link\'\n                JOIN UNNEST(JSON_EXTRACT_ARRAY(content_links, \'$.outputs\')) AS link_output\n                    ON JSON_EXTRACT_SCALAR(link_output, "$.output_id") IN UNNEST([\'8f8b9587-237f-4995-9461-c96eac53d615\'])\n        '

… while the container segfaults with …

2024-01-11 09:18:35 2024-01-11T17:18:35.253Z    INFO    contentdata/repository.go:167           {"query": "\n            SELECT links_id, version, JSON_EXTRACT_SCALAR(link_output, \"$.output_id\") AS output_id\n            FROM test_project.snapshot.links AS links\n                JOIN UNNEST(JSON_EXTRACT_ARRAY(links.content, '$.links')) AS content_links\n                    ON JSON_EXTRACT_SCALAR(content_links, '$.link_type') = 'process_link'\n                JOIN UNNEST(JSON_EXTRACT_ARRAY(content_links, '$.outputs')) AS link_output\n                    ON JSON_EXTRACT_SCALAR(link_output, \"$.output_id\") IN UNNEST(['8f8b9587-237f-4995-9461-c96eac53d615'])\n        ", "values": []}
2024-01-11 09:18:35 panic: runtime error: invalid memory address or nil pointer dereference
2024-01-11 09:18:35 [signal SIGSEGV: segmentation violation code=0x1 addr=0x70 pc=0x154abf4]
2024-01-11 09:18:35 
2024-01-11 09:18:35 goroutine 2206 [running]:
2024-01-11 09:18:35 github.com/goccy/go-zetasqlite/internal.RegisterFunctions.func2({0x2ab1660?, 0x4e2f540?})
2024-01-11 09:18:35     /go/pkg/mod/github.com/goccy/[email protected]/internal/function_register.go:430 +0x34
2024-01-11 09:18:35 reflect.Value.call({0x2b311e0?, 0x2ef1378?, 0xc000e98340?}, {0x2e58683, 0x4}, {0xc000c8d008, 0x1, 0x4775fa?})
2024-01-11 09:18:35     /usr/local/go/src/reflect/value.go:596 +0xce7
2024-01-11 09:18:35 reflect.Value.Call({0x2b311e0?, 0x2ef1378?, 0x1?}, {0xc000c8d008?, 0x1?, 0x14b3320?})
2024-01-11 09:18:35     /usr/local/go/src/reflect/value.go:380 +0xb9
2024-01-11 09:18:35 github.com/mattn/go-sqlite3.(*functionInfo).Call(0xc000579000, 0x0?, {0x7fff88104a68?, 0x0?, 0x0?})
2024-01-11 09:18:35     /go/pkg/mod/github.com/mattn/[email protected]/sqlite3.go:407 +0x75
2024-01-11 09:18:35 github.com/mattn/go-sqlite3.callbackTrampoline(0x0?, 0x1, 0x7fff88104a68)
2024-01-11 09:18:35     /go/pkg/mod/github.com/mattn/[email protected]/callback.go:39 +0x59
2024-01-11 09:18:35 github.com/mattn/go-sqlite3._Cfunc__sqlite3_step_internal(0x7fff880f2db8)
2024-01-11 09:18:35     _cgo_gotypes.go:367 +0x47
2024-01-11 09:18:35 github.com/mattn/go-sqlite3.(*SQLiteRows).nextSyncLocked.func1(0xc000bac158?)
2024-01-11 09:18:35     /go/pkg/mod/github.com/mattn/[email protected]/sqlite3.go:2186 +0x45
2024-01-11 09:18:35 github.com/mattn/go-sqlite3.(*SQLiteRows).nextSyncLocked(0xc001234f60, {0xc000d446f0, 0x3, 0x30dd5d8?})
2024-01-11 09:18:35     /go/pkg/mod/github.com/mattn/[email protected]/sqlite3.go:2186 +0x37
2024-01-11 09:18:35 github.com/mattn/go-sqlite3.(*SQLiteRows).Next.func1()
2024-01-11 09:18:35     /go/pkg/mod/github.com/mattn/[email protected]/sqlite3.go:2167 +0x2c
2024-01-11 09:18:35 created by github.com/mattn/go-sqlite3.(*SQLiteRows).Next in goroutine 2097
2024-01-11 09:18:35     /go/pkg/mod/github.com/mattn/[email protected]/sqlite3.go:2166 +0x189
@hannes-ucsc hannes-ucsc added the orange [process] Done by the Azul team label Jan 11, 2024
hannes-ucsc added a commit that referenced this issue Jan 11, 2024
hannes-ucsc added a commit that referenced this issue Jan 11, 2024
hannes-ucsc added a commit that referenced this issue Jan 11, 2024
@achave11-ucsc achave11-ucsc added bug [type] A defect preventing use of the system as specified test [subject] Unit and integration test code spike:8 [process] Spike estimate of eight points + [priority] High labels Jan 12, 2024
@hannes-ucsc
Copy link
Member Author

I filed a PR against upstream. Let's see if it gets picked up.

@hannes-ucsc hannes-ucsc removed their assignment Jan 13, 2024
@achave11-ucsc
Copy link
Member

achave11-ucsc commented Jan 16, 2024

@hannes-ucsc: "PR was not picked up immediately so we may need to go with forking upstream longer term."

@hannes-ucsc
Copy link
Member Author

hannes-ucsc commented Jan 17, 2024

Someone on the PR asked for a test case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
+ [priority] High bug [type] A defect preventing use of the system as specified no demo [process] Not to be demonstrated at the end of the sprint orange [process] Done by the Azul team spike:8 [process] Spike estimate of eight points test [subject] Unit and integration test code
Projects
None yet
Development

No branches or pull requests

2 participants