-
Notifications
You must be signed in to change notification settings - Fork 392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 [firestore-bigquery-export] Error initializing raw change log table #2134
Comments
This is strange to me, as we don't use the datatype Do you have any more info on your Clustering you're able to provide? |
Hi @cabljac, thanks for your quick reply; |
Was this schema solely generated by the extension? |
I don't really remember (the use of this extension has been in place for over 3 years) tbh, does it matter? clustering by JSON type fields is not at all possible in BQ. our two JSON fields (data, old_data) are therefore not part of the defined clustered fields. the error does not make sense to me. it seems somehow internally the data/old_data fields are used in such a way that is incompatible with JSON type. what also strikes me as odd, is the fact the errors seems to be randomly appearing, if it really was related to JSON types fields then I'd expect this error to occur for every firestore>bigquery event. but this is not the case. |
could this error be related to the fact we updated the extension to version 0.1.51? I'm unsure when we updated this extension however it must have been recently as this version was released on June 19th |
I'll have a look back at the release commits and see if anything could have caused this to start happening. Thanks for providing all this info by the way! I'll see if we can get this issue prioritised |
wondering if you have an update on this? |
a kind reminder 🙏🏾 do you have an update on a potential fix? |
apologies for chasing you once again, is there an update/ETA for this issue? |
Hi @boywijnmaalen, Thank you for providing the detailed context and information about the issue. Based on the schema and table info shared in the screenshots, it appears that clustering is configured correctly on supported fields (document_name, operation, and ingestion) and does not include JSON fields like data or old_data, which aligns with BigQuery's limitations. To help us further debug this issue, I have a few follow-up questions:
Looking forward to your response. Best regards, |
v0.1.56 was released ~2 weeks ago (as it contains a possible fix for another issue I raised). This version has been running on test/acceptance since it was released. last BQ error occurred Nov 16th, however after updating the extension the error mentioned in this thread occurred 13 times. search query;
No.
No, the errors seem to be totally random. I cannot deduct any patterns, nor do we have unexpected number of Firestore writes. our application is a no-peak-traffic app and is mostly busy during business hours (7 days a week).
Yes, the structure remains unchanged
While searching for the logs you requested I think I noticed some thing odd. I filtered out one of the errors and then searched for all the logs matching the error's execution ID.
I think I found the issue (encircled in red). it seems I added 'data' as a clustered field, this is a mistake, I know full well 'data' is a JSON field.
we're having this issue for a while. but it seems it only happens when we update the extension.
document is nested, but not too deep, max 2 levels, as a whole each document is ~2100 chars long when converted to JSON.
when I come to think of it, it seems to only occur when updating the extension.
while answering all your questions, I think this remark was related to the issue mentioned in #2133, not this issue, apologies.
There was actually missing data when manually looking for it. but this one is also related to the issue #2133.
Nothing odd from an application perspective, however, from time to time we do migrations in which we touch all documents but during these updates no issued arise.
already answered, seems totally random. Tomorrow morning (CEST) I'll reconfigure the extension, to not include the 'data' column for the clustered fields setting. I think the current value for this setting is most probably the issue we're seeing. will revert back once I re-configured the extension. for now a final question; if the errors are indeed related to updating the extension, why is it occurring 13 times and seemingly at no logical intervals (except for the last ~8 errors, which are 1h apart)? apologies for wasting your time, you certainly didn't waste mine, answering your questions helped me find the issue (well, it surely seems this is the issue) TBC |
No idea. I failed to reproduce the issue locally with the information you had provided, that is why I was trying to get more info.
Let me know. I hope it will solve it. ;) |
Hi @boywijnmaalen, Is the issue solved? |
I cannot tell for sure as these errors only occur when the extension is updated. There has not been a new release (that I am aware of) since our last communication. |
This could be due to exponential backoff settings in cloud tasks. I believe this table is initialised via an onConfigure or onInstall cloud task queue, and perhaps there's a maximum (1hr) to the backoff. I would have to investigate further to confirm |
an update to this: Looking into the code, i'm not sure how Either way, I have PRs to safeguard against this: With this change, a column with an invalid type specified in the clustering param will cause the extension not to update/add clustering on the table, and log a warning instead - similar to how the extension behaves currently if a specified clustering param does not exist in the schema |
yes, this is exactly what we did. turning it into a JSON object allowed us to reduce bytes read when selecting only specific properties from our objects. basically what happened is, we installed the extension, then two years passed, we turned the we have updated the extension config so it no longer includes the
that is perfect, that way it will become even more clear. thanks for your effort and taking the time to once more reply to this issue 👍🏾 |
Describe your configuration
BigQuery Dataset location: europe-****
BigQuery Project ID: ***
Collection path: ***
Enable Wildcard Column field with Parent Firestore Document IDs: true
Dataset ID: ***
Table ID: ***
BigQuery SQL table Time Partitioning option type: DAY
BigQuery Time Partitioning column name: timestamp
Firestore Document field name for BigQuery SQL Time Partitioning field option: timestamp
BigQuery SQL Time Partitioning table schema field(column) type: TIMESTAMP
BigQuery SQL table clustering: ***
Maximum number of synced documents per second: 100
Backup Collection Name: Parameter not set
Transform function URL: Parameter not set
Use new query syntax for snapshots: no
Exclude old data payloads: no
Use Collection Group query: no
Cloud KMS key name: Parameter not set
Describe the problem
in the last 30 days I can find 14 occurrence of this error:
Unhandled error Error: Error initializing raw change log table: Field data has type JSON, which is not supported for clustering.
I could find at least 1 (but won't rule out more) missing updates for Firestore documents in BQ.
fix: a manual edit of the FS doc in question resulted in a sync to BQ (as expected).
stack trace;
Unhandled error Error: Error initializing raw change log table: Field data has type JSON, which is not supported for clustering. at FirestoreBigQueryEventHistoryTracker.initialize (/workspace/node_modules/@firebaseextensions/firestore-bigquery-change-tracker/lib/bigquery/index.js:192:23) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) at async /workspace/lib/index.js:134:5 at async /workspace/node_modules/firebase-functions/lib/common/providers/tasks.js:74:17
Steps to reproduce:
unsure how to reproduce as I do not fully understand the problem
Expected result
to have every update for relevant Firestore docs synced to BQ
Actual result
missing update(s) for Firestore docs in BQ
The text was updated successfully, but these errors were encountered: