-
Notifications
You must be signed in to change notification settings - Fork 553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some pointers for Bigquery::Storage append_rows #19093
Comments
I'm not an expert in the internal protobuf design, but as far as I can tell, One would think that there would be a way to construct the "communication" data from the "internal" data. Unfortunately, though I spent some time this morning digging into the protobuf source code, I wasn't able to find one accessible from Ruby. There might be C code in the internal C implementation of protobuf that does it, but the Ruby interface is quite well defined and does not, AFAICT, provide access to such code even if it exists. I'll try to verify internally with the protobuf team, though. That said, one place the DescriptorProto is present is actually the DSL that is used to define a message type. That DSL (defined here) actually builds a DescriptorProto, and then passes it down into the C code which then defines the internal data structures, including the internal Descriptor object. We can grab a copy of the DescriptorProto in the middle of the DSL, like so: # "Declare" this local variable here, so that it's scoped globally.
# If we don't do this, Ruby will limit the variable's scope to the DSL block
# below, and we won't be able to access it later.
listen_file_descriptor_proto = nil
Google::Protobuf::DescriptorPool.generated_pool.build do
add_file("listen.proto", :syntax => :proto2) do
add_message "MyRow" do
required :post_id, :int64, 1
required :body, :string, 2
required :timestamp, :string, 3
end
# Grab the FileDescriptorProto at the end of the add_file block, after the
# messages have been populated into it. The internal build() method
# normalizes and returns the file descriptor proto.
listen_file_descriptor_proto = build
end
end
# Retrieve the desired message DescriptorProto from the FileDescriptorProto.
# See https://github.com/protocolbuffers/protobuf/blob/main/src/google/protobuf/descriptor.proto
# for the structure of these objects. In this case, message_type is a repeated field containing
# the messages in the order they were added in the DSL.
my_row_descriptor_proto = listen_file_descriptor_proto.message_type.first
# You should be able to pass this to BigQuery storage:
schema = ::Google::Cloud::Bigquery::Storage::V1::ProtoSchema.new(
proto_descriptor: my_row_descriptor_proto
) This is NOT an official Google-supported answer. It is a hack that I think will work for now, while we investigate better solutions. |
@dazuma amazing, thank you. That makes more sense, and I was able to upload via Storage with this. Would love to see a more officially-supported solution, but happy for this issue to be closed unless you want to keep it open while we wait for that. |
This doesn't appear to work anymore, do we have any other way to do this? It seems as though there's currently no way to use the storage write api with the ruby library? |
I've been struggling to figure out using append_rows in Bigquery::Storage.
The AppendRowsRequest has a proto_rows -> writer_schema -> proto_descriptor property, which I've not managed to generate properly in the ruby sdk.
I've got a protobuf descriptor for my rows, looking something like:
and
MyRow.descriptor
returns an instance of Google::Protobuf::Descriptor.I've been trying a lot of variations on this -
but that raises, eg, " Invalid type Google::Protobuf::Descriptor to assign to submessage field 'proto_descriptor'".
I confess I'm a bit fuzzy on the distinction between Descriptor and DescriptorProto. Any chance you could give a hint or two?
The text was updated successfully, but these errors were encountered: