-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide API endpoint that returns just the schema #629
Comments
https://stackoverflow.com/questions/44386451/how-to-integrate-hive-avro-tables-with-schema-registry describes this same problem. |
@ben-roling this would certainly be an valuable addition. From the design perspective, I would like to handle the response type through headers than a query parameter. Would you be interested in taking this forward? |
@mageshn Thanks for getting back to me. I am interested to take this forward, but if the response type is to be specified with a header it is going to pose a bit of difficulty. The AvroSerde in Hive doesn't support specifying headers. All that is available is to specify the avro.schema.url. If you're not familiar, have a look over the documentation. Are you willing to reconsider a query parameter or a different URI path? I would prefer not to do this in a way that requires changes to both this project and the Hive project. |
There is an alternative here, which is that I could pursue a change to the AvroSerDe to make this enhancement to the schema registry unnecessary. For example, I could seek the addition of a SerDe property named avro.schema.url.response.element (or similar) where the property value identifies the element in the response that contains the schema. Then, when using the SerDe with the confluent schema registry, that property could be set to "schema". Let me know if you have an opinion about which is the best route to go. |
@ben-roling Thanks for driving this so far. If there is an option to make a change in the Serde in Hive that would work great. If not, we can certainly work towards a solution without breaking any elegancy :) |
Thanks @mageshn . I'll see what the Hive community thinks of the SerDe change and see where things go from there. |
I wanted to revive this discussion. There wasn't much discussion with Hive around supporting the existing end point, but what was there didn't show much desire to support it. As @ben-roling mentioned, a header wouldn't help as the URL provided to the hive AvroSerDe needs to be explicit, and we don't have control over the http request for a schema. A query param could work, but isn't at the top of my list of solutions (although it would be easy), as it would be multiple different return types for the REST API. I think a new endpoint would make sense in this case (e.g. |
@mageshn, do you have any thoughts on a particular approach for retrieving just the schema? |
Hey @mageshn we're still interested in getting a change for this through. If you could help us figure out the best design we'd be happy to submit a PR with the change. |
I would prefer to take the route of a separate endpoint for this with appropriate headers ( event if its default) |
is this a duplicate of issue |
@sjdurfey that should be good |
@mageshn, cool. I sent the PR for this change |
Avro Hive table DDL now works with
|
Co-authored-by: Jan Werner <[email protected]>
I would like to have a table in the Hive Metastore that points avro.schema.url to a schema in the Schema Registry. Unfortunately, it appears I cannot do this as there is no endpoint that will return just the schema.
For example, I would like to set avro.schema.url = "http://my-confluent-schema-registry/subjects/my-dataset/versions/latest", but this doesn't work since the schema is buried down in a "schema" attribute in the response body.
I'm wondering if the Schema Registry could be enhanced to add either a request parameter or a new endpoint altogether where the response body would be purely the schema itself?
One such way would be to like this, where I've added a "mode=raw" parameter:
/subjects/my-dataset/versions/latest?mode=raw
Issue #436 and confluentinc/kafka-connect-hdfs#145 talk about this, but suggest writing the schema somewhere else. It seems if the schema could be referred to directly via a URI in the registry, then that would be unnecessary.
I specifically mention /versions/latest, as it could mean the Hive Metastore table automatically interprets the data using the latest schema. This seems like a nice bonus feature on top of not having to store the schema another place.
The text was updated successfully, but these errors were encountered: