Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide API endpoint that returns just the schema #629

Closed
ben-roling opened this issue Sep 18, 2017 · 17 comments
Closed

Provide API endpoint that returns just the schema #629

ben-roling opened this issue Sep 18, 2017 · 17 comments

Comments

@ben-roling
Copy link

I would like to have a table in the Hive Metastore that points avro.schema.url to a schema in the Schema Registry. Unfortunately, it appears I cannot do this as there is no endpoint that will return just the schema.

For example, I would like to set avro.schema.url = "http://my-confluent-schema-registry/subjects/my-dataset/versions/latest", but this doesn't work since the schema is buried down in a "schema" attribute in the response body.

I'm wondering if the Schema Registry could be enhanced to add either a request parameter or a new endpoint altogether where the response body would be purely the schema itself?

One such way would be to like this, where I've added a "mode=raw" parameter:
/subjects/my-dataset/versions/latest?mode=raw

Issue #436 and confluentinc/kafka-connect-hdfs#145 talk about this, but suggest writing the schema somewhere else. It seems if the schema could be referred to directly via a URI in the registry, then that would be unnecessary.

I specifically mention /versions/latest, as it could mean the Hive Metastore table automatically interprets the data using the latest schema. This seems like a nice bonus feature on top of not having to store the schema another place.

@ben-roling
Copy link
Author

@mageshn
Copy link
Member

mageshn commented Oct 6, 2017

@ben-roling this would certainly be an valuable addition. From the design perspective, I would like to handle the response type through headers than a query parameter. Would you be interested in taking this forward?

@ben-roling
Copy link
Author

@mageshn Thanks for getting back to me. I am interested to take this forward, but if the response type is to be specified with a header it is going to pose a bit of difficulty. The AvroSerde in Hive doesn't support specifying headers. All that is available is to specify the avro.schema.url. If you're not familiar, have a look over the documentation.

Are you willing to reconsider a query parameter or a different URI path?

I would prefer not to do this in a way that requires changes to both this project and the Hive project.

@ben-roling
Copy link
Author

There is an alternative here, which is that I could pursue a change to the AvroSerDe to make this enhancement to the schema registry unnecessary.

For example, I could seek the addition of a SerDe property named avro.schema.url.response.element (or similar) where the property value identifies the element in the response that contains the schema. Then, when using the SerDe with the confluent schema registry, that property could be set to "schema".

Let me know if you have an opinion about which is the best route to go.

@mageshn
Copy link
Member

mageshn commented Oct 9, 2017

@ben-roling Thanks for driving this so far. If there is an option to make a change in the Serde in Hive that would work great. If not, we can certainly work towards a solution without breaking any elegancy :)

@ben-roling
Copy link
Author

Thanks @mageshn . I'll see what the Hive community thinks of the SerDe change and see where things go from there.

@sjdurfey
Copy link
Contributor

I wanted to revive this discussion. There wasn't much discussion with Hive around supporting the existing end point, but what was there didn't show much desire to support it.

As @ben-roling mentioned, a header wouldn't help as the URL provided to the hive AvroSerDe needs to be explicit, and we don't have control over the http request for a schema. A query param could work, but isn't at the top of my list of solutions (although it would be easy), as it would be multiple different return types for the REST API. I think a new endpoint would make sense in this case (e.g. /subjects/<subject>/versions/latest/schema). But I would be happy to implement either if there is a strong consensus one way or the other.

@sjdurfey
Copy link
Contributor

@mageshn, do you have any thoughts on a particular approach for retrieving just the schema?

@ben-roling
Copy link
Author

Hey @mageshn we're still interested in getting a change for this through. If you could help us figure out the best design we'd be happy to submit a PR with the change.

@mageshn
Copy link
Member

mageshn commented Dec 6, 2017

I would prefer to take the route of a separate endpoint for this with appropriate headers ( event if its default)

@dadleyy
Copy link

dadleyy commented Dec 7, 2017

is this a duplicate of issue 381: Endpoint with schema’s raw JSON?

@sjdurfey
Copy link
Contributor

sjdurfey commented Dec 7, 2017

@dadleyy, yeah, this does appear to be a duplicate of #381

@mageshn as far as headers are concerned, is there something you had in mind? I was just envisioning an endpoint like /subjects/<subject>/versions/latest/schema that returned the schema as a string, and didn't do anything else.

@mageshn
Copy link
Member

mageshn commented Dec 7, 2017

@sjdurfey that should be good

@sjdurfey
Copy link
Contributor

sjdurfey commented Dec 8, 2017

@mageshn, cool. I sent the PR for this change

@OneCricketeer
Copy link
Contributor

@mageshn Will this be in 4.1? And can this issue and #381 be closed?

I'm tempted on making another issue for a query param of ?pretty, though.

@ghost
Copy link

ghost commented Mar 20, 2020

Avro Hive table DDL now works with /schema endpoint!

CREATE EXTERNAL TABLE foo
STORED AS AVRO
LOCATION
  'foo_location'
TBLPROPERTIES (
  'avro.schema.url'='http://schema_registry_url/subjects/foo-value/versions/latest/schema');

baumac pushed a commit to baumac/schema-registry that referenced this issue Oct 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants