-
Notifications
You must be signed in to change notification settings - Fork 3.2k
[SchemaRegistry] fix avro type leaks in serializer #21004
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
/azp run python - schemaregistry - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
| if self._auto_register_schemas | ||
| else self._schema_registry_client.get_schema_id | ||
| ) | ||
| self._user_input_schema_cache = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cache topic: do we need this cache since we now use lru_cache?
as OOB discussion: this is to store schema -> normalized schema pair.
I think we could either remove this cache first or wrap the logic with a lru_cache decorated method.
...maregistry-avroserializer/azure/schemaregistry/serializer/avroserializer/_avro_serializer.py
Show resolved
Hide resolved
| :ivar message: The error message. | ||
| :vartype message: str | ||
| :ivar error: The error condition, if available. | ||
| :vartype error: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why "error" ivar here? since the AzureError already has inner_exception ivar
| def __init__(self, message, **kwargs): | ||
| self.message = message | ||
| self.error = kwargs.get("error") | ||
| super(SchemaParseException, self).__init__(self.message, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could remove the __init__ method completely if there's no reason to have error ivar different than the inner_exception
| # -------------------------------------------------------------------------- | ||
| from azure.core.exceptions import AzureError | ||
|
|
||
| class SchemaParseException(AzureError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name sounds very generic.
Do we want this error to be library specific or a general one that could be shared among different serializer libraries in the future -- which means it's to be placed under schema registry client library like azure.schemaregistry.serializer.exceptions.SchemaParseException.
I'm leaning towards to having a common shared error, less types
| cached_schema.type | ||
| ) | ||
| ) | ||
| data_bytes = self._avro_serializer.serialize(value, cached_schema) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be safe here, we could abstract the underlying avro serializer library
- parse schema
- _fast_avro_serializer.parse_schema, normal_avro_serializer.parse_schema
- try:
_fast_avro_serializer.parse_schema()
exception Exception:
raise SchemaParseException()
- serialize
try:
_fast_avro_serializer.serialize()
except Exception:
SerializationException() - deserialize
DeserializationException
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could make a matrix for comparison:
fastavro avro
missing property: raising Keyerror raise SchemaParseException
wrong format: ...
but I'm afraid it would take much time to traverse all the cases (especially comers cases which are difficult to think about, and may not be worth the efforts)
fixes: #20818