-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bulk metadata ARXIVNG-332 #155
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic! A couple of minor things in the docstring, but otherwise good to go (pending passing tests)
|
||
Parameters | ||
---------- | ||
arxiv_id : str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the docstring could use an update
def bulk_retrieve(self, document_ids: List[str]) -> Dict[str, DocMeta]: | ||
""" | ||
Retrieve metadata for an arXiv paper. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one too :-)
) from e | ||
|
||
if response.status_code not in \ | ||
[status.HTTP_200_OK, status.HTTP_206_PARTIAL_CONTENT]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking out loud: what would actually happen if we got a 206? Would we risk not being able to deserialize the content?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In which case it would be handled below, I suppose
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also thinking out loud here: we should consider adding a parameter to cap the number of document_ids
per request to the bulk_docmeta endpoint, similar to what id being done with bulk indexing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
No description provided.