-
Notifications
You must be signed in to change notification settings - Fork 15.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add support for arxiv identifier in ArxivAPIWrapper() #9318
Conversation
ArxivAPIWrapper
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
lgtm, cc @leo-gan |
I've tried I don't think we need special treatment for multiple paper IDs because it works right now. |
Hi, If the version number is specified in the query (e.g. 2212.00794v2), no results will be returned. So I think there is a need to handle the arxiv identifier separately. |
OK. Then, please add unit tests to work with Ids. |
Hi @leo-gan, I've committed new unit tests. You can check that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
LGTM
@@ -54,6 +58,14 @@ class ArxivAPIWrapper(BaseModel): | |||
load_all_available_meta: bool = False | |||
doc_content_chars_max: Optional[int] = 4000 | |||
|
|||
def is_arxiv_identifier(self, query: str) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we add a few simple unit tests for this method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, I will add it soon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LMC117 Looks great! LGTM
thanks @LMC117! |
@LMC117 Hi , could you, please, resolve the merging issues? After that ping me and I push this PR for the review. Thanks! |
@leo-gan hi I resolved that the merging issue |
).results() | ||
if self.is_arxiv_identifier(query): | ||
results = self.arxiv_search( | ||
id_list=query[: self.ARXIV_MAX_QUERY_LENGTH].split(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we dont want to do self.ARXIV_MAX_QUERY_LENGTH
right?
run()
andload()
functions inarxiv.py
, using regex to recognize if the query is in the form of arxiv identifier (see https://info.arxiv.org/help/find/index.html). If so, it will directly search the paper corresponding to the arxiv identifier. I also modified and added tests intest_arxiv.py
.