-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Daily Papers API #2554
Daily Papers API #2554
Conversation
Here's an example DailyPaper(
paper=Paper(
paper_id="2409.11340",
authors=[
PaperAuthor(
author_id="66ea3b25353c1b9b84254825",
user=User(
username="Shitao",
fullname="Xiao",
avatar_url="/avatars/c0675d05a52192ee14e9ab1633353956.svg",
details=None,
is_following=None,
is_pro=False,
num_models=None,
num_datasets=None,
num_spaces=None,
num_discussions=None,
num_papers=None,
num_upvotes=None,
num_likes=None,
num_following=None,
num_followers=None,
orgs=[],
),
name="Shitao Xiao",
status="claimed_verified",
status_changed_at=datetime.datetime(
2024, 9, 18, 7, 1, 29, 215000, tzinfo=datetime.timezone.utc
),
hidden=False,
),
PaperAuthor(
author_id="66ea3b25353c1b9b84254826",
user=None,
name="Yueze Wang",
status="",
status_changed_at=None,
hidden=False,
),
PaperAuthor(
author_id="66ea3b25353c1b9b84254827",
user=User(
username="JUNJIE99",
fullname="JUNJIE ZHOU",
avatar_url="/avatars/42f09356a1282896573ccb44830cd327.svg",
details=None,
is_following=None,
is_pro=False,
num_models=None,
num_datasets=None,
num_spaces=None,
num_discussions=None,
num_papers=None,
num_upvotes=None,
num_likes=None,
num_following=None,
num_followers=None,
orgs=[],
),
name="Junjie Zhou",
status="claimed_verified",
status_changed_at=datetime.datetime(
2024, 9, 18, 7, 1, 31, 41000, tzinfo=datetime.timezone.utc
),
hidden=False,
),
PaperAuthor(
author_id="66ea3b25353c1b9b84254828",
user=User(
username="avery00",
fullname="huaying Yuan",
avatar_url="/avatars/2537cee66afecc2d999e05b01c78d319.svg",
details=None,
is_following=None,
is_pro=False,
num_models=None,
num_datasets=None,
num_spaces=None,
num_discussions=None,
num_papers=None,
num_upvotes=None,
num_likes=None,
num_following=None,
num_followers=None,
orgs=[],
),
name="Huaying Yuan",
status="admin_assigned",
status_changed_at=datetime.datetime(
2024, 9, 18, 7, 12, 24, 40000, tzinfo=datetime.timezone.utc
),
hidden=False,
),
PaperAuthor(
author_id="66ea3b25353c1b9b84254829",
user=None,
name="Xingrun Xing",
status="",
status_changed_at=None,
hidden=False,
),
PaperAuthor(
author_id="66ea3b25353c1b9b8425482a",
user=User(
username="Ruiran",
fullname="Ruiran Yan",
avatar_url="/avatars/26aef5944759c2e4366a71eb8c7fc50a.svg",
details=None,
is_following=None,
is_pro=False,
num_models=None,
num_datasets=None,
num_spaces=None,
num_discussions=None,
num_papers=None,
num_upvotes=None,
num_likes=None,
num_following=None,
num_followers=None,
orgs=[],
),
name="Ruiran Yan",
status="admin_assigned",
status_changed_at=datetime.datetime(
2024, 9, 18, 7, 12, 36, 909000, tzinfo=datetime.timezone.utc
),
hidden=False,
),
PaperAuthor(
author_id="66ea3b25353c1b9b8425482b",
user=User(
username="stingw",
fullname="Shu-Ting Wang",
avatar_url="/avatars/3486af06cc2c1562e09b04bb03360912.svg",
details=None,
is_following=None,
is_pro=False,
num_models=None,
num_datasets=None,
num_spaces=None,
num_discussions=None,
num_papers=None,
num_upvotes=None,
num_likes=None,
num_following=None,
num_followers=None,
orgs=[],
),
name="Shuting Wang",
status="admin_assigned",
status_changed_at=datetime.datetime(
2024, 9, 18, 7, 12, 43, 24000, tzinfo=datetime.timezone.utc
),
hidden=False,
),
PaperAuthor(
author_id="66ea3b25353c1b9b8425482c",
user=None,
name="Tiejun Huang",
status="",
status_changed_at=None,
hidden=False,
),
PaperAuthor(
author_id="66ea3b25353c1b9b8425482d",
user=User(
username="zl101",
fullname="zhengliu",
avatar_url="/avatars/ef13dc7ce243819bc0da9b04e778b432.svg",
details=None,
is_following=None,
is_pro=False,
num_models=None,
num_datasets=None,
num_spaces=None,
num_discussions=None,
num_papers=None,
num_upvotes=None,
num_likes=None,
num_following=None,
num_followers=None,
orgs=[],
),
name="Zheng Liu",
status="extracted_pending",
status_changed_at=datetime.datetime(
2024, 9, 18, 2, 30, 1, 852000, tzinfo=datetime.timezone.utc
),
hidden=False,
),
],
published_at=datetime.datetime(
2024, 9, 17, 16, 42, 46, tzinfo=datetime.timezone.utc
),
title="OmniGen: Unified Image Generation",
summary="In this work, we introduce OmniGen, a new diffusion model for unified image\ngeneration. Unlike popular diffusion models (e.g., Stable Diffusion), OmniGen\nno longer requires additional modules such as ControlNet or IP-Adapter to\nprocess diverse control conditions. OmniGenis characterized by the following\nfeatures: 1) Unification: OmniGen not only demonstrates text-to-image\ngeneration capabilities but also inherently supports other downstream tasks,\nsuch as image editing, subject-driven generation, and visual-conditional\ngeneration. Additionally, OmniGen can handle classical computer vision tasks by\ntransforming them into image generation tasks, such as edge detection and human\npose recognition. 2) Simplicity: The architecture of OmniGen is highly\nsimplified, eliminating the need for additional text encoders. Moreover, it is\nmore user-friendly compared to existing diffusion models, enabling complex\ntasks to be accomplished through instructions without the need for extra\npreprocessing steps (e.g., human pose estimation), thereby significantly\nsimplifying the workflow of image generation. 3) Knowledge Transfer: Through\nlearning in a unified format, OmniGen effectively transfers knowledge across\ndifferent tasks, manages unseen tasks and domains, and exhibits novel\ncapabilities. We also explore the model's reasoning capabilities and potential\napplications of chain-of-thought mechanism. This work represents the first\nattempt at a general-purpose image generation model, and there remain several\nunresolved issues. We will open-source the related resources at\nhttps://github.com/VectorSpaceLab/OmniGen to foster advancements in this field.",
upvotes=38,
discussion_id="66ea3b29353c1b9b842549ac",
),
published_at=datetime.datetime(
2024, 9, 18, 1, 0, 6, 728000, tzinfo=datetime.timezone.utc
),
title="OmniGen: Unified Image Generation",
thumbnail="https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2409.11340.png",
comments=3,
submitted_by=User(
username="",
fullname="AK",
avatar_url="https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg",
details=None,
is_following=None,
is_pro=False,
num_models=None,
num_datasets=None,
num_spaces=None,
num_discussions=None,
num_papers=None,
num_upvotes=None,
num_likes=None,
num_following=None,
num_followers=None,
orgs=[],
),
) Although not returned by the API we could add a link to arXiv page and PDF link, Also the API doesn't appear to allow retrieval of the paper's discussion. |
Example [
PaperSearchInfo(
paper_id="2409.07146",
title="Gated Slot Attention for Efficient Linear-Time Sequence Modeling",
thumbnail="https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2409.07146.png",
source="hf",
),
PaperSearchInfo(
paper_id="2409.03752",
title="Attention Heads of Large Language Models: A Survey",
thumbnail="https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2409.03752.png",
source="hf",
),
...
] Example DailyPaper(
paper=Paper(
paper_id="2409.11074",
authors=[
PaperAuthor(
author_id="66ead57361228b02f8144cdf",
user=None,
name="Adrian Cosma",
status="",
status_changed_at=None,
hidden=False,
),
PaperAuthor(
author_id="66ead57361228b02f8144ce0",
user=None,
name="Ana-Maria Bucur",
status="",
status_changed_at=None,
hidden=False,
),
PaperAuthor(
author_id="66ead57361228b02f8144ce1",
user=None,
name="Emilian Radoi",
status="",
status_changed_at=None,
hidden=False,
),
],
published_at=datetime.datetime(
2024, 9, 17, 11, 3, 46, tzinfo=datetime.timezone.utc
),
title="RoMath: A Mathematical Reasoning Benchmark in Romanian",
summary="Mathematics has long been conveyed through natural language, primarily for\nhuman understanding. With the rise of mechanized mathematics and proof\nassistants, there is a growing need to understand informal mathematical text,\nyet most existing benchmarks focus solely on English, overlooking other\nlanguages. This paper introduces RoMath, a Romanian mathematical reasoning\nbenchmark suite comprising three datasets: RoMath-Baccalaureate,\nRoMath-Competitions and RoMath-Synthetic, which cover a range of mathematical\ndomains and difficulty levels, aiming to improve non-English language models\nand promote multilingual AI development. By focusing on Romanian, a\nlow-resource language with unique linguistic features, RoMath addresses the\nlimitations of Anglo-centric models and emphasizes the need for dedicated\nresources beyond simple automatic translation. We benchmark several open-weight\nlanguage models, highlighting the importance of creating resources for\nunderrepresented languages. We make the code and dataset available.",
upvotes=1,
discussion_id="66ead57461228b02f8144d31",
),
published_at=datetime.datetime(
2024, 9, 19, 17, 17, 31, 279000, tzinfo=datetime.timezone.utc
),
title="RoMath: A Mathematical Reasoning Benchmark in Romanian",
thumbnail="",
comments=0,
submitted_by=User(
username="IAMJB",
fullname="JB D.",
avatar_url="/avatars/1208629f14f010dbc2cd94f3c30f9baf.svg",
details=None,
is_following=None,
is_pro=False,
num_models=None,
num_datasets=None,
num_spaces=None,
num_discussions=None,
num_papers=None,
num_upvotes=None,
num_likes=None,
num_following=None,
num_followers=None,
orgs=[],
),
) Example DailyPaper(
paper=Paper(
paper_id="2409.11074",
authors=[
PaperAuthor(
author_id="66ead57361228b02f8144cdf",
user=None,
name="Adrian Cosma",
status="",
status_changed_at=None,
hidden=False,
),
PaperAuthor(
author_id="66ead57361228b02f8144ce0",
user=None,
name="Ana-Maria Bucur",
status="",
status_changed_at=None,
hidden=False,
),
PaperAuthor(
author_id="66ead57361228b02f8144ce1",
user=None,
name="Emilian Radoi",
status="",
status_changed_at=None,
hidden=False,
),
],
published_at=datetime.datetime(
2024, 9, 17, 11, 3, 46, tzinfo=datetime.timezone.utc
),
title="RoMath: A Mathematical Reasoning Benchmark in Romanian",
summary="Mathematics has long been conveyed through natural language, primarily for\nhuman understanding. With the rise of mechanized mathematics and proof\nassistants, there is a growing need to understand informal mathematical text,\nyet most existing benchmarks focus solely on English, overlooking other\nlanguages. This paper introduces RoMath, a Romanian mathematical reasoning\nbenchmark suite comprising three datasets: RoMath-Baccalaureate,\nRoMath-Competitions and RoMath-Synthetic, which cover a range of mathematical\ndomains and difficulty levels, aiming to improve non-English language models\nand promote multilingual AI development. By focusing on Romanian, a\nlow-resource language with unique linguistic features, RoMath addresses the\nlimitations of Anglo-centric models and emphasizes the need for dedicated\nresources beyond simple automatic translation. We benchmark several open-weight\nlanguage models, highlighting the importance of creating resources for\nunderrepresented languages. We make the code and dataset available.",
upvotes=1,
discussion_id="66ead57461228b02f8144d31",
),
published_at=datetime.datetime(
2024, 9, 19, 17, 17, 31, 279000, tzinfo=datetime.timezone.utc
),
title="RoMath: A Mathematical Reasoning Benchmark in Romanian",
thumbnail="https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2409.11074.png",
comments=0,
submitted_by=User(
username="IAMJB",
fullname="JB D.",
avatar_url="/avatars/1208629f14f010dbc2cd94f3c30f9baf.svg",
details=None,
is_following=None,
is_pro=False,
num_models=None,
num_datasets=None,
num_spaces=None,
num_discussions=None,
num_papers=None,
num_upvotes=None,
num_likes=None,
num_following=None,
num_followers=None,
orgs=[],
),
) |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @hlky, thanks a lot for working on this 🤗 I left a couple of comments to keep the design of the API minimal and consistent.
14557f2
to
90bf50c
Compare
Thanks for the review. I've made the required changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @hlky! thanks for this iteration! Apart from the comments below, I think we are almost there :)
Thanks again. I've made the requested changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @hlky for this iteration! Sorry for this back and forth review 😄
Looks like the CI endpoint has no papers so the tests are failing. |
@hlky I pushed a fix where |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi there! Sorry been late on the feedback / review. I've checked the API and discussions above and I think we should settle on supporting only /api/papers/search
which supports only the "q"
parameter and drop support for /api/daily_papers
. If we want to be able to search by date in the future, we will update the backend. The server-side API is not consistent (yet) so let's start small client-side and expand once the API has evolved. Sorry if that changes (again) the spec of this PR 🙈 Please see the details below
list_papers = api.list_papers | ||
paper_info = api.paper_info |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two must be added at the root of huggingface_hub
package. To do so, you need to add them to this list and run make style
which will make sure alphabetical order is respected + add a type checking annotation. You can then commit the changes.
Co-authored-by: Celina Hanouti <[email protected]>
Co-authored-by: Celina Hanouti <[email protected]>
Co-authored-by: Lucain <[email protected]>
Thanks for the review. I've made the requested changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thanks for adding this 🤗
Fixes #2553
This PR introduces
list_papers
using the Daily Papers API,search_papers
using thepapers/search
endpoint andget_paper
usingpapers/{paper_id}
endpoint.We add
DailyPaper
dataclass, containingPaper
and associated metadata.Paper
dataclass containing metadata about the paper itself.PaperAuthor
dataclass containing metadata about the paper's author.PaperAuthor
'suser
andDailyPaper
'ssubmitted_by
use existingUser
dataclass, although these contain fewer fields thanUser
itself so could have their own dataclasses.We add
list_papers
toHfApi
which acceptsdate
asstr
,YYYY-MM-DD
is the expected format, this could also accept datetime as a parameter. The endpoint itself also accepts a full datetime in format%Y-%m-%dT%H:%M:%S.%fZ
. Invalid dates will returnHTTP 400
.We add
PaperSearchInfo
dataclass, containing minimal metadata, returned bysearch_papers
.We add
search_papers
toHfApi
which acceptsquery
asstr
, this can be a text query or arXiv paper ID.We add
get_paper
toHfApi
which accepts eitherpaper_id
asstr
or aPaperSearchInfo
object withpaper_search
. Due to slight differences between the data returned frompapers/{paper_id}
and Daily Papers endpoint we add a static methodfrom_get_paper
toDailyPaper
. Some fields are unavailable frompapers/{paper_id}
, namelythumbnail
andnumComments
, when providing aPaperSearchInfo
we copythumbnail
into theDailyPaper
object.We add tests
test_papers_by_date
,test_search_papers
,test_get_paper_by_id
,test_get_paper_by_paper_search_info
underDailyPaperApiTest
.