Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Display machine learning demos from Replicate.com #257

Merged
merged 21 commits into from
Mar 3, 2022

Conversation

zeke
Copy link
Contributor

@zeke zeke commented Feb 2, 2022

This pull request introduces a new arXiv Labs integration that displays links to interactive machine learning demos from @replicate.

Implementation notes

  • I used the existing PapersWithCode integration as a reference implementation.
  • There's a new "Demos" tab has between "Code & Data" and "Related Papers". (I believe this was a suggestion from the arXiv team during early design discussions?)
  • Favors well-supported browser APIs (doesn't use any jQuery)
  • The generated HTML is sanitized to prevent XSS

Screenshots

Demo exists:

Screen Shot 2022-02-03 at 9 45 02 PM


No demo exists:
Screen Shot 2022-02-03 at 9 45 45 PM


Integration disabled:
Screen Shot 2022-02-03 at 9 45 57 PM

Running the development enviroment

docker build . -t arxiv/browse
docker run -it --publish 8000:8000 arxiv/browse

Viewing the integration in development

The arxiv/browse dev environment includes a small subset of articles, but it appears that they're all pretty old, and none of the papers for which Replicate models exist are present in the dev environment. To work around this, I made it possible to inject a custom query param onto the URL to override the arxiv paper ID.

To see how demos are rendered, try these example URLs:

To see what the "no models exist" state looks like, visit any page (without setting the query param):

@CLAassistant
Copy link

CLAassistant commented Feb 2, 2022

CLA assistant check
All committers have signed the CLA.

@bfirsh
Copy link

bfirsh commented Feb 4, 2022

Taken a quick look at this and looks good to me. 👌

@zeke zeke marked this pull request as ready for review February 4, 2022 22:26
@zeke
Copy link
Contributor Author

zeke commented Feb 4, 2022

Hey @mhl10 and @SBBCornell, this is ready for a first look!

@zeke zeke changed the title Display machine learning demos from Replicate.com [WIP] Display machine learning demos from Replicate.com Feb 4, 2022
@mhl10
Copy link
Contributor

mhl10 commented Feb 8, 2022

Hi @zeke, we can include papers with known demos (e.g. 2103.17249, 2101.04061, 2108.10257 from above) in the test data if that would be helpful.

Previously we discussed the possibility of limiting display to papers within specific subject areas (like cs.AI or cs.ML). I think this is something to consider since there will be almost no demos outside the CS domain. Another option is to limit display to papers from the last 5-10 years. Let me know what you think.

@zeke
Copy link
Contributor Author

zeke commented Feb 8, 2022

Thanks for the feedback @mhl10!

we can include papers with known demos (e.g. 2103.17249, 2101.04061, 2108.10257 from above) in the test data

That would be helpful. Is this something I can do on my own?

limiting display to papers within specific subject areas (like cs.AI or cs.ML)

Yep, that makes sense. Spelunking through the code history I noticed that the Papers With Code integration was originally scoped to certain subject areas, but was subsequently yanked out in #199. I can use the code removed there as a basis for limiting the Replicate integration to certain contexts. Does that sound like a good approach?

@mhl10
Copy link
Contributor

mhl10 commented Feb 8, 2022

Thanks for the feedback @mhl10!

we can include papers with known demos (e.g. 2103.17249, 2101.04061, 2108.10257 from above) in the test data

That would be helpful. Is this something I can do on my own?

Hi @zeke, it's no problem -- I've pushed the test files to the develop branch.

limiting display to papers within specific subject areas (like cs.AI or cs.ML)

Yep, that makes sense. Spelunking through the code history I noticed that the Papers With Code integration was originally scoped to certain subject areas, but was subsequently yanked out in #199. I can use the code removed there as a basis for limiting the Replicate integration to certain contexts. Does that sound like a good approach?

Yes, that approach should be fine. Thanks!

@zeke
Copy link
Contributor Author

zeke commented Feb 9, 2022

I've pushed the test files to the develop branch.

eba2d4a 👀 Nice! Thank you.

@zeke
Copy link
Contributor Author

zeke commented Feb 10, 2022

I've pushed up a small change fbeb4d5 that selectively activates the Replicate integration based on the arXiv category of the article you're viewing.

To help determine which categories are appropriate, I wrote a script to collect the arXiv categories for each model we have on Replicate so far, and that list included cs.CL, cs.CV, cs.GR, cs.LG, cs.NE, cs.SD, eess.AS, eess.IV, and stat.ML. Curiously, no papers from the expected cs.AI subcategory!

Given that we have implementations from a broad range of arXiv subcategories, and that range will undoubtedly grow over time, I opted to scope the integration to the three top-level categories for which we currently have models: cs, eess, and stat. Let me know if that makes sense to you.

One subtle thing to note here is that the change I've made in fbeb4d5 determines whether the Replicate integration is enabled at page load, but doesn't actually have an effect on whether the [Demos] tab is displayed, as that's part of the HTML that's included on all pages. I can definitely update this to dynamically hide or show the Demos tab when appropriate, but wanted to check in first and see if that approach makes sense.

@mhl10
Copy link
Contributor

mhl10 commented Feb 14, 2022

One subtle thing to note here is that the change I've made in fbeb4d5 determines whether the Replicate integration is enabled at page load, but doesn't actually have an effect on whether the [Demos] tab is displayed, as that's part of the HTML that's included on all pages. I can definitely update this to dynamically hide or show the Demos tab when appropriate, but wanted to check in first and see if that approach makes sense.

@zeke dynamic hiding/showing the Demos tab makes sense to me! Let's do it--thanks for suggesting. Once that's ready, I'll stage your changes on one of our dev servers for everyone's review, and can also discuss deployment plans.

This change makes the server-rendered "Demos" tab invisible by default, and updates the toggle-labs script to conditionally make the tab visible on specific category pages.
@zeke
Copy link
Contributor Author

zeke commented Feb 14, 2022

dynamic hiding/showing the Demos tab makes sense to me! Let's do it--thanks for suggesting.

Cool. I pushed up commit 8dc1713 to conditionally display the demos tab. This was a little tricky to get right because of the code that auto-clicks the last active tab if a cookie is present, but I think I've got it working correctly now.

This screenshot shows how the demos tab is displayed (or not) depending on the category of page you're viewing:

Screen Shot 2022-02-14 at 3 18 22 PM

Once that's ready, I'll stage your changes on one of our dev servers for everyone's review, and can also discuss deployment plans.

Nice! Pretty sure this is ready to go from the Replicate side, operationally speaking. The new replicate.com API endpoint we set up for this integration is fronted by a Cloudflare edge network, and it's set to cache URLs for 30 minutes. We should be able to withstand the traffic from this.

We've also set up a process on our side where every Replicate model that includes an arXiv paper URL/ID now requires a Replicate admin to review the model and verify that the model is indeed an implementation of the referenced paper before it will be surfaced by the public API.

Looking forward to moving this to the next stage of review. :)

@zeke zeke mentioned this pull request Feb 15, 2022
Copy link

@helendwang helendwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good on my local machine! Looking forward to reviewing on dev.

@mhl10
Copy link
Contributor

mhl10 commented Feb 17, 2022

Looking good on our beta instance:

https://beta.arxiv.org/abs/2007.14268?override_paper_id=2103.17249

My only caveat that this machine is using an older snapshot of ourdata, from July 2020. Are there any demos associated with papers before then?

...instead of pointing to labs.arxiv.org. This will give us a way to more clearly explain what Replicate it, how the integration with arXiv works, etc.
@zeke
Copy link
Contributor Author

zeke commented Feb 18, 2022

@bfirsh and I reviewed this today and pushed up a few minor changes.

Apologies for sneaking those in after your review @mhl10 and @helendwang.

@mhl10
Copy link
Contributor

mhl10 commented Feb 21, 2022

@bfirsh and I reviewed this today and pushed up a few minor changes.

Apologies for sneaking those in after your review @mhl10 and @helendwang.

No worries -- latest changes are up on beta.arxiv.org.

@zeke
Copy link
Contributor Author

zeke commented Feb 22, 2022

Nice! For reference, here are some papers that exist in the beta.arxiv.org test data and have Replicate demos:

@mhl10 mhl10 merged commit 7f44d5f into arXiv:develop Mar 3, 2022
@zeke zeke deleted the replicate-demos branch March 3, 2022 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants