Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Render large dataframes as images icephys tutorials #1469

Merged
merged 10 commits into from
May 5, 2022

Conversation

oruebel
Copy link
Contributor

@oruebel oruebel commented May 3, 2022

Motivation

Fix #1467

Rendering large dataframes in readthedocs is challenging due to the limited horizontal space and lack of horizontal scrolling. The display of some of the tables in the icephys tutorials was sub-optimal because of this, in particular when tables were transposed for display.

This PR uses the dataframe_image library to save large tables in the icephys tutorial as PNG images which are then displayed in the docs as regular images.

Advantages

  • Tables are resized to fit the page, rather than spill over horizontally over the page boundary
  • Tables rendered this way also appear in the LaTeX/PDF version of the docs
  • Users can click on the tables to view the full version
  • Table rendered this way are still automatically updated with tutorials changes as they are generated dynamically as part of the docs

Disadvantages

  • This adds dataframe_image as a dependency in requirements-doc.txt
  • dataframe_image relies on chrome so it needs to be installed as well for this to work
  • Need to be careful for very large dataframes to avoid creating very large images, which can slow down generation and display of the docs
  • Tables are rendered as PNG rather than as text in HTML so searching or selecting text from tables displayed this way is not possible

Conclusion Using dataframe_image for the purpose of the docs is useful when displaying a very large dataframe in HTML is a challenge. For dataframes that fit the ReadTheDocs layout, rendering the tables directly in HTML with pandas as usual is fine.

How to test the behavior?

View the tutorial on readthedocs

Checklist

  • Did you update CHANGELOG.md with your changes?
  • Have you checked our Contributing document?
  • Have you ensured the PR clearly describes the problem and the solution?
  • Is your contribution compliant with our coding style? This can be checked running flake8 from the source directory.
  • Have you checked to ensure that there aren't other open Pull Requests for the same change?
  • Have you included the relevant issue number using "Fix #XXX" notation where XXX is the issue number? By including "Fix #XXX" you allow GitHub to close issue #XXX when the PR is merged.

@oruebel oruebel requested review from rly and weiglszonja May 3, 2022 06:06
@oruebel
Copy link
Contributor Author

oruebel commented May 3, 2022

@rly the gallery and readthedocs test pipelines currently fail for this PR. The problem seems to be a missing package OSError("Chrome executable not able to be found on your machine"). I didn't realize that dataframe_image requires chromeuntil after I made the PR. Can we install chrome or chromium-chromedriver for the gallery/docs or is this a deal breaker? See also:

@rly
Copy link
Contributor

rly commented May 4, 2022

Have you tried:

If you do not have Chrome installed or cannot get it to work properly, you can alternatively use matplotlib to convert the DataFrames to images. Select this option by setting the table_conversion parameter to 'matplotlib'.

source

If that doesn't work well...

Readthedocs does NOT support custom docker images so it will be difficult to install chrome: readthedocs/readthedocs.org#6162

However, we can hack it by calling commands to install chrome or chromium-chromedriver within conf.py.

@oruebel
Copy link
Contributor Author

oruebel commented May 4, 2022

Have you tried:

Thanks for the suggestion. I did not see this. I've updated the calls for dataframe_image to use matplotlib. It looks fine on my labtop, let's see if this works in the gallery tests and readthedocs now 🤞

@oruebel
Copy link
Contributor Author

oruebel commented May 4, 2022

I had to add lxml as a dependency to make dataframe_image work with matplotlib on readthedocs. The docs build is now working. However, there appears to be a font issue when using matplotlib which causes column headers to spill over the column. The figure on the left is how the table renders on my labtop and on the right how it renders on ReadTheDocs. I'll do some more investigation to see if I can fix that issue:

Screen Shot 2022-05-04 at 1 53 12 PM

undary.

The gallery tests are currently failing because one of the image files that is being rendered cannot be found. My guess is that maybe the gallery tests are running with a different home so the relative paths may not work. I'll see if I can fix that.

@codecov
Copy link

codecov bot commented May 4, 2022

Codecov Report

Merging #1469 (5e5f1c8) into dev (89f085a) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##              dev    #1469   +/-   ##
=======================================
  Coverage   77.47%   77.47%           
=======================================
  Files          37       37           
  Lines        2735     2735           
  Branches      455      455           
=======================================
  Hits         2119     2119           
  Misses        535      535           
  Partials       81       81           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 89f085a...5e5f1c8. Read the comment docs.

@oruebel
Copy link
Contributor Author

oruebel commented May 4, 2022

@rly It looks like this all working now. Please have a look and see if this looks better now.

The following comment is mostly for future reference. There are a couple of caveats to consider with this approach:

  1. This requires dataframe_image and lxml as additional docs dependencies. lxml is required because on ReadTheDocs, parsing the table HTML with BeautifulSoup as part of the dataframe_image export with matplotlib fails otherwise.
  2. Gallery test pipelines appear to run at a different path than the normal docs builds. Because of this, the image files cannot be created in the docs folder as it does not exist in the same location relative to the tutorial script. A simple fix is to call os.makedirs(df_basedir, exist_ok=True) so that the files can be created in the gallery tests. This is not ideal, since: a) it is technically the wrong path for the docs build and b) gallery tests will place the images in a different path then the normal docs built. However, I don't think this critical, since for the gallery tests we only care that the gallery code runs and when we build the docs the files are created in the right location. That being said, this is something to be aware of and if we can determine the proper location for the docs in the gallery tests then that may be a better solution.
  3. To render the tables on ReadTheDocs we have to use dataframe_image with the matplotlib renderer. This has the disadvantage that styling and fonts are not considered properly. I.e., setting table styles with dataframe.style.set_properties appear to be ignored. Unfortunately, on ReadTheDocs column labels are rendered too large so that the layout is not ideal (i.e,. column labels are overlapping). It still looks reasonable, but it is a limitation to be aware of.

@rly
Copy link
Contributor

rly commented May 5, 2022

This looks good to me. Another option to getting around the path issue is to export images to the current directory and use matplotlib or another library to render the image in the docs rather than displaying the image using Sphinx. It's hacky either way. I'd say the current method is good enough.

@oruebel oruebel merged commit 70c9bdb into dev May 5, 2022
@oruebel oruebel deleted the enh/display_dataframe_in_icephys_docs branch May 5, 2022 02:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Documentation]: Icephys tutorial output is confusing
2 participants