Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for editing Plain text files (like Python, MyST and R Markdown-based) as notebooks #1240

Open
allefeld opened this issue Jul 6, 2020 · 32 comments
Assignees
Labels
feature-request Request for new features or functionality notebook-serialization Applies to conversion of ipynb file to JSON & vice versa
Milestone

Comments

@allefeld
Copy link

allefeld commented Jul 6, 2020

Feature: Notebook Editor, Interactive Window, Python Editor cells

Description

The notebook provides an option 'Convert and save to a Python script', which creates a representation of the notebook as a standard .py script using the 'percent format'. This does not only provide yet another way to interactively work with Python and rich output – the format is also a solution to the long-standing problem that the JSON-format of .ipynb files including embedded binary data (graphics) does not play well with source control systems.

Personally I prefer to work with the standard notebook editor interface, but because of the latter problem with source control I like to have a percent-formatted version in parallel.

My feature request: Have an option, either global, per-workspace, or per-notebook, that whenever an .ipynb file is saved, a percent-formatted version of it is created/updated in the background, too. This way the notebook proper can be used for interactive work, but the commits on the percent-formatted version can serve as a readable record of what was done, so if it is necessary to revert changes, it is clear which commit is the right one.

Microsoft Data Science for VS Code Engineering Team: @rchiodo, @IanMatthewHuff, @DavidKutu, @DonJayamanne, @greazer, @joyceerhl

@joyceerhl
Copy link
Contributor

Thanks for the feature request! We'll discuss it at our upcoming triage.

@amittleider
Copy link

Definitely agree with this. Now that we have the real jupyter feel directly inside of VSCode, we don't even need to open a browser anymore. The problem is that when VSCode reads percent scripts, the look and feel of the interactive mode is different (and worse) than the new functionality.

We use percent scripts in the repo and open them using Jupyter with Jupytext. Jupytext links the percent script with the ipynb file, just as @allefeld is explaining. It'd be perfect to have this functionality right inside the editor.

@kaaloo
Copy link

kaaloo commented Sep 16, 2020

We were just discussing this issue at work. Would love to see a solution directly in vscode!

@rchiodo rchiodo changed the title automagically create percent-formatted version of notebook file for source control Keep python and notebook automatically in sync Oct 9, 2020
@echaya
Copy link

echaya commented Oct 11, 2020

Looking forward to this feature. Once released, I'll persuade the whole team to move from JLab to VSCode :D

@DonJayamanne DonJayamanne transferred this issue from microsoft/vscode-python Nov 13, 2020
@ssiegel95
Copy link

I would love to have this too. In fact, even having a scriptable means of converting from a "percent formatted" (PF) .py file back to a .ipynb would go a long way towards streamlining many of my team's workflows where we like to keep the PF versions for our automated tests and joint development via git but then export to .ipynb for our forward facing documentation pages.

Thanks for the amazing developer tools that you guys put out there, by the way!

@Aonnghus
Copy link

Any news on this issue ? That would really be a great addition to vscode !

@rchiodo
Copy link
Contributor

rchiodo commented Apr 26, 2021

Sorry but this is not currently on our backlog. We're working on publishing our backlog so that people can see our plans.

It's not that difficult though. You'd just have to watch save requests for the notebook documents (well in insiders anyway) and then run nbconvert every time a save occurred.

If anybody wants to submit a PR we gladly accept them.

@venaturum
Copy link

@rchiodo

It's not that difficult though. You'd just have to watch save requests for the notebook documents (well in insiders anyway) and then run nbconvert every time a save occurred.

Hi Rich, how can we watch save requests? I've tried several "run on save" extensions and none of them seem to work on ipynb files for some reason.

@rchiodo
Copy link
Contributor

rchiodo commented Jun 7, 2021

I believe you could listen to this event:

https://github.com/microsoft/vscode/blob/27b2434631bc9e253c30f3d55b837fea41a7c170/src/vs/vscode.proposed.d.ts#L1209

Then after the event fires, run nbconvert on the notebook.

@DonJayamanne
Copy link
Contributor

@venaturum
I've created an extension that does this (open a Python file and execute as a notebook) today https://github.com/notebookPowerTools/vscode-jupytext
Currently only usable in VS Code Insiders.

@allefeld
Copy link
Author

allefeld commented Jun 7, 2021

@DonJayamanne, that looks great, thank!!

Slightly offtopic: I've been looking forward to native notebooks and all that is possible with them, but it still seems to be a way off to get into stable VSCode. On the other hand, I usually don't like to use unstable software for daily work. Is Insiders "unstable"?

@DonJayamanne
Copy link
Contributor

k. Is Insiders "unstable"?

I wouldn't call it unstable. Its the latest build of VS Code and latest build of our extensions. These get updated daily. We have CI pipelines (tests) to ensure we don't ship breaking changes.
However once in a while things do break and we try to get them resolve ASAP (within the same day).

@lgonzalezsa
Copy link

@venaturum
I've created an extension that does this (open a Python file and execute as a notebook) today https://github.com/notebookPowerTools/vscode-jupytext
Currently only usable in VS Code Insiders.

I was about to comment about my enjoyable experience with jupytext, as alternative or now I should said, meantime we have a solution in vscode. Nice!

@venaturum
Copy link

@venaturum
I've created an extension that does this (open a Python file and execute as a notebook) today https://github.com/notebookPowerTools/vscode-jupytext
Currently only usable in VS Code Insiders.

Thanks @DonJayamanne , I've actually been wrestling with Jupytext for the last couple of days trying to achieve the setup my team is after. Maybe you can advise me as to whether it's possible. What we're aiming for is

  1. A repository with a folder called scripts which contain py files in percent format
  2. When we clone this repo we want to run a command from a terminal which creates corresponding ipynb files in a folder called notebooks
  3. We then want to run a command which pairs the files in scripts folder to files in notebooks folder
  4. Keep notebooks/py files synced through Insiders/Jupytext extension and git hooks

I have set formats = "notebooks///ipynb,scripts///py:percent" in jupytext.toml and tried executing all sorts of jupytext commands on the command line to achieve the above but haven't succeeded.

@DonJayamanne
Copy link
Contributor

If you have questions related to the Juptext extension for VS Code, please file it against that repo.
The Jupyter extension doesn't support jupytext natively (concepts such as keeping the files in sync and the like are currently out of scope - not yet supported)

@DonJayamanne
Copy link
Contributor

Duplicate of #1237
taking this as this falls into nb serialization, also similar to the jupytext notebook viewer (r markdown and myst files)

@DonJayamanne DonJayamanne assigned DonJayamanne and unassigned rebornix and amunger Sep 4, 2022
@DonJayamanne DonJayamanne added notebook-serialization Applies to conversion of ipynb file to JSON & vice versa and removed interactive-window Impacts interactive window labels Sep 4, 2022
@marcglobality
Copy link

Hi @DonJayamanne , do we have any updates on this? is it solved, just not documented? thx

@ruslaniv
Copy link

Please add the ability to syncing notebooks to py scripts.

Also I do not understand these two phrases used together especially since a lot of people expressed interest in this:

Sorry but this is not currently on our backlog

and

It's not that difficult though

@starball5
Copy link

Related on Stack Overflow: How to config automatic sync Jupyter notebook .ipynb and .py files in VSCode e.g. by using Jupytext

@DonJayamanne DonJayamanne changed the title Keep python and notebook automatically in sync Support for editing Plain text files (like Python, MyST and R Markdown-based) as notebooks Dec 1, 2023
@DonJayamanne DonJayamanne added this to the Backlog milestone Dec 4, 2023
@td-anne
Copy link

td-anne commented Jan 30, 2024

Just a note: there are (at least) two possible workflows one might want to support here:

  1. Only a .py file ever exists, but the user can/must open it through the notebook interface. Outputs are discarded when the editor window is closed. Changes are saved to the .py file.
  2. Both .py and .ipynb files exist. The .py file is opened as a normal python file, and the .ipynb file is opened as a notebook. When either one is saved, the command jupytext --sync is run to synchronize the two. This command does not modify the cell outputs, which live in the .ipynb file, but ensures that the code/markdown in the .py file always matches that in the .ipynb file. Only the .py file is checked in to version control; the outputs in the .ipynb are lost only when moving to a fresh clone.

Both modes of working have their merits, but the code required to support them is very different. Mode #2 requires nothing more than reliably running a sync every time either file is saved (though a few extra syncs wouldn't hurt anyone, for example on load). Mode #1 requires support from within the notebook machinery. I note that the extension by @DonJayamanne supports (only) #1. Various schemes have been tried to run the sync on save, but for some reason it appears difficult to arrange for a command to be run after a notebook is saved. This could in principle be worked around by creating a watcher daemon that would simply run a sync any time either file was modified; as long as VSCode could be persuaded to keep this running it would automate the task.

@jabbera
Copy link

jabbera commented Feb 2, 2024

I'd love #1 described in @td-anne. I want to use the standard notebook interface with jupytext files. I don't care about saving outputs!

@marctorsoc
Copy link

marctorsoc commented Feb 6, 2024

I would like #2 as described above. I do want to keep the outputs for the future. Right now I solve it by running

jupytext --to py:percent file.ipynb

after running the notebook, but it'd be great to be automatic and always sync'd, as it happens with jupyter in the browser

One advantage of this is that when editing the notebook and saving (assuming it automatically syncs to the py file), one would be able to git diff very quickly, without having to sync with jupytext. This is useful in many cases e.g. after doing some data inspection in the notebook, creating many new cells to debug and later aiming to revert the changes

@mwouts
Copy link

mwouts commented Feb 6, 2024

Thanks for considering this!

Quick question re #1, I see that the interactive script mode (also on TDS, search for interactive scripts in that page), which has been around for a while, still works in VS Code. That means that I can execute a percent script step by step in VS Code. Isn't that close already to what you want? How different would a Jupyter mode be?

Personally I would be using mostly #2 i.e. keep the outputs on disk too.

image

@jabbera
Copy link

jabbera commented Feb 6, 2024

How different would a Jupyter mode be?

Different enough that my users don't want to use it. Using Jupyter Notebook mode outputs appear directly below the cell that executed it. They don't need another window open. It's also inline with the behavior of JupyterLab and it's jupytext extension.

@td-anne
Copy link

td-anne commented Feb 7, 2024

Thanks for considering this!

Quick question re #1, I see that the interactive script mode (also on TDS, search for interactive scripts in that page), which has been around for a while, still works in VS Code. That means that I can execute a percent script step by step in VS Code. Isn't that close already to what you want? How different would a Jupyter mode be?

Personally I would be using mostly #2 i.e. keep the outputs on disk too.

I would be sticking to #2 as well. In fact I do manage a clumsy data-loss-prone version of it by manually running jupytext --sync when I remember to. Large or complex notebooks I end up foregoing VSCode's editing power and running things in actual JupyterLab, where this Just Works.

But the in-line rendered markdown, mathematics, and plots are a tremendous selling point for scientific users. Not to mention the rich display of certain outputs (sympy equations, pandas dataframes, generated markdown). With the notebook view, you can execute a notebook and have a literate-programming view of your results, in order, in place, associated with the code that generated them.

The interactive percent mode does have its place, and it can be less confusing than a notebook when you're running cells substantially out of order. But it is the notebook view that took over data science.

@AlexeyDmitriev
Copy link

AlexeyDmitriev commented Apr 19, 2024

For us, what we need is #1, As @jabbera said the current state is different enough to be not convenient to use.

#2 is also acceptable (we'd just need to gitignore ipynb's) but it looks more complicated for both implementation and usage

@VolkerH
Copy link

VolkerH commented Apr 24, 2024

For both options, .py files are explicitly mentioned (.py percent format). I'm not sure whether you implicitly also meant to support the other jupytext supported formats such as .md (markdown, myst markdown).

The interactive percent mode does have its place, and it can be less confusing than a notebook when you're running cells substantially out of order. But it is the notebook view that took over data science.

I agree. In the context of jupyterbook or Sphinx project, there will typically be a step that renders the output to something like Github pages, so the user sees the familiar notebook output (with some nice extra formatting and cross-referening) but you keep the output diff noise out of the repo. (just describing our use case for this feature, not trying to explain jupyterbook).

@allefeld
Copy link
Author

Personally, I've moved from Jupyter notebooks to Quarto documents. It's slightly less interactive, but Quarto documents are plain text files to begin with, and Quarto supports many additional Markdown features and many output formats (through Pandoc).

@lgonzalezsa
Copy link

Personally, I've moved from Jupyter notebooks to Quarto documents. It's slightly less interactive, but Quarto documents are plain text files to begin with, and Quarto supports many additional Markdown features and many output formats (through Pandoc).

I did not move completely but now I am keeping my computations in Jupyter Notebooks and use the Quarto feature to embed what I need from the Jupyter Notebook and expose it into my qmd report. My next step is to try to move completely but I still have peers that are Notebook centric.

@kesshijordan
Copy link

Sorry but this is not currently on our backlog. We're working on publishing our backlog so that people can see our plans.

It's not that difficult though. You'd just have to watch save requests for the notebook documents (well in insiders anyway) and then run nbconvert every time a save occurred.

If anybody wants to submit a PR we gladly accept them.

I would like to add an additional potential use case around supporting a workflow to securely use Copilot with VSCode in a notebook IDE when outputs may contain sensitive data.

When using VSCode as a notebook IDE with the Copilot extension there is a concern that given the nbformat containing data from outputs, sensitive data stored from the output of cells could be exposed. In sectors like healthcare we need guardrails to prevent that from happening if we want to use these tools. I asked about this in a support ticket and was advised:

I appreciate you bringing this our attention. I shared your questions to our Copilot engineering team. We went ahead and explored this further. We have concluded with and suggest that it is best to use content exclusions as we cannot guarantee that Copilot will not use data Jupyter notebook cells into its suggestions. For more information, read Configuring content exclusions for GitHub Copilot in the GitHub Docs.

I think one potential workflow is to sync a .py to .ipynb in a repo with the .ipynb subject to Copilot content exclusions and in the .gitignore. That would allow Copilot to be used on the .py percent format, while ensuring that when the .ipynb file is saved the data from the outputs stored in the json structure is protected. I added the jupytext sync command to the save keyboard binding as a task, which has lowered the friction, but it does introduce other off-target effects given keyboard bindings are universal (though the workarounds proposed here looks promising). I expect this feature request would greatly lower the friction to setup and use this kind of workflow.

I also want to acknowledge: thanks for considering this feature request and for all the contributors/developers do in the community!

TL;DR: I think VSCode support of this feature request would facilitate a secure/lower friction way to use Copilot with VSCode as a notebook IDE when outputs may contain sensitive data

@johndolan29
Copy link

With Databricks growing in popularity among organisations with Data Science teams, having something that can seamlessly integrate with Databricks connect and have .py files displayed in notebook format would be a game changer! Having a proper ide like VS Code is something databricks is missing.

@wyatt-wong
Copy link

Is there any update on this ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request Request for new features or functionality notebook-serialization Applies to conversion of ipynb file to JSON & vice versa
Projects
None yet
Development

No branches or pull requests