Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dev tool to build themes #379

Closed
wants to merge 13 commits into from

Conversation

s-weigand
Copy link
Contributor

As discussed in #376 I added a small commandline tool to more conveniently build themes.

How it works:

  1. It inits a new git repo in a temp folder, if not present and adds [email protected]:spatialaudio/nbsphinx.git as remote. This prevents messing with the actual git
  2. On the first run it just creates a dev_utils/requirements_themes.txt containing all requirements to build all themes. This is generated from the diff in doc/requirements.txt from the different theme branches and master remote.
  3. When run w/o any flags/options it then builds all themes by:
    • Copying doc/, README.rst, CONTRIBUTING.rst, to a temp folder once.
    • Adding the diff of the remote master and theme brache doc/conf.py to the one in the temp folder (in contrast to merge, this prevents possible merge conflicts) .
    • Build the theme docs to a subfolder of dev_util/_build with with the branch name.

Other features:

  • Subset of themes building
  • Listing available names for subset members
  • Force recreate dev_utils/requirements_themes.txt
  • Force doc rebuild (sphinx-build -a)

Open Questions:

  1. Would you rather have the theme docs being build in doc/_build, than dev_util/_build ?
  2. Since the tool uses GitPython, atm it throws an ImportError and tells the user to run pip install GitPython. In my projects I prefer to have a requirements_dev.txt with all dev dependencies in it (doc/requirements.txt can be referenced so no duplication). Would that be an option for you as well? This would also open the option to automatically test if new releases of dependencies break the code, using pyup.

@s-weigand
Copy link
Contributor Author

Also pip supports -e . in requirement files, but I don't know since which version.

@mgeier
Copy link
Member

mgeier commented Jan 14, 2020

Thanks, this is great!

I'm wondering why master should appear anywhere in the script.
I wouldn't expect the master branch to have any significance.

I would expect this to work on whatever branch (it even works with uncommitted changes, which is great!), regardless where the master branch is.

It worked very nicely when I tried it yesterday, but I had the feeling that it might break when the master branch evolves. I've just merged two PRs, and now it indeed stopped working.

CONTRIBUTING.rst Outdated Show resolved Hide resolved
CONTRIBUTING.rst Outdated Show resolved Hide resolved
doc/index.rst Outdated Show resolved Hide resolved
self.update_theme_conf(branch_ref)
dest_dir = os.path.join(TEMP_BUILD_ROOT_DIR, branch_name)
build_args = [TEMP_DOC_DIR, dest_dir] + self.build_args
# the theme "guzzle" and "press" need
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the problem with those themes?

Is this something that can be solved upstream?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

guzzle did trigger a rebuild instead of just using the cache and press did throw an exception.
I didn't look into why it does behave like that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why guzzle wants to re-build is because it wants to add itself to the extensions variable.

I've removed this in the theme branch, now it should work without re-build.

This is now disabling special table formatting (and probably some other stuff) and building sitemap.xml.
I've created a PR to fix this: guzzle/guzzle_sphinx_theme#46.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created schettino72/sphinx_press_theme#36 for the press problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just removed the press-theme branch. Now no more special cases should be necessary.

@s-weigand
Copy link
Contributor Author

The tool assumes that the theme branches only differ from master, in the theme relevant lines of doc/conf.py and extracts the changes compared to master. So if the theme branches and master diverge (the theme branches need a rebase), this can cause the building to break.

I'm wondering why master should appear anywhere in the script.
I wouldn't expect the master branch to have any significance.

The significance of master is, that it is assumed as base branch for the theme branches.

I would expect this to work on whatever branch (it even works with uncommitted changes, which is great!), regardless where the master branch is.

Note that the diffing only concerns the branches as they are on github, not the local copy. So as long as there are no changes to doc/conf.py on github all works fine, if there are, you need to run your git_rebase_theme_branches.sh and push the changes, or the tool might break (maybe that should be automated?).
Since you take care of the state of branches on github, I think this is more reliable than leaving it to users and their local copies with possible merge conflicts and so on.

It worked very nicely when I tried it yesterday, but I had the feeling that it might break when the master branch evolves. I've just merged two PRs, and now it indeed stopped working.

Due to the changes in master at the end of the generated conf.py, there now is a

    'sphinx_copybutton',

which causes the syntax error.

@mgeier
Copy link
Member

mgeier commented Jan 15, 2020

The tool assumes that the theme branches only differ from master

I don't think that's good.

you need to run your git_rebase_theme_branches.sh and push the changes [...] (maybe that should be automated?)

I don't really want to run this after each commit to master.
On the one hand, this is unnecessary load for RTD, on the other hand I don't really want the theme branches to reflect the master version. I would rather have them close to the "stable" release.

Typically, I only run this script after each release.
Other than that, I only run it when there is a specific reason, like e.g. checking whether the "copy button" extension works on all themes.
But with your new tool, that won't even be necessary anymore!

I think a better assumption (if necessary) would be that each theme branch only contains one diverging commit.
This way, you could probably use cherry-pick to get the theme diffs?
Or you could use the Git commit range I'm using in the docs for the "usage" link: rtd-theme^...rtd-theme (which also uses that assumption).

from, theme branch differs to master, to theme branch has only one commit differing to its base
@s-weigand
Copy link
Contributor Author

Typically, I only run this script after each release.

Ok, my assumption was that you run it each time conf.py changes.

The 'theme branch has only one commit' assumption, sounds solid to me.

IMHO an assumption is needed to get the diff, since actions like merging to the current conf.py might result in conflics that need manual reloving, which I don't see how to implement w/o loosing all the convenince the tool is for.

@mgeier
Copy link
Member

mgeier commented Jan 16, 2020

Thanks for the update!

This works really well, and it is surprisingly fast!

However, I have the feeling that it is much more complicated than necessary. The script has 358 lines, which seems a lot to me. That's like 20% of src/nbsphinx.py, isn't it?

I didn't completely absorb all the details (yet), but there are already a few things that I'm wondering about:

  • is it really necessary to fetch the repo from Github each time?

    • it's quite fast, so that's not the problem, but what if you don't have an internet connection? I think this whole thing should be able to work without internet connection.
  • is it really necessary to mention specific files in the code and create (somewhat) manual diffs for them?

    • it seems to me that Git should be able to do everything, I don't understand why there is so much "manual" work involved.
  • is it really necessary to manually copy and remove stuff, when Git can do this automatically with git worktree?

    • I guess an important part is to get the local uncommitted (and added but not-yet-committed) changes to the copy of the repo, but couldn't this be done with git stash?

I would think that something like this is simpler:

  1. check for upstream remote, raise an error if not available (this can be added and updated manually by the user, it doesn't have to be part of the script). The remote name could be a command line argument (defaulting to upstream or origin or whatever)
  2. if temporary copy exists, run git worktree remove (we could try to re-use the existing copy, but I guess we could avoid a lot of complexity this way)
  3. run git worktree to create temporary copy
  4. use git stash to transfer uncommitted changes to the temporary copy (and commit them there in a temporary commit (which will be the base commit for the following rebasing)
  5. for all theme branches:
    1. rebase/cherry-pick the relevant commit
    2. run sphinx

Wouldn't this be simpler?
Or am I missing something?

I'm not sure if Sphinx would re-execute everything if the worktree is removed and re-created ... this could become a problem ...

CONTRIBUTING.rst Outdated Show resolved Hide resolved
CONTRIBUTING.rst Outdated Show resolved Hide resolved
CONTRIBUTING.rst Outdated Show resolved Hide resolved
CONTRIBUTING.rst Outdated Show resolved Hide resolved
@mgeier
Copy link
Member

mgeier commented Jan 16, 2020

I played around a bit with git worktree and it turns out that Sphinx re-builds everything when deleting and re-creating a worktree.

But now I understand git worktree a bit better and I know that it's not necessary to remove the worktree in the first place.

I think this should work:

  1. check for upstream remote, raise an error if not available
  2. if worktree doesn't exist, create it: git worktree add my-worktree my-working-branch (my-working-branch must exist already)
  3. get current commit hash (and remember it): git rev-parse HEAD
  4. run git stash create (and remember the output)
  5. cd into my-worktree
  6. reset (--hard) my-working-branch to appropriate commit
  7. git merge --ff-only <stash-hash>
  8. remember current hash as base commit
  9. for each theme branch:
    1. git cherry-pick theme commit
    2. run sphinx
    3. reset (--hard) to base commit

@s-weigand
Copy link
Contributor Author

Since this is my first time using GitPython I didn't know how to use the diff function properly, which is why I used ndiff. I refactored the code so all the diffing is done by GitPython.

I also made fetching optional if the repo exists in the temp folder, so it also works offline.

As for git worktree, it looks like it isn't supported by GitPython.

Concerning the size of the tool:

If you remove all docstrings, it is only 236 lines, with 55 lines only for the cli.

Concerning mentioning specific files in the code :

  • To create one big dev_utils/requirements_themes.txt this is definitely needed. And IMHO installing all dependencies in your venv once, is cleaner than i.e. calling pip each theme build to ensure that all dependencies are installed.
  • Since all theme relevant changes are in doc/conf.py, I personally also don't see a problem having this in the tools source.

Concerning manual copying:

IMHO, this is a more straightforward approach, than stashing, unstashing, resetting hard and possibly messing up git on keyboard interrupt. If I understood it right, the git worktree approach would require to work with the git config in .git, while my approach works with dev_utils/tmp/.git which eliminates the possibility to break .git since it isn't changed at any point. Even if something gets messed up, in my approach, you could just delete dev_utils/tmp and be done without any consequences.

@mgeier
Copy link
Member

mgeier commented Jan 20, 2020

Thanks for the update. It's already a bit simpler, but I think we can go much further!

I also made fetching optional if the repo exists in the temp folder, so it also works offline.

OK, that's good. But we could even make this "out of scope" for the script.

Isn't it the typical case that a collaborator has an "upstream" remote already set up?

I don't think we need a script to make the necessary setup, I think it would be better to simply add the required Git commands to the developer docs.

As for git worktree, it looks like it isn't supported by GitPython.

Are you sure?

I tried this:

repo.git.branch('my-branch')
repo.git.worktree('add', 'my-worktree', 'my-branch')

... and it seems to work.

Concerning the size of the tool:

If you remove all docstrings, it is only 236 lines, with 55 lines only for the cli.

That's still a lot.

I still didn't read the whole thing, but I guess it would take hours to fully understand it.

I would prefer something that only takes a few minutes to understand (assuming good knowledge of Git).

Concerning mentioning specific files in the code :

To create one big dev_utils/requirements_themes.txt this is definitely needed.

But probably that feature isn't needed at all?

It sure is a fun exercise to create a script for that, but I'm not sure if I want to maintain such a script.

What about just adding the list of packages to the docs and being done with it?

Sure, then we have to change the docs whenever we add a theme branch, but that's probably less maintenance burden.

And IMHO installing all dependencies in your venv once, is cleaner than i.e. calling pip each theme build to ensure that all dependencies are installed.

I agree.
But that doesn't mean we need a script to create the list of dependencies.

What about removing this feature for now and if you really think it would be great to have it, adding it in a separate PR at a later time?

In general, I think it's a good idea to start small and add non-essential features later.

Since all theme relevant changes are in doc/conf.py, I personally also don't see a problem having this in the tools source.

I don't really see a "problem" either.
But it's a code smell.

All necessary information should be in the theme commits, including which files have to be modified.

I haven't yet read all the code (even less understood it), so I don't really know if it's good to have this hard-coded or not.
At this point, I can only say that I have the feeling that it would be better not to mention concrete file names ... but it's quite likely that I'm wrong!

Concerning manual copying:

IMHO, this is a more straightforward approach, than stashing, unstashing,

I'm not talking about stashing and unstashing.
That would be brittle indeed.

But git stash create doesn't modify the repo in any way that's immediately visible.

I chose this exact command for exactly this reason.

resetting hard and possibly messing up git on keyboard interrupt.

I don't believe that any of the commands I mentioned above can be messed up. The only thing that can be messed up is the working branch. Only one branch is ever created (my-working-branch) and that's all that will ever be visible under "normal" inspection (git status, git log, ...).

But even that can probably be avoided by always working with a "detached head".

This command would not need a named branch:

repo.git.worktree('add', 'my-other-worktree', '--detach')

(I don't know if that's the proper way to add the --detach flag)

If I understood it right, the git worktree approach would require to work with the git config in .git, while my approach works with dev_utils/tmp/.git which eliminates the possibility to break .git since it isn't changed at any point. Even if something gets messed up, in my approach, you could just delete dev_utils/tmp and be done without any consequences.

True, the possibility of messing up .git exists.
But if you use only operations that don't change the repo, it shouldn't be a problem.

And if the worktree is broken, you can remove it with git worktree remove.

@s-weigand
Copy link
Contributor Author

Guess I didn't fully understand the whole git worktree workflow (kind of a git noob here).

Isn't it the typical case that a collaborator has an "upstream" remote already set up?

I personally only set up "upstream" for a forked repo, when I contribute a second time and need to update the main branch, the PR is based on.
So IMHO assuming that all users have set it up, is a less stable assumption, than assuming that each theme branch is only one commit, with all necessary changes, ahead of of its base branch.
Especially since you have no control over the users setup, besides a contributing guide in the docs, but you have full control about how the branches on upstream behave.

repo.git.branch('my-branch')
repo.git.worktree('add', 'my-worktree', 'my-branch')

Good to know, when I searched GitPython API ref, I mostly found things concerning exceptions. So I'm not sure if this is an intended feature or a side effect of how they implemented the git functionality?

But probably that feature isn't needed at all?

Since you need the theme dependency in doc/requirements.txt so the theme branch will build on RTD anyway, IMHO autogenerating dev_utils/requirements_themes.txt is maintenance free, if pip doesn't drastically change the way requirement files are written.

But it's a code smell.

IMHO it would be a code smell if it was a general purpose tool, with none configureable parameters, but this tool is already very specific. Only considering branches that end in -theme would be a code smell too, wouldn't it?

I should read up on how to use git worktree first, to understand all its pros and cons. But in the end, I think that some assumptions are needed for both approaches.

P.S.: I opened an issue for the broken link from the readthedocs-docs, which breaks circleci readthedocs/readthedocs.org#6541

@mgeier
Copy link
Member

mgeier commented Jan 21, 2020

Guess I didn't fully understand the whole git worktree workflow

TBH, I've never used git worktree either.

I've just played around a bit in the context of this PR and it is really nice!

You should manually try the commands I mentioned in #379 (comment), this should clear up things.

And as an improvement, you can also try it in "detached" mode, without ever creating a new branch.

Isn't it the typical case that a collaborator has an "upstream" remote already set up?

I personally only set up "upstream" for a forked repo, when I contribute a second time and need to update the main branch, the PR is based on.

OK, that sounds reasonable.
I normally clone the main repo and then add my fork, but I think both approaches are perfectly fine.

Either way, at some point you will have an "upstream" remote, but the name might not be the same for everyone.
I would typically have origin as "upstream" remote, others might use the name upstream or something completely different.
What name are you using?

So IMHO assuming that all users have set it up, is a less stable assumption, than assuming that each theme branch is only one commit, with all necessary changes, ahead of of its base branch.

I agree.
But those two assumptions are completely unrelated, aren't they?

Especially since you have no control over the users setup, besides a contributing guide in the docs, but you have full control about how the branches on upstream behave.

That's right.

I still think we can get a meaningful behavior with very few assumptions.
That's my suggestion:

  • provide a command line option --upstream with a default value of upstream
    • this could probably be called --theme-remote or something, because it doesn't necessarily have to be the "official" repo. You can also use your own repo with your own *-theme branches. I still think that upstream would be a good default.
  • if the "upstream" remote doesn't exist, raise an error and exit
  • if there are no *-theme branches, raise an error and exit

This way, if I would run the script right now, I would get an error because upstream doesn't exist in my case.
To solve this, I could either create an upstream remote or I could use --theme-remote=origin.

It would be the responsibility of the user (at least for now) to update the remote whenever needed.

repo.git.branch('my-branch')
repo.git.worktree('add', 'my-worktree', 'my-branch')

Good to know, when I searched GitPython API ref, I mostly found things concerning exceptions. So I'm not sure if this is an intended feature or a side effect of how they implemented the git functionality?

I think that's the wrong question.
They seem to have implemented the thing in a way that they don't have to explicitly mention every "intended feature".

I can use:

import git
g = git.Git()
g.nonsense('whatever')

... and this will simply be converted to a call to the git program.
And it will fail with an error:

...
  cmdline: git nonsense whatever
  stderr: 'git: 'nonsense' is not a git command. See 'git --help'.'

So it looks like you can use anything you want as long as git knows it, GitPython doesn't have to know it at all.

But GitPython definitely knows something about worktrees, because the word appears in their code.

But probably that feature isn't needed at all?

Since you need the theme dependency in doc/requirements.txt so the theme branch will build on RTD anyway, IMHO autogenerating dev_utils/requirements_themes.txt is maintenance free, if pip doesn't drastically change the way requirement files are written.

Nothing is maintenance free.

The mere existence of the code makes it harder to understand the whole script.

We can talk about adding such a feature later, but I wouldn't add it in the beginning.

But it's a code smell.

IMHO it would be a code smell if it was a general purpose tool, with none configureable parameters, but this tool is already very specific. Only considering branches that end in -theme would be a code smell too, wouldn't it?

Yes, sure.

A code smell doesn't necessarily mean we should change it, just that we should think about it.

I think the *-theme assumption makes the code simpler, while using explicit file names makes it more complicated!

Removing hard-coded file names is a good thing, as long as it doesn't make anything else worse.

Removing the *-theme assumption would also be generally a good thing, but I think it would make the code more complicated and harder to understand and we wouldn't really gain anything.

I should read up on how to use git worktree first, to understand all its pros and cons. But in the end, I think that some assumptions are needed for both approaches.

Sure, some assumptions are needed if we want to keep it simple.
I would just like to remove the unneeded and possibly harmful assumptions.

P.S.: I opened an issue for the broken link from the readthedocs-docs, which breaks circleci readthedocs/readthedocs.org#6541

Cool. I've also seen the test failure and have created #383, which changes the broken link to https://docs.readthedocs.io/en/latest/config-file/.

@mgeier
Copy link
Member

mgeier commented Jan 21, 2020

I had another idea how we could reduce the number of assumptions regarding remotes.

What about adding this condition?

  • when no remote name is explicitly given, iterate over all available remotes and check if any is pointing to the official spatialaudio/nbsphinx URL. If one is found, use that (ignoring the actual name of the remote). If none is found, raise an error and exit.

This would not require a decision on a remote name like origin or upstream and it should work in both of our cases out-of-the-box.

This way we wouldn't even need the --theme-remote (or --upstream) flag (but we could still add it later if it turns out to be useful).

@s-weigand
Copy link
Contributor Author

I did play around a bit with git worktree and your proposed workflow.

  • check for upstream remote, raise an error if not available
  • if worktree doesn't exist, create it: git worktree add my-worktree my-working-branch (my-working-branch must exist already)

If I didn't miss something, you can't have the same branch checked out multiple times, which means that you would need to create a temporary branch.

$ git checkout my-working-branch
fatal: 'my-working-branch' is already checked out at 'my-worktree'

If the user changes the branch, the branch of the worktree would need to be changed (or a temp one newly created) too, which adds an extra checks.

  • get current commit hash (and remember it): git rev-parse HEAD

I had to first run git add . so git stash create would actually create a stash.

  • run git stash create (and remember the output)

  • cd into my-worktree

  • reset (--hard) my-working-branch to appropriate commit

  • git merge --ff-only <stash-hash>

  • remember current hash as base commit

  • for each theme branch:

    1. git cherry-pick theme commit
    2. run sphinx
    3. reset (--hard) to base commit

The big problem I see is, how merge conflicts are handled when cherrypicking.
Which is why I only extracted the diff in the first place.

To ensure that the theme builds and doesn't hang at a conflict, we could use:

$ git cherry-pick <theme commit hash> --strategy=recursive -X theirs

But that could potentially overwrite important changes. Is there something like an append merge strategy?

To handle the 'one commit assumption' we could use git cherry or?

$ git cherry -v upstream/master upstream/rtd-theme

I don't think that this is the proper usecase for git worktree, IMHO just copying the whole repo in a temp folder would be more elegant.

And sure we could add a --theme-remote (or --upstream) option and check for the spatialaudio/nbsphinx url.

@mgeier
Copy link
Member

mgeier commented Jan 22, 2020

If I didn't miss something, you can't have the same branch checked out multiple times, which means that you would need to create a temporary branch.

$ git checkout my-working-branch
fatal: 'my-working-branch' is already checked out at 'my-worktree'

I didn't know that you can't check out the same branch on different workdirs, that's good to know!

But I was assuming anyway that the script uses one branch (my-working-branch in my example) exclusively.
I was thinking about using a very long and complicated branch name that a user wouldn't dare to touch.

If the user changes the branch, the branch of the worktree would need to be changed (or a temp one newly created) too, which adds an extra checks.

Of course, the user would not be allowed to use this branch in any way.

But you are right, that's an ugly situation.

That's why I was suggesting to use "detached head" instead.

If the script only uses "detached head" in its worktree, there are no limits to what the user can do in the main worktree.

I don't think a temporary branch has ever to be created (but of course I didn't fully implement anything, so I don't really know for sure).

I had to first run git add . so git stash create would actually create a stash.

This would of course be bad, because it changes the state of the repo without the user's consent.

It worked for me for untracked changes of existing files.
Are you talking about newly created files?

According to the manpage, there is the --include-untracked flag, but it doesn't seem to work with git stash create.

Probably we have to use git stash push --include-untracked followed by git stash pop (and somehow get the stash-hash in between).

But this didn't work either when I tried it just now.

I don't know what we can do here.
Probably there is a different way than git stash to copy the untracked files to the worktree?

Alternatively, we could just ignore the problem. At some point, we'll have to add new files to the index anyway, so I think this might not even be a big deal?

The big problem I see is, how merge conflicts are handled when cherrypicking.
Which is why I only extracted the diff in the first place.

To ensure that the theme builds and doesn't hang at a conflict,

I think we should raise an error at a conflict.
Whatever conflict can be resolved manually (either by changing the current working directory or by changing the remote branch).

I guess it would be useful to still continue with the remaining branches, but in the end the script should fail and print an error message containing the names of the broken branches.

we could use:

$ git cherry-pick <theme commit hash> --strategy=recursive -X theirs

But that could potentially overwrite important changes. Is there something like an append merge strategy?

I don't know.
But I don't think there is anything the script needs to do, it should just give up.

To handle the 'one commit assumption' we could use git cherry or?

$ git cherry -v upstream/master upstream/rtd-theme

I didn't know the git cherry command and I don't fully understand its manpage.

But we already know the relevant commit by name (it's the name of the theme branch), so it doesn't sound like we need to "find" any commits.

What would git cherry do for us?

I don't think that this is the proper usecase for git worktree, IMHO just copying the whole repo in a temp folder would be more elegant.

I think elegance is subjective.
If you think doing multiple brittle steps manually is more elegant than doing the same thing in one Git command (given that we use Git anyway), I have a different understanding of elegance.

But I don't insist at all on using git worktree.
I just want the thing to be less complex.
I thought that using git worktree would be one way to achieve that. I might be wrong.

And sure we could add a --theme-remote (or --upstream) option and check for the spatialaudio/nbsphinx url.

OK, but probably we should postpone this feature in the interest of simplicity.

@s-weigand
Copy link
Contributor Author

Are you talking about newly created files?

Yap, just a quick and dirty touch foo.txt for testing.

What would git cherry do for us?

git cherry [-v] [<upstream> [<head> [<limit>]]]
git-cherry - Find commits yet to be applied to upstream

So as I understand it, git-cherry looks up, which additional commits are in <head> that would need to be applied to <upstream>. In our case this is only one commit atm, but that assumption might not be nesessary with git-cherry.

$ git cherry -v upstream/master upstream/rtd-theme
+ 9081d20775cbe4a08c78f0e26d9fa87fd983b566 DOC: Switch to readthedocs theme

I think elegance is subjective.

That is definitely true, especially since we have a great example of it right here 😛.

At the end of the day, the job of this tool is to allow contributers, to quickly build all example themes and check if they look fine.

And IMHO:
copy needed doc files -> apply changes for the theme -> build theme docs
Is pretty straight forward and simpler to understand, than spending hours of reading the docs of git and testing to see what is going on (maybe this is just because I'm a noob with git).
Thus doing a bunch of git actions for what could just be a copy, feels overengeneered to me (subjectiv again). Personally, I also don't like touching the users .git/require them to have set a remote other than their forks origin.

What do you think about copying the whole repo and using the ignore arg to i.e. not copy .git, which for me is 5 times the size of the repo itself? This might be even more straight forward (no need to look out for README.rst and CONTRIBUTING.rst).

I think we should raise an error at a conflict.

Since the conflict (most likely) arises from 'which line was changed' and not from an 'actual problem' (i.e. building the docs would break), I don't think this should raise an error.

As a user, my first thought on such an error would be:
"I just want to work on this, how can I get it to work? Ah, when I just append my changes to conf.py, instead of adding it to the proper block, there won't be a conflict. Great, now I can keep working on the actual problem I'm trying to solve.".
But maybe thats just me again.

@mgeier
Copy link
Member

mgeier commented Jan 22, 2020

So as I understand it, git-cherry looks up, which additional commits are in <head> that would need to be applied to <upstream>. In our case this is only one commit atm, but that assumption might not be nesessary with git-cherry.

OK, I see.

I'm fine with the one-commit-assumption, since it seems to reduce complexity quite a bit.

If we want to give up this assumption, we can do it at a later point.

copy needed doc files -> apply changes for the theme -> build theme docs
Is pretty straight forward and simpler to understand,

Yes, the line above is indeed simple.
But if you need hundreds of lines of codes to implement it, it isn't simple anymore.

than spending hours of reading the docs of git and testing to see what is going on (maybe this is just because I'm a noob with git).

That's a good reason to learn more about Git, then!
I've already learned a ton of new stuff about Git just while discussing this PR, which is great!

I think the steps that I've suggested in #379 (comment) are also quite simple.

And I have the feeling that they can be implemented with less code. But I didn't do it, so I don't really know.

Thus doing a bunch of git actions for what could just be a copy, feels overengeneered to me (subjectiv again).

Well it only depends on how much the code can be simplified by using more powerful Git commands.
If it cannot be simplified significantly, it's probably not worth it.

If it's really just a copy, I'm fine with it, but your current proposal does much more than a simple copy, right?

Personally, I also don't like touching the users .git

I think that's a valid concern. I guess whether that's worth it also depends on how much simplification it brings. And how much risk there is of messing it up.

[...] require them to have set a remote other than their forks origin.

I don't see a problem with that.

On the contrary, I think it's better to explicitly require them to have a certain remote instead of silently downloading it behind their backs.

And it is only the matter of copying one command from the docs (or from the error message) and running it.

What do you think about copying the whole repo and using the ignore arg to i.e. not copy .git, which for me is 5 times the size of the repo itself? This might be even more straight forward (no need to look out for README.rst and CONTRIBUTING.rst).

If the resulting code is simple enough, I'm totally OK with it.

But to me it sounds quite complicated to copy the repo (without the .git directory), creating a new (unrelated) git repository from the just copied data and then applying some plain diffs (because the connection to the original repo is lost).

This is exactly what git worktree does, just more reliable, simpler and faster.

I think we should raise an error at a conflict.

Since the conflict (most likely) arises from 'which line was changed' and not from an 'actual problem' (i.e. building the docs would break), I don't think this should raise an error.

I would rather raise an error than silently providing a probably broken result.
But that's a detail, we can revisit this once we get there.

As a user, my first thought on such an error would be:
"I just want to work on this, how can I get it to work? Ah, when I just append my changes to conf.py, instead of adding it to the proper block, there won't be a conflict. Great, now I can keep working on the actual problem I'm trying to solve.".

If the theme branch doesn't apply cleanly that's an error that should be fixed. And I'd prefer if the error surfaces earlier rather than later.

But in reality, this will happen very rarely.
It will only happen if you change the first line of the dependencies (if you add new dependencies to the top instead the bottom) or if you make changes close to the html_theme settings in conf.py.

@mgeier
Copy link
Member

mgeier commented Jan 22, 2020

I thought that instead of always talking hypothetically, I'd quickly try if the "git worktree" approach even works.
To be able to play around with it, I created #387.

@mgeier
Copy link
Member

mgeier commented Jan 23, 2020

I wanted to compare my experiment from #387 with your script, but it doesn't seem to work anymore!
Does it still work for you?

I've made some changes to the theme branches in the meantime, but I don't think anything too wild.

The error is in get_theme_requirements(), but even if I comment that out, the error comes again when creating the diff for the first theme.

@s-weigand
Copy link
Contributor Author

Hmmmm it works fine for me. Maybe try python dev_utils/theme_builder.py --fetch or delete dev_utils/tmp?
Could you also post the traceback?

@mgeier
Copy link
Member

mgeier commented Jan 23, 2020

Sure.
I've removed tmp and using --fetch does fail with the same error.

$ python3 dev_utils/theme_builder.py 
Building new 'dev_utils/requirements_themes.txt'.
Make sure that all requirements are installed by running:
'pip install -r dev_utils/requirements_themes.txt' from the repo root.
Traceback (most recent call last):
  File "dev_utils/theme_builder.py", line 369, in <module>
    cli()
  File "dev_utils/theme_builder.py", line 359, in cli
    theme_builder.get_theme_requirements()
  File "dev_utils/theme_builder.py", line 170, in get_theme_requirements
    self.get_diff_string(theme_branche_ref, "doc/requirements.txt")
  File "dev_utils/theme_builder.py", line 147, in get_diff_string
    output_indicator_new="",
  File "/home/mg/.local/lib/python3.7/site-packages/git/diff.py", line 152, in diff
    index = diff_method(self.repo, proc)
  File "/home/mg/.local/lib/python3.7/site-packages/git/diff.py", line 428, in _index_from_patch_format
    handle_process_output(proc, text.append, None, finalize_process, decode_streams=False)
  File "/home/mg/.local/lib/python3.7/site-packages/git/cmd.py", line 124, in handle_process_output
    return finalizer(process)
  File "/home/mg/.local/lib/python3.7/site-packages/git/util.py", line 332, in finalize_process
    proc.wait(**kwargs)
  File "/home/mg/.local/lib/python3.7/site-packages/git/cmd.py", line 414, in wait
    raise GitCommandError(self.args, status, errstr)
git.exc.GitCommandError: Cmd('git') failed due to: exit code(129)
  cmdline: git diff-tree --output-indicator-new= --unified=0 aad12aa16c379300ba2cc99a9efcb432a1aaf2e6 00ac96a45f4174cde5424f3e2172240ec0f7fab7 -r --abbrev=40 --full-index -M -p --no-color -- doc/requirements.txt

My Git checkout is completely clean and I have no local changes.

If I comment out the requirements thingy, I get a very similar error:

$ python3 dev_utils/theme_builder.py 



################################################################################
#                         BUILDING BRANCH: AGOGO-THEME                         #
################################################################################
Traceback (most recent call last):
  File "dev_utils/theme_builder.py", line 369, in <module>
    cli()
  File "dev_utils/theme_builder.py", line 365, in cli
    theme_builder.build_themes(args.themes)
  File "dev_utils/theme_builder.py", line 305, in build_themes
    self.build_theme(branch_ref)
  File "dev_utils/theme_builder.py", line 226, in build_theme
    self.update_theme_conf(branch_ref)
  File "dev_utils/theme_builder.py", line 200, in update_theme_conf
    conf_diff = self.get_diff_string(branch_ref, "doc/conf.py")
  File "dev_utils/theme_builder.py", line 147, in get_diff_string
    output_indicator_new="",
  File "/home/mg/.local/lib/python3.7/site-packages/git/diff.py", line 152, in diff
    index = diff_method(self.repo, proc)
  File "/home/mg/.local/lib/python3.7/site-packages/git/diff.py", line 428, in _index_from_patch_format
    handle_process_output(proc, text.append, None, finalize_process, decode_streams=False)
  File "/home/mg/.local/lib/python3.7/site-packages/git/cmd.py", line 124, in handle_process_output
    return finalizer(process)
  File "/home/mg/.local/lib/python3.7/site-packages/git/util.py", line 332, in finalize_process
    proc.wait(**kwargs)
  File "/home/mg/.local/lib/python3.7/site-packages/git/cmd.py", line 414, in wait
    raise GitCommandError(self.args, status, errstr)
git.exc.GitCommandError: Cmd('git') failed due to: exit code(129)
  cmdline: git diff-tree --output-indicator-new= --unified=0 aad12aa16c379300ba2cc99a9efcb432a1aaf2e6 00ac96a45f4174cde5424f3e2172240ec0f7fab7 -r --abbrev=40 --full-index -M -p --no-color -- doc/conf.py

@mgeier
Copy link
Member

mgeier commented Jan 23, 2020

Did you update your upstream remote?
As I said, I've made some changes today.

@s-weigand
Copy link
Contributor Author

Very strange.
And my code creates it's own repo and sets the upstream if you delete tmp.
I use GitKraken, which fetches remotes each minute, so that should be fine.

Does the git command itself work?

$ git diff-tree --output-indicator-new= --unified=0 aad12aa16c379300ba2cc99a9efcb432a1aaf2e6 00ac96a45f4174cde5424f3e2172240ec0f 7fab7 -r --abbrev=40 --full-index -M -p --no-color -- doc/conf.py
diff --git a/doc/conf.py b/doc/conf.py
index 33fad3fdb94c4fe5561e985108e886af8710c58e..e2565fd6b4c19613e8be1580c6f4533e1c89277f 100644
--- a/doc/conf.py
+++ b/doc/conf.py
@@ -104,0 +105,6 @@ html_title = project + ' version ' + release
html_theme = 'agogo'
html_theme_options = {
    'navigation_with_keys': True,
    #'rightsidebar': False,
    #'sidebarwidth': 300,
}

@s-weigand
Copy link
Contributor Author

Ok I got the reason (tested on ubuntu 18.4).
The flag I used --output-indicator-new is new in git 2.22, so older versions of git fail, due to the unknown flag.

@mgeier
Copy link
Member

mgeier commented Jan 24, 2020

I'm using Git 2.25.0.

The problem seems to be that something is missing after --output-indicator-new=.

My error message is:

$ git diff-tree --output-indicator-new= --unified=0 aad12aa16c379300ba2cc99a9efcb432a1aaf2e6 00ac96a45f4174cde5424f3e2172240ec0f7fab7 -r --abbrev=40 --full-index -M -p --no-color -- doc/requirements.txt
error: output-indicator-new expects a character, got ''

@s-weigand
Copy link
Contributor Author

The intention was to strip the '+' char for added content, so it can just be appended.

Anyway looking at the availability for git>=2.22, I think this flag shouldn't be used at all.

I'm using Git 2.24.1

$ git version
git version 2.24.1.windows.2

@mgeier
Copy link
Member

mgeier commented Feb 15, 2020

Thanks for the fix!

I just compared this PR with #387 with regards to what happens if only a small change is made (or no change at all) and the script is executed again.

Both detect the change to the "environment" correctly and only re-parse the modified source files (avoiding notebook execution if not necessary).

However, this PR is much faster because it also avoids re-building the HTML files unnecessarily. It only rebuilds HTML files where the source files have changed.
I don't know why, but #387 always re-builds all HTML files, which is kinda quick, but still much slower than not doing it.

Any idea why this could be the case?

@s-weigand
Copy link
Contributor Author

After a quick test, looks like this is due to the change mtime of the files.
In my approach I just copy the doc folder, which preserves the mtime.
The worktree approach changes the mtime for all stashed and cherrypicked filechanges, to the current time.
So for me it looks like sphinx uses mtime to see if a file has changed, which while not perfect should be much faster than i.e. file hashes.

@mgeier
Copy link
Member

mgeier commented Feb 18, 2020

Thanks for checking this out!

Sphinx definitely uses mtime for some files (as far as I've seen at least templates, source files and dependencies, probably more?).
But which files are checked before creating the HTML files (if no sources have been changed)?

The only files that are typically updated are doc/requirements.txt and doc/conf.py.
I've tried using touch doc/conf.py and touch doc/requirements.txt, but this doesn't seem to cause a re-build of the HTML files!

So it must be something else?

If I use touch, no re-build happens, but if I manually call git reset --hard and git cherry-pick <branchname>, the HTML files are re-built.

@mgeier
Copy link
Member

mgeier commented Feb 18, 2020

Never mind, I found the problem!

There was still one option that depended on the Git hash: html_title

I just fixed this (e7cfb70) and now everything is fine. There is no unnecessary re-building of HTML files anymore!

@mgeier
Copy link
Member

mgeier commented Feb 18, 2020

I still owe you a few answers (from #379 (comment)):

  • Would you rather have the theme docs being build in doc/_build, than dev_util/_build ?

I think it would be best to rename dev_utils to something specific to this one tool, for example theme_comparison?
If we get more tools, they can also have their own directories.

Then we don't need a _build directory, but could directly have the theme-specific build directories in there, e.g. theme_comparison/alabaster.

  • Since the tool uses GitPython, atm it throws an ImportError and tells the user to run pip install GitPython. In my projects I prefer to have a requirements_dev.txt with all dev dependencies in it (doc/requirements.txt can be referenced so no duplication). Would that be an option for you as well?

Sure. We could e.g. have theme_comparison/requirements.txt.

And next to that, we could have the auto-generated theme_comparison/requirements_themes.txt

@mgeier
Copy link
Member

mgeier commented Apr 18, 2020

@s-weigand Are you planning to continue working on this?

In the meantime, I have added a few more features to #387.

@s-weigand
Copy link
Contributor Author

Sorry for the delayed answer, when I got more time, I will give it another try.

@mgeier
Copy link
Member

mgeier commented May 12, 2021

I have merged #387, which obsoletes this PR, but which wouldn't have been possible without the work in this PR.

Thanks @s-weigand!

@mgeier mgeier closed this May 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants