Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable multiple revisions for project repos (first attempt 2024) #4659

Draft
wants to merge 75 commits into
base: master
Choose a base branch
from

Conversation

marcodelapierre
Copy link
Contributor

@marcodelapierre marcodelapierre commented Jan 15, 2024

This PR adds support for handling local copies of multiple revisions of the same pipeline.

Key points:

  • Under NXF_ASSETS, each pipeline is now pulled as <org>/<repo>[:<revision>] ; :<revision> is only appended if the corresponding flag was used on CLI, -r/--revision
  • The key implementation change is adding the revision attribute to the AssetManager class, as both pipeline name and revision are now required to fully identify a pipeline
  • Support has been added and tested for all relevant CLI commands: run, pull, clone, drop, list, view, config, info, inspect, kuberun
  • Unit tests for AssetManagerTest have been updated

Caveats:

  • Jgit does not implement git worktree, so the original idea within Allow the concurrent run of multiple pipeline revisions #2870 could not be applied
  • depth = 1 (shallow clones) was investigated to reduce disk usage, but could not be implemented as it would not allow to checkout branches/tags/commits
  • In the current implementation, pulling without --revision and with --revision <default branch> create two duplicate pulls; this is not optimal, but with very limited known negative impact; [update] this only happens if the default branch is not declared in the manifest and differs from master
  • As a by-product, if a repo has a default branch different from master, pulling and running it is now possible without specifying the branch name explicitly

Closes #2870 .
Also indirectly addresses #3593

Copy link

netlify bot commented Jan 15, 2024

Deploy Preview for nextflow-docs-staging canceled.

Name Link
🔨 Latest commit 61bd4bf
🔍 Latest deploy log https://app.netlify.com/sites/nextflow-docs-staging/deploys/666ff7b4650e42000878ae7a

@marcodelapierre marcodelapierre self-assigned this Jan 15, 2024
@marcodelapierre marcodelapierre changed the title Multiple revisions 1st: AssetManager, Cmd Clone Pull Run Enable multiple revisions for SCMs Jan 15, 2024
@marcodelapierre marcodelapierre marked this pull request as draft January 15, 2024 12:39
@marcodelapierre marcodelapierre changed the title Enable multiple revisions for SCMs Enable multiple revisions for project repos Jan 15, 2024
@bentsherman
Copy link
Member

You could save disk space by cloning with depth = 0 when a revision is specified

@marcodelapierre
Copy link
Contributor Author

marcodelapierre commented Jan 16, 2024

You could save disk space by cloning with depth = 0 when a revision is specified

Thanks @bentsherman , great idea, will do, for nextflow pull but not for clone. Why not for any case, including no revision specified? - I just have to check that a subsequent nextflow pull still works (for the purpose of updating the local copy).

@bentsherman
Copy link
Member

yes I suppose we could just clone with depth = 0 in all cases, since no revision just defaults to master. the clone, pull, and run commands all provide a -deep option for this, so users can change it if they want

@marcodelapierre
Copy link
Contributor Author

Ah! unfortunately we cannot implement the deep functionality, as with a depth of 1 we cannot checkout to other branches/tags/commits.
This would be supported : git clone -b <branch/tag> --deep <deep> <repo> , but only works for branches and tags, not commits. Git still does not support a way to clone to a specific commit straight away.

Signed-off-by: Dr Marco Claudio De La Pierre <[email protected]>
Signed-off-by: Dr Marco Claudio De La Pierre <[email protected]>
Signed-off-by: Dr Marco Claudio De La Pierre <[email protected]>
Signed-off-by: Dr Marco Claudio De La Pierre <[email protected]>
Signed-off-by: Dr Marco Claudio De La Pierre <[email protected]>
… operation

Signed-off-by: Dr Marco Claudio De La Pierre <[email protected]>
…f "master"

Signed-off-by: Dr Marco Claudio De La Pierre <[email protected]>
@marcodelapierre marcodelapierre marked this pull request as ready for review January 18, 2024 08:11
Signed-off-by: Dr Marco Claudio De La Pierre <[email protected]>
Signed-off-by: Dr Marco Claudio De La Pierre <[email protected]>
Signed-off-by: Dr Marco Claudio De La Pierre <[email protected]>
@marcodelapierre
Copy link
Contributor Author

Good progress today.

Notes on revision dereferencing (working)
- key attributes are localPath and commitId
- do it at instantiation
- if cmdpull/cmdrun, use updateLocalBareRepo() to update the attributes
- also use updateLocalBareRepo() if revision cannot be found locally

        Let's simplify and minimise usage of the local bare
        0. get local gitconfig
        1. dereference revisions (branches!/tags?) to commits
        2. use commits in localpath
        3. use commits in CLI Cmds
        4. use commits for download()
        5. use local bare path just to dereference locally without need for remote repo, and for local gitconfig
        6. also use local bare for download() of revisions/commits

@marcodelapierre
Copy link
Contributor Author

marcodelapierre commented May 27, 2024

Outstanding bits right now:

  1. In the current implementation, the AssetManager has been changed with an extra revision argument.

    • I know this is not ideal because it changes the class API e.g. in relation to Seqera Platform.
    • however, if we get rid of the extra argument, we will be forcibly in a situation where, at the time an Asset Manager is instantiated, we don't know the corresponding localPath of the pipeline, because the latter is defined by the commit ID, and hence the requested revision. This will still require some change of how Platform interacts with the AssetManager class.
  2. Updating of bare repo ideally would be specific to the requested revision, however Jgit by default sets it up to update ALL branches&tags

    • the rationale here is that we need a mechanism to update the local bare repo with respect to the remote. On one hand, this should be controlled by some user action, so as to maintain a revision <> commit correspondence locally (for increased reproducibility). On the other hand, ideally the update is as transparent as possible: I would avoid an extra option in the CLI. Hence, updating every time there is a pull or run -latest could be an option.
    • need to test something like this: git.fetch().setRefSpecs("+refs/heads/<branch>:refs/heads/<branch>")
    • what if a commit ID is requested though? which branch should be updated?
  3. The wrong revision/commit is printed at run time, as in "Launching ..."

    • also, no notice when remote branch is updated (probably related)
    • how is RevisionInfo used in the run algorithm?
    • shall we revise what information we print? I think the 2 key pieces of information are the user-requested revision and the corresponding commit ID. We are storing repos by commit ID, but this implies that the same pulled repo can belong to multiple branches/tags.
  4. Brittle algorithm at the moment:

    • instantiation: decode revision to commit ID (so as to define localPath)
    • if pull/run: update local bare -> can result in mismatch of revision vs commit
  5. Update mechanics of list/info commands (yet to be done)

Point 3. might actually have been already there with the original implementation - have to check. Issues arise when multiple branches/tags all have the same commits in their history. It is amplified in my tests, as I am using dummy pipeline repos.

Signed-off-by: Dr Marco Claudio De La Pierre <[email protected]>
Signed-off-by: Dr Marco Claudio De La Pierre <[email protected]>
Signed-off-by: Dr Marco Claudio De La Pierre <[email protected]>
@marcodelapierre
Copy link
Contributor Author

On point 2. above, wondering whether we should leverage user-provided information more, when it comes to printing out information on repository in use. I.e.:

  • commit ID as de-referenced from user request
  • (if not null) revision as requested by the user
  • (eventually, also infer branch)

Signed-off-by: Dr Marco Claudio De La Pierre <[email protected]>
Signed-off-by: Dr Marco Claudio De La Pierre <[email protected]>
@marcodelapierre
Copy link
Contributor Author

As dicussed in meeting, on point 1., let me re-implement with the original class signature for AssetManager - no revision attribute. Commit ID and hence local path will be set when the revision is set.

Note on available code so far:

  1. current branch for this PR is add/mult_revisions - impl with local bare repo, and commit id in local path
  2. branch add/mult_revisions_bkp15apr - first impl, uses revision in local path
  3. branch add/mult_revisions_bkp29may - backup of 1.

Signed-off-by: Dr Marco Claudio De La Pierre <[email protected]>
Signed-off-by: Dr Marco Claudio De La Pierre <[email protected]>
Signed-off-by: Dr Marco Claudio De La Pierre <[email protected]>
@marcodelapierre marcodelapierre changed the title Enable multiple revisions for project repos Enable multiple revisions for project repos (first attempt 2024) Jun 24, 2024
@marcodelapierre
Copy link
Contributor Author

marcodelapierre commented Jun 24, 2024

Going forward, let's manage these through 2 separate new PRs:

  1. Enable multiple git revisions (Jan-Apr 2024) : temporary, using revision name in repo localPath #5087 : Jan-April implementation, using revision name in repo localPath
  2. Enable multiple git revisions (Jun 2024) : final, using commit ID in repo localPath #5089 : June implementation, using commit ID in repo localPath

The first one is for the record and for reference, unlikely to go forward.

The second one is that to be considered for revision and merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants