-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatic workspace discovery #442
Conversation
34edd53
to
987d310
Compare
241d657
to
a03c060
Compare
Heads up reviewers: just noticed that I forgot to add a validation that produces an error and human readable message when multiple changesets with the same branch have been produced in the same repository. I'll add that, but that shouldn't stop anyone from reviewing this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me 🌟
defer rz.mu.Unlock() | ||
|
||
rz.references -= 1 | ||
if rz.references == 0 && rz.fetcher.deleteZips { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this mean when using 1 worker thread, there would be no caching at all? and can there be the case where the workers are utilized like the following
[repo A:/path1]
[repo B]
[repo C]
[repo D]
=>
[repo A:/path2]
[repo B]
[repo C]
[repo D]
where the worker would close the zip for repo A at path1, and then need to refetch for path2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch. You're right. I'll address in follow-up PR.
40c54da
to
358684c
Compare
There are two things that I need to do:
I'll address those in follow-up PRs to make it easier for review. |
This is a follow-up to #442 and ensures that changeset specs are not getting silently lost by validating that multiple changeset specs in the same repository have different branches. I decided to make this a separate step _after_ the execution of the steps so that users can leverage the cache. That allows them to change the campaign spec and then rerun the command after they get this error, vs. the execution being aborted after running into this error (if we'd do the check inside executor).
This is a follow-up to fix the issue discovered by @eseliger here: #442 (comment) Short version: the previous implementation would only avoid deleting an archive if there were *currently active tasks* holding references to it. If tasks that need the same archive would execute sequentially, though, the archive would be downloaded, deleted, downloaded again. This here is a fix for the issue by first marking all repository archives for later use and only once all marks have been turned into references and those references have been closed is the archive deleted.
* Check for branch duplicates after creating changeset specs This is a follow-up to #442 and ensures that changeset specs are not getting silently lost by validating that multiple changeset specs in the same repository have different branches. I decided to make this a separate step _after_ the execution of the steps so that users can leverage the cache. That allows them to change the campaign spec and then rerun the command after they get this error, vs. the execution being aborted after running into this error (if we'd do the check inside executor). * Fix naming in duplicateBranchesErr
* Implement dynamic workspace discovery * update schema and fix template helpers * Add changelog entry * Rename file * Change naming * Use strings.ReplaceAll
* Check for branch duplicates after creating changeset specs This is a follow-up to #442 and ensures that changeset specs are not getting silently lost by validating that multiple changeset specs in the same repository have different branches. I decided to make this a separate step _after_ the execution of the steps so that users can leverage the cache. That allows them to change the campaign spec and then rerun the command after they get this error, vs. the execution being aborted after running into this error (if we'd do the check inside executor). * Fix naming in duplicateBranchesErr
What?
This adds automatic workspace discovery to src-cli. It allows users to
steps
in those project folders, turning them into workspaces.How?
Users define workspaces like so:
That means: in every repository that starts with
github.com/sourcegraph/sourc
projects have ago.mod
at its root and those folders should be used asworkspaces
for the execution of campaign specsteps
.src-cli uses Sourcegraph search under the hood to search for the locations of the
rootAtLocationOf
file, which means it doesn't need to download the repository first and search the file system.workspaces
can also contain multiple definitions, matching different repositories (but a repository cannot be matched by multiple definitions):Since multiple workspaces per repository means that multiple changesets will be produced in a single repository, the
changesetTemplate.branch
needs to use templating to avoid name clashes.For that, users can access the template variable
steps.path
and use helper functions to generate a unique branch name per changeset. Example:(The
join_if
and thereplace
helpers are new.join_if
joins the given list of strings, but ignoring the blank strings.replace
is an alias forstrings.ReplaceAll
)Users can, of course, also user other ways to generate a unique branch name per changeset. With
outputs
, for example:Or, in combination:
Details & Edge Cases
on
and not matched by anworkspaces.in:
glob, thesteps
will be executed in its root folder.on
and matches aworkspace.in:
glob, but there are no workspaces in it that contain the file inrootAtLocationOf
then thesteps
won't be executed in the repository.Dependency
This requires the addition of
workspaces
to the campaign spec schema, which means it requires changes to the Sourcegraph server.The PR is here: https://github.com/sourcegraph/sourcegraph/pull/17757
What's not included
src-cli still downloads a complete archive of every matched repository, even if the steps should only be included in subdirectories. Only downloading archives of the workspace directories is something we should implement in the near future to make support for large monorepos better.
Full campaign spec to try this at home
There you go: