Skip to content

Conversation

@pshao25
Copy link
Member

@pshao25 pshao25 commented Mar 13, 2023

Current behavior: we do npm install for exactly same packages in all the cadl projects. Same npm install would do n times.

After this change: Same emitter and its dependencies would install only once at the root folder. Reduce time cost from O(n) to O(1).

@azure-sdk
Copy link
Collaborator

The following pipelines have been queued for testing:
java - template
java - template - tests
js - template
net - template
net - template - tests
python - template
python - template - tests
You can sign off on the approval gate to test the release stage of each pipeline.
See eng/common workflow

@pshao25 pshao25 requested a review from m-nash March 13, 2023 05:47
@weshaggard
Copy link
Member

@mikeharder is this a case where we should consider npm ci over npm install?

@pshao25
Copy link
Member Author

pshao25 commented Mar 14, 2023

@mikeharder is this a case where we should consider npm ci over npm install?

We shouldn't have any package-lock.json there. Could you put more details why we should consider npm ci?

@mikeharder
Copy link
Member

@pshao25: In general, any scenario where we are installing a JavaScript-based tool, we should be installing from a lockfile to ensure we are pinning the versions of all transitive dependencies. Otherwise, our tool can be broken at any time by a dependency publishing a new minor or patch version with a breaking change.

@pshao25
Copy link
Member Author

pshao25 commented Mar 14, 2023

@pshao25: In general, any scenario where we are installing a JavaScript-based tool, we should be installing from a lockfile to ensure we are pinning the versions of all transitive dependencies. Otherwise, our tool can be broken at any time by a dependency publishing a new minor or patch version with a breaking change.

In development scenarios, I totally agree.

But this script is for user scenarios. We published our emitter packages and this script could help customer install that package (download from public registry). In this case, we don't even pack the lockfile into that package. So IMO we should still use npm install.

@mikeharder
Copy link
Member

mikeharder commented Mar 14, 2023

@pshao25: In general, any scenario where we are installing a JavaScript-based tool, we should be installing from a lockfile to ensure we are pinning the versions of all transitive dependencies. Otherwise, our tool can be broken at any time by a dependency publishing a new minor or patch version with a breaking change.

In development scenarios, I totally agree.

But this script is for user scenarios. We published our emitter packages and this script could help customer install that package (download from public registry). In this case, we don't even pack the lockfile into that package. So IMO we should still use npm install.

I think we should take a deeper look at the customer scenario as well. It's important to understand the difference between JS libraries and JS tools.

Libraries are intended to be used in applications, where the application developer can (and should) use their own lockfile to pin all transitive dependencies. Libraries should float their dependencies to the widest range possible, for maximum interoperability with other libraries and to reduce the number of dependency versions that need to be installed.

Tools are intended to be run directly from the command line, and the user will not have their own lockfile. Tools should do as much as they can to ensure they are not broken by dependency updates, since tools want to guarantee that if users install a particular version of the tool, it will always work exactly the same regardless of what new dependency versions. This means tools either need to ship their own lockfile or bundle all their dependencies. The easiest way is to include an npm-shrinkwrap.json file in the package itself.

https://docs.npmjs.com/cli/v9/configuring-npm/npm-shrinkwrap-json

@pshao25
Copy link
Member Author

pshao25 commented Mar 14, 2023

@mikeharder Ah I got your point now. I believe our customers use it as a JS tool. @m-nash Is there any possibility that our customers would use emitter as JS package?

There is a quote from the link that discourage us from using shrinkwrap:

It's strongly discouraged for library authors to publish this file, since that would prevent end users from having control over transitive dependency updates.

@m-nash there are 2 solutions:

  1. We include an npm-shrinkwrap.json file in the package itself as Mike suggested.
  2. We could add a emitter-package-lock.json right near emitter-package.json, so that when we install emitter from sdk repo, we could lock the dependency version.

I think other language repos don't have above files. They might need to do the same thing.

@m-nash
Copy link
Member

m-nash commented Mar 14, 2023

I will defer to @mikeharder expertise here if you think our best approach is for all emitters to include a npm-shrinkwrap.json and @pshao25 you don't see any blocker reasons to not do that lets go with option 1.

@mikeharder
Copy link
Member

It's strongly discouraged for library authors to publish this file, since that would prevent end users from having control over transitive dependency updates.

If you want to publish something as both a library and a tool, you should publish two packages. For example, the tool package could be named foo, and the library package something like foo-lib. The tool package would depend on the library package and include an npm-shrinkwrap.json, and the library package would not include npm-shrinkwrap.json.

@mikeharder
Copy link
Member

@pshao25: I think this conversation has diverged from the original purpose of this PR, which I believe is to install the emitter only once. Do all packages in the repo use the same emitter version, and do we think this will be true going forward? Do we ever see a world where different packages use different versions of the emitter? Might this be necessary as the emitter evolves and we may not want to regenerate all packages at once?

If we are sure all packages will use the same emitter, should we just install it globally using npm install -g?

@lmazuel
Copy link
Member

lmazuel commented Mar 14, 2023

FWIW, I'm in totally in sync with @mikeharder in spirit here, our emitters are applications, not library. While I'm not JS expert, if option 1 is the right eng approach to pin dependencies of our emitters here then let's do that.

@lmazuel
Copy link
Member

lmazuel commented Mar 14, 2023

@pshao25 Pan Shao FTE: I think this conversation has diverged from the original purpose of this PR, which I believe is to install the emitter only once. Do all packages in the repo use the same emitter version, and do we think this will be true going forward? Do we ever see a world where different packages use different versions of the emitter? Might this be necessary as the emitter evolves and we may not want to regenerate all packages at once?

If we are sure all packages will use the same emitter, should we just install it globally using npm install -g?

I think it should be an unique version for the repo. It doesn't mean you're forced to regenerate each time there is a codegen release (it's interesting debate, but not here :)), but if you regenerate, you need to follow the current recommendation hardcoded in the repo. It's been the way SDK Automation has been doing it by the way before even it was named like that (when it was SwaggerToSdk in a previous life)

@pshao25
Copy link
Member Author

pshao25 commented Mar 15, 2023

@mikeharder I created a new issue to track npm-shrinkwrap.json change. For the PR itself,

Do all packages in the repo use the same emitter version, and do we think this will be true going forward?

Yes.

Do we ever see a world where different packages use different versions of the emitter?

No.

Might this be necessary as the emitter evolves and we may not want to regenerate all packages at once?

No, AFAIK.

If we are sure all packages will use the same emitter, should we just install it globally using npm install -g?

My concern is if we install emitter globally, the dependecies (typespec compiler and others) would also be updated when we run this script in our dev machine. I don't think it is a good practice, because developers might want to install specific version of typespec compiler. I really don't want to contaminate the global place, and really don't want the place where I'm fetching the emitter to be contaminated (I mean it is possible we could install globally at any other place, right?)

@mikeharder
Copy link
Member

@pshao25

My concern is if we install emitter globally, the dependecies (typespec compiler and others) would also be updated when we run this script in our dev machine.

Is this script intended for use both in pipelines and on dev machines? If so, I agree this adds a lot of complications. Installing things globally in pipelines is usually fine, as long as it works for the duration of the pipeline. But we do want to be more careful on dev machines.

I think it's fairly unusual for scripts under tools/eng/common to be used in both pipelines and dev machines.

@weshaggard: Are you aware of any other scripts under tools/eng/common used on dev machines?

@weshaggard
Copy link
Member

@weshaggard: Are you aware of any other scripts under tools/eng/common used on dev machines?

Yes there are other scripts under eng/common (e.g. prepare-release.ps1) and if we need these scripts to be in all the repos I think it should be under eng/common for either local or pipelines.

@weshaggard
Copy link
Member

For my earlier question about npm ci I agree while we don't have a lock file we cannot use that and I agree we should try to use shrinkwrap instead to help get these tools installed.

@pshao25
Copy link
Member Author

pshao25 commented Mar 16, 2023

Is this script intended for use both in pipelines and on dev machines?

Yes.

@mikeharder @weshaggard @benbp @lmazuel Thank you all for the suggestion. I will track the npm-shrinkwrap.json in another issue and notify other languages.

If there is no further concern, I would have this merged after "cadl rename" finished.

@m-nash
Copy link
Member

m-nash commented Mar 16, 2023

Definitely should not globally install it would mess with all other development / testing constantly.

@benbp
Copy link
Member

benbp commented Mar 23, 2023

Following up here with a response to your email @pshao25 regarding parallel npm installs powershell locking. I think it would be better to investigate ways to solve this via npm and the project structure rather than adding more script logic on top of npm. For example, could workspaces or even manual npm link commands work? Are the child directory package.json files different and if not, could we instead only keep a single package.json in the relevant parent directory (this way npm install would bubble up)?

@pshao25
Copy link
Member Author

pshao25 commented Mar 24, 2023

For example, could workspaces or even manual npm link commands work?

No. We will still face the same problem when two projects start to install packages in parellel. For each project (which are independent and don't know each other), it calls "TypeSpec-Project-Generate.ps1" to 1) install npm packages and 2) generate SDK by this package. The problem here is: the package is not installed in advance, it is installed when we call "TypeSpec-Project-Generate.ps1". Therefore, no matter we are mono repo or not, as long as we install the pacakge in the same place, we are facing the process race issue.

Are the child directory package.json files different

No, they are the same.

and if not, could we instead only keep a single package.json in the relevant parent directory (this way npm install would bubble up)?

Yes, that is exactly what we are doing in this PR. We try to install the package in the parent directory and hope it is only installed once. Then it comes to the problem we are facing "two processes might install together". And we want to hear the suggestion from you if it is OK to apply a file lock implemented in this PR.

@pshao25
Copy link
Member Author

pshao25 commented Mar 31, 2023

@m-nash @weshaggard @benbp I removed the lock and add an $env:SKIPINSTALL to control the installation. The extra steps we need to take is: each pipeline should add this variable and a step to install the package itself. Please let me know if this solution is OK to go.

$installedVersion = GetNpmPackageVersion $installedPath
Write-Host "Installed emitter version: $installedVersion"

$replacementVersion = GetEmitterDependencyVersion $replacementPackageJson
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not pass in the emitterName here instead of having the function called again?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which function do you think I'm calling twice? Here I want to get two versions of the emitter package. One is the already installed one under root folder, the other is the going-to-be installed one in emitter-package.json.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$emitterName = &$GetEmitterNameFn You are calling this on line 50 and then again inside of the GetEmitterDeependecyVersion function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see you setting emitterName anywhere. Also you will need to pass it as a parameter here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It set globally in Line 122 too.

@@ -36,17 +108,9 @@ function NpmInstallForProject([string]$workingDirectory) {
Remove-Item -Path "package-lock.json" -Force
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we also want to remove the places we are deleting the files here? I don't think we want to delete those unless we actually do an npm install.

Copy link
Member Author

@pshao25 pshao25 Apr 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to do that in case user installs emitter under the project folder. If we don't do that, node will not load the emitter installed under root. It will load the emitter installed under the project folder.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If files like package.json are present under this folder do we think that folks have it committed to the repo? If so then deleting it will cause a change in the files. I think we might actually want to warn people instead of blindly deleting these.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This place $workingDirectory is in the gitignore. It is actually {root}\sdk\anomalydetector\Azure.AI.AnomalyDetector\TempCadlFiles\AnomalyDetector for example.

The scenario I'm thinking is: before this change, there will be package.json and npm_modules in the $workingDirectory. Then users apply this change, and they should load packages from root folder. So I want to remove the package.json and npm_modules existing before.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is under the temp working directory why would there ever be a package.json and node_modules files under that directory?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think about these steps:

  1. Before this change, people run TypeSpec-Project-Generate.ps1. Then in $workingDirectory there are package.json and npm_modules.
  2. We get this change merged. package.json and npm_modules are still there in $workingDirectory.
  3. People run TypeSpec-Project-Generate.ps1 again, which will call typespec compile from npm_modules in $workingDirectory instead of from npm_modules in root folder. And that is a critical problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that is the problem we are worried about we should just have folks remove the entire temp working directory (i.e. git clean -xdf). I don't think we should try and special case this one upgrade scenario.

@azure-sdk
Copy link
Collaborator

The following pipelines have been queued for testing:







You can sign off on the approval gate to test the release stage of each pipeline.
See eng/common workflow

@azure-sdk
Copy link
Collaborator

The following pipelines have been queued for testing:







You can sign off on the approval gate to test the release stage of each pipeline.
See eng/common workflow


$hasPackageFile = Test-Path "package.json"

$npmInstallCommand = "npm install --prefix $root $emitterName@$replacementVersion --no-package-lock --omit=dev"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we start installing as a specific package like this we are making the assumption that there are no other packages in the emitter.package.json file. @raych1 @lmazuel is that assumption valid? If it is perhaps we should avoid any confusion by instead not having package.json file and simply track the package name and version instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the JS has an additional dependency currently, see https://github.com/Azure/azure-sdk-for-js/blob/main/eng/emitter-package.json. @MaryGao do you know if that extra dependency is needed or will it be included as a dependency of the emitter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no preference between these two. I did this change because @lmazuel mentioned above emitter is a tool not a library.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make an explicit decision to make this change as it has larger implications on other things like other dependencies. I don't want to change to this unless we have thought it through and switched to that strategy everywhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weshaggard , it's not aligned for all language emitters. I would suggest to use package.json for installation rather than install emitter individually. we could drop package.json to sdk folder to avoid conflicts with package.json in root folder.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to agree we should stick with a package.json file and I believe we can put it under the tempTypeSpec directory it just needs to live under the directory structure of wherever we put the TypeSpec project.

Copy link
Member

@raych1 raych1 Apr 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weshaggard , the purpose of @pshao25 is to avoid multiple times installation for emitter. Put package.json under tempTypeSpec directory doesn't work and it should be put under upper parent folder like sdk folder.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is fair we do need to put it in some shared location.

Copy link
Member

@m-nash m-nash May 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weshaggard the design was that we didn't have any additional dependencies in the emitter-package.json. This simulates exactly what a customer might do if they start with a language emitter. They will simply npm install our emitter and that should have ALL necessary dependencies defined in order to run. Since they are required to be defined in the emitters package.json there is no need to duplicate them in the emitter-package.json file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will accommodate the devDependency pinning and emitter-package-lock.json work we're doing. We would prefer to npm ci using emitter-package.json and emitter-package-lock.json rather than explicit version installation.

if ($LASTEXITCODE) { exit $LASTEXITCODE }

if ($hasPackageFile) {
git restore package.json
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we stick with the npm install $emitterName@$replacementVersion they I assume you don't need to worry about the package.json any longer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, npm install will still modify/create package.json. This is how npm install works:

  1. Look for a package.json from current folder up to outer folders until it finds one. Then it installs packages in that folder. In build agent, there are some folders up to repo root folder that contains package.json. So I add --prefix $root to make sure the installation will only happen in repo root folder.
  2. When it installs packages, it modifies the package.json to add that package if there is a package.json, otherwise it creates a new package.json.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that is the case then that is all the more reason for use to continue to use the npm install with the emitter.package.json file as we don't want package.json files to be created or modifled based on that install command.

@m-nash
Copy link
Member

m-nash commented May 30, 2023

@m-nash @weshaggard @benbp I removed the lock and add an $env:SKIPINSTALL to control the installation. The extra steps we need to take is: each pipeline should add this variable and a step to install the package itself. Please let me know if this solution is OK to go.

Can you detail out the dev experience when running things locally?

It sounds like we would need to do the following.

  • copy the emitter-package.json to the root of the lang repo
  • run npm install on this package.json
  • Set this env variable SKIPINSTALL
  • reload the env variables for all tools / shells that might use that env variable but cache these
  • run our sync / generate commands

This seems overly cumbersome for a local dev experience. Or are we assuming that locally you will never get this optimization to take effect?

Instead of an env variable have we considered an optional command line argument?

@weshaggard
Copy link
Member

@m-nash @weshaggard @benbp I removed the lock and add an $env:SKIPINSTALL to control the installation. The extra steps we need to take is: each pipeline should add this variable and a step to install the package itself. Please let me know if this solution is OK to go.

Can you detail out the dev experience when running things locally?

It sounds like we would need to do the following.

  • copy the emitter-package.json to the root of the lang repo
  • run npm install on this package.json
  • Set this env variable SKIPINSTALL
  • reload the env variables for all tools / shells that might use that env variable but cache these
  • run our sync / generate commands

This seems overly cumbersome for a local dev experience. Or are we assuming that locally you will never get this optimization to take effect?

Instead of an env variable have we considered an optional command line argument?

I don't think we are depending on any environment variables any longer. @pshao25 has switched to a model of simply looking to see if it is already installed and if so skip install otherwise install. I believe that should work just fine for the local dev scenarios as well.

@github-actions
Copy link

Hi @pshao25. Thank you for your interest in helping to improve the Azure SDK experience and for your contribution. We've noticed that there hasn't been recent engagement on this pull request. If this is still an active work stream, please let us know by pushing some changes or leaving a comment. Otherwise, we'll close this out in 7 days.

@github-actions github-actions bot added the no-recent-activity There has been no recent activity on this issue. label Nov 10, 2023
@github-actions
Copy link

Hi @pshao25. Thank you for your contribution. Since there hasn't been recent engagement, we're going to close this out. Feel free to respond with a comment containing /reopen if you'd like to continue working on these changes. Please be sure to use the command to reopen or remove the no-recent-activity label; otherwise, this is likely to be closed again with the next cleanup pass.

@github-actions github-actions bot closed this Nov 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

no-recent-activity There has been no recent activity on this issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.