Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial Deployment Issues #488

Closed
raffertyuy opened this issue Oct 11, 2023 · 27 comments
Closed

Initial Deployment Issues #488

raffertyuy opened this issue Oct 11, 2023 · 27 comments
Assignees
Labels
deployment Issues related to deploying Chat-Copilot

Comments

@raffertyuy
Copy link

This may not be a bug, but I'm having trouble deploying the app from scratch. (Note: I was able to deploy successfully a month ago, so I can confirm my AAD configurations are working).

Help?

Attempt 1: Error using ./deploy-azure.ps1
This is my command

./deploy-azure.ps1 -Subscription {VALUE} -DeploymentName sk-chatcopilot-20231010codebase -AIService AzureOpenAI -AIApiKey {VALUE} -AIEndpoint "https://resource.openai.azure.com/" -BackendClientId {VALUE} -FrontendClientId {VALUE} -TenantId common -ResourceGroup razcopilot-rg -Region eastus -WebAppServiceSku S1

This is the error message that I'm getting

{"status":"Failed","error":{"code":"DeploymentFailed","target":"/subscriptions/7308e0b7-489d-4f8b-80b7-832b0662d47d/resourceGroups/razcopilot-rg/providers/Microsoft.Resources/deployments/sk-chatcopilot-20231010codebase","message":"At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/arm-deployment-operations for usage details.","details":[{"code":"BadRequest","target":"/subscriptions/7308e0b7-489d-4f8b-80b7-832b0662d47d/resourceGroups/razcopilot-rg/providers/Microsoft.Resources/deployments/sk-chatcopilot-20231010codebase","message":"{\r\n  \"Code\": \"BadRequest\",\r\n  \"Message\": \"Encountered an error (InternalServerError) from host runtime.\",\r\n  \"Target\": null,\r\n  \"Details\": [\r\n    {\r\n      \"Message\": \"Encountered an error (InternalServerError) from host runtime.\"\r\n    },\r\n    {\r\n      \"Code\": \"BadRequest\"\r\n    },\r\n    {\r\n      \"ErrorEntity\": {\r\n        \"Code\": \"BadRequest\",\r\n        \"Message\": \"Encountered an error (InternalServerError) from host runtime.\"\r\n      }\r\n    }\r\n  ],\r\n  \"Innererror\": null\r\n}"}]}}

Attempt 2: Using Deploy to Azure from this page

API is running
image

I believe the SWA is now hosted in the same API endpoint, but nothing...
image

Tried the SWA, which is in the RG - but nothing is deployed
image
image

@threegitty350
Copy link

I have also encountered deployment issues. This began several weeks ago and cannot be updated or deployed from scratch to azure. I have also encountered the issue where it could not find the Bing Search resource. By commenting out that part of the deployment and manually deploying the resource and configuring the key, it's able to deploy but fails to start and showed the same screen as OP. Eventually it decided to start (not sure how but it did after the weekend) but when i click on sign in with Microsoft it returns the error "AADSTS900144: The request body must contain the following parameter: 'client_id'."

Any thoughts? I could not get a good deployment since Thursday the 21st I believe.

@TaoChenOSU TaoChenOSU self-assigned this Oct 11, 2023
@TaoChenOSU TaoChenOSU added the deployment Issues related to deploying Chat-Copilot label Oct 11, 2023
@TaoChenOSU
Copy link
Collaborator

Hello @raffertyuy,

Thank you for opening the issue!

Our main branch is evolving quickly, and it is not guaranteed to be stable. Please use the deploy-to-azure button to deploy a stable version of Copilot Chat. Please note that if you deploy to an existing resource group, you may still see resources that are no longer needed as deploying doesn't wipe out existing resources.

@TaoChenOSU
Copy link
Collaborator

I have also encountered deployment issues. This began several weeks ago and cannot be updated or deployed from scratch to azure. I have also encountered the issue where it could not find the Bing Search resource. By commenting out that part of the deployment and manually deploying the resource and configuring the key, it's able to deploy but fails to start and showed the same screen as OP. Eventually it decided to start (not sure how but it did after the weekend) but when i click on sign in with Microsoft it returns the error "AADSTS900144: The request body must contain the following parameter: 'client_id'."

Any thoughts? I could not get a good deployment since Thursday the 21st I believe.

Hi @threegitty350,

Could you please open a new issue you encountered with the Bing Search resource?

@glahaye
Copy link
Collaborator

glahaye commented Oct 11, 2023

@raffertyuy Notwithstanding what Tao wrote above, the deployment script shouldn't fail. I'll take a look into this.

@glahaye glahaye self-assigned this Oct 11, 2023
@glahaye
Copy link
Collaborator

glahaye commented Oct 12, 2023

OK. There are a few things to unpack here...

First, I'll assume you have the required permissions to deploy all the resource types needed for a Chat Copilot deployment.

One way to get more information on attempted deployments is to add the -DebugDeployment switch at the end of the deploy-azure script. This will cause the deployment of the underlying ARM template to display hopefully more useful information.

Now onto the deployment themselves... As Tao mentioned, the code has changed a lot recently and some of the binaries used to support the deployment were obsolete. I have just updated them and you should be good to go now.

As a reminder, there is no more Static Web App resource as of release 0.7 (last week). Instead, the static files are now hosted by the backend by default (though you could host them elsewhere if you wanted). So make sure you point to your Web App Service to see whether your deployment works.

The number of resources deployed has grown a lot recently and I see there are race conditions that are possible and would cause the ARM template (especially for Application Insights-related resources). I will have to look into this.

Also, the reason you didn't get a frontend using the Deploy to Azure button is that a second step needed to be done manually after clicking the "Deploy to Azure" button (using the deploy-webapp script, which no longer exists). Simplifying this is actually one of the reasons the default hosting of the frontend file is now done from the Web App.

So, hopefully, you should be good now although the ARM template can still use some streamlining and protection against deployment race conditions.

Also, the safest way to avoid problems should any of the binaries lag again in the deployment resources would be to invoke the following scripts in this order:

.\deploy-azure.ps1
.\package-webapi.ps1
.\deploy-webapi.ps1
.\package-memorypipeline.ps1
.\deploy-memorypipeline.ps1
.\package-plugins.ps1
.\deploy-plugins.ps1

An overall script to do all this is coming next week along with the streamlining of the ARM template.

@douglasware
Copy link

You should not be pulling such problematic changes into the main branch when they don't work. Moving fast is not a polite justification. Branches are free to create. Thank you for all your doing, but please do better. <3

@raffertyuy
Copy link
Author

OK. There are a few things to unpack here...

First, I'll assume you have the required permissions to deploy all the resource types needed for a Chat Copilot deployment.

One way to get more information on attempted deployments is to add the -DebugDeployment switch at the end of the deploy-azure script. This will cause the deployment of the underlying ARM template to display hopefully more useful information.

Now onto the deployment themselves... As Tao mentioned, the code has changed a lot recently and some of the binaries used to support the deployment were obsolete. I have just updated them and you should be good to go now.

As a reminder, there is no more Static Web App resource as of release 0.7 (last week). Instead, the static files are now hosted by the backend by default (though you could host them elsewhere if you wanted). So make sure you point to your Web App Service to see whether your deployment works.

The number of resources deployed has grown a lot recently and I see there are race conditions that are possible and would cause the ARM template (especially for Application Insights-related resources). I will have to look into this.

Also, the reason you didn't get a frontend using the Deploy to Azure button is that a second step needed to be done manually after clicking the "Deploy to Azure" button (using the deploy-webapp script, which no longer exists). Simplifying this is actually one of the reasons the default hosting of the frontend file is now done from the Web App.

So, hopefully, you should be good now although the ARM template can still use some streamlining and protection against deployment race conditions.

Also, the safest way to avoid problems should any of the binaries lag again in the deployment resources would be to invoke the following scripts in this order:

.\deploy-azure.ps1 .\package-webapi.ps1 .\deploy-webapi.ps1 .\package-memorypipeline.ps1 .\deploy-memorypipeline.ps1 .\package-plugins.ps1 .\deploy-plugins.ps1

An overall script to do all this is coming next week along with the streamlining of the ARM template.

Thanks for the tip on script run-order. This will come in handy with my initial objectives.

Anyway, I followed @TaoChenOSU 's advise and deployed through the deploy-to-azure button first. I am getting intermittent errors though...

My first time deploying resulted in this errors deploying 3 log analytics workspaces
image

After deleting/purging and trying again, I got a different application insights deployment error
image

Help? :)

@douglasware
Copy link

douglasware commented Oct 13, 2023

@raffertyuy I think the webapi config is not getting set up right which just causes the web app to start and immediately fail. It doesn't emit any useful telemetry (that I could see) to indicate what the specific problem is. The most recent version I can get to work at all (when deployed to Azure) is c9e585d from September 19 which is the last commit before they added the new memory stuff. Running locally is fine.

@TaoChenOSU
Copy link
Collaborator

OK. There are a few things to unpack here...
First, I'll assume you have the required permissions to deploy all the resource types needed for a Chat Copilot deployment.
One way to get more information on attempted deployments is to add the -DebugDeployment switch at the end of the deploy-azure script. This will cause the deployment of the underlying ARM template to display hopefully more useful information.
Now onto the deployment themselves... As Tao mentioned, the code has changed a lot recently and some of the binaries used to support the deployment were obsolete. I have just updated them and you should be good to go now.
As a reminder, there is no more Static Web App resource as of release 0.7 (last week). Instead, the static files are now hosted by the backend by default (though you could host them elsewhere if you wanted). So make sure you point to your Web App Service to see whether your deployment works.
The number of resources deployed has grown a lot recently and I see there are race conditions that are possible and would cause the ARM template (especially for Application Insights-related resources). I will have to look into this.
Also, the reason you didn't get a frontend using the Deploy to Azure button is that a second step needed to be done manually after clicking the "Deploy to Azure" button (using the deploy-webapp script, which no longer exists). Simplifying this is actually one of the reasons the default hosting of the frontend file is now done from the Web App.
So, hopefully, you should be good now although the ARM template can still use some streamlining and protection against deployment race conditions.
Also, the safest way to avoid problems should any of the binaries lag again in the deployment resources would be to invoke the following scripts in this order:
.\deploy-azure.ps1 .\package-webapi.ps1 .\deploy-webapi.ps1 .\package-memorypipeline.ps1 .\deploy-memorypipeline.ps1 .\package-plugins.ps1 .\deploy-plugins.ps1
An overall script to do all this is coming next week along with the streamlining of the ARM template.

Thanks for the tip on script run-order. This will come in handy with my initial objectives.

Anyway, I followed @TaoChenOSU 's advise and deployed through the deploy-to-azure button first. I am getting intermittent errors though...

My first time deploying resulted in this errors deploying 3 log analytics workspaces image

After deleting/purging and trying again, I got a different application insights deployment error image

Help? :)

Hello @raffertyuy,

Thank you for uploading the screenshot! I just deployed using the deploy-to-azure button and it worked. As @glahaye mentioned, there might be a race condition in the deployment template, I know it's not ideal, but I believe the best bet to unblock you is to try redeploying while we investigate.

@TaoChenOSU
Copy link
Collaborator

Also if you are still seeing issues, could you please post the options you set in the deployment template (with the secrets hidden)?
image

@douglasware
Copy link

Here are my settings. The click deployment failed the first time, worked the second time, but like every time, regardless of the commit I've tried, no version I have tried after the first memory commit works, the web app crashes on startup. The last build I have seen work in Azure with my own eyes is the one from Sep 19.

image

image

@douglasware
Copy link

BTW... if you figure out what this issue is, I beg of you, fix your logging and error handling to make it possible to tell instead of just falling over dead with no output. :)

@TaoChenOSU
Copy link
Collaborator

Here are my settings. The click deployment failed the first time, worked the second time, but like every time, regardless of the commit I've tried, no version I have tried after the first memory commit works, the web app crashes on startup. The last build I have seen work in Azure with my own eyes is the one from Sep 19.

image

image

Please refer to this post on how to view the logs: #423.

Could you please deploying with memoryStore set to AzureCognitiveSearch?

@TaoChenOSU
Copy link
Collaborator

TaoChenOSU commented Oct 13, 2023

I believe I have found the issue. We don't support volatile anymore. Will issue a fix soon.
image

@raffertyuy
Copy link
Author

I believe I have found the issue. We don't support volatile anymore. Will issue a fix soon. image

Thanks for investigating. My screenshots above were from deploying to Azure Cognitive Search as the memory store. I will try again soon and give a screenshot of my inputs... maybe after your fix :)

@TaoChenOSU
Copy link
Collaborator

Task to remove Volatile and Postgres tracked by: #510

@glahaye
Copy link
Collaborator

glahaye commented Oct 17, 2023

@raffertyuy The blocking aspect of this issue should be fixed now. It was due to some obsolete default settings. Please give it another try and let me know how it went. I'm not closing this issue until your deployment works!

I eliminated some source of resource deployment race conditions but believe there is still one I need to address. I hope to have a PR for that today. In the meantime, you might have transient failures when deploying which can be mitigated by making a second deployment attempt.

@TaoChenOSU is also currently working on making the web search plugin optional (and I believe off by default), which will eliminate another common source of deployment problems.

@douglasware Admittedly, the past couple of weeks have shown us that we need to better deal with change velocity and how it can affect stability. We are actually looking at implementing a branching scheme more sophisticated than just putting everything in main.

As for the the logging, did you check in App Insights to get visibility into the problem? Suggestions as to where / how to log are welcome since we might be tunnel-visioned in debugging a certain way which doesn't necessarily correspond to how folks would like to do so.

@glahaye
Copy link
Collaborator

glahaye commented Oct 18, 2023

The web search plugin is now optional (and off by default) so that shouldn't be an issue anymore either.

@threegitty350
Copy link

Deploys for me just fine now. Thank you for your amazing work! :)

@douglasware
Copy link

Thanks for the great work! I'll have a go later today

@douglasware
Copy link

As for the the logging, did you check in App Insights to get visibility into the problem? Suggestions as to where / how to log are welcome since we might be tunnel-visioned in debugging a certain way which doesn't necessarily correspond to how folks would like to do so.

My theory that day was that it was crashing setting the middleware up and that was why turning up the app insights logging was not giving me anything to go on, i.e. it was crashing too soon to talk to app insights. I was able to use the filesystem logs and system events but the output wasn't much to go on.

@byte-rose
Copy link

@douglasware Did you get an Azure deployment working? I noticed once deployed there are a few configuration settings within Azure App service that need to be changed, some of the environment variables are still defined to local deployments; the server listens on localhost for incoming requests, while it should really be the app service url with the consequent port. I have also run into an error where the isnt a chat directory within the file structure but then, the path is called when you try to initiate a conversation

@byte-rose
Copy link

Other than that, the plugins work fine, the health test is also fine and auth is okay; I might have missed something if you got your instance working please let me know how

@raffertyuy
Copy link
Author

raffertyuy commented Oct 19, 2023

Thank you @glahaye and @TaoChenOSU for the latest updates. It is working for me now.

There are 2 minor issues, but easily resolved:

  1. Racing condition errors. workaround: redeploy again (without deleting anything).
  2. Deployment did not automatically update my Entra ID App Registration with the new redirect URI. I believe this was done automatically before. Anyway, easy manual step after deployment.

Feel free to close this issue if you think these should be tracked separately.

@glahaye
Copy link
Collaborator

glahaye commented Oct 19, 2023

@raffertyuy I have a potential fix for the race condition which I am testing now.

As for the App Registration, you can turn it on by using the -EnsureUriInAppRegistration flag with the deploy-webapi script. Your experience suggests this should be the other way around: done by default with the possibility to skip. I will make that change.

@glahaye
Copy link
Collaborator

glahaye commented Oct 20, 2023

@raffertyuy -EnsureUriInAppRegistration is no longer needed (and in fact no longer exists).

By default now, just invoking the deploy-webapi script takes care of all that's needed.

Also, I'll have a PR for the race condition shortly.

@glahaye
Copy link
Collaborator

glahaye commented Oct 23, 2023

Closing this issue.

Opened #539 to track deployment race conditions specifically.

@glahaye glahaye closed this as completed Oct 23, 2023
@github-project-automation github-project-automation bot moved this from In progress to Done in Apps & Services Semantic Kernel Oct 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deployment Issues related to deploying Chat-Copilot
Projects
No open projects
Development

No branches or pull requests

6 participants