Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cold start of Azure Function with large npm module trees is slow in Consumption plan #298

Closed
arafato opened this issue Apr 25, 2016 · 127 comments
Assignees
Milestone

Comments

@arafato
Copy link

arafato commented Apr 25, 2016

Hi--,

I've notice that the initial call of an Azure function (i.e. a couple of minutes after the last call) takes a significant amount of time (up to 15 secs). Is there any way to pre-warm a function other than initially calling it? Will the startup performance be improved in the near future? I'm using it on the Dynamic plan. Thanks!

Best regards
--Oliver.

@davidebbo
Copy link
Contributor

Currently, there is a short idle timeout on Function apps, so if they don't get requests for 5 minutes the next request causes a cold start which takes longer. We are looking into increasing this timeout, which will make the situation much less common.

@fabiocav fabiocav modified the milestones: Current, Next Jul 19, 2016
@FlavorScape
Copy link

FlavorScape commented Sep 5, 2016

Ive seen delays up to 2 minutes, or even timeout with Azure Function app and node.js on a dynamic hosting plan. This is terrible! running functions app version .5

@arafato
Copy link
Author

arafato commented Sep 5, 2016

Since upgrading to 0.5 delays have increased from 15 secs up to 50 secs on my side.

@arafato
Copy link
Author

arafato commented Sep 5, 2016

I'm currently mitigating the cold-start issue with an additional pre-warming Azure function that periodically calls the other Azure function like so.

let http = require('http');

module.exports = function (context, myTimer) {
        http.get('http://your.azure-function.net', (res) => {
        context.log(`Successfully called endpoint. Got response: ${res.statusCode}`);   
        context.done();
    }).on('error', (e) => {
        context.log(`Error: ${e.message}`);
        context.done();
    });
};

This function is called every 4 minutes through a timer-triggered function with the following
cron job expression 0 */4 * * * *

@vance
Copy link

vance commented Sep 5, 2016

I tried this. I have a timer calling every 4 minutes. I still get delays up to 5 MINUTES Here's the log of it. Fine, fine, fine, TERRIBLE...

screen shot 2016-09-05 at 12 12 35 am

@vance
Copy link

vance commented Sep 5, 2016

So, my classic plan app is running Function App v .4 and it seems that the debugging messages may work better? I get errors showing that don't get in the dynamic app (dynamic app just hangs for long periods). Could faults on one Function affect the performance of the whole app? Sometimes I even get "BAD GATEWAY"

@davidebbo
Copy link
Contributor

This is strange. To make sure that this is indeed related to 0.5 and not some other factors, can you try going back to 0.4 (instructions here)?

@peterjuras
Copy link

For me personally, it is not only about the cold start/idle issue, but as I wrote in #529 I think it would be good to have some transparency of the scenarios that are targeted by Azure functions.

If I understand things correctly, the start up time of a function is not only occuring on a cold start of a single container, but also affects new scaled instances if the dynamic plan wants to scale out.

This means that if I have an API HTTP triggered function, and one instance of that function is already running (let's say an execution takes 100ms), it could wait until the execution finishes and then cater the new request, leading to a maximum execution time of 200ms (which is kind of fine).

But instead, the dynamic plan could decide to scale out, and now we have the cold start penalty again. This means that instead of a maximum execution time of 200ms, this request will go above and beyond 5 seconds.

Could you explain how you are handling these cases?

@davidebbo
Copy link
Contributor

@peterjuras let's keep this issue focused on the potential cold start regression which is causing cold start of over 1 minute for some users. We can use a separate thread for a more general discussion of how the system work.

For users running into these extreme cold starts, various questions that would help investigating:

  1. Does 0.4 vs 0.5 make a difference?
  2. What functions are in the Function App? Is it just the one JS functions, or are there others? In particular, do you have any F# functions (there may be an issue there)?
  3. Can you share the name of a Function App where you saw this, and the UTC time of one specific slow cold start?

@vance
Copy link

vance commented Sep 8, 2016

So I did a long running test again where I ping every 3 minutes, the response times are all under 1000 MS unless one of the scheduled jobs failed to fire. Then after the six minutes lapse, the response time goes to a whopping 150,000-280,000 MS. I will try reverting to .4 and see what happens.

@davidebbo
Copy link
Contributor

@vance so you mean you are measuring http functions, but the app happens to also have a scheduled function? What's the function's schedule? Where does 'six minute lapse' come from?

@vance
Copy link

vance commented Sep 9, 2016

I mean I have a timer triggered function doing a HTTP request to my other function. It is set to 3 minutes. However, sometimes it does not fire, and six minutes elapse. The next time it runs, the execution time is extreme. Sometimes it's so slow, it creates a ripple effect where it goes from 280K to 140K to 30K or so, before it finally returns to under 1k MS for the next 10-20 calls.

@davidebbo
Copy link
Contributor

That's very odd behavior. If you can share your Function App name (either directly or indirectly), and the UTC time when you saw this, we can hopefully analyse the situation.

@vance
Copy link

vance commented Sep 9, 2016

Sure, here's a dummy app in the same region. fscape-dynamic3. Any function endpoint shows the extreme time in the logs, I think.

The screenshot is dated

05 Sep 2016 19:12:35 GMT

so it should be around this time.

@vance
Copy link

vance commented Sep 9, 2016

So I reverted to v 0.4 similar results. Also, don't know if it's related, but sometimes the "pinged" services even timeout, or even issue a CORS error, when other requests work. I tried just having a local script run a query every 10 minutes. See that result in the second image. No server changes happened at this time.

image

image

@dennis-yemelyanov
Copy link

dennis-yemelyanov commented Sep 10, 2016

I'm seeing similar symptoms. Although this might be an unrelated issue, in my case most of the time seems to be spent loading the azure-storage library. Any ideas if this is normal or how this can be fixed?

context.log('requiring azure');
const azure = require('azure-storage');
context.log('required azure'); 

2016-09-10T21:24:46.626 requiring azure
2016-09-10T21:27:04.641 required azure

@vance
Copy link

vance commented Sep 10, 2016

Actually, this might explain a lot of the delay. I think it may be multiplied if you hit a bunch of functions simultaneously when coming out of sleep. Maybe it pegs the I/O for the VM. I do know that this dependency is seems rather large...

I have no idea how this virtualization is engineered, but I wonder if there's some way to snapshot these static resources as globals in a shared memory location that's immutable. Just a thought, don't know if that's even possible.

@dennis-yemelyanov
Copy link

Yeah keeping it in memory might not be possible is some scenarios (at least not within the host process, which might restart etc.). I think the main question though is why does even fresh initialization take so long? Azure-storage indeed has tons of dependencies (I was shocked when I first saw it, way too many for a library that effectively is just a simple rest API wrapper). But when I run the same code locally (in a new node process) it executes instantly without any issues, which makes me think it could be an Azure runtime issue...

@vance
Copy link

vance commented Sep 12, 2016

Wonder if it's the network or storage used. Maybe when these instances come back from idle, the data must get loaded over the network since they are not bound to any particular hardware (or VM)? I did not see these long startup times when hitting Classic Service Plan app (with AlwaysOn turned off).

@arafato
Copy link
Author

arafato commented Sep 13, 2016

Good point @vance . Deleting an "average" node_modules folder on a dynamic plan takes well over 5 minutes in many cases. IOPS seems to be very low...

@vance
Copy link

vance commented Sep 27, 2016

Any way to have the container manager or whatever app pool manager cache the resources on the container disk, so they at least come back to life without going over the network? ... at least for the first instance.

@securityvoid
Copy link

Okay, I'm having some issues with the Webpack technique and I'm wondering what other people are doing.

I've mentioned this before, but I created a build-script. It goes through each one of the index.js for each function and extracts every require. These then go into a single, central javascript file like @1N50MN14 mentioned earlier.

e.g.
global.deps = { package1: require('package1'), package2: require('package2'), }

The final file is WebPacked and uglified, each index.js is set to include this file, and the references for each library are updated to reference the appropriate require.

e.g.
var myvar = globals.deps['package1'];

The problem I'm running into is that the final JavaScript file, while only 700kb, is large enough that if I add it to a VERY simple test function; the cold start takes longer than 5 minutes. Since I'm on a consumption plan, you can't set the timer above 5 minutes.

For example, the following will take over 5 minutes to run:

'use strict';
require('D:\\home\\site\\repository\\azure.deps.js');

module.exports = function(context){
    context.res =  {
        status: 200,
        headers: {
            "Content-Type" : "application/json"
        },
        body : { "status" : "alive"}
    }
    context.done();
}

Now, in the above example I don't even need the require, so it can be taken out. However, just as an example if I drop out the require, it of course runs really quickly. I'm sure if I ever got it cold started, the warm function would not have any issue with the 5 minutes. However, with the 5 minute timer it just dies before it can ever get warm.

What I don't understand is I don't think my dependency list is too terribly long:

  "dependencies": {
    "crypto": "0.0.3",
    "dotenv": "^4.0.0",
    "fs": "0.0.1-security",
    "https": "^1.0.0",
    "jsonwebtoken": "^7.3.0",
    "jwk-to-pem": "^1.2.6",
    "lodash": "^4.17.4",
    "path": "^0.12.7",
    "q": "^1.4.1",
    "xml2js": "^0.4.17",
    "xmlbuilder": "^8.2.2"
  }

As a result it seems like there should be some way to get this to work, and in fact if anyone is getting anything to run; I feel like I should be able to get my code to run.

@1N50MN14
Copy link

1N50MN14 commented Mar 1, 2017

@securityvoid 700k uglified code sounds like a lot are you sure you're doing it right? path, fs, crypto, https they all sound like native node modules which you don't need to include in your bundle. Also, did you make sure your node_modules folder is either in .gitignore or completely deleted this has been the reason cold start too slow. To be on the safe side, double check in the portal node_modules does not exist in your bundle directory (or other directories you don't need it) and your git integration is properly in sync.

My entry file (named index.src.js) looks something like this:

global.deps = {
  Promise: require('bluebird'),
  jwt: require('jsonwebtoken'),
  uuid: require('uuid')
}

then I have a shell script (that would be .bat on Windows I guess?) that I use to bundle and minifiy which looks like this:

rm index.bundle.min.js
rm index.bundle.js
npm install
browserify --node index.src.js > index.bundle.js
uglify -s index.bundle.js -o index.bundle.min.js 
rm -rf ./node_modules

And in my function code I use the deps like so:

require('../SDK/index.bundle.min')

var Promise = global.deps.Promise;
var uuid = global.deps.uuid;

module.exports = function(context)
{
  //code
}

Hope some of this helps ;)

@securityvoid
Copy link

@1N50MN14 I'll go back and double-check all the dependencies to be sure; but I think I have it all setup right. I can tell you that I'm gitignoring node_modules, and that the code runs fine locally. As a result it should be right.

Since I'm doing this as a deploy script my code actually has the "normal" requires prior to the script running that does all the updates.

Only the optimized code ever makes it into the wwwroot (no node_modules).

@securityvoid
Copy link

securityvoid commented Mar 1, 2017

@1N50MN14 I went back and did a print-out of the JSON Stats from Webpack to see everything that was being packed up. It looks like the ~700k is correct. A good 3/4 of which are associated with the jsonwebtoken package. That package does do a lot, so it sort of makes sense.

@davidebbo Are there plans on how to handle this? I can effectively undo all the Webpacking, but then I'm still back to the slow response times.

@ricklove
Copy link

ricklove commented Mar 1, 2017

@securityvoid

With my technique, I have around a 10 second cold start (which is equivalent to the average C# cold start from my tests). Some of my files are 3MB which is huge, but I don't run uglify and it doesn't seem to matter. (It seems that the cold-start issue with Azure Functions has to do with latency not bandwidth. It seems the number of file dependencies is the problem and large files don't slow it down much.)

Some observations:

  • I don't manually do anything to prune dependencies.
  • I leave everything in the original node_modules, place everything in its final position and then run webpack. This pulls in the node_modules dependencies so that it is no longer needed at all. (In my deployment folder, I have no node_modules and I have no project.json.)
  • I do all this on my machine before I deploy.
  • It takes around a minute for my build tool to run webpack on everything.
  • When I deploy, it takes a few minutes for everything to update from the time I git push.
  • Cold start time is around 10 seconds
    • I have run my cold-start measurment tests from a separate azure function and concluded that Javascript, C#, and F# Functions all have equivalent (within 20%) cold start times (assuming a webpack pre-built solution like this).
  • There are probably some points of improvement, but I have obtained sufficient performance for production use.

Folder structure:

  • root
    • node_modules
    • project.json
    • src (Typescript)
    • lib (Generated from src)
    • .deployments (points to deployments)
    • deployments
      • no project.json
      • [function-folder]
        • function.json
        • index.js (requires build.js)
        • build.source.js (points to entry point in lib folder)
        • build.js
          • built using webpack
          • contains all dependencies from lib and node_modules
          • largest is around 3MB (I don't use uglify)

Process:

  • Run tsc to transpile from the src folder into a lib folder in my project.
    • The code in this folder still points to the original node_modules for external require.
  • Generate a function-folder for each endpoint (this can be done manually).
    • This includes:
      • function.json
      • index.js
      • build.source.js
  • Run webpack against each build.source.js which produces a build.js
    • This now has every code from the lib folder and all dependencies from node_modules.
  • Git push the project with the deployments folder to the production branch
    • no node_modules folder is included
    • lib folder is not needed (but I include it because my CI tests run against it)

@securityvoid
Copy link

@ricklove Thank's for sharing. If you have 3mb files, the 700k definitely isn't why my stuff is so slow. I'm going to try without uglifying and see if there is a difference.

BTW, are you using the standard consumption plan?

@ricklove
Copy link

ricklove commented Mar 1, 2017

@securityvoid Yes, I'm using the consumption plan.

For your case, I suspect there is something in your deployments folder that has many files that is getting loaded during cold start.

Also, I just extended my cold-start timing tests so I can directly compare a JavaScript run with no dependencies against a 3MB function that uses azure-storage.

My test runs every 12 minutes alternating which function it calls (from an external function app). So it's a good test from the perspective of a client - measuring the actual turn around time from initial call to client receipt of the response (the monitor page of the function itself only shows ~2 seconds, when this test shows ~10 seconds).

I'll give a report tomorrow.

@securityvoid
Copy link

@ricklove I would think if it was something generically in my deployments folder then the function would not behave different whether I had that require or not.

This times out every time:

'use strict';
require('D:\\home\\site\\repository\\azure.deps.js');

module.exports = function(context){
    context.res =  {
        status: 200,
        headers: {
            "Content-Type" : "application/json"
        },
        body : { "status" : "alive"}
    }
    context.done();
}

This runs every time:

'use strict';

module.exports = function(context){
    context.res =  {
        status: 200,
        headers: {
            "Content-Type" : "application/json"
        },
        body : { "status" : "alive"}
    }
    context.done();
}

It doesn't make any sense for me to have issues with the 700k file when you have a 3M file, but it seems like the file itself is somehow the issue. I suppose the alternative is that the file name is just the last little bit that pushes it over the 5m mark.

In any case, removing uglify did not really help.

@securityvoid
Copy link

@ricklove Arg! I think I just noticed my own typo.

If you notice in the file path of the require is "repository". That is the non-webpacked version of the file. The webpacked version is in the wwwroot.

Since the file existed, it didn't error out, and in fact would work... given enough time.

I'm redeploying now to confirm that solves things.

@phillipharding
Copy link

@davidebbo @1N50MN14 I've made the Browserify technique work without the use of global...

The shared/library module is as follows...

/**
 * ./library/shared.js -->
 * export your dependencies here
 */
module.exports = {
	require: require('request'),
	requestPromise: require('request-promise'),
	common: require("./common"),
};

And the Function code file is...

/**
 * ./functionname/index.js -->
 */
var shared = require("../library/shared.min");

function functionHandler(context, req) {
	context.log('FuncImpersonate01 HTTP trigger function received a request.');
	shared.common.logRequestHeaders(context, req);

	context.res = {
		status: 200,
		headers: { "Content-Type": "application/json" },
		body: {
			Success: true,
		},
	};
	context.done();
};
module.exports = functionHandler;

The Build/Browserify commands are...

browserify --node ./library/shared.js --standalone shared -o ./library/shared.bundle.js
uglifyjs ./library/shared.bundle.js -o ./library/shared.min.js

@davidebbo
Copy link
Contributor

@phillipharding thanks for sharing!

As an aside, note that we've added support to automatically run funcpack on Kudu git deployments. To enable it, simply set an Azure App Setting: SCM_USE_FUNCPACK=1. In theory this should just work, but it's still new, so feedback is welcome (https://github.com/Azure/azure-functions-pack is a good place for it).

@1N50MN14
Copy link

@phillipharding Awesomes thanks for sharing this! it seems that browerify's --standalone did the trick!

@bakatz
Copy link

bakatz commented Jan 8, 2018

I've been running into this issue for some time now and it's very frustrating as I'm seeing 60 second startup times on a regular basis. I'll give azure-functions-pack a shot at some point, and I think that the azure-functions-pack tool is a decent workaround for the short term, but is there anything on the roadmap to address the root cause? (i.e. the underlying file system being prohibitively slow)

Specifically, I'm using the Slack Events API, and if Slack determines that a request to my API takes more than 5 seconds to complete (which happens regularly whenever whatever orchestrates Azure Functions decides to spin down our API), it'll automatically be retried over and over. This behavior in combination with Functions' filesystem issues means I can't really work with Slack's API without a hack that ignores "retry" messages based on a certain request header value, which isn't ideal for various reasons.

Thanks

@adamdriscoll
Copy link

I'm having the exact same issue with the Shopify Webhook API. It will retry if the request doesn't finish in 5 seconds. I usually get about 3-4 requests before one returns in time for it to stop trying.

@murraybauer
Copy link

Microsoft needs to fix this ASAP. Node.js runs fine on AWS Lamba. Disappointing when you see articles like this: seven reasons why so many Node.js devs are excited about Microsoft Azure

@tony-gutierrez
Copy link

Forget the webpack flag, a huge number of modules wont compress properly with it. So basically no solution still, and a pretty useless product if you want serverless node.

@bakatz
Copy link

bakatz commented Feb 8, 2018

@tony-gutierrez agree - this is something I would have expected Azure Functions to launch with, not something that still isn't fixed a full year after the original issue was filed. 🤔

@tony-gutierrez
Copy link

tony-gutierrez commented Feb 8, 2018

We started with functions trying to use the consumption plan. Not even close to working. Then we ran the app service plan for a while, and at least it worked. Then we realized we basically had no benefits of a serverless host, and all the horrid things function apps incur (like no access to console.log, super slow storage, not easy to dev locally, etc.) Did a quick modify to a normal node app and deployed to a VM and app service. The perf increase is pretty major on both.

@bhosale-ajay
Copy link

We used webpack to bundle the npm module dependencies into a single file to reduce the cold time, refer this medium article

@davidebbo
Copy link
Contributor

See also the newly announced Run-From-Zip feature, which also helps with cold start.

@davidebbo
Copy link
Contributor

davidebbo commented Jun 13, 2018

Closing this issue as Run-From-Zip is working well enough for most people (and some others are using funcpack). It doesn't mean that there can't be more optimizations, but we can always track specific work separately, as this issue is too open-ended.

@ghost ghost locked as resolved and limited conversation to collaborators Jan 2, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests