Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework Caching Guide #1436

Merged
merged 8 commits into from
Jul 24, 2017
Merged

Rework Caching Guide #1436

merged 8 commits into from
Jul 24, 2017

Conversation

skipjack
Copy link
Collaborator

@skipjack skipjack commented Jul 17, 2017

Trying to combine the current caching guide with @timse's post and make it a bit easier to read with clear, fully-fleshed out examples. I'll be submitting a PR to @TheDutchCoder's examples repo soon as well for the small test project I created in relation to this.

Resolves #652 (hopefully -- and hopefully a bunch of the related open issues on the main repo)
Part of #1258 (will update that now)

This will still need much review and potentially another section or two to
cover all the bases.
@skipjack
Copy link
Collaborator Author

skipjack commented Jul 17, 2017

@TheDutchCoder this should be ready for an initial review.

@timse I used your post in combination with the content that @okonet and @jouni-kantola already had here. Using the HashedModuleIdsPlugin allowed me to avoid (I think) certain steps in the post you wrote. I still need to test using import(), externals and a few other things, so another section or two may be warranted. It seems you have a lot of experience on the subject, so if you could help review that would be great.

@okonet @jouni-kantola I can't assign you, but if either of you can review as well that would be much appreciated.

Everyone should note that I also have a PR open at TheDutchCoder/webpack-guides-code-examples#17 that directly reflects the final version of the files/code in the guide. This should be a good testing ground for us to figure out exactly what does and doesn't mess up hashes. I want to keep things fairly straightforward so the guide doesn't go on forever, so maybe leaving some rare edge cases out is ok, but we should definitely try to cover all common use cases.

@skipjack
Copy link
Collaborator Author

skipjack commented Jul 17, 2017

I think once we get this to a good state, we should locate all related issues on the main repo (here I think it's only #652), see if they are fully or almost fully answered by the guide updates and close them with a note saying that people should read through the guide and, if any issues arise create a targeted issue with what needs to change.

As per @jouni-kantola's point in #652, it would be great if we could get to one source of truth for this documentation. Then from there, hopefully the @webpack/core-team can continue to improve on the hashing discrepancies to simplify the whole process in further webpack releases.

Copy link
Collaborator

@TheDutchCoder TheDutchCoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor comments from my end.

Other than that this is looking 💯

Note: for some reason it shows some comments as "outdated", but they're not (as far as I can tell), just a heads up!

@@ -22,7 +22,7 @@ related:

T> This examples in this guide stem from [getting started](/guides/getting-started), [output management](/guides/output-management) and [code splitting](/guides/code-splitting).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S/This examples/The examples

@@ -22,7 +22,7 @@ related:

T> This examples in this guide stem from [getting started](/guides/getting-started), [output management](/guides/output-management) and [code splitting](/guides/code-splitting).

So we're using webpack to bundle our modular application, deploying our `/dist` directory to the server, and clients, typically browsers, are hitting that server to grab the site and its assets. The last step can be time consuming, which is why browsers use a technique called [caching](). This allows sites to load faster with less unnecessary network traffic, however it can also cause headaches when you need new code to be picked up.
So we're using webpack to bundle our modular application, deploying our `/dist` directory to the server, and clients, typically browsers, are hitting that server to grab the site and its assets. The last step can be time consuming, which is why browsers use a technique called [caching](https://en.wikipedia.org/wiki/Cache_(computing)). This allows sites to load faster with less unnecessary network traffic, however it can also cause headaches when you need new code to be picked up.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would rephrase this "...directory to the server, and clients, typically browsers, are hitting that server..." so it flows a little better.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -241,13 +241,13 @@ runtime.1400d5af64fc1b7b3a45.js 5.85 kB 2 [emitted] runtime
+ 1 hidden module
```

... we can see that all three have. This is because the [`module.id`]() of each is incremental by default, so...
... we can see that all three have. This is because the [`module.id`](/api/module-variables#module-id-commonjs-) of each is incremental by default, so...

- The `main` bundle changed because of it's new content.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S/it's/its

@@ -241,13 +241,13 @@ runtime.1400d5af64fc1b7b3a45.js 5.85 kB 2 [emitted] runtime
+ 1 hidden module
```

... we can see that all three have. This is because the [`module.id`]() of each is incremental by default, so...
... we can see that all three have. This is because the [`module.id`](/api/module-variables#module-id-commonjs-) of each is incremental by default, so...

- The `main` bundle changed because of it's new content.
- The `vendor` bundle changed because it's `module.id` was changed.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S/it's/its


- The `main` bundle changed because of it's new content.
- The `vendor` bundle changed because it's `module.id` was changed.
- And, the `runtime` bundle changed because it now contains a reference to a new module.

The first and last are expected -- it's the `vendor` hash we want to fix. Luckily, there are two plugins that can help us out with this dilemma. First, the [`NamedModulesPlugin`]() which the path to the module rather than a numerical ID. While this plugin is useful during development for easier to read output, it does take a bit longer to run. The second option is the `HashedModuleIdsPlugin`, which is what we'll use as these examples are more targeted toward production builds:
The first and last are expected -- it's the `vendor` hash we want to fix. Luckily, there are two plugins that can help us out with this dilemma. First, the [`NamedModulesPlugin`](/plugins/named-modules-plugin) which the path to the module rather than a numerical ID. While this plugin is useful during development for easier to read output, it does take a bit longer to run. The second option is the `HashedModuleIdsPlugin`, which is what we'll use as these examples are more targeted toward production builds:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"...which the path..." seems like a word is missing here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


- The `main` bundle changed because of it's new content.
- The `vendor` bundle changed because it's `module.id` was changed.
- And, the `runtime` bundle changed because it now contains a reference to a new module.

The first and last are expected -- it's the `vendor` hash we want to fix. Luckily, there are two plugins that can help us out with this dilemma. First, the [`NamedModulesPlugin`]() which the path to the module rather than a numerical ID. While this plugin is useful during development for easier to read output, it does take a bit longer to run. The second option is the `HashedModuleIdsPlugin`, which is what we'll use as these examples are more targeted toward production builds:
The first and last are expected -- it's the `vendor` hash we want to fix. Luckily, there are two plugins that can help us out with this dilemma. First, the [`NamedModulesPlugin`](/plugins/named-modules-plugin) which the path to the module rather than a numerical ID. While this plugin is useful during development for easier to read output, it does take a bit longer to run. The second option is the `HashedModuleIdsPlugin`, which is what we'll use as these examples are more targeted toward production builds:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link to HashedModuleIdsPlugin?


## The problem
A simple way to ensure the browser picks up changed files is by using `output.filename` [substitutions](/configuration/output#output-filename). The `[hash]` substitution can be used to include a build-specific hash in the filename, however it's even better to use the `[chunkhash]` subsitution which include a bundle-specific hash in the filename.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S/include/includes

```js
// webpack.config.js
var ChunkManifestPlugin = require("chunk-manifest-webpack-plugin");
// Lodash, currently included via a script, is required for this line to work
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this comment (I think)


Inlining the chunk manifest and webpack runtime (to prevent extra HTTP requests), depends on your server setup. There is a nice [walkthrough for Rails-based projects](https://brigade.engineering/setting-up-webpack-with-rails-c62aea149679). For server-side rendering in Node.js you can use [webpack-isomorphic-tools](https://github.com/halt-hammerzeit/webpack-isomorphic-tools).
// Lodash, currently included via a script, is required for this line to work
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

* https://github.com/webpack/webpack/issues/1315
* https://github.com/webpack/webpack.js.org/issues/652
* https://presentations.survivejs.com/advanced-webpack/
Caching gets messy. Plain and simple. However the walk-through above should give you a running start to deploying consistent, cache-able assets. See the _Further Reading_ section below to learn more.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S/cache-able/cachable

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I guess it's ok to append able to words even if you it's not an actual variation of that word. Using the - prevents spell checkers from throwing an error but I guess it does read better without it.

---

To enable long-term caching of static resources produced by webpack:
T> This examples in this guide stem from [getting started](/guides/getting-started), [output management](/guides/output-management) and [code splitting](/guides/code-splitting).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and examples don't fit together grammatically

1. Use `[chunkhash]` to add a content-dependent cache-buster to each file.
2. Extract the webpack manifest into a separate file.
3. Ensure that the entry point chunk containing the bootstrapping code doesn’t change hash over time for the same set of dependencies.
So we're using webpack to bundle our modular application, deploying our `/dist` directory to the server, and clients, typically browsers, are hitting that server to grab the site and its assets. The last step can be time consuming, which is why browsers use a technique called [caching](https://en.wikipedia.org/wiki/Cache_(computing)). This allows sites to load faster with less unnecessary network traffic, however it can also cause headaches when you need new code to be picked up.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this guide needs this intro. It is good for a blog post but I'd expect this guide to get directly to the point. Probably replace it with a link to my post and I could cross-link to this guide in the beginning to guide the traffic here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm I see your point but I think it is helpful to give a little context before jumping into things. I did use the link so as not to explain caching in too much detail -- also updated per the last review to make it flow a little better. Maybe we can still tighten it up a bit more? @TheDutchCoder what do you think?


For an even more optimized setup:
This guide focuses on the configuration changes needed to ensure that your `output` files are cached when appropriate, but re-requested when changed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This guide focuses on the configuration needed to ensure files produced by webpack compilation can remain cached unless their contents has changed. ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


What if we could produce the same filename, if the contents of the file did not change between builds? For example, it would be unnecessary to re-download a vendor file, when no dependencies have been updated, only application code.
This is because webpack includes certain boilerplate, specifically the runtime and manifest, in the entry chunk.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the previous sentence you're saying the file contents should have been changed but I think you meant to say "should not change"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

}
};
```
## Extracting Boilerplate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracting webpack boilerplate?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idk I think I'd rather leave this title as is for now. Partly because the mix of lowercase and uppercase (I know this is defined in our style guide/media repo) feels a bit odd but also I think it's pretty clear based on context.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I'd prefer Extract webpack runtime. I interpret boilerplate like something that isn't variable. Also, later in the PR the asset is named runtime.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about Extract Runtime & Libraries? I think that pretty much covers the whole section. Could also be Extract Runtime, Manifest, and Libraries but the runtime and manifest are being extracted into the same bundle so I think it's ok to just use Runtime.

path.join(__dirname, "build", "stats.json"),
JSON.stringify(stats.toJson()));
});
Another thing we may want to extract is our core third-party libraries, in our case `lodash`, as they are less likely to change than our source code. This step will allow clients to request even less from the server to stay up to date. Let's do this using a combination of a new `entry` point along with another `CommonsChunkPlugin` instance:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicking but thing and libraries don't match to me. What about

It's a good practice to extract third-party libraries, such as lodash or react, to a separate vendor chunk as they are less likely to change than our source code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd replace Let's do this with This can be done by

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense on both counts -- will update.

```

Alternatively, just use one of these plugins to export JSON files:
W> Note that order matters here. The `'vendor'` instance must be included prior to the `'runtime'` instance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instance? Did you mean chunk?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant CommonsChunkPlugin instance but maybe I should specify that.

})
]
};
... we can see that all three have. This is because the [`module.id`](/api/module-variables#module-id-commonjs-) of each is incremental by default, so...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is incremental by default => is not consistent by default?

Copy link
Collaborator Author

@skipjack skipjack Jul 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I did mean "incremental" in the sense that modules ids are simply incremented up in the order they are resolved (e.g. 1, 2, 3). However I do see how this could be a bit confusing... I'll rephrase.

};
... we can see that all three have. This is because the [`module.id`](/api/module-variables#module-id-commonjs-) of each is incremental by default, so...

- The `main` bundle changed because of it's new content.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this list is a part of the sentence, I'd make The lowercase and remove . in the end of each but last line

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean but the way I updated the sentence above makes this change unnecessary (I think).

- The `vendor` bundle changed because it's `module.id` was changed.
- And, the `runtime` bundle changed because it now contains a reference to a new module.

The first and last are expected -- it's the `vendor` hash we want to fix. Luckily, there are two plugins that can help us out with this dilemma. First, the [`NamedModulesPlugin`](/plugins/named-modules-plugin) which the path to the module rather than a numerical ID. While this plugin is useful during development for easier to read output, it does take a bit longer to run. The second option is the [`HashedModuleIdsPlugin`](/plugins/hashed-module-ids-plugin), which is what we'll use as these examples are more targeted toward production builds:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove dilemma

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which the path to the module is missing the verb

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ID => identifier?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for easier to read output => for simpler output or for the output that is easier to read?

@skipjack
Copy link
Collaborator Author

@okonet @TheDutchCoder awesome feedback, thank you both! Just pushed the updates...

What did you think of the article in general? My feeling is that it's a solid base that's now in line with our guides, but we may have to add a few of more of the oddities that can break caching, like those discussed in the various other guides we're linking to.

The issue is being thrown because of a recent change to the
uglifyjs-webpack-plugin. We'll probably need to put a more stable
fix in place (if not that table will continue to be broken). It looks
like maybe the cause is multiple escaped `|` characters used within
a single cell.
Copy link
Contributor

@jouni-kantola jouni-kantola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job! Easy to follow. I added a couple of suggestions and questions.

}
};
```
## Extracting Boilerplate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I'd prefer Extract webpack runtime. I interpret boilerplate like something that isn't variable. Also, later in the PR the asset is named runtime.

[0] ./src/index.js 63 bytes {1} [built]
[1] ./src/vendor.js 63 bytes {0} [built]
```
As we learned in [code splitting](/guides/code-splitting), the [`CommonsChunkPlugin`](/plugins/commons-chunk-plugin) can be used to split modules out into separate bundles. A lesser-known feature of the `CommonsChunkPlugin` is extracting webpack's boilerplate and manifest which can change with every build. By specifying a name not mentioned in the `entry` configuration, the plugin will automatically extract what we want into a separate bundle:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the runtime added to the last specified commons chunk? Maybe that is only when specified as array.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's extracted to a separate chunk when you use a name that doesn't already exist (i.e. no entry chunk exists by that name). I played around with the array syntax as well but ran into some issues trying to do it all with one instance.

      title: 'Caching'
}),
+ new webpack.optimize.CommonsChunkPlugin({
+ name: 'vendor'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need minChunks: Infinity as property, or is it default?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the example at TheDutchCoder/webpack-guides-code-examples#17... I don't think it's necessary but I also didn't test chunks too much yet (though I do have a bit more code to push that uses import() to generate a child chunk).

entry: './src/index.js',
plugins: [
new CleanWebpackPlugin(['dist']),
new HtmlWebpackPlugin({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should leave out HtmlWebpackPlugin from the caching docs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think the HtmlWebpackPlugin is key for this doc. If not, users have to track bundle names themselves using the manifest and I think that complicates the guide significantly. Plus, it's usage is already covered in output-management which comes before this guide.

@jouni-kantola
Copy link
Contributor

One thing I'd like to point out is that this PR mostly describes how to setup a proper config, not what to be aware of during development, I.e. adding extra entries. What I've always found hardest is to keep hashes intact during development and refactoring. What seem like small changes can have bigger impact than expected (https://gist.github.com/jouni-kantola/1c1e2bfaebf30de50d1b6a71b869da13).

@timse
Copy link
Member

timse commented Jul 21, 2017

Heya :) not sure about HashedModuleIdsPlugin might be working some magic that makes things I used unnecessary. I personally like the NamedModulesPlugin/NamedModulesChunks better for a couple of reasons, that dont necessarily count as a general guideline :)

That being said, I would use the named chunks plugin - But probably with a saner fallback function than specified in the guide a wrote :)

@skipjack
Copy link
Collaborator Author

One thing I'd like to point out is that this PR mostly describes how to setup a proper config, not what to be aware of during development, I.e. adding extra entries. What I've always found hardest is to keep hashes intact during development and refactoring. What seem like small changes can have bigger impact than expected.

@jouni-kantola I see your point, but I mean ideally we want to provide the configuration changes needed so that unexpected hash changes don't occur, right? Maybe some discussion in the conclusion of what to watch out for would help though?

Heya :) not sure about HashedModuleIdsPlugin might be working some magic that makes things I used unnecessary. I personally like the NamedModulesPlugin / NamedModulesChunks better for a couple of reasons, that dont necessarily count as a general guideline :)

@timse I used the HashedModuleIdsPlugin because I found it simpler when testing (TheDutchCoder/webpack-guides-code-examples#17) and it seemed I didn't need to include any of the Named*** plugins. That said, I did see what you said in your post about the Named*** plugins yielding slightly smaller bundles.

That being said, I would use the named chunks plugin - But probably with a saner fallback function than specified in the guide a wrote :)

I will take a look at that plugin tomorrow. I'm going to merge this for now but I definitely would like to keep the conversation going and do some follow up PRs. Maybe we can all test the example I put together in TheDutchCoder/webpack-guides-code-examples#17 and see if we can poke holes any holes in it? I don't think we have to cover extreme edge cases but if there are some common scenarios that break it we should definitely address them in the guide.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants