-
Notifications
You must be signed in to change notification settings - Fork 239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RRFC] Avoid distributing Readme.md when is not needed #744
Comments
What would we display on npmjs.org for a package? |
The NPM can go to https://github.com/chalk/chalk#readme and fetch the readme directly when the repository URL is a host that NPM trust, like Github for example. But this is just an initial idea, I want to raise this discussion so we could see if there is a good solution for this problem, or see if this is not a problem at all. In my perspective, the developers don't need to have the If the only reason of |
Without some sort of metadata that allows a package to be traced exactly back to what generated it, I don't know how the website would be able to fetch the right version. Other websites that use the registry (like jsdelivr, yarnpkg, etc) all use the README info within the package to work, and would too need to fetch this information on the fly and cache it themselves. (Of course, that level of tracing wound be nice, and some tooling adds a gitHead prop, but that's not universal and may not be accurate when produced automatically depending on the situation.) I personally would like to see some sort of info that says that the bandwidth used by README files is significant enough to warrant the change; it would seem to me that READMEs are small fish compared to the traffic generated by large packages or packages with large dependency graphs. |
Honestly long run I would prefer we move to the readme being updatable separate from the tarball upload (basically what @H4ad said in the follow up comment). It would be a larger long term change for the registry, so not sure if it is a topic for an RFC or just something to ask from npm folks. I believe we had talked about that idea before as well but I don't immediately see an issue/discussion to that point. |
The readme in the tarball is not a significant source of extra bandwidth to be honest. The readme in the packument is. That's a separate issue and we have internal discussions about how to stop needing to send that / store it. |
I did not mean that it being in the tarball was a bandwidth issue. I meant more that we should decouple the two things. Readme in the tarball or not, we should enable the readme on the website to be updated without publishing a new version. That would then enable projects to decide if bundling the readme made sense for their project without npm doing any change. |
@jakebailey If you know a way to get the top downloaded packages, I can try to get this statistic. But by looking just for two packages that we know are heavily downloaded (chalk and commander), it can save ~7,7TB/month, so I think the number is considerable, but the magnitude of significance can be different from person to person, for me TB is something to worry, for NPM could not.
I disagree in parts, for |
If the readme is 12.7k compressed that suggests a way-too-large readme that could be split into separate files, rather than “readmes shouldn’t be in the tarball”. |
Honestly I like having docs in readme's for small enough scoped libraries. So I am not a huge fan of that proposed solution as a reason not to try and make things better. Although I do agree that is the right approach for now. If we can get movement on the registry api then I think it is reasonable to ask for an endpoint just for posting a new readme as a metata bit attached to the package not from the tarball itself. This solves more than one problem, it allows for readme updates without a publish (a long time ask) and also enables reducing the tarball size if packages do have large docs written in the readme. |
The solution to solve most of the module size issues is to have two published bundles:
With a command, npm could “augment” the given module with the missing files (in case somebody needs them locally). Users that needs offline access could also turn it on by default. (A popular argument is that github releases provides that source bundle. This is unfortunately not correct, because a published module cannot be deleted, while a github repo can). |
Yeah, we want distributions! This would help so much with other issues as well. That would be a great solution to this problem. |
Maybe we can integrate with #519 and add a new type of distribution that is only used to address the |
That sounds like a great idea! |
That’s basically what I’ve been asking npm for for the better part of a decade :-) i would happily remove tests and a bunch of other dev-time files from the “runtime” package if i could set an npm config value and always have the “full’ package locally installed (for 450+ packages). |
same here @ljharb |
Motivation ("The Why")
I found two packages that ship Readme.md,
chalk
andcommander.js
.If we stop shipping Readme.md, we can save
631GB
and1.3TB
per week of bandwidth for these two packages.If we consider all the packages in the NPM ecosystem, I think the bandwidth that can be saved is huge.
Example
I created two PRs to propose a lighter version instead of shipping the heaviest Readme.md:
How
But I think instead of trying to fix each package, maybe we can change the behavior of NPM itself.
Current Behaviour
Today, we always include README.md.
Desired Behaviour
Not have to the
README.md
on the bundle.Possible solutions
We can remove
Readme.md
when the repository URL is defined and ifReadme.md
is not included in the files.To present the content of the
NPM
page, when it can be fetched from the repository URL.Only when the user explicitly defines
Readme.md
infiles
or when the repository URL is not defined then we will keepReadme.md
.All this behavior must happen in the
prepack
step because we should not modify the package after it is generated, as there are many packages that send not only the.tgz
but also the hash to validate the integrity, so there is no way forNPM
modify package contents after publish.The text was updated successfully, but these errors were encountered: