Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: symlinks in node_modules #6993

Closed
wants to merge 3 commits into from
Closed

Conversation

aleclarson
Copy link
Contributor

@aleclarson aleclarson commented Sep 18, 2018

  • add follow method to HasteFS class
  • add links property to InternalHasteMap type
  • add LinkData map type
  • add LinkMetaData array type

Summary

This PR is required by facebook/metro#257, which also adds support for symlinks in node_modules directories.

Only symlinks matching node_modules/* or node_modules/@*/* are stored in the links property. For each symlink found, a [target, mtime] tuple is created. The target value is undefined until hasteFS.follow uses fs.realpathSync to resolve the symlink. In this way, symlinks are lazily resolved and then cached.

The user will need Watchman installed for this commit to take effect. Support for the Node crawler should be added in a future commit.

Test plan

None yet. Not familiar with Jest's test suite, and I assume you'll want some changes, or deny this altogether in favor of a more informed direction.

I've tested this manually with arbitrary symlinks and PNPM installations.

@@ -40,6 +43,11 @@ export default class HasteFS {
return this._files.has(file);
}

follow(file: Path): Path {
const link = this._links[file];
return link ? link[0] || (link[0] = fs.realpathSync(file)) : file;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could use realpath-native

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Split the side-effect in two as well.

@SimenB
Copy link
Member

SimenB commented Sep 19, 2018

Needs a changelog entry 🙂

@SimenB SimenB requested a review from rafeca September 19, 2018 12:04
Copy link
Contributor

@mjesun mjesun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it looks like solid work for me. Aside from the comments I left; we'd like the file watcher exposed to trigger both when the symlink is re-assigned, and when the destination file is changed. I assume the latter only happens in case the destination file is also tracked by watchman. Not optimal TBH, but I think we can live with it.

]);
} else {
// See ../constants.js
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, make it back.

@@ -40,6 +43,11 @@ export default class HasteFS {
return this._files.has(file);
}

follow(file: Path): Path {
const link = this._links[file];
return link ? link[0] || (link[0] = fs.realpathSync(file)) : file;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Split the side-effect in two as well.

@mjesun
Copy link
Contributor

mjesun commented Sep 19, 2018

@rafeca Can we try internally the change to watchman's query?

@aleclarson
Copy link
Contributor Author

I assume the latter only happens in case the destination file is also tracked by watchman. Not optimal TBH, but I think we can live with it.

Yeah, we could resolve symlinks immediately and watch the real paths (skipping any that resolve to a node_modules descendant).

@aleclarson
Copy link
Contributor Author

The requested changes have been addressed.

@mjesun
Copy link
Contributor

mjesun commented Sep 19, 2018

Yeah, we could resolve symlinks immediately and watch the real paths (skipping any that resolve to a node_modules descendant).

We should probably track both: the symlink itself, and the file that points to. Every time the symlink changes, the file it points to is stopped from watching, and we start watching the new one. Changes to any of those should be reported back as changes by the listener.

Copy link
Contributor

@mjesun mjesun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've had a bit of a discussion internally, and we think the symlink should not be exposed, or, at least, be fully transparent (i.e., you shouldn't need to call .follow at all).

@aleclarson
Copy link
Contributor Author

We've had a bit of a discussion internally, and we think the symlink should not be exposed, or, at least, be fully transparent (i.e., you shouldn't need to call .follow at all).

Hmm, I wonder if there would be a considerable performance cost in doing so?

Perhaps, support for transparent symlink resolution can be added when/if we add comprehensive symlink support (that is, symlinks outside node_modules).

For now, it's not a problem, since we only call follow here and here in Metro's resolution logic.

@rubennorte
Copy link
Contributor

Hmm, I wonder if there would be a considerable performance cost in doing so?

I think symlink support should probably be an option.

Perhaps, support for transparent symlink resolution can be added when/if we add comprehensive symlink support (that is, symlinks outside node_modules).

Is there a particular reason to support them only inside node_modules? Other than it's what you need to solve your use case I mean.

@aleclarson
Copy link
Contributor Author

aleclarson commented Sep 19, 2018

I think symlink support should probably be an option.

Okay, but there should be three options: "on", "node_modules", and "off" (if there's a considerable perf difference between "on" and "node_modules"). It should default to "node_modules", because I'm guessing this PR alone will have an insignificant impact on perf for projects not utilizing symlinks.

Is there a particular reason to support them only inside node_modules? Other than it's what you need to solve your use case I mean.

I'd be fine with comprehensive support, if it's not too expensive. But I don't use symlinks for anything except node_modules directories. Does that answer your question?

@SimenB
Copy link
Member

SimenB commented Sep 20, 2018

Kind related (maybe?) #5356

@aleclarson
Copy link
Contributor Author

aleclarson commented Sep 20, 2018

Preserving symlinks == the symlink is treated as the real path.
Expanding symlinks == the symlink is discarded in favor of the real path.

This PR tracks where symlinks exist and resolves them on-demand (without ever discarding symlink paths). To resolve a symlink, its real path must be watched separately and explicitly. Otherwise, the HasteFS instance won't believe the real path exists (which should probably be reflected in the follow method).

Anyway, it's important to not discard the symlink path once resolved, because this information is crucial when resolving dependencies in Metro. Although, it might make sense for Metro to track that information itself.

The ideal behavior is for Jest to resolve symlinks eagerly, watch both the symlink and its real path, and treat symlinks as if they're hard links. To support this, you'll need a map of symlink->real and a second map of real->symlinks (for propagating changes).

The follow method should be kept, so users can test if a path is a symlink and/or resolve it. This is useful when deduping resolved dependencies, for example. Also, keeping the follow method allows this PR to be accepted without needing to break backwards-compatibility once transparent symlinks are implemented.

For "broken" symlinks, should follow return the target path, return the symlink path, return null, or throw an error? The current behavior is to throw, I think.

@SimenB
Copy link
Member

SimenB commented Sep 20, 2018

Don't disregard perf hit with symlinks without testing it (especially for windows). We've had reports of performance degradations that according to the report came after adding some realpath calls, e.g. #6783

@aleclarson
Copy link
Contributor Author

aleclarson commented Sep 20, 2018

@SimenB In that case, I can, because realpath is never called if you don't use symlinks in node_modules.

Unless you think the Watchman query will hurt perf.

@SimenB SimenB added this to the Jest 24 milestone Oct 16, 2018
@SimenB
Copy link
Member

SimenB commented Oct 19, 2018

@rubennorte @mjesun is this ready to land?

@SimenB
Copy link
Member

SimenB commented Oct 20, 2018

@aleclarson mind rebasing and fixing the failing tests?

- add `follow` method to `HasteFS` class
- add `links` property to `InternalHasteMap` type
- add `LinkData` map type
- add `LinkMetaData` array type

The user will need Watchman installed for this commit to take effect. Support for the Node crawler should be added in a future commit.

Only symlinks matching `node_modules/*` or `node_modules/@*/*` are stored in the `links` property. For each symlink found, a `[target, mtime]` tuple is created. The `target` value is undefined until `hasteFS.follow` uses `fs.realpathSync` to resolve the symlink. In this way, symlinks are lazily resolved and then cached.
@SimenB
Copy link
Member

SimenB commented Oct 22, 2018

Seems like the hastemap tests fail on CI?

@aleclarson
Copy link
Contributor Author

aleclarson commented Nov 15, 2018

I could find bandwidth to write/fix the tests if a FB employee finds time to stub out tests for the expected behavior (and merge it into this PR).

@jeanlauliac
Copy link
Contributor

Could you add a test case to packages/jest-haste-map/src/crawlers/__tests__/watchman.test.js and fix the existing cases? As now the args assertion has changed.

@jeanlauliac jeanlauliac self-requested a review November 30, 2018 13:28
@jeanlauliac
Copy link
Contributor

jeanlauliac commented Dec 17, 2018

So, to recap on this issue. I'm uneasy going with the "follow" function strategy because it's not a very systematic approach, it puts the responsibility on the consumers to do the right thing; where I believe we could do something that is more transparent, ie. that wouldn't expose the concept of symlink to the library's consumers. The reason we don't need to expose the concept of symlink is because the library is essentially read-only. We can expose files in such a way that symlinks are exposed the same way as hard links would be, consumers don't need to know (or do they?). By implementing things in that way, it seems like things would just work out of box not only for Metro, but also for the Jest test runner.

On the other hand that means Metro, for example, would have to transform the same file twice (ex. the real one, versus the symlink). But, since they live in different folders, they might have different transform options, so it might make sense anyway. I'm not entirely whether that would be prohibitive in terms of

How I'd propose to have symlinks implemented:

  • We don't add a new API, instead making it transparent: the functions getModuleName, getDependencies, getSha1, exists, getAllFiles, etc. should all act just as if symlinks were real files, with their own file paths.
  • Internally, we can do this fairly simply by adding these symlinks just like the other files in the _files structure.
  • Importantly, this means that if you have a symlink pointing to a folder containing any Haste module, these Haste names would cause duplication to happen, which seems correct to me.
  • I don't think we should support symlinks pointing to files outside the roots as we wouldn't be able to watch them easily. I don't think we should try to support cycles either, etc. as these would cause an infinitely large list of files (or, we could support a maximum depth).
  • In addition to adding symlink to _files, we track all the reverse-links caused by symlinks that are within the roots (ex. _links). When we get notified a file changed, we traverse the reverse-links to update the data for any symlink referencing the file (including transitive reverse-links). We normally also need to check any parent folder of that file to see if they are themselves referenced by any symlink. Finally we need to generate "change" events for the entire list of files/symlinks involved.
  • Finally, when a symlink itself changes (reported by watchman), we need both to update the hypothetic _links data structure, but we also need to follow the new linked path, and update the symlink's own data as well as all its transitive reverse-link, just like the above point. Then we need to generate "change" events for all of these traversed files. Here again we make it transparent, just exposing these as if they were separate files.

The fact this last two implementation details are quite non-trivial leads me to believe that integrating symlink support within jest-haste-map transparently is the right thing to do, instead of letting consumers manage their own reverse-links data structure and having to do proper cache invalidation. We'd have to code this logic twice or more in Metro, Jest, etc. Making transparent may make consumers' caches potentially bigger, as they cannot deduplicate, but it also makes it much simpler to manage, and robust.

About testing, I don't have necessarily a guidance on which stubs/cases should be added, I trust you to add the tests that you judge necessary, and to adjust existing test cases (but they shouldn't normally be affected since we're adding support for a new feature). We'll probably want cover the initial crawling of the codebase, files getting changed, symlinks getting changed, and finally files or symlink being added/removed after the watcher already started.

Implementing things that way would remove the need for facebook/metro#257, so it's not necessarily more work or more complex overall, but it makes it more self-contained.

Does that sound like a plan?

@jeanlauliac
Copy link
Contributor

About performance concerns: I don't think there would be that much of a performance hit, as we can read symlinks the first time we find them. Then, once the process has started, we already know where symlinks are pointing at, and we only need to read them again when watchman reports a change. Note that with the logic I describe above, at no point in time would we call realpath. Indeed, instead, we want to do our own traversal of the file system, not rely on the POSIX resolution mechanism.

@aleclarson
Copy link
Contributor Author

I won't be implementing transparent symlinks, because I don't need them. 😄

I still believe you could merge this PR for temporary relief, and then tackle transparent symlinks at your own pace. But if you prefer jumping right to transparent symlinks, I can't stop you. 😉

@jeanlauliac
Copy link
Contributor

I won't implement transparent symlinks myself unfortunately as I don't need them at the moment either. I suppose we can merge a partial solution, though we have to keep in mind it'll likely be permanent. But I'll let the Jest maintainers make the call on that one as I don't work much on Jest these days.

@jeanlauliac jeanlauliac removed their request for review December 17, 2018 17:46
@SimenB
Copy link
Member

SimenB commented Dec 17, 2018

@rickhanlonii
Copy link
Member

This is great! I agree with @jeanlauliac - I think we should wait until we have transparent symlinks

@aleclarson
Copy link
Contributor Author

aleclarson commented Dec 21, 2018

Is there a way we could involve the Metro community in this decision?

I strongly believe that symlinks in node_modules is the main use case. I've never thought of making a symlink for anything else. Of course, this is only anecdotal.

edit: Unless you guys decide to implement transparent symlinks immediately, this PR should be strongly considered. It's ready to solve the main use case (apart from any rebasing and testing).

@rickhanlonii
Copy link
Member

@aleclarson @jeanlauliac and @mjesun work on Metro full time

@rickhanlonii
Copy link
Member

@mjesun what's the risk of merging the partial solution?

@aleclarson
Copy link
Contributor Author

aleclarson commented Dec 24, 2018

@jeanlauliac and @mjesun work on Metro full time

Okay, but they haven't said whether they'll be implementing transparent symlinks without community involvement.

I'm working on an update to this PR that will make it easier to implement transparent symlinks in the future. 😊

@aleclarson
Copy link
Contributor Author

Closed in favor of #7549

@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Please note this issue tracker is not a help forum. We recommend using StackOverflow or our discord channel for questions.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 12, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants