-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
readline: add support for FileHandle
#43102
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is just a convenience function and the same end result can already be achieved by using existing APIs, could you please point out which of the points mentioned in https://github.com/nodejs/node/blob/HEAD/doc/contributing/components-in-core.md applies here?
added: REPLACEME | ||
--> | ||
|
||
* `input` {FileHandle} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the justification behind choosing FileHandle and not a file path or an fd? Is it because most of userland code uses FileHandles nowadays?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using FileHandle
is more or less the only reliable way of checking if a file exists to then proceed to read it in an async context, for this reason I think we should simplify its use when we can. fd are just numbers, I think it could be dangerous to add support for those, file paths are just strings. We could consider URL objects, but I'm not convinced that would make a very interesting API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to read it in an async context
Hmm, would be nice to have a test case that tries to read lines from the file after a part of the stream has already been consumed if that's what you mean by the async context here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By async context, I mean that you at some point need to know that the file exists, then at another point you need to read it and/or write to it. e.g.
async function* getLinesOrCreateFile(path) {
let fh
try {
fh = await open(path, 'r');
} catch (err) {
if (err?.code === 'ENOENT') {
await writeFile(path, getDefaultData());
yield* getDefaultData();
return;
}
throw err;
}
try {
yield* readLines(fh);
} finally {
await fh.close();
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, so something like a cache. This could still be done by accepting string filepaths, fds and url objects with some modifications like opening the file later on in the last try block and turning the open() call into access().
fd are just numbers, I think it could be dangerous to add support for those, file paths are just strings. We could consider URL objects, but I'm not convinced that would make a very interesting API.
I'm not sure how using these are dangerous though. Fd based APIs are not marked as deprecated or legacy and there is no mention of any sort of danger in the fs docs.
I didn't understand what's wrong with file paths being "just strings".
It seems that there are multiple ways in which users could prefer to use this functionality. I feel that adding support for anything other than just readable streams would serve as an impetus for PRs adding support for other things like fds, string paths and even url objects (you might not find the api interesting but someone else might :P). This would unnecessarily expand the API surface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could still be done by accepting string filepaths, fds and url objects with some modifications like opening the file later on in the last try block and turning the open() call into access().
Well no, the access
call to a path could report there's a file that exists, then another program could delete it, which would create an error later when trying to read the file. See discussion in #39960 for more context.
I'm not sure how using these are dangerous though. Fd based APIs are not marked as deprecated or legacy and there is no mention of any sort of danger in the fs docs.
I didn't understand what's wrong with file paths being "just strings".
What I mean is readLines('this is a line\nthis is another line')
could be something that someone tries, I think it's more useful if it throws a useful error. Anyway, no strong feeling on my side for this, it's not what I'm looking for anyway.
It seems that there are multiple ways in which users could prefer to use this functionality. I feel that adding support for anything other than just readable streams would serve as an impetus for PRs adding support for other things like fds, string paths and even url objects (you might not find the api interesting but someone else might :P). This would unnecessarily expand the API surface.
That's a great point, that's why I suggest instead we make it a method of the FileHandle
class.
if (input?.input) { | ||
if (input instanceof FileHandle) { | ||
input = input.createReadStream(); | ||
crlfDelay = Infinity; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure that this is the only option that needs to be set with a value different from the default while reading lines from a file?
Also, why isn't Infinity the default value of crlfDelay in all cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure that this is the only option that needs to be set with a value different from the default while reading lines from a file?
That's the documented one, we can always update it later if we find out some are missing.
Also, why isn't Infinity the default value of crlfDelay in all cases?
I don't have the answer to that question, I don't think adding an unnecessary breaking change is worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think adding an unnecessary breaking change is worth it
I thought doing that would fix the weird problems you were mentioning in #42590 (comment) instead of introducing unnecessary breakage. (would be interesting to know what the weird problem is though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unnecessary breakage is to change the default. See the docs for more context:
Lines 998 to 1003 in 6afd3fc
* `crlfDelay` {number} If the delay between `\r` and `\n` exceeds | |
`crlfDelay` milliseconds, both `\r` and `\n` will be treated as separate | |
end-of-line input. `crlfDelay` will be coerced to a number no less than | |
`100`. It can be set to `Infinity`, in which case `\r` followed by `\n` | |
will always be considered a single newline (which may be reasonable for | |
[reading files][] with `\r\n` line delimiter). **Default:** `100`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is \r\n
supposed to be considered as a single line even on posix? I thought line endings were \r
on posix and \r\n
on windows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
\r
and \n
are two different characters, \n
is the one that marks the end of the line on non-Windows systems. readline
will treat both as a line separator, unless there are next to one another.
node/lib/internal/readline/interface.js
Lines 1261 to 1275 in d36b60e
case 'return': // Carriage return, i.e. \r | |
this[kSawReturnAt] = DateNow(); | |
this[kLine](); | |
break; | |
case 'enter': | |
// When key interval > crlfDelay | |
if ( | |
this[kSawReturnAt] === 0 || | |
DateNow() - this[kSawReturnAt] > this.crlfDelay | |
) { | |
this[kLine](); | |
} | |
this[kSawReturnAt] = 0; | |
break; |
|
||
* `input` {FileHandle} | ||
|
||
Convenient alias to create a `readlinePromises.Interface` and stream over the file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Convenient alias to create a `readlinePromises.Interface` and stream over the file. | |
Convenient alias to create a `readlinePromises.Interface` and stream over the | |
file. |
lint fix
4: Developer experience is significantly improved if the component is in core. |
"significant" sounds very subjective here. Personally, I wouldn't call this a significant developer experience improvement because it just allows you to write this const rl = readline.createInterface({ input: fh.createReadStream(), crlfDelay: Infinity });
for await (const line of rl) {
...
} like this for await (const line of readline.createInterface(fh)) {
...
} which is just a 1 line reduction and there isn't a lot of complaints from users regarding writing this line either at least in this repo. However, it would definitely be nice to hear if other people consider this to be a significant developer experience improvement. Perhaps the TSC because the doc was written in response to nodejs/TSC#1041 which is in the tsc repo?
The functionality is already provided and it is documented in https://nodejs.org/api/readline.html#example-read-file-stream-line-by-line. For people who don't read Node.js docs, it's also there on stackoverflow - https://stackoverflow.com/a/32599033. |
I agree that it is indeed subjective. However, I think my frustration with the current API is real (maybe a one line difference doesn't look that much to you, but it is for me), and I think it's fair if I'm being counted as a Node.js user.
I think my proposal is reducing friction, and what it's trying to achieve is to make the "right" implementation the "easy" implementation. Also, I don't think there's any upside in refusing to reduce user friction. |
I think we need support from multiple users to call this significant.
I'm not sure I fully understand what the friction is that you are talking about but there's no mention of such a thing in the doc currently. So to consider that, we should have consensus with the TSC on accepting changes that "reduce friction". Moreover, there is a point in the doc that's against the inclusion of this API node/doc/contributing/components-in-core.md Lines 29 to 30 in 26846a0
|
I disagree, I don't see how this has anything to do with the TSC, imo this should follow the usual review process. Anyway, I'm going to pursue #42590 instead, we both seem to agree it's better than this proposal. |
Nope, I don't think that's a better proposal. I feel the same way about both because at its core, it's the same thing - a utility API composed of fs and readline stuff to enable reading lines from a file using a lesser line of code than using the underlying APIs directly. |
Sorry for misrepresenting your position, I guess I should have say "We both agree that this PR is no better than the one I opened earlier".
Yes that's correct, and I think it's a good thing, for the same reason e.g. I added |
Alternative to #42590.
/cc @RaisinTen