Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

readline: add support for FileHandle #43102

Closed
wants to merge 2 commits into from
Closed

Conversation

aduh95
Copy link
Contributor

@aduh95 aduh95 commented May 14, 2022

Alternative to #42590.

/cc @RaisinTen

@nodejs-github-bot nodejs-github-bot added needs-ci PRs that need a full CI run. readline Issues and PRs related to the built-in readline module. labels May 14, 2022
Copy link
Contributor

@RaisinTen RaisinTen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is just a convenience function and the same end result can already be achieved by using existing APIs, could you please point out which of the points mentioned in https://github.com/nodejs/node/blob/HEAD/doc/contributing/components-in-core.md applies here?

added: REPLACEME
-->

* `input` {FileHandle}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the justification behind choosing FileHandle and not a file path or an fd? Is it because most of userland code uses FileHandles nowadays?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using FileHandle is more or less the only reliable way of checking if a file exists to then proceed to read it in an async context, for this reason I think we should simplify its use when we can. fd are just numbers, I think it could be dangerous to add support for those, file paths are just strings. We could consider URL objects, but I'm not convinced that would make a very interesting API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to read it in an async context

Hmm, would be nice to have a test case that tries to read lines from the file after a part of the stream has already been consumed if that's what you mean by the async context here.

Copy link
Contributor Author

@aduh95 aduh95 May 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By async context, I mean that you at some point need to know that the file exists, then at another point you need to read it and/or write to it. e.g.

async function* getLinesOrCreateFile(path) {
  let fh
  try {
    fh = await open(path, 'r');
  } catch (err) {
    if (err?.code === 'ENOENT') {
      await writeFile(path, getDefaultData());
      yield* getDefaultData();
      return;
    }
    throw err;
  }
  try {
    yield* readLines(fh);
  } finally {
    await fh.close();
  }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so something like a cache. This could still be done by accepting string filepaths, fds and url objects with some modifications like opening the file later on in the last try block and turning the open() call into access().

fd are just numbers, I think it could be dangerous to add support for those, file paths are just strings. We could consider URL objects, but I'm not convinced that would make a very interesting API.

I'm not sure how using these are dangerous though. Fd based APIs are not marked as deprecated or legacy and there is no mention of any sort of danger in the fs docs.

I didn't understand what's wrong with file paths being "just strings".

It seems that there are multiple ways in which users could prefer to use this functionality. I feel that adding support for anything other than just readable streams would serve as an impetus for PRs adding support for other things like fds, string paths and even url objects (you might not find the api interesting but someone else might :P). This would unnecessarily expand the API surface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could still be done by accepting string filepaths, fds and url objects with some modifications like opening the file later on in the last try block and turning the open() call into access().

Well no, the access call to a path could report there's a file that exists, then another program could delete it, which would create an error later when trying to read the file. See discussion in #39960 for more context.

I'm not sure how using these are dangerous though. Fd based APIs are not marked as deprecated or legacy and there is no mention of any sort of danger in the fs docs.

#37874 (comment)

I didn't understand what's wrong with file paths being "just strings".

What I mean is readLines('this is a line\nthis is another line') could be something that someone tries, I think it's more useful if it throws a useful error. Anyway, no strong feeling on my side for this, it's not what I'm looking for anyway.

It seems that there are multiple ways in which users could prefer to use this functionality. I feel that adding support for anything other than just readable streams would serve as an impetus for PRs adding support for other things like fds, string paths and even url objects (you might not find the api interesting but someone else might :P). This would unnecessarily expand the API surface.

That's a great point, that's why I suggest instead we make it a method of the FileHandle class.

if (input?.input) {
if (input instanceof FileHandle) {
input = input.createReadStream();
crlfDelay = Infinity;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure that this is the only option that needs to be set with a value different from the default while reading lines from a file?
Also, why isn't Infinity the default value of crlfDelay in all cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure that this is the only option that needs to be set with a value different from the default while reading lines from a file?

That's the documented one, we can always update it later if we find out some are missing.

Also, why isn't Infinity the default value of crlfDelay in all cases?

I don't have the answer to that question, I don't think adding an unnecessary breaking change is worth it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think adding an unnecessary breaking change is worth it

I thought doing that would fix the weird problems you were mentioning in #42590 (comment) instead of introducing unnecessary breakage. (would be interesting to know what the weird problem is though)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unnecessary breakage is to change the default. See the docs for more context:

node/doc/api/readline.md

Lines 998 to 1003 in 6afd3fc

* `crlfDelay` {number} If the delay between `\r` and `\n` exceeds
`crlfDelay` milliseconds, both `\r` and `\n` will be treated as separate
end-of-line input. `crlfDelay` will be coerced to a number no less than
`100`. It can be set to `Infinity`, in which case `\r` followed by `\n`
will always be considered a single newline (which may be reasonable for
[reading files][] with `\r\n` line delimiter). **Default:** `100`.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is \r\n supposed to be considered as a single line even on posix? I thought line endings were \r on posix and \r\n on windows.

Copy link
Contributor Author

@aduh95 aduh95 May 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\r and \n are two different characters, \n is the one that marks the end of the line on non-Windows systems. readline will treat both as a line separator, unless there are next to one another.

case 'return': // Carriage return, i.e. \r
this[kSawReturnAt] = DateNow();
this[kLine]();
break;
case 'enter':
// When key interval > crlfDelay
if (
this[kSawReturnAt] === 0 ||
DateNow() - this[kSawReturnAt] > this.crlfDelay
) {
this[kLine]();
}
this[kSawReturnAt] = 0;
break;


* `input` {FileHandle}

Convenient alias to create a `readlinePromises.Interface` and stream over the file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Convenient alias to create a `readlinePromises.Interface` and stream over the file.
Convenient alias to create a `readlinePromises.Interface` and stream over the
file.

lint fix

@aduh95
Copy link
Contributor Author

aduh95 commented May 18, 2022

Since this is just a convenience function and the same end result can already be achieved by using existing APIs, could you please point out which of the points mentioned in https://github.com/nodejs/node/blob/HEAD/doc/contributing/components-in-core.md applies here?

4: Developer experience is significantly improved if the component is in core.
5: The component provides functionality that can be expected to solve at least one common use case Node.js users face.

@RaisinTen
Copy link
Contributor

RaisinTen commented May 19, 2022

4: Developer experience is significantly improved if the component is in core.

"significant" sounds very subjective here. Personally, I wouldn't call this a significant developer experience improvement because it just allows you to write this

const rl = readline.createInterface({ input: fh.createReadStream(), crlfDelay: Infinity });
for await (const line of rl) {
  ...
}

like this

for await (const line of readline.createInterface(fh)) {
  ...
}

which is just a 1 line reduction and there isn't a lot of complaints from users regarding writing this line either at least in this repo. However, it would definitely be nice to hear if other people consider this to be a significant developer experience improvement. Perhaps the TSC because the doc was written in response to nodejs/TSC#1041 which is in the tsc repo?

5: The component provides functionality that can be expected to solve at least one common use case Node.js users face.

The functionality is already provided and it is documented in https://nodejs.org/api/readline.html#example-read-file-stream-line-by-line. For people who don't read Node.js docs, it's also there on stackoverflow - https://stackoverflow.com/a/32599033.

@aduh95
Copy link
Contributor Author

aduh95 commented May 19, 2022

"significant" sounds very subjective here
which is just a 1 line reduction
there isn't a lot of complaints from users

I agree that it is indeed subjective. However, I think my frustration with the current API is real (maybe a one line difference doesn't look that much to you, but it is for me), and I think it's fair if I'm being counted as a Node.js user.

The functionality is already provided and it is documented

I think my proposal is reducing friction, and what it's trying to achieve is to make the "right" implementation the "easy" implementation. Also, I don't think there's any upside in refusing to reduce user friction.

@RaisinTen
Copy link
Contributor

I agree that it is indeed subjective. However, I think my frustration with the current API is real (maybe a one line difference doesn't look that much to you, but it is for me), and I think it's fair if I'm being counted as a Node.js user.

I think we need support from multiple users to call this significant.

I think my proposal is reducing friction, and what it's trying to achieve is to make the "right" implementation the "easy" implementation. Also, I don't think there's any upside in refusing to reduce user friction.

I'm not sure I fully understand what the friction is that you are talking about but there's no mention of such a thing in the doc currently. So to consider that, we should have consensus with the TSC on accepting changes that "reduce friction".

Moreover, there is a point in the doc that's against the inclusion of this API

3. There is already similar functionality in core and adding the component will
provide a second API to do the same thing.
.

@aduh95
Copy link
Contributor Author

aduh95 commented May 20, 2022

we should have consensus with the TSC on accepting changes that "reduce friction"

I disagree, I don't see how this has anything to do with the TSC, imo this should follow the usual review process.

Anyway, I'm going to pursue #42590 instead, we both seem to agree it's better than this proposal.

@aduh95 aduh95 closed this May 20, 2022
@aduh95 aduh95 deleted the readline-fh branch May 20, 2022 14:25
@RaisinTen
Copy link
Contributor

Anyway, I'm going to pursue #42590 instead, we both seem to agree it's better than this proposal.

Nope, I don't think that's a better proposal. I feel the same way about both because at its core, it's the same thing - a utility API composed of fs and readline stuff to enable reading lines from a file using a lesser line of code than using the underlying APIs directly.

@aduh95
Copy link
Contributor Author

aduh95 commented May 20, 2022

Nope, I don't think that's a better proposal.

Sorry for misrepresenting your position, I guess I should have say "We both agree that this PR is no better than the one I opened earlier".

a utility API composed of fs and readline stuff to enable reading lines from a file using a lesser line of code than using the underlying APIs directly.

Yes that's correct, and I think it's a good thing, for the same reason e.g. I added FileHandle.prototype.createReadStream even though it was already possible before using more lines of code.
The other thing that's worth mentioning is that such utility would nudge user away from anti-patterns (such as await access or forgetting crlfDelay: Infinity).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-ci PRs that need a full CI run. readline Issues and PRs related to the built-in readline module.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants