Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

buffer: add indexOf() method #561

Closed
wants to merge 1 commit into from

Conversation

trevnorris
Copy link
Contributor

Add Buffer#indexOf(). Support strings, numbers and other Buffers. This
is more support than String#indexOf() gives, but the increased
versatility should be helpful.

Special thanks to Sam Rijs for first proposing this
change.

R=@bnoordhuis

This is a re-hash of #160. Done this way so future support can include regexp's and arrays.

@trevnorris trevnorris added buffer Issues and PRs related to the buffer subsystem. enhancement labels Jan 22, 2015
@trevnorris
Copy link
Contributor Author

Ref to original PR that brought this about: #160

Buffer.prototype.indexOf = function indexOf(val, byteOffset) {
if (byteOffset > 0xffffffff)
byteOffset = 0xffffffff;
if (byteOffset < 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe else if, does that have any optimisation benefit here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Will do.

return args.GetReturnValue().Set(-1);

char data[2];
data[0] = val;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We silently truncate values >255?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I supposed to throw?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already silently truncate values >255 when initializing a buffer:

new Buffer([256]) // <Buffer 00>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly curious if we'd want to treat >0xff values as "search for byte sequence," though I suppose that could start to get into endianness issues.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(alternatively, yes, I would consider throwing for the sake of proper input validation.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not going to throw. I was also considering the search for byte sequence. e.g. b.indexOf(0xbada55), and think it is a good solution. Though users will need to know that byte sequences over 0x20000000000000 loose precision. Also, I'm not worried about endianness. The input data is endiannes independent, and it's up to the user to know what the data looks like that they're searching for.

@bnoordhuis What would be the fastest way to convert a number to a byte sequence that can be searched for?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean like this?

uint32_t needle = args[1]->Uint32Value();
void* ptr = memmem(haystack, haystack_len, &needle, sizeof(needle));

void IndexOfNumber(const FunctionCallbackInfo<Value>& args) {
ASSERT(args[0]->IsObject());
ASSERT(args[1]->IsNumber());

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing ASSERT(args[2]->IsNumber()); ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. Guess I could put it in. The JS simply does coercion for sanity, but the Uint32Value() will still generally do the correct thing. Guess I'll throw it in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok.

@mscdex
Copy link
Contributor

mscdex commented Jan 23, 2015

How would regexp functionality be implemented/supported? The built-in C regex library, v8's regex library, pcre, or?

@trevnorris
Copy link
Contributor Author

@mscdex I'm working on that now. Can hack around it by having a pre-allocated external array class and reassign it on the fly, then convert the regex into 1 byte characters (for utf8 safety). So V8 would still do the heavy lifting but all the resources would still live outside the heap.

@rvagg
Copy link
Member

rvagg commented Jan 23, 2015

see also #161 for lastIndexOf() (listing here for reference, not necessarily recommendation)

@rvagg rvagg added the semver-minor PRs that contain new features and should be released in the next minor version. label Jan 23, 2015

if (str.length() == 0) {
return args.GetReturnValue().Set(
MIN(static_cast<uint32_t>(obj_length), offset));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: let's switch to std::min() and std::max() in a follow-up PR.

@feross
Copy link
Contributor

feross commented Jan 26, 2015

@trevnorris Great work!

While considering how to add support in buffer, I realized I don't like how Buffer.prototype.indexOf behaves with a negative byteOffet value.

If byteOffset < 0, we currently search the whole buffer (same as passing 0). This is how String.prototype.indexOf behaves.

I think it makes more sense to treat a byteOffset < 0 as the offset from the end of the buffer. This is how Array.prototype.indexOf and TypedArray.prototype.indexOf do it. In general, I'd prefer to copy the way TypedArray works unless we have a good reason to do otherwise. Here's the spec.


assert.equal(b.indexOf('a'), 0);
assert.equal(b.indexOf('a', 1), -1);
assert.equal(b.indexOf('a', -1), 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we changed to Array.prototype.indexOf/TypedArray.prototype.indexOf semantics, this would change to:

assert.equal(b.indexOf('a', -1), -1);

And I'd add a few more tests for good measure:

assert.equal(b.indexOf('a', -4), -1);
assert.equal(b.indexOf('a', -5), 0);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks.

This was referenced Dec 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
buffer Issues and PRs related to the buffer subsystem. semver-minor PRs that contain new features and should be released in the next minor version.
Projects
None yet
Development

Successfully merging this pull request may close these issues.