-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Iterators #1141
Conversation
At this moment loop variables can contain integers only. This change paves the way for loop variables with different types in the future.
With this change instructions like OP_PUSH_M and OP_POP_M work independently of the type of the value.
This introduces a change in the way in which "for-in" loops work by introducing the concept of iterators. These kind of loops now accept an iterator that will return a sequence of values. Besides the integer ranges (N..M) and integer enumerations (X,Y,Z), this kind of loops now accepts other kinds of iterables, like arrays. This allows conditions like the following one: for any section in pe.sections: ( section.name == "foo" ) Until now the same expression was written as: for any i in (0..pe.number_of_sections - 1): ( pe.sections[i].name == "foo" ) The new syntax is more legible and opens door for more powerful features in the future. Backward compatibility is maintained.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fascinating work, thank you for doing this!
I think this looks good, just some minor questions which probably show my ignorance. :)
integer_set | ||
: '(' integer_enumeration ')' { $$ = INTEGER_SET_ENUMERATION; } | ||
| range { $$ = INTEGER_SET_RANGE; } | ||
: '(' integer_enumeration ')' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you are intentionally not using string enumerations here? In theory they could be treated as an iterator, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct. I'm leaving that for another commit in order to keep this one as small as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. Thanks for clarifying!
libyara/include/yara/types.h
Outdated
// of the iterator, because it's legitimate for an iterator to return UNDEFINED | ||
// items in the middle of the iteration. | ||
// | ||
// The "next" function should return ERROR_SUCCESS if every went fine or an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// The "next" function should return ERROR_SUCCESS if everything went fine or an
if (obj != NULL) | ||
stack->items[stack->sp++].o = obj; | ||
else | ||
stack->items[stack->sp++].i = UNDEFINED; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than push UNDEFINED can't we just skip this item? I'm thinking of a scenario where someone has done something like this in a module:
set_integer(1, pe->object, "foo[%i]", 0); set_integer(2, pe->object, "foo[%i]", 100);
In this case wouldn't we be pushing a lot of UNDEFINED values before we get to the next assigned value in the array? Wouldn't it be nicer to only return objects if they are there and skip over the unassigned array values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually that was my first idea, but I found out that it wouldn't work as expected.
Suppose you have an array of integers with 3 items where the first two are undefined, like [undefined, undefined, 10]. If we simply skip the first two items that would be undistinguishable from an array with a single item, the loop would be executed only once, and an expression like this one would be true:
for all i in array : ( i == 10)
However this expression shouldn't be true because the equality is not true for all items in the array as there are two of them that are not defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. The ways I see around this involve some rather drastic, and possibly backwards-compatability breaking, changes. In particular I'm thinking we can do something like set_integer(1, pe->object, "foo")
and have it act more like an append function, where you don't get to specify the index. This would at least ensure the values that are set are contiguous, though it doesn't address the fact that you can still set undefined. As I think about it more, I think your approach is the best - it sucks that we can iterate over an arbitrary number of UNDEFINED values but it is the best option.
Thanks for the clarification here!
Until now all loops were allocating 5 (LOOP_LOCAL_VARS) local variables in the scratch memory, even if they didn't use all of them. Now each loop will allocate only the variables that it uses.
Example: for <any|all|number> key,value in some_dictionary : ( // Some condition that uses "key" and "value". The identifiers "key" and "value" can be changed to anything you want. )
This introduces a change in the way in which "for-in" loops work by introducing the concept of iterators. These kind of loops now accept an iterator that will return a sequence of values. Besides the integer ranges (N..M) and integer enumerations (X,Y,Z), this kind of loops now accepts other kinds of iterables, like arrays. This allows conditions like the following one: for any section in pe.sections: ( section.name == "foo" ) Until now the same expression was written as: for any i in (0..pe.number_of_sections - 1): ( pe.sections[i].name == "foo" ) The new syntax is more legible and opens door for more powerful features in the future. Backward compatibility is maintained.
This introduces a change in the way in which "for-in" loops work by introducing the concept of iterators. These kind of loops now accept an iterator that will return a sequence of values. Besides the integer ranges (N..M) and integer enumerations (X,Y,Z), this kind of loops now accepts other kinds of iterables, like arrays. This allows conditions like the following one:
for any section in pe.sections: ( section.name == "foo" )
Until now the same expression was written as:
for any i in (0..pe.number_of_sections - 1): ( pe.sections[i].name == "foo" )
The new syntax is more legible and opens door for more powerful features in the future. Backward compatibility is maintained.