Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iterators #1141

Merged
merged 8 commits into from
Nov 12, 2019
Merged

Iterators #1141

merged 8 commits into from
Nov 12, 2019

Conversation

plusvic
Copy link
Member

@plusvic plusvic commented Oct 10, 2019

This introduces a change in the way in which "for-in" loops work by introducing the concept of iterators. These kind of loops now accept an iterator that will return a sequence of values. Besides the integer ranges (N..M) and integer enumerations (X,Y,Z), this kind of loops now accepts other kinds of iterables, like arrays. This allows conditions like the following one:

for any section in pe.sections: ( section.name == "foo" )

Until now the same expression was written as:

for any i in (0..pe.number_of_sections - 1): ( pe.sections[i].name == "foo" )

The new syntax is more legible and opens door for more powerful features in the future. Backward compatibility is maintained.

At this moment loop variables can contain integers only. This change paves the way for loop variables with different types in the future.
With this change instructions like OP_PUSH_M and OP_POP_M work independently of the type of the value.
This introduces a change in the way in which "for-in" loops work by introducing the concept of iterators. These kind of loops now accept an iterator that will return a sequence of values. Besides the integer ranges (N..M) and integer enumerations (X,Y,Z), this kind of loops now accepts other kinds of iterables, like arrays. This allows conditions like the following one:

for any section in pe.sections: ( section.name == "foo" )

Until now the same expression was written as:

for any i in (0..pe.number_of_sections - 1): ( pe.sections[i].name == "foo" )

The new syntax is more legible and opens door for more powerful features in the future. Backward compatibility is maintained.
@plusvic plusvic requested a review from wxsBSD October 10, 2019 08:35
Copy link
Collaborator

@wxsBSD wxsBSD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fascinating work, thank you for doing this!

I think this looks good, just some minor questions which probably show my ignorance. :)

integer_set
: '(' integer_enumeration ')' { $$ = INTEGER_SET_ENUMERATION; }
| range { $$ = INTEGER_SET_RANGE; }
: '(' integer_enumeration ')'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you are intentionally not using string enumerations here? In theory they could be treated as an iterator, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. I'm leaving that for another commit in order to keep this one as small as possible.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. Thanks for clarifying!

// of the iterator, because it's legitimate for an iterator to return UNDEFINED
// items in the middle of the iteration.
//
// The "next" function should return ERROR_SUCCESS if every went fine or an
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// The "next" function should return ERROR_SUCCESS if everything went fine or an

if (obj != NULL)
stack->items[stack->sp++].o = obj;
else
stack->items[stack->sp++].i = UNDEFINED;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than push UNDEFINED can't we just skip this item? I'm thinking of a scenario where someone has done something like this in a module:

set_integer(1, pe->object, "foo[%i]", 0); set_integer(2, pe->object, "foo[%i]", 100);

In this case wouldn't we be pushing a lot of UNDEFINED values before we get to the next assigned value in the array? Wouldn't it be nicer to only return objects if they are there and skip over the unassigned array values?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually that was my first idea, but I found out that it wouldn't work as expected.

Suppose you have an array of integers with 3 items where the first two are undefined, like [undefined, undefined, 10]. If we simply skip the first two items that would be undistinguishable from an array with a single item, the loop would be executed only once, and an expression like this one would be true:

for all i in array : ( i == 10)

However this expression shouldn't be true because the equality is not true for all items in the array as there are two of them that are not defined.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. The ways I see around this involve some rather drastic, and possibly backwards-compatability breaking, changes. In particular I'm thinking we can do something like set_integer(1, pe->object, "foo") and have it act more like an append function, where you don't get to specify the index. This would at least ensure the values that are set are contiguous, though it doesn't address the fact that you can still set undefined. As I think about it more, I think your approach is the best - it sucks that we can iterate over an arbitrary number of UNDEFINED values but it is the best option.

Thanks for the clarification here!

Until now all loops were allocating 5 (LOOP_LOCAL_VARS) local variables in the scratch memory,  even if they didn't use all of them. Now each loop will allocate only the variables that it uses.
Example:

   for  <any|all|number> key,value in some_dictionary : (
      // Some condition that uses "key" and "value". The identifiers "key" and "value" can be changed to anything you want.
    )
@plusvic plusvic mentioned this pull request Oct 15, 2019
@plusvic plusvic merged commit 2c0690c into master Nov 12, 2019
plusvic added a commit that referenced this pull request Nov 12, 2019
plusvic added a commit that referenced this pull request Nov 13, 2019
@plusvic plusvic deleted the iterators branch November 13, 2019 11:08
tarterp pushed a commit to mandiant/yara that referenced this pull request Mar 31, 2022
This introduces a change in the way in which "for-in" loops work by introducing the concept of iterators. These kind of loops now accept an iterator that will return a sequence of values. Besides the integer ranges (N..M) and integer enumerations (X,Y,Z), this kind of loops now accepts other kinds of iterables, like arrays. This allows conditions like the following one:

for any section in pe.sections: ( section.name == "foo" )

Until now the same expression was written as:

for any i in (0..pe.number_of_sections - 1): ( pe.sections[i].name == "foo" )

The new syntax is more legible and opens door for more powerful features in the future. Backward compatibility is maintained.
tarterp pushed a commit to mandiant/yara that referenced this pull request Mar 31, 2022
tarterp pushed a commit to mandiant/yara that referenced this pull request Mar 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants