Perl Script (possible) Improvement #1

stevan · 2012-02-17T21:30:39Z

You may get some speed/memory improvements by turning the while statement in Perl into a statement modifier.

push @rows, [split(/\t/)] while <>;

With this change, Perl should not have to create a new lexical scope for the while block, which should save on memory and speed. Note that I haven't benchmarked this (my internet connection is not good enough at the moment to pull your full repo, TSVs and all) and it is entirely possible that it won't be significant enough to matter either.

The text was updated successfully, but these errors were encountered:

lorca · 2012-02-17T21:34:34Z

Thanks, I'll try it out later today and let you know. Good point, I've
removed the TSV file since it can be generated by gen_data.rb.

-steve

On Fri, Feb 17, 2012 at 1:30 PM, Stevan Little <
[email protected]

wrote:

You may get some speed/memory improvements by turning the while
statement in Perl into a statement modifier.
push @rows, [split(/\t/)] while <>;
With this change, Perl should not have to create a new lexical scope for
the while block, which should save on memory and speed. Note that I
haven't benchmarked this (my internet connection is not good enough at the
moment to pull your full repo, TSVs and all) and it is entirely possible
that it won't be significant enough to matter either.

Reply to this email directly or view it on GitHub:
#1

lorca · 2012-02-18T21:57:06Z

hey, slight difference after repeated tests. i dont understand it, but it's there.

https://github.com/lorca/flat_file_benchmark/blob/master/pics/benchmark.png

stevan · 2012-02-19T00:11:35Z

Yes, that makes sense. I didn't expect anything huge. I suspect you might be able to get some more performance if you use the map operator too.

my @rows = map { [ split /\t/ ] } <>;

On Feb 18, 2012, at 4:57 PM, [email protected] wrote:

hey, slight difference after repeated tests. i dont understand it, but it's there.

https://github.com/lorca/flat_file_benchmark/blob/master/pics/benchmark.png

Reply to this email directly or view it on GitHub:
#1 (comment)

lorca · 2012-02-19T01:35:53Z

doesn't make sense to me but glad it makes sense to you :)

it looks like they are both the same but my perl version has to do
something extra.

cool, i'll try out your new version, and let you know.

-steve

On Sat, Feb 18, 2012 at 4:11 PM, Stevan Little <
[email protected]

wrote:

Yes, that makes sense. I didn't expect anything huge. I suspect you might
be able to get some more performance if you use the map operator too.

my @rows = map { [ split /\t/ ] } <>;

On Feb 18, 2012, at 4:57 PM, [email protected] wrote:

hey, slight difference after repeated tests. i dont understand it, but
it's there.

https://github.com/lorca/flat_file_benchmark/blob/master/pics/benchmark.png

Reply to this email directly or view it on GitHub:

#1 (comment)

Reply to this email directly or view it on GitHub:
#1 (comment)

stevan · 2012-02-19T14:41:08Z

Actually it is fairly simple, but requires a bit of understanding of what the perl interpreter is doing under the hood.

When you use the block form of a statement in Perl, like this ...

while (<>) { 
    push @row => $_ 
}

the interpreter has to create a lexical pad, which is the C-level data structure that the perl interpreter uses to store any variables created or used within the block. However, when you do a statement modifier version, like this …

push @row => $_ while <>;

the interpreter doesn't need to create a new lexical pad, which means less memory allocation and less CPU time. You can see this directly by dumping the optree that the perl interpreter creates.

$ perl -MO=Concise -e 'my @row; while (<>) { push @row => $_ }'
i  <@> leave[1 ref] vKP/REFC ->(end)
1     <0> enter ->2
2     <;> nextstate(main 1 -e:1) v:{ ->3
3     <0> padav[@row:1,4] vM/LVINTRO ->4
4     <;> nextstate(main 3 -e:1) v:{ ->5
h     <2> leaveloop vK/2 ->i
5        <{> enterloop(next->b last->h redo->6) v ->c
-        <1> null vK/1 ->h
g           <|> and(other->6) vK/1 ->h
f              <1> defined sK/1 ->g
-                 <1> null sK/2 ->f
-                    <1> ex-rv2sv sKRM*/1 ->d
c                       <$> gvsv(*_) s ->d
e                    <1> readline[t2] sKS/1 ->f
d                       <$> gv(*ARGV) s ->e
-              <@> lineseq vK ->-
6                 <;> nextstate(main 2 -e:1) v ->7
a                 <@> push[t3] vK/2 ->b
7                    <0> pushmark s ->8
8                    <0> padav[@row:1,4] lRM ->9
-                    <1> ex-rv2sv sK/1 ->a
9                       <$> gvsv(*_) s ->a
b                 <0> unstack v ->c
-e syntax OK

$ perl -MO=Concise -e 'my @row; push @row => $_ while <>'
h  <@> leave[1 ref] vKP/REFC ->(end)
1     <0> enter ->2
2     <;> nextstate(main 1 -e:1) v:{ ->3
3     <0> padav[@row:1,2] vM/LVINTRO ->4
4     <;> nextstate(main 2 -e:1) v:{ ->5
g     <@> leave vK* ->h
5        <0> enter v ->6
-        <1> null vKP/1 ->g
a           <|> and(other->b) vK/1 ->g
9              <1> defined sK/1 ->a
-                 <1> null sK/2 ->9
-                    <1> ex-rv2sv sKRM*/1 ->7
6                       <$> gvsv(*_) s ->7
8                    <1> readline[t3] sKS/1 ->9
7                       <$> gv(*ARGV) s ->8
-              <@> lineseq vK ->-
e                 <@> push[t2] vK/2 ->f
b                    <0> pushmark s ->c
c                    <0> padav[@row:1,2] lRM ->d
-                    <1> ex-rv2sv sK/1 ->e
d                       <$> gvsv(*_) s ->e
f                 <0> unstack v ->6
-e syntax OK

You can see that the optrees are almost identical here with the exception of the leaveloop/enterloop on lines 6 & 7 of the block version. Basically, these two opcodes are what are making the difference here.

So the third version I suggested, with the map operator will likely be even faster. Here is what that looks like when you dump the optree.

$ perl -MO=Concise -e 'my @row = map { $_ } <>'
d  <@> leave[1 ref] vKP/REFC ->(end)
1     <0> enter ->2
2     <;> nextstate(main 2 -e:1) v:{ ->3
c     <2> aassign[t4] vKS/COMMON ->d
-        <1> ex-list lK ->a
3           <0> pushmark s ->4
8           <|> mapwhile(other->9)[t3] lK ->a
7              <@> mapstart lK* ->8
4                 <0> pushmark s ->5
-                 <1> null lK/1 ->5
-                    <1> null lK/1 ->8
-                       <@> scope lK ->8
-                          <0> ex-nextstate v ->9
-                          <1> ex-rv2sv sK/1 ->-
9                             <$> gvsv(*_) s ->-
6                 <1> readline[t2] lKM/1 ->7
5                    <$> gv(*ARGV) s ->6
-        <1> ex-list lK ->c
a           <0> pushmark s ->b
b           <0> padav[@row:2,3] lRM*/LVINTRO ->c
-e syntax OK

As you can see it is smaller, but also if you look at line 7, the mapwhile/mapstart opcode, you can see that Perl has specialize opcodes for this particular idiom.

This is where the Perl idea of TIMTOWTDI (There Is More Then One Way To Do It) can come in really handy. You can make tradeoffs and/or reap benefits based on your needs simply by doing it another way.

stevan · 2012-02-19T14:56:30Z

Hmmm, you know, it is also possible that the map version might be slower because it has to pull in the entire input first before it can pass it into the map operation, whereas the while versions will read a line at a time. Of course, this could also mean it is faster too, since it is only calling readline once. I will clone the repo and test this out.

lorca · 2012-02-19T23:20:39Z

yea, I just ran it. it uses a lot of memory, started swapping, then the
kernel OOM handler killed it. right that explanation makes sense,
interesting.

On Sun, Feb 19, 2012 at 6:56 AM, Stevan Little <
[email protected]

wrote:

Hmmm, you know, it is also possible that the map version might be slower
because it has to pull in the entire input first before it can pass it into
the map operation, whereas the while versions will read a line at a
time. Of course, this could also mean it is faster too, since it is only
calling readline once. I will clone the repo and test this out.

Reply to this email directly or view it on GitHub:
#1 (comment)

stevan · 2012-02-20T00:14:54Z

Yeah, okay that makes sense too. Oh well, at least the statement modifier one is quicker. I am sure we could golf this down and squeak out even more performance, but it would be at the expense of readability which is not worth it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perl Script (possible) Improvement #1

Perl Script (possible) Improvement #1

stevan commented Feb 17, 2012

lorca commented Feb 17, 2012

lorca commented Feb 18, 2012

stevan commented Feb 19, 2012

lorca commented Feb 19, 2012

stevan commented Feb 19, 2012

stevan commented Feb 19, 2012

lorca commented Feb 19, 2012

stevan commented Feb 20, 2012

Perl Script (possible) Improvement #1

Perl Script (possible) Improvement #1

Comments

stevan commented Feb 17, 2012

lorca commented Feb 17, 2012

lorca commented Feb 18, 2012

stevan commented Feb 19, 2012

lorca commented Feb 19, 2012

stevan commented Feb 19, 2012

stevan commented Feb 19, 2012

lorca commented Feb 19, 2012

stevan commented Feb 20, 2012