Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unreasonable multiple Whetstone results #15

Open
GoogleCodeExporter opened this issue Apr 9, 2015 · 15 comments
Open

unreasonable multiple Whetstone results #15

GoogleCodeExporter opened this issue Apr 9, 2015 · 15 comments

Comments

@GoogleCodeExporter
Copy link

What steps will reproduce the problem?
1. ./Run arthrimetic
2. ./Run -c 4 arthrimetic
3. ./Run -c 8 arthrimetic
4. ./Run -c 12 arthrimetic
5. ./Run -c 16 arthrimetic

What is the expected output? What do you see instead?

the system CPU has only 4 cores.

-c 4 ~ 16 should have result slightly less than 4x of single process test 
result.
but, the results are:
4: 3380.9       
8: 6756.2   
12: 10130.1 
16: 13504.2

What version of the product are you using? On what operating system?
unixbench 5.1.3
3.2.0-4-amd64 #1 SMP Debian 3.2.63-2 x86_64 GNU/Linux
one Intel XEON E3-1225

Please provide any additional information below.
also got same kind of result on a Apple MBP, with Core i7 3615, 4C8T

1: 959.8    
4: 3902.9   
8: 7139.1
12: 10673.6
16: 14252.6

Original issue reported on code.google.com by [email protected] on 5 Dec 2014 at 2:38

@GoogleCodeExporter
Copy link
Author

sorry, the title should be "unreasonable multiple-core Whetstone results"

Original comment by [email protected] on 5 Dec 2014 at 2:40

@GoogleCodeExporter
Copy link
Author

I assume this has got to do with how the measurement is done. Just raised the 
same issue. I changed the flags passed to compiler and it fixed the problem. 
Could you kindly try the same?

Instead of -DUNIX , kindly use -DGTODay so that it uses wall clock time instead 
of process cpu time.

Thanks.

Original comment by [email protected] on 30 Mar 2015 at 1:11

@GoogleCodeExporter
Copy link
Author

tested in OS X, 4C8T, (not idle). looks it still uses process cpu time. 
1: 1004.1
2: 1951.8
4: 3887.8
8: 6794.8

Original comment by [email protected] on 30 Mar 2015 at 2:41

@GoogleCodeExporter
Copy link
Author

I tried now and I see it is working fine.

DUNIX:

Benchmark Run: Mon Mar 30 2015 21:48:04 - 21:48:19
4 CPUs in system; running 1 parallel copy of tests

Double-Precision Whetstone                     4203.4 MWIPS (9.7 s, 1 samples)

System Benchmarks Partial Index              BASELINE       RESULT    INDEX
Double-Precision Whetstone                       55.0       4203.4    764.3
                                                                   ========
System Benchmarks Index Score (Partial Only)                          764.3

------------------------------------------------------------------------
Benchmark Run: Mon Mar 30 2015 21:48:19 - 21:48:36
4 CPUs in system; running 4 parallel copies of tests

Double-Precision Whetstone                    13725.4 MWIPS (9.9 s, 1 samples)

System Benchmarks Partial Index              BASELINE       RESULT    INDEX
Double-Precision Whetstone                       55.0      13725.4   2495.5
                                                                   ========
System Benchmarks Index Score (Partial Only)                         2495.5

------------------------------------------------------------------------
Benchmark Run: Mon Mar 30 2015 21:48:36 - 21:49:05
4 CPUs in system; running 8 parallel copies of tests

Double-Precision Whetstone                    27433.0 MWIPS (9.9 s, 1 samples)

System Benchmarks Partial Index              BASELINE       RESULT    INDEX
Double-Precision Whetstone                       55.0      27433.0   4987.8
                                                                   ========
System Benchmarks Index Score (Partial Only)                         4987.8


DGTODay:

Benchmark Run: Mon Mar 30 2015 21:51:08 - 21:51:24
4 CPUs in system; running 1 parallel copy of tests

Double-Precision Whetstone                     4293.7 MWIPS (10.0 s, 1 samples)

System Benchmarks Partial Index              BASELINE       RESULT    INDEX
Double-Precision Whetstone                       55.0       4293.7    780.7
                                                                   ========
System Benchmarks Index Score (Partial Only)                          780.7

------------------------------------------------------------------------
Benchmark Run: Mon Mar 30 2015 21:51:24 - 21:51:40
4 CPUs in system; running 4 parallel copies of tests

Double-Precision Whetstone                    13406.6 MWIPS (9.9 s, 1 samples)

System Benchmarks Partial Index              BASELINE       RESULT    INDEX
Double-Precision Whetstone                       55.0      13406.6   2437.6
                                                                   ========
System Benchmarks Index Score (Partial Only)                         2437.6

------------------------------------------------------------------------
Benchmark Run: Mon Mar 30 2015 21:51:40 - 21:51:55
4 CPUs in system; running 8 parallel copies of tests

Double-Precision Whetstone                    14167.5 MWIPS (9.3 s, 1 samples)

System Benchmarks Partial Index              BASELINE       RESULT    INDEX
Double-Precision Whetstone                       55.0      14167.5   2575.9
                                                                   ========
System Benchmarks Index Score (Partial Only)                         2575.9

Original comment by [email protected] on 30 Mar 2015 at 4:23

@GoogleCodeExporter
Copy link
Author

Since you mentioned 8 threads, could you try with 16 copies and with both the 
timing options to see how the numbers look like?

Original comment by [email protected] on 30 Mar 2015 at 4:23

@GoogleCodeExporter
Copy link
Author

Forgot to mention additional information. My laptop had 2C4T.. so.. 4T should 
be the max for any throughput scaling..

Original comment by [email protected] on 30 Mar 2015 at 4:25

@GoogleCodeExporter
Copy link
Author

This works for me. I made tests using Arch Linux on:

Intel(R) Xeon(R) CPU W3530 (4 cores, HT)
ARM Cortex A8 (one core)
ARM Cortex A7 (dual core)

I don't have any Mac to test, maybe you may try -DMAC ? You can see the options 
at src/whets.c at "Timer options".

Thanks.

Original comment by [email protected] on 30 Mar 2015 at 4:30

@GoogleCodeExporter
Copy link
Author

Mac OS also should support gettimeofday( ). Could you kindly try increasing the 
number of copies to be beyond the number of cpu threads and test the same?

Original comment by [email protected] on 30 Mar 2015 at 5:01

@GoogleCodeExporter
Copy link
Author

[deleted comment]

1 similar comment
@GoogleCodeExporter
Copy link
Author

[deleted comment]

@GoogleCodeExporter
Copy link
Author

tried -DMAC, and -DMAC_TMgr

-DMAC_TMgr won't even compile, "Timer.h"  not found, maybe that's for older PPC 
Macs.

result of -DMAC on 4C8T OS X (problem seems not solved, and new problem 
introduced.)

Original comment by [email protected] on 31 Mar 2015 at 4:50

Attachments:

@GoogleCodeExporter
Copy link
Author

and  tested -DGTODay in 
Linux ... 3.16.0-4-686-pae #1 SMP Debian 3.16.7-ckt4-3 (2015-02-03) i686 
GNU/Linux
(already patched 'volatile unsigned long iter;')
got this error:
1 x Arithoh  1 2 3Can't take log of -2.93709e+07 at ./Run line 935.

is the timing done by Perl script 'Run' ?

Original comment by [email protected] on 31 Mar 2015 at 4:55

@GoogleCodeExporter
Copy link
Author

Could you kindly clarify the following? I assume the initial issue reported was 
with whetstone. Did you try -DGTODay with whetstone? What was the results with 
multiple copies ( > 8 copies ).

arith.c uses a different method for measuring time. From your previous results 
arithmetic is working properly.

Original comment by [email protected] on 31 Mar 2015 at 5:46

@GoogleCodeExporter
Copy link
Author

hmm, looks like solved.

though the score of >4 copies is much higher  than what I expected. 

Thanks!

Original comment by [email protected] on 31 Mar 2015 at 9:45

Attachments:

@GoogleCodeExporter
Copy link
Author

 -DGTODay also works well in linux. 

Original comment by [email protected] on 1 Apr 2015 at 4:42

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant