Skip to content

Latest commit

 

History

History
70 lines (61 loc) · 5.14 KB

ARM-Cortex-A9_gcc-4.7.3_x32_Ubuntu-12.04.md

File metadata and controls

70 lines (61 loc) · 5.14 KB

Ubuntu 12.04, gcc-4.7.3, 32-bit, with fftw 3.3.3 (built with --enable-neon), on a 1.2GHz ARM Cortex A9 (Tegra 3)

Built with:

gcc-4.7 -O3 -DHAVE_FFTW -march=armv7-a -mtune=cortex-a9 -mfloat-abi=hard -mfpu=neon -ffast-math test_pffft.c pffft.c -o test_pffft_arm fftpack.c -lm -I/usr/local/include/ -L/usr/local/lib/ -lfftw3f
input len real FFTPack real FFTW real PFFFT cplx FFTPack cplx FFTW cplx PFFFT
64 549 452 731 512 602 640
96 421 272 702 496 571 602
128 498 512 815 597 618 652
160 521 536 815 586 669 625
192 539 571 883 485 597 626
256 640 539 975 569 611 671
384 499 610 879 499 602 637
480 518 507 877 496 661 616
512 524 591 1002 549 678 668
640 542 612 955 568 663 645
768 557 613 981 491 663 598
800 514 353 882 514 360 574
1024 640 640 1067 492 683 602
2048 587 640 908 486 640 552
2400 479 368 777 422 376 518
4096 511 614 853 426 640 534
8192 415 584 708 386 622 516
9216 419 571 687 364 586 506
16384 426 577 716 398 606 530
32768 417 572 673 399 572 468
262144 219 380 293 255 431 343
1048576 202 274 237 265 282 355

Same platform as above, but this time pffft and fftpack are built with clang 3.2:

clang -O3 -DHAVE_FFTW -march=armv7-a -mtune=cortex-a9 -mfloat-abi=hard -mfpu=neon -ffast-math test_pffft.c pffft.c -o test_pffft_arm fftpack.c -lm -I/usr/local/include/ -L/usr/local/lib/ -lfftw3f
input len real FFTPack real FFTW real PFFFT cplx FFTPack cplx FFTW cplx PFFFT
64 427 452 853 427 602 1024
96 351 276 843 337 571 963
128 373 512 996 390 618 1054
160 426 536 987 375 669 914
192 404 571 1079 388 588 1079
256 465 539 1205 445 602 1170
384 366 610 1099 343 594 1099
480 356 507 1140 335 651 931
512 411 591 1213 384 649 1124
640 398 612 1193 373 654 901
768 409 613 1227 383 663 1044
800 411 348 1073 353 358 809
1024 427 640 1280 413 692 1004
2048 414 626 1126 371 640 853
2400 399 373 898 319 368 653
4096 404 602 1059 357 633 778
8192 332 584 792 308 616 716
9216 322 561 783 299 586 687
16384 344 568 778 314 617 745
32768 342 564 737 314 552 629
262144 201 383 313 227 435 413
1048576 187 262 251 228 281 409

So it looks like, on ARM, gcc 4.7 is the best at scalar floating point (the fftpack performance numbers are better with gcc), while clang is the best with neon intrinsics (see how pffft perf has improved with clang 3.2).