-
Notifications
You must be signed in to change notification settings - Fork 1
/
ChangeLog
20743 lines (14187 loc) · 614 KB
/
ChangeLog
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
commit 7184fc796279cfa70e4ba62519ac2938054584e6
Author: Matteo Frigo <[email protected]>
Date: Tue Sep 14 10:41:20 2021 -0400
Update configure.ac for 3.3.10
commit 62e605c845d3c366849f48c977dfe0e320c23ff0
Author: Matteo Frigo <[email protected]>
Date: Mon May 31 09:55:57 2021 -0400
Use -mavx2 instead of -march=core-avx2
It appears that -march=core-avx2 was necessary on some old version of
gcc-4.8, but setting -march is problematic for users who want to set
-march themselves.
It appears that whatever gcc bug caused us to use -march=core-avx2 in
2015 is long gone. I have checked gcc-4.8.5 on centos-7 and
oraclelinux-7 (both 64-bit) and on 32-bit debian jessie and in all
cases -mavx2 is sufficient.
Therefore, given that the original cause for using -march is gone and
that using -march causes problems, I am removing the automatic
selection of -march=core-avx2 and I am using -mavx2 instead. Users
for whom this is a problem can still override the default via
./configure --enable-avx2 AVX2_CFLAGS=...
commit e9c510bf92a4fa848982310d2ecc8c5701aa3f6a
Author: Matteo Frigo <[email protected]>
Date: Thu Feb 25 17:52:44 2021 -0500
Update NEWS.
Bugfix 06855286 is noteworthy.
commit 06855286332683aa79a541eddc07ee535cfa4f95
Author: Matteo Frigo <[email protected]>
Date: Thu Feb 25 10:15:04 2021 -0500
Fix aligment checks for n2*v codelets.
n2{f,b}v codelets assume that all loads and stores are aligned.
Thus, not only the pointers must be aligned, but all strides must
be aligned as well (in the ALIGNEDA sense).
For some reason, two-way SIMD, including double-precision sse2,
defined SIMD_VSTRIDE_OKA(ivs) to be true always. However, this is
incorrect because the codelet only works if ivs is even. This change
implements the correct condition in all places where it matters.
Thanks Kuangdai Leng for the bug report.
This bug is deeply puzzling. First, I don't understand how this
bug survived for years despite uncountable randomized tests. Second,
I am pretty sure I fixed exactly this issue years ago and I don't
understand how it came back.
commit d738232235af6c96bc4a28b07fc867c000696245
Author: Rui Oliveira <[email protected]>
Date: Wed Feb 24 15:58:23 2021 +0000
Fix SIMD detection for MSVC (#232)
commit 17b4252e3da62490d3e206569ebfec552d09afc1
Author: Vincent Jacques <[email protected]>
Date: Sun Dec 27 20:16:40 2020 +0100
Doc: remove duplicated "the" (#226)
commit 34082eb5d6ed7dc9436915df69f376c06fc39762
Author: Matteo Frigo <[email protected]>
Date: Thu Dec 10 06:54:31 2020 -0500
Updated NEWS for fftw-3.3.9
commit a67409c50acd19180cc9b8f61c67354d392a9e66
Author: Steven G. Johnson <[email protected]>
Date: Tue Oct 20 12:20:45 2020 -0400
undo inadvertent change in cdb950a60c8a5b723f25ec24639a824e021b780d
commit 16164a7bf3257376f22ddeda0842805d1c456b01
Author: Galen Lynch <[email protected]>
Date: Thu Aug 20 13:22:23 2020 -0400
First attempt at adding planner_nthreads (#205)
* First attempt at adding planner_nthreads
While you can currently set the maximum number of threads used by the planner,
there is no way to check that number. This simply adds a new function,
`planner_nthreads`, that provides this information, as was suggested by
@stevengj.
* Add basic testing of planner_nthreads
In an attempt to verify that planner_nthreads works as intended, I have added an
assertion in `fftw-bench.c` that `planner_nthreads` returns the nthreads just
passed as an argument to `plan_with_nthreads`.
commit 8266ecd946bebf3214c3b13880c8c2110f0600a4
Author: Robin Lu <[email protected]>
Date: Thu Aug 13 22:45:16 2020 +0800
Misspell: imlementation to implementation (#207)
commit 10e2040af822a08ed49d2f6a1db45a7a3ad50582
Author: Steven G. Johnson <[email protected]>
Date: Wed Aug 5 14:44:41 2020 -0400
-no-gcc seems to have not been required with icc for years now, and is starting to cause problems with system headers on other platforms (fixes #184)
commit 78e01190450398d44c7ba7ac69d3e720b6af1f16
Author: Steven G. Johnson <[email protected]>
Date: Wed Jul 1 08:24:32 2020 -0400
tag upper bound is 0x7fff, not 0xffff (fixes #206)
commit fe55435ee880569b85446ff9ba6640a7db53a025
Author: Holy Wu <[email protected]>
Date: Wed Jul 1 06:04:27 2020 +0800
Fix x86 compilation in MSVC (#198)
commit d832ca0e2f286650278563c6887c4f235785e606
Author: Matteo Frigo <[email protected]>
Date: Sat Jun 13 13:10:44 2020 -0400
Fix incorrect math in 128-bit generic SIMD
It looks like simd-generic128.h:VTW1 was copied&pasted from
the equivalent SSE code, but the consumer of that structure
(BYTW1) was assuming a different format.
commit ecba4071a7ece5c2fa6d93a21468a52dbb0522f5
Author: Matteo Frigo <[email protected]>
Date: Sat Jun 6 07:25:34 2020 -0400
Update MIT contact address
commit 516116615362334c2051a16ec1658a888573ab63
Merge: 3a103f42 83707fdf
Author: Matteo Frigo <[email protected]>
Date: Fri May 1 16:57:58 2020 -0400
Merge pull request #201 from muxator/patch-1
README.md: link to README
commit 83707fdf9118b10fd88f67b206e5a2d2b69764c1
Author: muxator <[email protected]>
Date: Fri May 1 20:21:53 2020 +0200
README.md: link to README
README.md refers the user to the text-only README, which contains - among other
things, build instructions. Since the two files have a very similar name, this
may lead to confusion for first-time users.
With this commit, a relative hyperlink pointing to README is added in README.md,
in order to make it explicit what file is being referenced.
commit 3a103f4258d2a2090aa9038503aae7ca56c50bea
Author: Matteo Frigo <[email protected]>
Date: Tue Apr 14 20:35:45 2020 -0400
Add explicit cast for C++ compatibility
commit ef15637f3241f8d2cc8fcefc2bb9e536de98d692
Author: Matteo Frigo <[email protected]>
Date: Fri Apr 3 12:56:06 2020 -0400
Fix Makefile rules under -j N
For some reason, given this Makefile:
a: b c
b c:
COMMANDS
invoking "make -j 2 c" runs COMMANDS in parallel twice, which fails if
COMMANDS modify some global state.
Hack around this quirk.
commit db8b77525089448206b453279360b752f476d1cf
Author: Matteo Frigo <[email protected]>
Date: Fri Apr 3 12:38:18 2020 -0400
Style Police
commit 2a0f32012aaf3714ea4c4eccad5b4f053e85fb39
Author: Matteo Frigo <[email protected]>
Date: Fri Apr 3 12:22:55 2020 -0400
Replace deprecated Sort.list with List.sort
commit c69d6c3c7512a814efe229d15e51dda0c1590ee6
Author: Matteo Frigo <[email protected]>
Date: Mon Jan 20 10:41:38 2020 -0500
Report mflops as %.8g (was %.5g)
I got tired of seeing things like 1.15e5, prefer 115000.
The drawback is that now bench reports things like 6839.6357 mflops,
where the 8-digit precision implies much more accuracy than we
actually have.
commit cdb950a60c8a5b723f25ec24639a824e021b780d
Author: Steven G. Johnson <[email protected]>
Date: Fri Jan 17 11:11:48 2020 -0500
fix compilation w/o threads
commit 8588d74f24c7d98dcc66466bb364210330639f69
Author: Steven G. Johnson <[email protected]>
Date: Tue Sep 17 18:44:32 2019 -0400
soversion bump for API change
commit 4a74466adfc93ca8748c4661de47cd9fa5c6bb46
Author: Steven G. Johnson <[email protected]>
Date: Tue Sep 17 17:53:50 2019 -0400
enable avx512 on clang (#177)
commit 9126411654e6ac9c94202ceeb5a5af849ecac12c
Author: Steven G. Johnson <[email protected]>
Date: Tue Sep 17 14:44:29 2019 -0400
update 3.3.9 NEWS
commit ec98679b6535a7a75a4348c697b8b7daaae53b64
Author: Steven G. Johnson <[email protected]>
Date: Tue Sep 17 14:41:30 2019 -0400
indentation
commit ca3c9c7e56c608bf05f42a6e92a46cb01c0ffe46
Author: Steven G. Johnson <[email protected]>
Date: Tue Sep 17 14:40:23 2019 -0400
document fftw_threads_set_callback
commit 4093e4b58528459afdb6ba9e373d5d7a2a3a22b0
Author: Steven G. Johnson <[email protected]>
Date: Tue Sep 17 14:23:53 2019 -0400
add test for fftw_threads_set_callback
commit 8a9236c494f680e550c536eeb6aa7102ecfad2ca
Author: Steven G. Johnson <[email protected]>
Date: Mon Sep 16 15:48:40 2019 -0400
export threads_set_callback function properly, support in openmp as well as threads backend
commit ffd28fd35b4ea6b29194ca2ef73994398fa9ee46
Author: Steven G. Johnson <[email protected]>
Date: Wed Sep 4 21:14:52 2019 -0400
silence more generated code in .gitignore
commit c5c85fe618a93bf6e41489bb65b6bf4e27224ff1
Author: Matteo Frigo <[email protected]>
Date: Sat Aug 10 08:41:09 2019 -0400
README: note dependency on the Ocaml Num library.
genfft needs the Ocaml Num library, which has been part of Ocaml for
more than twenty years but was suddenly removed in ocaml-4.06:
* #1178: remove the Num library for arbitrary-precision arithmetic.
It now lives as a separate project https://github.com/ocaml/num
with an OPAM package called "num".
(Xavier Leroy)
For now, note in README that Num is necessary, until I come up with a
better long-term plan. Debian/ubuntu seem to have ocaml-4.05. Fedora
30 has ocaml-4.07, but installing the ocaml-num-devel package seems to
be good enough. I'll worry about this problem once more distibutions
upgrade ocaml.
commit f1ad19cdac52f9f8d27c1e2fea55a16215408fd3
Author: Steven G. Johnson <[email protected]>
Date: Fri Jul 26 16:01:18 2019 -0400
add internal fftw_threads_set_callback, closes #175
commit d8bafdad0b460543fed9ff964f0de1c73cde249b
Author: Matteo Frigo <[email protected]>
Date: Wed Jul 3 17:39:58 2019 -0400
fftw3.texi: use @finalout
Avoid the \overfullrule black boxes when TeX cannot wrap long lines of
code.
commit 9fcdd122574259f263ba5afa939e1a6d28581bb3
Author: Matteo Frigo <[email protected]>
Date: Tue May 28 15:08:56 2019 -0400
make bigcheck: validate wisdom with multiple threads too
commit 0062a1b23d035b44954fbfa8e05460a4bcddbe59
Author: Matteo Frigo <[email protected]>
Date: Tue May 28 15:01:22 2019 -0400
make bigcheck: validate that problems can be solved in -owisdom-only mode
commit ccdf1cdfed8f2cf863dd4a1e4ac7be8ef4df152f
Author: Matteo Frigo <[email protected]>
Date: Thu May 16 15:58:54 2019 -0400
configure.ac: update version to fftw-3.3.9
commit b43eba5ef5f1f6402b617b78e11b8e08d57988a6
Author: Matteo Frigo <[email protected]>
Date: Thu May 16 15:55:25 2019 -0400
update NEWS
commit ebde7c4e4607afb6bbba7e6609fae56ff0fda01b
Author: Matteo Frigo <[email protected]>
Date: Thu May 16 15:49:28 2019 -0400
Fix alignment requirements for avx512
The avx512 alignment requirement was set to 64 bytes, but this is
wrong. Alignment requirements are a property of the platform (e.g.,
x86) and not of the instruction set (e.g., AVX). Among other things,
this broke wisdom with avx512.
Good thing that avx512 was marked as experimental. It's still
experimental, because I have no hardware to test it.
Thanks Tom Epperly for the bug report and debugging assistance.
commit 77a2efdded4dceef1e6045313ae06e27265807ec
Author: Matteo Frigo <[email protected]>
Date: Fri May 3 10:10:20 2019 -0400
CMakeLists: install fftw3.pc, not fftw.pc
This resolves the discrepancy between "make" and "cmake".
commit 4754dfd3f3149275c64896b78fbf3e8a36b7cc6c
Merge: c6bebdb8 80a8d61e
Author: Matteo Frigo <[email protected]>
Date: Sun Dec 2 16:52:16 2018 -0500
Merge pull request #158 from katterjohn/typo-fix
Fix the incorrect comment about Util.interval
commit 80a8d61e8e25d75c58cfa3c02b7137dc5e930b97
Author: Kris Katterjohn <[email protected]>
Date: Sun Dec 2 14:16:41 2018 -0600
Fix an incorrect comment
commit c6bebdb84fafdfb1de8462f69276bd6a305f0154
Author: José Mª Escartín <[email protected]>
Date: Mon Oct 8 22:27:26 2018 +0200
Fix typo in Fortran quadruple-precision doc. (#155)
The use of quadruple-precision with Fortran does not require
the inclusion of file fftw3l.f03, but it requires the inclusion
of file fftw3.f03.
commit d59abdaaeb31af8a7d7f2d2ffa92931efd344e6c
Author: Mario Emmenlauer <[email protected]>
Date: Fri Jun 15 03:31:40 2018 +0200
CMakeLists.txt: disable /bigobj on MSVC Intel compiler because its not required and not supported (#139)
commit 700745cdbb34e964e1abda86183809fd8dd95796
Author: Matteo Frigo <[email protected]>
Date: Thu May 24 08:00:45 2018 -0400
Bump FFTW_MINOR_VERSION for fftw-3.3.8
commit 902d0982522cdf6f0acd60f01f59203824e8e6f3
Author: Matteo Frigo <[email protected]>
Date: Thu May 24 07:43:02 2018 -0400
update NEWS
commit 41b0d9eff394891ba3327b9062811d48677bb411
Author: Matteo Frigo <[email protected]>
Date: Thu May 24 07:35:36 2018 -0400
CFLAGS: don't use -ffast-math
-ffast-math is a relic from 1999 when it was kind of necessary for
full use of FMA on powerpc. Nowadays it is just a liability. For
example, 'gcc-8 -ffast-math' ignores the disctintion between +0 and
-0, thus breaking the avx and avx2 implementations in fftw-3.7.
commit 19eeeca592f63413698f23dd02b9961f22581803
Author: Matteo Frigo <[email protected]>
Date: Thu May 24 07:29:00 2018 -0400
Fixes for gcc-8
It looks like 'gcc-8 -ffast-math' does honor the distinction between
+0.0 and -0.0 in floating-point constants. I suppose that technically
-ffast-math has the right to do so.
For good measure, this patch encodes such constants as their explicit
binary representation. A separate patch will disable -ffast-math.
commit bf478afbf2367df0f38c77f31d1f912aeeb82585
Author: Miklos Espak <[email protected]>
Date: Thu Apr 26 18:31:57 2018 +0100
Define include directory for installed targets (#141)
commit ab888adf510338c03ea8ac49b4aab91fb57f1479
Author: Steven G. Johnson <[email protected]>
Date: Sat Apr 14 11:40:39 2018 -0400
don't need both identifier and name fields
commit 2b999c600c58c78b8acb78c3352b02d9df6f6e60
Author: Steven G. Johnson <[email protected]>
Date: Fri Apr 13 08:43:35 2018 -0400
JSON doesn't like trailing commas
commit 92eee8bbc4252c871aa870d2dce88eb98d0c7d18
Author: Steven G. Johnson <[email protected]>
Date: Fri Apr 13 08:38:50 2018 -0400
list both C and OCaml (as explained in codemeta/codemeta#181)
commit 35e5609f17e212bf1c40da9b2ebe66784ad37052
Author: Steven G. Johnson <[email protected]>
Date: Thu Apr 12 12:01:15 2018 -0400
add codemeta file
commit eba07c46b5d2f7824d293ab59aa5c29a25034963
Author: Matteo Frigo <[email protected]>
Date: Mon Feb 19 09:30:29 2018 -0500
Call _mm256_zeroupper() when leaving avx512 code
Carsten Steger says:
simd-avx512.h defines VLEAVE as nothing in FFTW 3.3.7. However, the
current Intel® 64 and IA-32 Architectures Optimization Reference Manual,
chapter 15.18, recommends the following:
- When you have to mix group B instructions with Intel SSE instructions,
or you suspect that such a mixture might occur, use the VZEROUPPER
instruction whenever a transition is expected.
- Add VZEROUPPER after group B instructions were executed and before any
function call that might lead to Intel SSE instruction execution.
- Add VZEROUPPER at the end of any function that uses group B instructions.
- Add VZEROUPPER before thread creation if not already in a clean state
so that the thread does not inherit Dirty Upper State.
(Group B are instruction types that modify bits 128-511 of vector
registers 0-15.)
Therefore, I believe it would be prudent to define VLEAVE as
_mm256_zeroupper in simd-avx512.h (see the attached patch).
At https://software.intel.com/en-us/forums/intel-isa-extensions/topic/704023
Mark Charney says:
To be clear, we very much still recommend using VZEROUPPER on
Skylake. Even though it does not have the same penalties as earlier
designs in that family for mixing AVX and SSE code, we definitely
recommend using VZEROUPPER on Skylake.
Yes it would obviously be better if there were one solution. For
code that has to run on both families, the "common code" solution
is to use the Xeon guidelines.
If Mark Charney recommends VZEROUPPER, that's good enough for me.
commit b267008613d082975b108252ed596ba0916ffa31
Author: Matteo Frigo <[email protected]>
Date: Wed Nov 22 12:54:18 2017 -0500
fftw3-mpi.f03 should be regenerated when Makefile changes
commit 708b202fd593cf1002cf97dce0863e2a438e3720
Merge: 2e0cfdda 8ba34c40
Author: Matteo Frigo <[email protected]>
Date: Mon Nov 20 09:37:17 2017 -0500
Merge pull request #113 from xantares/mingw
CMake enhancements
commit 2e0cfddacacccc8a1e6e679c5e3fa81fb0219bda
Author: Matteo Frigo <[email protected]>
Date: Mon Nov 20 07:07:30 2017 -0500
Attempt to strengthen language in README.md
commit 8ba34c40fef38f661c9c413781990a7c021ba22b
Author: Michel Zou <[email protected]>
Date: Thu Nov 9 22:33:51 2017 +0100
Preliminary Fortran support
commit bd753a7679ecca2799640e7c8ced6f1f784f1b51
Author: Michel Zou <[email protected]>
Date: Mon Nov 6 23:00:29 2017 +0100
CMake MinGW fixes
Mostly fixes the SSE2 macro in config.h, otherwise minor detection fixes
commit da5372a175bcb09578359960869c76da74c9fda3
Author: Matteo Frigo <[email protected]>
Date: Tue Oct 31 20:21:17 2017 -0400
EXTRA_DIST += README-perfcnt.md
commit 1b64d9269254e9d0a0f0b088e5eceb0db92d531f
Merge: b5ccc557 2be183c3
Author: Matteo Frigo <[email protected]>
Date: Tue Oct 31 20:19:13 2017 -0400
Merge pull request #112 from alexeicolin/PR--armv7-pmccntr-counter-and-docs
Pr armv7 pmccntr counter and docs
commit 2be183c3a44d58aaa11909ba8882310fb44d598c
Author: Alexei Colin <[email protected]>
Date: Tue Oct 31 23:34:38 2017 +0000
perf counters: name ARMv8 PMCCNTR_EL0 explicitly
For consistency with the rest.
commit 504ece7f8ffc60c2a03b28d977e9825230052d48
Author: Alexei Colin <[email protected]>
Date: Tue Oct 31 23:28:48 2017 +0000
perf counters: add PMCCNTR for ARMv7 and add docs
The existing armv7 counter (CNTVCT) does need enabling from kernel mode (so
updated the configure help), and the enable bit is different from the PMU
enable bit (described in the new docs).
Tested on XU4: printed the returned counter values and they look reasonable.
commit b5ccc557fd2e57bfc955f0db9b5182e92f9cb55c
Author: Matteo Frigo <[email protected]>
Date: Sun Oct 29 08:13:04 2017 -0400
fftw-mpi.h should include <fftw3.h>, not "fftw3.h"
commit 9e3f8da20e65f1e34e677768e550086b06d77f16
Author: Matteo Frigo <[email protected]>
Date: Sun Oct 29 08:09:35 2017 -0400
NEWS: warn that cmake support is experimental and not well tested
commit 9616fb9ff1c2694f5cfa2c4a59efa96094ae6812
Author: Matteo Frigo <[email protected]>
Date: Sun Oct 29 07:48:43 2017 -0400
Update NEWS for upcoming fftw-3.3.7
commit 62edb203fc09c8c8ac2c2d5ac3299ea8d4dc7838
Author: Matteo Frigo <[email protected]>
Date: Tue Oct 10 18:58:37 2017 -0400
Ditch --enable-debug-malloc and --enable-debug-alignment
We wrote DEBUG_MALLOC in 1997 to debug memory leaks. Nowadays
DEBUG_MALLOC is just confusing. Better tools are available, and
DEBUG_MALLOC is not thread-safe and it does not respect SIMD
alignment. It confused at least one user.
In the gcc-2.SOMETHING days, gcc would allocate doubles on the stack
at 4-byte boundary (vs. 8) reducing performance by a factor of 3.
That's when we introduced --enable-debug-alignment, which is totally
obsolete by now.
commit 6ed4297e85e5ef24a18ce428b18e020d8e48413a
Author: Matteo Frigo <[email protected]>
Date: Fri Sep 29 19:27:43 2017 -0400
Use armv7a cycle counter unconditionally if HAVE_ARMV7A_CNTVCT
It looks like __ARM_ARCH_7A__ is not always defined. If the
user says HAVE_ARMV7A_CNTVCT, trust the user.
commit 2dd77382319ceb99c32b38418716783eec8adad4
Merge: 04590cb1 e09ab8ca
Author: Matteo Frigo <[email protected]>
Date: Thu Sep 21 22:42:38 2017 -0400
Merge pull request #110 from junghans/cmake
Minor cmake fixes
commit e09ab8cac98c0f206968bbd962a6f76cf26e7437
Merge: 890dac59 76427f30
Author: Christoph Junghans <[email protected]>
Date: Thu Sep 21 16:13:43 2017 -0600
Merge commit 'refs/pull/109/head' of github.com:FFTW/fftw3 into cmake
commit 04590cb11baa11bbfdebe101fa90186bbf48423c
Author: Matteo Frigo <[email protected]>
Date: Thu Sep 21 18:00:58 2017 -0400
simd-vsx.h: don't use vpermxor
It seems like gcc-6 generates incorrect code when using vpermxor
(tested with qemu emulator, so there is a chance that gcc is right and
qemu is wrong). Disable the use of vpermxor and do the simple thing
(one multiplication + one permutation).
commit 76427f30080e2cab3ca5047193ce8ffe6110f047
Author: Michel Zou <[email protected]>
Date: Thu Sep 21 23:44:15 2017 +0200
No need to list includes
commit e47e9a81c41454e5e128cd68505b38152ad60500
Author: Matteo Frigo <[email protected]>
Date: Thu Sep 21 17:13:14 2017 -0400
Remove AC_FUNC_{MALLOC,REALLOC,MMAP}
They don't do what I thought. E.g., AC_FUNC_MALLOC checks that
malloc(0) returns NULL, and defines malloc to be rpl_malloc otherwise.
We don't support rpl_malloc() and we don't care about malloc(0).
commit 5aebc02ff30af12d2dc3be6c762e821a38f56595
Author: Matteo Frigo <[email protected]>
Date: Thu Sep 21 10:09:02 2017 -0400
Dead-Code Police
commit d97394a17250d71d6a722ae64dcc3123130cf08f
Author: Matteo Frigo <[email protected]>
Date: Thu Sep 21 09:54:36 2017 -0400
Fixup fftw3-mpi.h
fftw3-mpi.h must include "fftw3.h", not "api/fftw3.h", because both
fftw3-mpi.h and fftw3.h will ultimately be installed in /usr/include.
Thus, as a special exception, mpi/Makefile.am must specify the include
path -I $(top_srcdir)/api.
commit 890dac59aca4c153e7e22add0a8de00766227670
Merge: 4ebda892 106582aa
Author: Christoph Junghans <[email protected]>
Date: Wed Sep 20 14:44:04 2017 -0600
Merge commit 'refs/pull/109/head' of github.com:FFTW/fftw3 into cmake
commit 4ebda89297b6b38632c3d91bd5a673a1bee4ffff
Author: Christoph Junghans <[email protected]>
Date: Wed Sep 20 14:05:13 2017 -0600
autotools: fix install of FFTW3ConfigVersion.cmake
commit e9a66d5f748037f9cb9c0f5b8d824d73c0425042
Author: Christoph Junghans <[email protected]>
Date: Wed Sep 20 13:29:29 2017 -0600
cmake: use GNUInstallDirs
commit 4fbb72ad294e2070d64a83b24f89a601d4f624c6
Author: Matteo Frigo <[email protected]>
Date: Wed Sep 20 13:11:55 2017 -0400
Generate codlist.c only when MAINTAINER_MODE
The user is not supposed to regenerate .c files. In addition, the
generation rule is subtly nonportable (it depends on whether or not
'#' can be escaped in Makefiles, an issue that does not appear
settled.)
commit f243f8ce48be61952527d43da222096296fdd2f9
Author: Matteo Frigo <[email protected]>
Date: Wed Sep 20 11:54:13 2017 -0400
Generate {dft,rdft}/simd/{sse,sse2,avx,...}/*.c only when MAINTAINER_MODE
Users are not supposed to generate them. Apart from that, the
generation rule uses '$*' in an explicit make rule, which is
technically a GNU extension. (Works with {open,free}bsd, but breaks
Solaris.)
commit 106582aa8f97257f53730cbac81f98e8659b084c
Author: Michel Zou <[email protected]>
Date: Wed Sep 20 15:46:51 2017 +0200
Fix includes, export target
commit 1a24e67165ba56447f814bcdc12b9d6e083f1670
Author: Matteo Frigo <[email protected]>
Date: Wed Sep 20 07:24:58 2017 -0400
Restore the ability to build out of tree.
Before 1f3704b9, we had "-I $(top_srcdir)/foo -I $(top_srcdir)/bar".
After 1f3704b9, we had no -I specification at all, but automake wants
an explicit -I $(top_srcdir) in order to build out of tree.
commit 919b795940d1e86a948a4430193dbd0853f47272
Merge: 6076339a f7a64365
Author: Matteo Frigo <[email protected]>
Date: Wed Sep 20 06:41:50 2017 -0400
Merge pull request #107 from xantares/config-mode
Config mode
commit f7a6436509d324297783eb77df54010320b062f8
Author: Michel Zou <[email protected]>
Date: Wed Sep 20 11:46:05 2017 +0200
Build bench according to BUILD_TESTS
commit 82cec28b7e14280ad11878978e23a3680bb0e983
Author: Michel Zou <[email protected]>
Date: Wed Sep 20 11:41:20 2017 +0200
Use cmake config mode
Installs FFTW3Config.cmake instead of a FindFFTW3.cmake
Also configures the pkgconfig file from cmake
commit 6076339a342b12b0d0cfd9f6d967bfa9fbf6b1b2
Author: Matteo Frigo <[email protected]>
Date: Tue Sep 19 23:38:27 2017 -0400
Fix performance regression with gcc-3.3
commit f4c37657cb32b2552c5e86f0540c0308d4f451ef
Author: Matteo Frigo <[email protected]>
Date: Tue Sep 19 23:24:08 2017 -0400
get rid of the sse2-nonportable.c hack
It was necessary to support some broken compiler 15 years ago.
Remove it and see if anybody complains.
commit 362ae5c7b8a9df76b5ec0de4433131db33bae0ae
Author: Matteo Frigo <[email protected]>
Date: Tue Sep 19 21:44:13 2017 -0400
configure.ac Police
Remove some obsolete AC_CHECK_HEADERS, add new checks suggested by
autoscan.
commit a56b5b4b149e56fce43778172a56f77d30352833
Author: Matteo Frigo <[email protected]>
Date: Tue Sep 19 21:43:45 2017 -0400
Include Police
fftw-wisdom.c was including <fftw3.h> instead of "api/fftw3.h"
commit 1f3704b9eff4b7e80ef7d775fb13f5bb8de0a5f1
Author: Matteo Frigo <[email protected]>
Date: Tue Sep 19 21:12:22 2017 -0400
Do not set include path ("-I") in Makefile.am
.[ch] files should specify their own paths explicitly. Setting paths
in the Makefile was always a bad idea, but it is totally untenable if
we are supporting cmake.
commit 6e0ae04bad14a7dd9b4928f22d7a01e887dfdc03
Author: Matteo Frigo <[email protected]>
Date: Tue Sep 19 19:31:55 2017 -0400
Fix OpenBSD build
Using $< in a non-suffix rule context is a GNUmake idiom and OpenBSD
doesn't like it.
commit 31a53789197f90d6bf349dd230ab86023e5fb83c
Author: Matteo Frigo <[email protected]>
Date: Tue Sep 19 19:24:34 2017 -0400
EXTRA_DIST += FindFFTW3.cmake.in
commit ae1a764ce88166e8e1f05a25888f105ec8f1939d
Merge: 5fdca1d9 97b273d8
Author: Matteo Frigo <[email protected]>
Date: Tue Sep 19 17:13:58 2017 -0400
Merge pull request #69 from junghans/cmake
Build und install cmake module
commit 5fdca1d9b0a0b2e6491c98f63873dcf600355e09
Merge: b521e530 66506470
Author: Matteo Frigo <[email protected]>
Date: Tue Sep 19 15:57:59 2017 -0400
Merge pull request #92 from tklauser/armv7a-cycle-counter
Fix ARMV7-A cycle counter detection
commit b521e5305a7317c1c0f1d454beb6580eaf4de1db
Author: Matteo Frigo <[email protected]>
Date: Tue Sep 19 15:51:03 2017 -0400
cmake: don't check for dlfcn.h
We don't use it
commit fc852fcdfa80fab30eac2284249686853efa2e4b
Author: Matteo Frigo <[email protected]>
Date: Tue Sep 19 15:43:02 2017 -0400
Remove ancient paranoia
In the '90s we used to run autoconf three times, just in case
(because it really didn't work the first time). "Three" was modeled
after the "sync; sync; sync; reboot" incantation of the '80s.
Hopefully we are past this by now.
commit 34738e7f669882c6abc12c2744c8acc347c91719
Author: Matteo Frigo <[email protected]>
Date: Tue Sep 19 15:32:39 2017 -0400
Flip boolean in a way that makes more sense to me
commit a2bfd859d9ad08490d02252d8a80c5994dd82747
Author: Matteo Frigo <[email protected]>
Date: Tue Sep 19 15:28:56 2017 -0400
Various CMakeLists.txt fixes
* AVX2 codelets require -mfma
* --enable-avx2 automatically enables the 128-bit avx2 codelets in
*dft/simd/avx2-128
* bump FFTW_VERSION to 3.3.7, SOVERSION to 3.5.7
* build bench always, irrespective of Threads_FOUND
commit 93ac6e1075e73c0275a9e0006fe9161c3b6fae38
Merge: a71f3dd3 d3a8d13f
Author: Matteo Frigo <[email protected]>
Date: Tue Sep 19 14:31:03 2017 -0400
Merge pull request #103 from xantares/cmake
Add user cmake support
Still needs work, but let's move forward and move this contribution into the official repository
commit d3a8d13f74361a7ffc4c48c229181a86b35e9a7d
Author: Michel Zou <[email protected]>
Date: Tue Jul 18 12:16:43 2017 +0200
Add user cmake infrastructure
commit a71f3dd355f802dc362a52674a977ff81daadf9d
Author: Matteo Frigo <[email protected]>
Date: Wed Jul 5 06:33:40 2017 -0400
Disable ISA_EXTENSION_PREFERS_FMA for now
I still don't understand whether or not avx2 should use FMA codelets.
Ryzen is faster with the non-FMA version. Haswell prefers the FMA
version.
However, I suspect that Haswell prefers FMA because of a quirk of the
micro-architecture. Haswell has two floating-point "ports". You can
issue an addition only through one "port", but you can issue two FMA
in parallel on both ports, so FMA appears to be faster. Skylake
apparently restores balance (but I haven't tried yet). Suspend
judgment for now until I gather more data.
commit f82b8c94596868897987b71a648eaa664590602a
Author: Matteo Frigo <[email protected]>
Date: Tue Jul 4 20:06:57 2017 -0400
Rationalize HAVE_FMA
Distinguish ARCH_PREFERS_FMA, for architectures that "naturally"
prefer FMA (e.g., powerpc), from ISA_EXTENSION_PREFERS_FMA, for
instruction-set extensions that favor FMA where the base architecture
does not (e.g., avx2 on x86).
Previously, --enable-avx2 would use FMA code for scalar and avx
codelets, which is wrong.
This change improves performance by a few percent on Ryzen (where FMA
doesn't really do anything), and is a wash on Haswell.
commit 0869f4e51b8b0aeb7da1b21b2683c30cd4e10a5e
Author: Steven G. Johnson <[email protected]>
Date: Tue May 9 09:14:37 2017 -0400
document that howmany ≥ 0 (closes #95)
commit 665064700b26c01c0836e4c12a5ee0eab3923858
Author: Tobias Klauser <[email protected]>
Date: Wed Mar 29 16:15:45 2017 +0200
Fix ARMV7-A cycle counter detection
Check for the correct pre-processor define HAVE_ARMV7A_CNTVCT from
config.h (instead of ARMV7A_HAS_CNTVCT) to fix the detection of the
cycle counter for ARMv7-A in the configure script (and actually use it
in the built library).
Without this fix, even the following ./configure call:
./configure --enable-neon --enable-single --enable-armv7a-cntvct \
--host=arm-linux-gnueabihf --disable-fortran \
CC="arm-linux-gnueabihf-gcc -march=armv7-a"
will emit the warning:
checking whether a cycle counter is available... no
***************************************************************
WARNING: No cycle counter found. FFTW will use ESTIMATE mode
for all plans. See the manual for more information.
***************************************************************
With this fix applied, ./configure will correctly detect the cycle
counter register:
...
checking whether a cycle counter is available... yes
...
commit cc5fc8ce7ffd77f467740554f649aab4d3f71344
Merge: 102f2fd0 950b1539
Author: Matteo Frigo <[email protected]>
Date: Tue Mar 14 07:21:45 2017 -0400
Merge pull request #91 from fornwall/android-clock-gettime
Avoid trying to use CLOCK_SGI_CYCLE on Android
commit 950b153910f7f0dde9cc20cddeee5dc9048d25b7
Author: Fredrik Fornwall <[email protected]>
Date: Mon Mar 13 23:41:35 2017 +0100
Avoid trying to use CLOCK_SGI_CYCLE on Android
The Android headers defines CLOCK_SGI_CYCLE but the call fails at
runtime as it's not implemented. Combined with getticks() not
checking the return value of clock_gettime() this causes bogus
values to be returned from getticks().
commit 102f2fd0249dca301d195b4df1b94e7b339b8c60
Author: Matteo Frigo <[email protected]>
Date: Wed Feb 22 14:59:30 2017 -0500
Compute mflops() in 64 bit precision
Old code was overflowing for N>2^32
commit 2b63fc2eaae645a5c2ef4a97c384beb2adefd58d
Author: Matteo Frigo <[email protected]>
Date: Fri Jan 27 16:06:27 2017 -0500
Update NEWS for 3.3.6-pl2
commit d2ca54234956ad8be82ba050305ccf979fd631a7
Author: Matteo Frigo <[email protected]>
Date: Fri Jan 27 16:01:42 2017 -0500
Get ready for fftw-3.3.6-pl2
commit 83092f8efbf872aefe7cfc6ee8fa43412f8e167a
Author: Matteo Frigo <[email protected]>
Date: Fri Jan 27 15:52:18 2017 -0500
Fix scrips that generate the MPI F03 interface
It turns out that the scripts were using fftw3.h from /usr/include,
not ../api, and were failing silently if fftw3.h was not installed.
This bug led to a fftw-3.3.6pl1 release with incomplete mpi/f03 header
files.
commit ab402b00f9a003daa10863b9bcdbe0810b26f541
Author: Steven G. Johnson <[email protected]>
Date: Wed Jan 25 13:03:15 2017 -0500
mention mkdist.sh and summarize the build process in README.md (closes #85)