-
Notifications
You must be signed in to change notification settings - Fork 0
/
OPEN_ISSUES
826 lines (660 loc) · 38.3 KB
/
OPEN_ISSUES
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
Score-P OPEN ISSUES
===================
Effective: mon year
This file lists known limitations and unimplemented features of
various Score-P components.
--------------------------------------------------------------------------------
* Platform support
- Score-P is intended to run on ppc64, x86_64, or aarch64
architectures.
- Score-P has been tested on the following platforms:
+ HPE/Cray XC and EX systems with PrgEnvs cray, intel, nvidia,
gnu, amd, aocc, cray-amd (see 'Shared libraries on Cray systems'
below). PrgEnv-cray with Cray PE prior 21.05 fails to
compiler-instrument functions.
+ various Linux (Intel, AMD, IBM, ARM) clusters with GNU, Intel,
NVIDIA, IBM, and AMD compilers. Note that some compilers are
based on Clang. Note that Fujitsu compilers haven't been tested
recently.
The provided configure options (see INSTALL) may provide a good
basis for building and testing the toolset on other systems.
- The following platforms have not been tested recently:
+ Intel Xeon Phi (KNL), only native mode supported
+ IBM Blue Gene
+ Cray XT, XE, XK
However the supplied build system might still work on these
systems.
- Each toolset installation can only support one MPI implementation
(because MPI is only source-code but not binary compatible). If
your systems support more than one MPI implementation (e.g. Linux
clusters often do), separate builds for each MPI implementation
have to be installed. The same applies for SHMEM.
- The same is true if your system features more than one compiler
supporting automatic function instrumentation. Additionally if
the GNU compiler plug-in based instrumentation is used, then
different major version of the GNU compiler suite are also
incompatible. I.e., for each major version a separate Score-P
installation must be provided.
- To build Score-P, a C and C++ compiler as well as C and, optionally,
Fortran compiler wrappers (e.g., MPI or SHMEM) are used. Compilers
and compiler wrappers must be compatible.
- Note (dated): PGI changed its default C++ compiler from 'pgCC'
(which was entirely removed in 16.1) to 'pgc++'. For building
Score-P with '--with-nocross-compiler-suite=pgi', 'CXX=pgc++'
is used. If your MPI and SHMEM compiler wrappers still use the
old 'pgCC', please add 'CXX=pgCC' to your configure line to
force Score-P to use 'pgCC' as well. This will prevent link
failures.
- Shared libraries on Cray systems: building Score-P's shared libraries
works as normal through the '--enable-shared' configure option. But
note that if the default Cray link mode is static (which is the case
for CCE prior to 9.0.0) or if CRAYPE_LINK_TYPE=static is provided
via the environment, you need to pass the '-dynamic' flag
(if available) to the Cray compiler wrapper when building your
application to link to Score-P's shared libraries.
On some Cray systems, where Score-P uses externally installed
dependencies, these dependencies (i.e., the shared libraries
libotf2.so and libcube4w.so) fail to load. As a workaround, add
the corresponding pathes to LD_LIBRARY_PATH in your job script.
- Experimental platforms:
The following platforms are considered experimental and don't get
much attention. They may break without notification. Help to
maintain these is always welcomed.
+ MinGW
- Address-to-line lookup might not work as link.h and
dl_iterate_phdr might not be available.
+ macOS
- Building shared/dynamic libraries may not work.
- Address-to-line lookup (e.g., for clang-based compiler
instrumentation) does not work as link.h and dl_iterate_phdr
is not available.
--------------------------------------------------------------------------------
* Automatic compiler instrumentation via "scorep" based on (often
undocumented) compiler switches
- GNU : tested with GCC 4.9 and higher
- NVHPC : tested with version 20.7 and higher
- PGI : tested with version 10.1 and higher
Note that PGI 13.8 is currently not supported as it is not
recognized as an PGI compiler anymore. PGI 16 supports
only 64-bit platforms.
- IBM : only works for xlc/xlC version 7.0 and xlf version 9.1 and
higher and corresponding bgxl compilers on Blue Gene systems
- Intel : only works with Intel icc/icpc/ifort version 11 and higher
compilers or the Clang-based ixc/icpx compilers from
Intel oneAPI toolkit. Note that the Fortran ifx compiler
does not support compiler instrumentation.
- Cray : tested with version 8.1.8 and later, uses the GNU interface
- Fujitsu: tested with different versions of the language
environment K-1.2.0 on K computer and compiler version
1.2.1 on FX-10 systems. Uses the GNU interface
- Clang : tested with Clang 4.0 and higher. Note that several
compiler vendors now use Clang for their C and C++
compilers while the Fortran compiler stays
vendor-specific.
- The Intel compiler provides the function name and the source code
location in one string separated by a colon. Thus, if the path
name of the source code file contains colons, Score-P will split
the source file name and the function name improperly.
Additionally, the provided function name makes it impossible to
distinguish overloaded functions in C++. Thus, functions which
differ only in the argument list will be mapped to the same
function definition by Score-P.
- When using any of the intrinsic headers (e.g., xmmintrin.h and friends)
from GCC version up to 6.3, it is known that the result of the compiler
instrumentation will fail at link time because of undefined reference.
A work around is to pass
-finstrument-functions-exclude-file-list=intrin.h
as flag to GCC on the command line. This issue does not apply if the
GNU compiler plug-in based instrumentation is used.
- Measurement filtering can only be applied to functions
instrumented by the IBM, GNU, Intel, NVHPC, or PGI compilers as
well as functions instrumented by PDT, Kokkos, and user functions.
Filtering of MPI, SHMEM, and OpenMP runtime functions is always
ineffective.
- The GNU Fortran compiler versions 4.6.0 and 4.6.1 have a bug which
leads to an internal compiler error when using automatic function
instrumentation. It is therefore recommended to either use an
older/newer version of the compiler or to work around this issue
by using manual instrumentation or automatic source-code
instrumentation based on PDToolkit.
- The GCC plug-in is supported from GCC 4.9 onwards. Though due to
constant changes in the plug-in API of the GNU compiler
infrastructure, it is unlikely that the Score-P instrumentation
plug-in builds with versions newer than GCC snapshot 11-20210307.
- The pgCC compiler versions 13.9 and higher preinclude omp.h for
OpenMP codes. This results in multiple defined symbols if the
source file is preprocessed before compilation. Since version 14.1
an option is available to avoid preinclusion, which we can use for
preprocessed source files. For the pgCC versions 13.9 until 14.0,
preprocessing is not possible for C++ codes.
- The GCC compiler uses DWARF v4 as the default debug information
format since version 4.8. But when used with an older binutils
version this results in unreliably region meta data. If file name
information are missing in the resulting region meta data, then
try recompiling with the -gdwarf-3 or -gdwarf-2 flag.
- The Fujitsu compiler instrumentation does not handle inlined
functions properly. They are recorded as recursive calls of the
enclosing function due to a bug in the compiler
instrumentation. We suggest to use PDT instrumentation for codes
that rely on inlining.
- The LLVM plug-in is supported from LLVM 13.0 and onwards. Due to
regular changes in the plug-in API, bleeding edge releases of
LLVM might not be able to compile the plug-in. In our testing,
the plug-in did compile with LLVM 13.0 to LLVM 18.1.
- When no support for LLVM demangle is detected during configure,
compile-time filtering will only work for mangled names.
- Compiling HIP code with the LLVM plug-in might cause error messages
of the linker `lld` because plug-in parameters are passed to the
linker. This issue is fixed in LLVM 17.0 and newer, as well as in
ROCm 6.1 and newer.
--------------------------------------------------------------------------------
* MPI support
- Most functions added with MPI-3.0 and MPI-3.1 only have Enter
and Exit event records, i.e., Score-P can measure the time spend
in these routines, but no further analysis is possible.
- C++ bindings for MPI are not supported. These were deprecated as
of MPI-2.2 and removed with MPI-3.0. When using C++ bindings for
MPI, Score-P will most likely indicate the failed library wrapping
with the following warning: "If you are using MPICH1, please
ignore this warning. If not, it seems that your interprocess
communication library (e.g., MPI) hasn't been initialized. Score-P
cannot generate output."
If your MPI implements the C++ bindings in a separate library on
top of the C bindings, the following workaround might work for
you:
1) Instrument your application with 'scorep --keep-files -v' It
will make the instrumenter not remove the intermediate files
and output the commands it executes.
2) Copy the link command (the last command from the scorep
output).
3) Insert a link flag (e.g. -lmpicxx) for your C++ bindings
library right before '-lscorep_adapter_mpi_events' and execute
the modified link command.
- When using derived data types in non-blocking communications, and
no support for MPI_Type_dup() was detected, please ensure that the
MPI_Datatype handle is still valid at the time the request
finishes.
- Currently, Score-P cannot handle MPI_Finalize() calls that occur
after the end of main(), e.g., via a destructor of a static C++
object. Please call MPI_Finalize() before the end of main(). The
issue will be resolved in a future version of Score-P.
- The IBM Platform MPI (not mpixlc!) compiler wrapper (the formerly
HP-MPI) does not append its libraries at the very end of the
original link command. Thus, instrumenting applications with
Score-P fails at link time due to unresolved symbols in the
Score-P libraries.
- Multi-threaded MPI communication is limited/experimental in Score-P.
It is not that it is not possible to measure communication
from other threads. But it is nearly impossible to do reasonable analysis
when only ranks but not the real communication partners (threads)
are known. There are efforts in the MPI forum to address this
issue (endpoints).
The current version of Score-P is designed to stop the instrumented
application from crashing under multi-threaded MPI communication
but there are some limitations.
When thread-local storage is not detected during the configure
step `MPI_THREAD_MULTIPLE` is not supported.
All recorded MPI communication events will still be reported on
the master thread.
MPI implementations optimize small messages and create non-unique
requests, e.g., empty requests in OpenMPI, and can not be
distinguished by Score-P. A workaround is implemented. However,
the ordering and, therefore, association to the MPI calls is lost.
This effect was also present in previous versions of Score-P but went
unnoticed.
- Open MPI 4.0.0 is known broken to build with Score-P, because it
accidentally removed deprecated-only MPI functions, without
removing the prototypes in `mpi.h`.
- Score-P does not yet support the `USE mpi_f08` Fortran language
binding. On detection of the use of this binding, Score-P aborts
during MPI initialization.
- Score-P leaks memory in the MPI requests management as the Score-P
memory management does not allow freeing individual allocations.
We try to reduce this leakage. Gradually increasing the number of
requests will increase this effect. However, in practice,
it does not seem to have any impact on the measurement.
- The MPI sessions model is not supported and `MPI_Session_init`
aborts the measurement.
--------------------------------------------------------------------------------
* OpenMP support
Applies to OPARI2 and OMPT instrumentation:
- Device directives, introduced with OpenMP 4 and later, are not supported.
Applies to OPARI2 instrumentation only:
- Function instrumentation using the Intel compiler version 11.1 for
codes using OpenMP tasking is erroneous. You might try PDT
instrumentation instead.
- When compiling with the PGI compiler version 10.1, local variables
that are defined after a OpenMP for construct share the same memory
address. This breaks the OPARI2 instrumentation for task tracking
and may lead to segmentation faults in the measurement system.
Our recommendation is to use a newer compiler version, According
to our tests, late compiler versions have fixed this issues.
We tested with PGI compiler version 11.7.
- Currently, the instrumenter allows to switch off OPARI2
instrumentation for OpenMP programs with the --noopenmp
option. However parallel regions still need to be instrumented to
ensure thread-safe execution of the measurement system. Currently
combined constructs, such as parallel for/do, are still
instrumented fully, i.e. the for/do appears in the measurements.
- Due to a bug in the Cray compilers OpenMP instrumentation is
broken if an OpenMP parallel pragma is used in combination with
an if clause.
- OPARI2-instrumented Fortran OpenMP codes that use compiler options
to change the default name-mangling (XL compilers: -qextname, GNU:
-fno-underscoring' and '-fsecond-underscore') might crash if
OpenMP 3.0 ancestry API functions are not available (check with
'scorep-info config-summary | grep ancestry'). In this case OPARI2
instruments a thread private variable that, due to non-standard
name-mangling, does not match the one used in the Score-P
libraries. On AIX, a workaround is to manually rename it at link
time (-brename:pomp_tpd_,pomp_tpd).
- With PGI C++ v13.10 compiler, preprocessing of OpenMP codes using
OPARI2 is not possible any longer as the compiler itself adds a
'--preinclude omp.h' option to the call of pgcpp1. This leads to
'invalid redeclaration' errors. As a workaround, use the
'--nopreprocess' instrumenter option.
- The OPARI2 instrumentation and the preprocessing for OPARI2
instrumentation prepend some headers to source files which
include stdint.h in C/C++ files. The behavior of this header changes
whether macros the __STDC_CONSTANT_MACROS or __STDC_LIMIT_MACROS are
defined. If these macros are defined in a header file or source file,
the instrumentation will prepend the include directive before the macro
definition. Thus, macros like UINT32_C are left undefined. As workaround,
pass macros like __STDC_CONSTANT_MACROS or __STDC_LIMIT_MACROS on the
command line. See also the Open Issues of OPARI2.
- OPARI2-instrumented OpenMP code using user-defined reductions
(UDR), introduced with OpenMP 4, might crash when the UDR includes
execution of instrumented routines, which Score-P currently
reports as events on invalid threads or mis-matching enter/exit
events.
- Score-P 5.0 introduced program-begin/-end events. In profiling
mode, these manifest as enter/exit of an additional region which
acts as the new root node for the program. Artificial regions like
THREADS and TASKS are now recorded as child nodes of this new
root node.
When recording OpenMP tasks, the tasks are recorded as children of
the TASKS node as well as children in the program's
calltree. Metrics, time in particular, are double accounted.
As a consequence, the new root node shows negative time values
when expanded (the absolute value equals the inclusive TASKS
value). The values shown for TASKS and the program's calltree are
correct though. This situation will be improved in a future
release of Score-P and/or Cube.
- In C++, OpenMP clauses like private, firstprivate, and lastprivate
call constructors, copy-constructors, and destructors for list
elements of class type. Given OPARI2 instrumentation, if these
functions trigger any event, Score-P will/might fail. For
constructors and destructors we might see `TPD == 0` errors, for
destructors it will be `Assertion 'parent != ((void *)0)' failed`.
Preventing all events from these functions by runtime/compile-time
filtering is a workaround. Future OpenMP instrumentation based on
OMPT instead of OPARI2 does not show this problems.
- Like other compilers, the NVIDIA HPC SDK compilers add outlined
functions to implement OpenMP directives. These functions get
potentially instrumented but need to be ignored by Score-P to
prevent 'TPD==0' errors. Unfortunately, NVIDIA's naming convention
for these functions isn't documented and changed between versions
of the SDK. However, since version 21.2 of the SDK, the outlined
functions appear to follow this naming scheme:
__nv_*_F[0-9]*L[0-9]*_[0-9]*
Outlined functions following this scheme are automatically
filtered by Score-P.
For SDKs predating 21.2, the outlined functions seem to be of the
form <FUNCTION>_<SOMETHING>L<LINENO>. This isn't handled
automatically by Score-P, but could be filtered with this very
generic EXCLUDE rule:
*_*L[0-9]*
- With CMake, OpenMP might get enabled at link time not via a
compiler flag like `-fopenmp`, but by explicit linking of the
runtime, e.g., `-lgomp`. In this case, Score-P would not
automatically add its OpenMP support libraries and the link would
fail. As a workaround, please provide `--thread=omp` to the
`scorep-wrapper` via
`make SCOREP_WRAPPER_INSTRUMENTER_FLAGS="--thread=omp"`.
- For Fortran, PDT instrumentation inserts a specification statement
into the executable section (e.g., omp parallel do), which is
illegal. Subsequent OPARI2 processing fixes this issue. With OMPT
instead of OPARI2 the error persists, thus, the combination PDT
and OMPT instrumentation is invalid. Use compiler instrumentation
or user instrumentation instead of PDT instrumentation for all
languages.
- With CMake and NVHPC, the preparations for OPARI2 instrumentation
might create empty source files due to the `-MD` flag added by
CMake during compilation. The compilation will abort with the
message `undefined reference to main`. For NVHPC 23.7 and newer,
please use OMPT via `--thread=omp:ompt` instead.
Applies to OpenMP target offloading when using Score-P compiler
instrumentation:
- When using an LLVM based compiler in an application using OpenMP
target directives the scorep compile command may fail with a
linker error and the following error message:
Undefined reference to '__cyg_profile_func_enter'
Undefined reference to '__cyg_profile_func_exit'
The error is caused by the `-finstrument-functions-after-inlining`
flag. As a work around, you may try to add the following function
declarations to each of your compile units:
#pragma omp begin declare target device_type(nohost)
void __cyg_profile_func_enter(void *a, void *b)
__attribute__((no_instrument_function));
void __cyg_profile_func_exit(void *a, void *b)
__attribute__((no_instrument_function));
void __cyg_profile_func_enter(void *a, void *b) { }
void __cyg_profile_func_exit(void *a, void *b) { }
#pragma omp end declare target
On a compiler using LLVM 16.0 or newer, you may try to use these flags:
scorep --nocompiler [llvm-based compiler] [args] \
-Xarch_host -finstrument-functions-after-inlining
Applies to OMPT instrumentation only:
- Until device directives are supported by Score-P, device events
are ignored, if possible.
- Runtimes have been reported creating helper threads for certain target
directives. These helper threads may dispatch events in a non-conforming
way. In addition, non-conforming tasking behavior has been observed.
In these cases, Score-P can't ignore the device events, but aborts.
As a workaround, please turn off these helper threads by setting the
environment variable `LIBOMP_USE_HIDDEN_HELPER_TASK` to `0`. Known
nonconforming runtimes include LLVM/Clang (12.0.0 to 18.1.1) and
oneAPI (2024.0 and lower).
- When measuring an application using `#pragma omp sections` with
Intel oneAPI compilers, enabling OpenMP via `-fiopenmp` or `-qopenmp`
may result in an error reporting an `inconsistent profile`. This happens
because the runtime reports the end of the sections construct incorrectly.
For C / C++ only, you may switch to `-fopenmp` for the measurement. For
Fortran applications, please use OPARI2 via `--thread=omp:opari2`.
- When measuring an application using `#pragma omp sections` with
NVHPC 23.7, running the application will abort reporting an
`inconsistent profile`. This happens because the runtime does not report
the end of the sections construct. If you want to measure applications
using OpenMP sections with NVHPC, please use OPARI2 via `--thread=omp:opari2`
instead.
- When measuring an application using `#pragma omp ordered` with NVHPC,
running the application may result in an error message like
"Bug: 'task->workshare_regions_current == 0'" or crash with a segmentation
fault. As this is a runtime issue and cannot be worked around, please use
OPARI2 via `--thread=omp:opari2` to instrument applications using
`#pragma omp ordered`.
--------------------------------------------------------------------------------
* CUDA support
- CUDA device activities get lost for CUDA 5.5 (CUPTI 4). The last
activity in a CUPTI activity buffer gets lost, when the buffer is
full. The issue can be avoided by providing buffers, which are
large enough to store all CUDA device activities until the buffer
is flushed. The user can specify the CUPTI activity buffer (chunk)
size with the environment variable SCOREP_CUDA_BUFFER_CHUNK. In
CUDA 6.0 (CUPTI 5) this issue is fixed.
- Support for CUDA 4.2 and prior versions is deprecated. Score-P may
work in combination with these CUDA versions but it is not tested.
Support for CUDA 4.2 and prior versions will become officially
unsupported in a future release of Score-P.
- On Cray systems you need to 'module load cudatoolkit', otherwise
the CUDA related configure checks will fail, even if
--with-libcuda and --with-libcudart are given.
- The nvcc compiler preinclude its CUDA runtime header. This results
in multiple defined symbols if the source file is preprocessed
before compilation. Thus this combination is not allowed by the
instrumenter.
- When building against a stubs-only CUDA installation (i.e., without
`libcuda.so` from the NVIDIA driver package), then `make check` will
likely fail. In this case, ensure that the library path is provided
to configure via the `--with-libcuda-lib=<cuda-sdk>/lib64/stubs` flag and
includes a `stubs`-directory component.
- Score-P's NVTX implementation does not support RangeStart/Stop APIs.
They are supposed to cross locations and can overlap, which does not fit
into Score-P's event model.
- NVTX does consider different domains as disjoint range stacks, thus
this does not fit into Score-P's event model as well. Score-P expect
that regions from disjoint domains are properly nested.
--------------------------------------------------------------------------------
* SHMEM support
- When using the OpenSHMEM reference implementation and building
Score-P as a shared library, ensure that the GASNet library is
build with -fPIC on platforms which need this flag for shared
library code. For example, run make with the MANUAL_LIBCFLAGS and
MANUAL_CFLAGS variables set:
make MANUAL_LIBCFLAGS="-fPIC -DPIC" MANUAL_CFLAGS="-fPIC -DPIC"
on GNU/Linux platforms.
Also, if you encounter segmentation faults when running the
instrumented application with Score-P attached, but the error
disappears magically on subsequent runs, then the OpenSHMEM
reference implementation may not consider some of your global or
static variables as symmetric objects. Please allocate these
objects with shmalloc() to ensure that they are in the symmetric
heap.
- Since version 1.0g the OpenSHMEM library is compiled as a statically
linked archive, rather than as a shared object. If you want to build
Score-P as a shared library, ensure that the OpenSHMEM archieve is
build with -fPIC.
- Currently tested SHMEM implementations:
+ OpenSHMEM reference implementation 1.0h
+ Open MPI implementation up to 3.1.3
+ SGI SHMEM
+ Cray SHMEM
- When using the Cray SHMEM versions 7.2.2 or 7.2.3, then Score-P fails
to detect a number of SHMEM functions. Please use a newer version of
the module to circumvent this problem.
--------------------------------------------------------------------------------
* POSIX threads support
- Pthread support is currently not available on systems that use
linkers other than the GNU linker, e.g., AIX systems.
- Please note that on systems where Pthreads or runtimes based on
Pthreads don't need extra flags (like e.g., -pthread) to be
compiled and linked (e.g., Cray, Blue Gene/Q), Score-P cannot
instrument Pthreads automatically. You need to enable Pthread
instrumentation manually via scorep's --thread=pthread option.
--------------------------------------------------------------------------------
* Sampling support
- Sampling is mostly tested on x86-64 CPUs. The compiler needs to support
thread-local storage, either via the language extension __thread or the C11
keyword _Thread_local. This excludes PGI compilers prior to 'Version 2015'.
- Sampling uses libunwind as external library. Please use version
1.2 or later as earlier versions occasionally segfaulted.
- libunwind used in combination with Intel MPI, even when sampling
is not active, may mysteriously alter the application output
just by linking libunwind. Thus, sampling/libunwind support is
disabled when Intel MPI is detected.
- It is possible to activate sampling for programs that were
instrumented with compiler instrumentation. In this case the
events originating from the compiler instrumentation will be
suppressed. Please note that depending on the compiler
instrumentation used, the overhead can still be significant as
the compiler instrumentation might affect inlining. If this is
the case, consider rebuilding your application using Score-P's
--nocompiler switch.
- Thread management events and intermediate trace buffer flushes may result
in unexpected callpaths.
- CUDA API calls are represented twice in the calling context.
- If PAPI is used as the interrupt source for sampling, then it is not
possible to measure also PAPI metrics.
- The timer interrupt source only works for non-threaded applications.
- Flushing the trace buffer from inside a sample signal is not allowed and
thus aborts the measurement immediately.
- Untested feature combinations with sampling:
- OpenMP tasks
- Trace rewinding
- SCOREP_RECORDING_ON/SCOREP_RECORDING_OFF user instrumentation macros
- Selective recording (SCOREP_SELECTIVE_CONFIG_FILE)
--------------------------------------------------------------------------------
* User library wrapping
- Functions returning function pointers are not supported, e.g.:
int ( *fn_x_return_function_pointer( void ) )( double );
A typedef works around this limitation, e.g.:
typedef int ( function )( void );
function fn_x_return_function_pointer( double );
- Nested Function pointer arguments are not supported, e.g.:
int sion_generic_register_gather_and_execute_cb(
int aid,
int gather_execute_cb( const void *, long long*, int,
long long, void *, int, int, int, int,
int process_cb( const void *, long long *, int ) ) );
Typedefs work around this limitation, e.g.:
typedef int ( process_function )( const void *, long long *, int );
typedef int ( gather_execute_function )( const void *, long long*, int,
long long, void *, int, int, int, int,
process_function process_cb );
int sion_generic_register_gather_and_execute_cb(
int aid, gather_execute_function gather_execute_cb );
- Typedefs from classes need to be fully qualified when used, e.g., convert:
class C {
typedef int mytype;
void f(mytype a);
};
into:
class C {
typedef int mytype;
void f(C::mytype a);
};
- LLVM upto version 3.9 does not support the ABI tag attribute from GCC 5
and onwards. Thus creating library wrappers for symbols with ABI tags is
impossible. The issue is tracked in LLVM here:
https://llvm.org/bugs/show_bug.cgi?id=23529
- Building and using library wrappers for Intel Xeon Phi co-processors is
untested.
- Using a library wrapper for functions which Score-P also calls may only
work on systems supporting RTLD_NEXT (at least GNU).
--------------------------------------------------------------------------------
* I/O recording
- ISO C I/O support
- fsetpos: Seek events provides only the resulting offset
- ungetc not supported yet
- POSIX I/O support
- Redirecting stdout/stderr to a file via dup2 will not affect the ISO C I/O.
The recorded I/O of e.g. puts will target the stdout instead of the file.
- O_NONBLOCK and O_NDELAY flags are not handled in open operations because of
the following note (see manpage):
Note that this flag has no effect for regular files and block devices;
that is, I/O operations will (briefly) block when device activity
is required, regardless of whether O_NONBLOCK is set. Since O_NONBLOCK
semantics might eventually be implemented, applications should not
depend upon blocking behavior when specifying this flag for regular
files and block devices.
- The mode of an open operation will not be tracked.
e.g. open(..., mode_t mode);
- putc will not be recorded because on most system it is macro. Furthermore,
it is difficult to detect it as a macro on Cray systems.
--------------------------------------------------------------------------------
* Kokkos support
- Currently only one accelerator device is supported. To ensure that for this
device a Score-P location was created before first deep-copy, enable
recording of memory allocations in the corresponding accelerator adapter
of Score-P.
- Kokkos/OpenMP support should be considered experimental until OMPT
support is added to Score-P. This affects principally the Kokkos adapter.
- Kokkos/Pthreads should be considered unsupported. It will produce a
measurement, but probably not a meaningful one. This affects
instrumentation of Kokkos-based codes in general, with or without use of
the Kokkos tools interface.
- In order for non-GPU-based back ends to support Kokkos-based allocation
tracking and memory copy instrumentation, it is necessary to ensure that
the threads where memory is allocated or copied are instrumented. For best
chances of success there, we recommend using Score-P's '--thread=pthread'
option for the OpenMP back end and a static build of Kokkos. This affects
use of the SCOREP_KOKKOS_ENABLE=memcpy,malloc settings in the Kokkos
adapter. If instrumenting a different configuration (e.g. a shared Kokkos
library) is important for your use case, please contact
[email protected] for help.
--------------------------------------------------------------------------------
* HIP support
- Score-P is only able to use one compiler suite for one installation.
Because `hipcc` is Clang based, it also only accepts the Clang compiler
instrumentation. Thus Score-P must be build with a Clang-based compiler
suite.
If the compile step for device code errors out with:
lld: error: undefined symbol: __cyg_profile_func_enter
>>> referenced by <filename>:<linenumber>
then the heuristic to detect the usage of `hipcc` failed and must be
enabled explicity with `scorep --hip`.
- Device-to-device data transfers are only recorded on the API level, not
as a device activity.
--------------------------------------------------------------------------------
* OpenACC support
- Score-P may generate OpenACC related events if the compiler supports
the OpenACC Profiling (and Error) Callback Interface. It is a Score-P
requirement that the OpenACC runtime registers a tool by linking a
library that contains the function `acc_register_library`. Tool
registration via `ACC_PROFLIB` or `LD_PRELOAD` is not supported.
Currently, only NVIDIA compilers are known to support the interface
and to provide events to Score-P. With GNU compilers, a tool can be
compiled, but it won't be registered by linking `acc_register_library`
to the program.
--------------------------------------------------------------------------------
* Score-P misc.
- Score-P does not support MPMD style programs where the executables
are instrumented differently. I.e., it is best to compile/link all
sources with the same set of flags for the Score-P instrumenter, so
that the instrumenter itself cannot automatically decides what
paradigms the program uses. This applies to the --mpp and --thread
flags.
- There might be a performance impact when instrumenting code
without explicitly given optimization flags. The instrumenter adds
compiler flags to enable additional debugging
information. Depending on the compiler this may turn off
optimization unless optimization flags are explicitly specified.
- For autotools or CMake based build systems, please consult the usage
of `scorep-wrapper` to learn about the recommended way to apply
instrumentation.
- Literal file-filter rules like "INCLUDE bt.f" for Fortran files
that will be processed by OPARI2 (i.e., files containing OpenMP or
POMP user pragmas) do not work as expected as OPARI2 changes the
file name (here to bt.opari.f)
- Due to a bug in PDT 3.18 and earlier versions, PDT support is
disabled on Blue Gene systems for these versions.
- Rusage-based metrics are not supported on Blue Gene systems.
- Traces generated by applications compiled with the CCE Fortran
compiler on Cray X series systems are inconsistent as there is no
exit event for 'main'. I.e., such traces are not analyzable by
Scalasca.
- Running make check fails if SCOREP_EXPERIMENT_DIRECTORY is set to
'scorep'. One workaround is to run '(unset
SCOREP_EXPERIMENT_DIRECTORY; make check;)'.
- In some cases the PGI 11.7 compiler fails to link C++ programs
that use user-instrumentation macros due to an undefined reference
to '_Unwind_Resume'. Adding -lstdc++ to the original link line
solves this issue.
- In cases where Score-P was compiled with PAPI support, the
instrumenter option --static (only available for '--enable-shared
--enable-static' builds) explicitly adds the static PAPI library to
the link line which may cause 'undefined reference' errors. This
may happen for other libraries that were requested at configure time
as well.
- The Cube metric hierarchy remapping specification file shipped with
Score-P currently classifies all MPI request finalization calls (i.e.,
MPI_Test[all|any|some] and MPI_Wait[all|any|some]) as point-to-point,
regardless of whether they actually finalize a point-to-point request,
a file I/O request, or a non-blocking collective operation. This will
be addressed in a future release.
- When instrumenting memory API calls with the Cray CCE, Score-P uses
the '-h system_alloc' flag and therefore not the default tcmalloc
library (note that -h tcmalloc was deprecated in CCE-15).
- Tracking memory allocations in Fortran programs may only work, if the
compiler generate calls to malloc/free. This is only known for the
GNU compiler.
- Score-P uses an 'atexit' handler to finalize the measurement and produces
the output files. In case a run-time system also uses an 'atexit' handler
but does not allow previous registered 'atexit' handlers to run, then one
can try to re-install the 'atexit' handler again, so that Score-P's will
run before the malicious handler. If that still does not work one can
also trigger the finalization manually. Here are the C prototypes for
these two functions:
void SCOREP_RegisterExitHandler( void );
void SCOREP_FinalizeMeasurement( void );
One known malicious run-time system is NetDRMS.
- Instrumentation of files that include system headers via local
headers of the same name fails for compilers that don't support
the '-iquote' option (e.g., PGI) and when scorep's --pdt option is
used. In this case, renaming of the local header is the only
workaround.
- Recording arguments of the executable was disabled due to unreliable values
on some old HPC platforms now decommissioned. If there are program aborts
inside Score-P, please do make check and installcheck and exercise the
constructor checks:
$ make
$ make -C build-backend check-build
$ make -C build-backend check-serial TESTS_SERIAL="test_constructor_check_c test_constructor_check_cxx test_constructor_f"
$ ./build-backend/test_constructor_check_c 1 2 3
$ ./build-backend/test_constructor_check_cxx 1 2 3
$ ./build-backend/test_constructor_check_f 1 2 3
$ make install
$ make -C build-backend constructor-checks
$ ./installcheck/constructor_checks/bin/run_constructor_checks.sh
--------------------------------------------------------------------------------
Please report bugs, wishes, and suggestions to <[email protected]>.