-
Notifications
You must be signed in to change notification settings - Fork 0
/
intro.tex
42 lines (35 loc) · 2.17 KB
/
intro.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
\section{Introduction}
Science often has to use empirical results; computer science makes no
exception.
Optimizing a given piece of code is both time-consuming and extremely hard. Not
only do developers have to struggle with complicated algorithms, but they need
to adapt their code to the machine it will be run on.
Knowing the characteristics of the system may not be as easy as it seems.
Indeed, the way to access these data depends on the hardware, on the operating
system and might even require root privileges.
Portability also matters. Pieces of software such as
hwloc\footnote{\url{http://www.open-mpi.org/projects/hwloc/}} return excellent
results, but are not guaranteed to be portable. Plus, in the field of HPC, a
machine may become obsolete before programmers even have the time to optimize
applications for it\cite{ATLAS}\cite{Search_BLAS}.
All in all, writing a few different versions of the same program, run each and
every one of them and pick up the fastest one might be the best approach.
Running tests in order order to automatically detect relevant parameters and
allow the code to adapt itself to the machine is also a way to improve
performance.
% ATLAS Page 21
This approach has been widely used and has proven successful in optimizing
certain operations. For example, by empirically detecting the size of the L1
cache, the pipeline depth and information about the floating point units, ATLAS
is able to provide a DGEMM operation that can compete with the vendor-supplied
implementation; in fact, it is the best implementation on some
platforms\cite{ATLAS}.
Jorge Gonz\'alez-Dom\'inguez, Guillermo L. Taboada, Basilio B. Fraguela, Mar\'ia
J. Mart\'in and Juan Touri\~{n}o developed a benchmark suite called Servet that
is able to provide relevant data about the caches, the memory access overhead
and communication costs. This suite is described in their article "Servet : a
banchmark suite for autotuning on multicore clusters".
%First, we will take a closer look at Servet's way of determining relevant
%information about the cache. Then we will see how it tries to gather data about
%the communication costs, which have an important impact on the performance but
%are still hard to determine.