-
Notifications
You must be signed in to change notification settings - Fork 16
/
aggregate.theory.txt
203 lines (168 loc) · 14.9 KB
/
aggregate.theory.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
AGGREGATE
SUMMARY ==> #Lowest-level data structures implementations
#Can be:
# - array: same|similar types
# - records: not same|similar types
/=+===============================+=\
/ : : \
)==: ARRAY :==(
\ :_______________________________: /
\=+===============================+=/
ARRAY ==> #Aggregates items ("elements") of the same|similar type
#Keys can be:
# - numbers/indices ("single-value")
# - 0-based, 1-based or n-based
# - strings ("associative")
#Items type can be composite ("container") or not
#Items size ("width|stride|increment|step") can be:
# - always the same: faster access
# - or not: more compact
#Position of an item since beginning of array is "offset"
DOPE VECTOR ==> #Metadata stored alongside an array
#E.g. origin address, items type, length, dimensions, stride
/=+===============================+=\
/ : : \
)==: MATRIX :==(
\ :_______________________________: /
\=+===============================+=/
MATRIX ==> #Array that has multiple dimensions|ranks
#Opposite is "vector"
#Can be implemented:
# - row-major: row > column. Better for sequential access
# - column-major: column > row. Better for array programming
# - "iliffe vector": array of pointers towards other array
SPACE STORAGE ==> #Often matrices have the following properties that allow for structures to save space:
# - band matrix: when all [non-]empty elements form diagonals
# - symetric matrix: when one side of diagonal is same as other one
/=+===============================+=\
/ : : \
)==: ARRAY SIZE :==(
\ :_______________________________: /
\=+===============================+=/
ARRAY SIZE ==> #Array size can be:
# - fixed: "table"/"bounded"
# - dynamic: "collection"/"dynamic array|table"/"array list"
#Array memory can be allocated:
# - compile-time
# - runtime: "variable-length"/"dynamically allocated"
#Zero-filling:
# - filling allocated memory with \0
# - slower but safer from programming errors and security exploits
SENTINEL VALUE ==> #Value at the end to indicate termination of a dynamic array.
#E.g. null-terminated strings
#Empty array will still contain a value (the sentinel value)
#Compared to dope vector:
# - pros:
# - no maximum size limit
# - better space complexity
# - time complexity: does not need update on insert|delete
# - more stateless (does not need to store array size during iteration)
# - cons:
# - length() time complexity of O(n) instead of O(1)
# - must reserve a special value
GEOMETRIC EXPANSION ==> #Possible dynamic array allocation.
#Increasing capacity on demand, usually using a scaling factor, often:
# - 2, allowing bitwise/bytewsie operations
# - 1.5, with less wasted space
#Most wasted space (O(n)) but simple and limited number of copies
GAP BUFFER ==> #Possible dynamic array allocation.
#Split array into two parts, and only allow adding|deleting between them
#Elements can be moved from one part to another
#Efficient when modification usually happen around the same place, e.g. in a text editor
HASHED ARRAY TREE (HAT) ==> #Possible dynamic array allocation.
#Using an iliffe vector ("directory") pointing to arrays ("leaves")
#Can allocate memory by:
# - geometric expansion on directory
# - geometric expansion on all leaves
# - combinations:
# - only expanding directory avoid any copy, but makes iteration slower
# - expanding both but delaying allocating memory for new leaves until needed
# - waste only O(√n) space
#Can deallocate memory by:
# - dealllocate leaf when empty
# - restructure whole array to smaller directory and leaves when smaller size
# - e.g. when scaling factor 2, when 1/8 full, can half both directory and leaves,
# so it becomes 1/2 full
FREE LIST ==> #Possible dynamic array allocation.
#Using unrolled linked lists, i.e. linked list to arrays.
MEMORY POOL ==> #Possible dynamic array allocation.
#Allocate big chunk of memory and allocate it to objects on demand.
BUDDY MEMORY ALLOCATION ==> #Memory pool allocation.
#For a pool of size n, divide it into N parts (usually 2).
#Each part is also divided into N subparts, recursively.
#The level is subdivision is the "order" m
#Chunks of n/Nᵐ memory are allocated and read by striding through the array:
# - usually a data structure like a search tree is used for each order
# to quickly check whether memory chunks are available
#Must set a minimum chunk size:
# - smaller size increase bookkeeping time
# - bigger size increase fragmentation
/=+===============================+=\
/ : : \
)==: SPARSE ARRAY :==(
\ :_______________________________: /
\=+===============================+=/
SPARSE ARRAY ==> #Array that has many empty elements.
#"Sparsity"/"density"/"load factor" is percentage of [non-]empty elements.
#Can save space using any of the following methods.
DICTIONARY OF KEYS (DOK) ==> #Method to save space with sparse arrays.
#Associative array with indices as keys.
#Slower on ordered iteration as it is not ordered
COORDINATE LIST (COO) ==> #Method to save space with sparse arrays.
#Association list of (INDEX, VAL) pairs
#Slower for writes, as it requires ordering
ARRAY BIT MAPPING ==> #COO where:
# - using AoS, i.e. INDEX is stored in an first array and VAL in a second array
# - INDEX array uses a bit array
# - where 1 bit = 1 possible INDEX
# - max size is static
# - e.g. 256 ASCII values
#Very space efficient
#Only makes sense when number of INDEXes is bounded and density is high
LIST OF LIST (LIL) ==> #COO using an "iliffe vector" (one level of pointers per dimension)
#For matrix only.
COMPRESSED SPARSE ROW (CSR) ==> #Method to save space with sparse arrays.
#Also called "compressed row storage" (CRS) / "Yale format"
#For matrix only.
#Three arrays, for all non-empty elements:
# - array A: values of elements
# - array IA: number of elements per non-empty row. Must append 0.
# - array JA: column of elements
#Takes least space (compresses row information), but is slowest on read|write
#"New format":
# - concatenating IA and JA
#"Compressed column storage" (CCS) / "Harwell-Boeing":
# - column-major instead of row-major
/=+===============================+=\
/ : : \
)==: RECORD :==(
\ :_______________________________: /
\=+===============================+=/
RECORD ==> #Aggregates items ("fields"/"members"/"elements"/"column") of different types
#Also called "tuple", "struct[ure]" or "row"
#Items size can be fixed or dynamic.
#Keys:
# - fields that serve as identifier
# - "primary key" are unique keys
#According to size, called:
# - "singleton", "pair", "triplet", "quad", "quint", etc.
# - "monuple", "couple", "triple", "quadruple", "quintuple", etc.
RECORD + ARRAY ==> #Can either:
# - array of records|structures (AoS)
# - most used
# - better when usually accessing and iterating over all fields at a time
# - record|structures of arrays (SoA) / "interleaving"
# - better when usually accessing and iterating over one field at a time
# - including SIMD operations
# - can save space with:
# - better low-level byte alignment
# - indices taking less space than string keys
# - each array is a "parallel array"
# - record of AoS:
# - better when usually accessing and iterating over few fields at a time