Architecture/Code_design/Types/aggregate.theory.txt


             
   AGGREGATE  
             


SUMMARY ==>                       #Lowest-level data structures implementations
                                  #Can be:
                                  #  - array: same|similar types
                                  #  - records: not same|similar types


                                             /=+===============================+=\
                                            /  :                               :  \
                                            )==:             ARRAY             :==(
                                            \  :_______________________________:  /
                                             \=+===============================+=/


ARRAY ==>                         #Aggregates items ("elements") of the same|similar type
                                  #Keys can be:
                                  #  - numbers/indices ("single-value")
                                  #     - 0-based, 1-based or n-based
                                  #  - strings ("associative")
                                  #Items type can be composite ("container") or not
                                  #Items size ("width|stride|increment|step") can be:
                                  #  - always the same: faster access
                                  #  - or not: more compact
                                  #Position of an item since beginning of array is "offset"

DOPE VECTOR ==>                   #Metadata stored alongside an array
                                  #E.g. origin address, items type, length, dimensions, stride


                                             /=+===============================+=\
                                            /  :                               :  \
                                            )==:            MATRIX             :==(
                                            \  :_______________________________:  /
                                             \=+===============================+=/


MATRIX ==>                        #Array that has multiple dimensions|ranks
                                  #Opposite is "vector"
                                  #Can be implemented:
                                  #  - row-major: row > column. Better for sequential access
                                  #  - column-major: column > row. Better for array programming
                                  #  - "iliffe vector": array of pointers towards other array

SPACE STORAGE ==>                 #Often matrices have the following properties that allow for structures to save space:
                                  #  - band matrix: when all [non-]empty elements form diagonals
                                  #  - symetric matrix: when one side of diagonal is same as other one


                                             /=+===============================+=\
                                            /  :                               :  \
                                            )==:          ARRAY SIZE           :==(
                                            \  :_______________________________:  /
                                             \=+===============================+=/


ARRAY SIZE ==>                    #Array size can be:
                                  #  - fixed: "table"/"bounded"
                                  #  - dynamic: "collection"/"dynamic array|table"/"array list"
                                  #Array memory can be allocated:
                                  #  - compile-time
                                  #  - runtime: "variable-length"/"dynamically allocated"
                                  #Zero-filling:
                                  #  - filling allocated memory with \0
                                  #  - slower but safer from programming errors and security exploits
                                  
SENTINEL VALUE ==>                #Value at the end to indicate termination of a dynamic array.
                                  #E.g. null-terminated strings
                                  #Empty array will still contain a value (the sentinel value)
                                  #Compared to dope vector:
                                  #  - pros:
                                  #     - no maximum size limit
                                  #     - better space complexity
                                  #     - time complexity: does not need update on insert|delete
                                  #     - more stateless (does not need to store array size during iteration)
                                  #  - cons:
                                  #     - length() time complexity of O(n) instead of O(1)
                                  #     - must reserve a special value
                                  
GEOMETRIC EXPANSION ==>           #Possible dynamic array allocation.
                                  #Increasing capacity on demand, usually using a scaling factor, often:
                                  #  - 2, allowing bitwise/bytewsie operations
                                  #  - 1.5, with less wasted space
                                  #Most wasted space (O(n)) but simple and limited number of copies
                                  
GAP BUFFER ==>                    #Possible dynamic array allocation.
                                  #Split array into two parts, and only allow adding|deleting between them
                                  #Elements can be moved from one part to another
                                  #Efficient when modification usually happen around the same place, e.g. in a text editor

HASHED ARRAY TREE (HAT) ==>       #Possible dynamic array allocation.
                                  #Using an iliffe vector ("directory") pointing to arrays ("leaves")
                                  #Can allocate memory by:
                                  #  - geometric expansion on directory
                                  #  - geometric expansion on all leaves
                                  #  - combinations:
                                  #     - only expanding directory avoid any copy, but makes iteration slower
                                  #     - expanding both but delaying allocating memory for new leaves until needed
                                  #        - waste only O(√n) space
                                  #Can deallocate memory by:
                                  #  - dealllocate leaf when empty
                                  #  - restructure whole array to smaller directory and leaves when smaller size
                                  #     - e.g. when scaling factor 2, when 1/8 full, can half both directory and leaves,
                                  #       so it becomes 1/2 full

FREE LIST ==>                     #Possible dynamic array allocation.
                                  #Using unrolled linked lists, i.e. linked list to arrays.

MEMORY POOL ==>                   #Possible dynamic array allocation.
                                  #Allocate big chunk of memory and allocate it to objects on demand.

BUDDY MEMORY ALLOCATION ==>       #Memory pool allocation.
                                  #For a pool of size n, divide it into N parts (usually 2).
                                  #Each part is also divided into N subparts, recursively.
                                  #The level is subdivision is the "order" m
                                  #Chunks of n/Nᵐ memory are allocated and read by striding through the array:
                                  #  - usually a data structure like a search tree is used for each order
                                  #    to quickly check whether memory chunks are available
                                  #Must set a minimum chunk size:
                                  #  - smaller size increase bookkeeping time
                                  #  - bigger size increase fragmentation


                                             /=+===============================+=\
                                            /  :                               :  \
                                            )==:         SPARSE ARRAY          :==(
                                            \  :_______________________________:  /
                                             \=+===============================+=/


SPARSE ARRAY ==>                  #Array that has many empty elements.
                                  #"Sparsity"/"density"/"load factor" is percentage of [non-]empty elements.
                                  #Can save space using any of the following methods.

DICTIONARY OF KEYS (DOK) ==>      #Method to save space with sparse arrays.
                                  #Associative array with indices as keys.
                                  #Slower on ordered iteration as it is not ordered

COORDINATE LIST (COO) ==>         #Method to save space with sparse arrays.
                                  #Association list of (INDEX, VAL) pairs
                                  #Slower for writes, as it requires ordering

ARRAY BIT MAPPING ==>             #COO where:
                                  #  - using AoS, i.e. INDEX is stored in an first array and VAL in a second array
                                  #  - INDEX array uses a bit array
                                  #     - where 1 bit = 1 possible INDEX
                                  #  - max size is static
                                  #     - e.g. 256 ASCII values
                                  #Very space efficient
                                  #Only makes sense when number of INDEXes is bounded and density is high

LIST OF LIST (LIL) ==>            #COO using an "iliffe vector" (one level of pointers per dimension)
                                  #For matrix only.

COMPRESSED SPARSE ROW (CSR) ==>   #Method to save space with sparse arrays.
                                  #Also called "compressed row storage" (CRS) / "Yale format"
                                  #For matrix only.
                                  #Three arrays, for all non-empty elements:
                                  #  - array A: values of elements
                                  #  - array IA: number of elements per non-empty row. Must append 0.
                                  #  - array JA: column of elements
                                  #Takes least space (compresses row information), but is slowest on read|write
                                  #"New format":
                                  #  - concatenating IA and JA
                                  #"Compressed column storage" (CCS) / "Harwell-Boeing":
                                  #  - column-major instead of row-major


                                             /=+===============================+=\
                                            /  :                               :  \
                                            )==:            RECORD             :==(
                                            \  :_______________________________:  /
                                             \=+===============================+=/


RECORD ==>                        #Aggregates items ("fields"/"members"/"elements"/"column") of different types
                                  #Also called "tuple", "struct[ure]" or "row"
                                  #Items size can be fixed or dynamic.
                                  #Keys:
                                  #  - fields that serve as identifier
                                  #  - "primary key" are unique keys
                                  #According to size, called:
                                  #  - "singleton", "pair", "triplet", "quad", "quint", etc.
                                  #  - "monuple", "couple", "triple", "quadruple", "quintuple", etc.
                                  
RECORD + ARRAY ==>                #Can either:
                                  #  - array of records|structures (AoS)
                                  #     - most used
                                  #     - better when usually accessing and iterating over all fields at a time
                                  #  - record|structures of arrays (SoA) / "interleaving"
                                  #     - better when usually accessing and iterating over one field at a time
                                  #        - including SIMD operations
                                  #        - can save space with:
                                  #           - better low-level byte alignment
                                  #           - indices taking less space than string keys
                                  #     - each array is a "parallel array"
                                  #  - record of AoS:
                                  #     - better when usually accessing and iterating over few fields at a time