1
-
2
-
3
1
<!-- TITLE -->
4
2
# Positional Encoding Benchmark for Time Series Classification
5
3
4
+ [ ![ arXiv] ( https://img.shields.io/badge/arXiv-2502.12370-b31b1b.svg )] ( https://arxiv.org/abs/2502.12370 )
5
+ [ ![ License: MIT] ( https://img.shields.io/badge/License-MIT-yellow.svg )] ( https://opensource.org/licenses/MIT )
6
+ [ ![ Python 3.10] ( https://img.shields.io/badge/python-3.10-blue.svg )] ( https://www.python.org/downloads/release/python-3100/ )
7
+ [ ![ PyTorch] ( https://img.shields.io/badge/PyTorch-2.4.1-ee4c2c.svg )] ( https://pytorch.org/ )
8
+
6
9
This repository provides a comprehensive evaluation framework for positional encoding methods in transformer-based time series models, along with implementations and benchmarking results.
7
10
8
11
Our work is available on arXiv: [ Positional Encoding in Transformer-Based Time Series Models: A Survey] ( https://arxiv.org/abs/2502.12370 )
@@ -14,43 +17,28 @@ We present a systematic analysis of positional encoding methods evaluated on two
14
17
2 . Time Series Transformer with Patch Embedding
15
18
16
19
17
-
18
20
### Positional Encoding Methods
19
21
We implement and evaluate eight positional encoding methods:
20
22
21
- | Method | Type | Injection Technique | Parameters |
22
- | --------| ------| -------------------| ------------|
23
- | Sinusoidal PE | Absolute | Additive | 0 |
24
- | Learnable PE | Absolute | Additive | L×d |
25
- | RPE | Relative | MAM | 2(2L-1)dl |
26
- | tAPE | Absolute | Additive | Ld |
27
- | eRPE | Relative | MAM | (L²+L)l |
28
- | TUPE | Rel+Abs | MAM | 2dl |
29
- | ConvSPE | Relative | MAM | 3Kdh+dl |
30
- | T-PE | Rel+Abs | Combined | 2d²l/h+(2L+2l)d |
31
-
32
- Where:
33
- - L: sequence length
34
- - d: embedding dimension
35
- - h: number of attention heads
36
- - K: kernel size
37
- - l: number of layers
38
-
39
- ## Dataset Characteristics
40
-
41
- | Dataset | Train Size | Test Size | Length | Classes | Channels | Type |
42
- | ---------| ------------| -----------| ---------| ----------| -----------| ------|
43
- | Sleep | 478,785 | 90,315 | 178 | 5 | 1 | EEG |
44
- | ElectricDevices | 8,926 | 7,711 | 96 | 7 | 1 | Device |
45
- | FaceDetection | 5,890 | 3,524 | 62 | 2 | 144 | EEG |
46
- | MelbournePedestrian | 1,194 | 2,439 | 24 | 10 | 1 | Traffic |
47
- | SharePriceIncrease | 965 | 965 | 60 | 2 | 1 | Financial |
48
- | LSST | 2,459 | 2,466 | 36 | 14 | 6 | Other |
49
- | RacketSports | 151 | 152 | 30 | 4 | 6 | HAR |
50
- | SelfRegulationSCP1 | 268 | 293 | 896 | 2 | 6 | EEG |
51
- | UniMiB-SHAR | 4,601 | 1,524 | 151 | 9 | 3 | HAR |
52
- | RoomOccupancy | 8,103 | 2,026 | 30 | 4 | 18 | Sensor |
53
- | EMGGestures | 1,800 | 450 | 30 | 8 | 9 | EMG |
23
+ | Method | Type | Inject. | Learn. | Params | Memory | Complex. |
24
+ | --------| ------| ---------| --------| ---------| ---------| ----------|
25
+ | Sin. PE | Abs | Add | F | 0 | O(Ld) | O(Ld) |
26
+ | Learn. PE | Abs | Add | L | Ld | O(Ld) | O(Ld) |
27
+ | RPE | Rel | Att | F | (2L−1)dl | O(L²d) | O(L²d) |
28
+ | tAPE | Abs | Add | F | 0 | O(Ld) | O(Ld) |
29
+ | RoPE | Hyb | Att | F | 0 | O(Ld) | O(L²d) |
30
+ | eRPE | Rel | Att | L | 2L − 1 | O(L² + L) | O(L²) |
31
+ | TUPE | Hyb | Att | L | 2dl | O(Ld+d²) | O(Ld+d²) |
32
+ | ConvSPE | Rel | Att | L | 3Kdh+dl | O(LKR) | O(LKR) |
33
+ | T-PE | Hyb | Comb | M | 2d²l/h+(2L+2l)d | O(L²d) | O(L²d) |
34
+ | ALiBi | Rel | Att | F | 0 | O(L²h) | O(L²h) |
35
+
36
+ ** Legend:**
37
+ - Abs=Absolute, Rel=Relative, Hyb=Hybrid
38
+ - Add=Additive, Att=Attention, Comb=Combined
39
+ - F=Fixed, L=Learnable, M=Mixed
40
+ - L: sequence length, d: embedding dimension, h: attention heads, K: kernel size, l: layers
41
+
54
42
55
43
## Dependencies
56
44
- Python 3.10
@@ -88,16 +76,16 @@ Our experimental evaluation encompasses eight distinct positional encoding metho
88
76
89
77
### Key Findings
90
78
91
- #### 1. Sequence Length Impact
79
+ #### 📊 Sequence Length Impact
92
80
- ** Long sequences** (>100 steps): 5-6% improvement with advanced methods
93
81
- ** Medium sequences** (50-100 steps): 3-4% improvement
94
82
- ** Short sequences** (<50 steps): 2-3% improvement
95
83
96
- #### 2. Architecture Performance
84
+ #### ⚙️ Architecture Performance
97
85
- ** TST** : More distinct performance gaps
98
86
- ** Patch Embedding** : More balanced performance among top methods
99
87
100
- #### 3. Average Rankings
88
+ #### 🏆 Average Rankings
101
89
- ** SPE** : 1.727 (batch norm), 2.090 (patch embed)
102
90
- ** TUPE** : 1.909 (batch norm), 2.272 (patch embed)
103
91
- ** T-PE** : 2.636 (batch norm), 2.363 (patch embed)
@@ -114,6 +102,48 @@ Our experimental evaluation encompasses eight distinct positional encoding metho
114
102
- TUPE maintains competitive accuracy
115
103
- Relative encoding methods show improved local pattern recognition
116
104
105
+
106
+ ### Computational Efficiency Analysis
107
+
108
+ Training time measurements on Melbourne Pedestrian dataset (100 epochs):
109
+
110
+ | Method | Time (s) | Ratio | Accuracy |
111
+ | --------| ----------| -------| ----------|
112
+ | Sin. PE | 48.2 | 1.00 | 66.8% |
113
+ | Learn. PE | 60.1 | 1.25 | 70.2% |
114
+ | RPE | 128.4 | 2.66 | 72.4% |
115
+ | tAPE | 54.0 | 1.12 | 68.2% |
116
+ | RoPE | 67.8 | 1.41 | 69.0% |
117
+ | eRPE | 142.8 | 2.96 | 73.3% |
118
+ | TUPE | 118.3 | 2.45 | 74.5% |
119
+ | ConvSPE | 101.6 | 2.11 | ** 75.3%** |
120
+ | T-PE | 134.7 | 2.79 | 74.2% |
121
+ | ALiBi | 93.8 | 1.94 | 67.2% |
122
+
123
+ ** ConvSPE emerges as the efficiency frontier leader** , achieving highest accuracy (75.3%) with reasonable computational overhead (2.11×).
124
+
125
+ ### Method Selection Guidelines
126
+
127
+ #### Sequence Length-Based Recommendations
128
+ - ** Short sequences (L ≤ 50)** : Learnable PE or tAPE (minimal gains don't justify computational overhead)
129
+ - ** Medium sequences (50 < L ≤ 100)** : SPE or eRPE (3-4% accuracy improvements)
130
+ - ** Long sequences (L > 100)** : TUPE for complex patterns, SPE for regular data, ConvSPE for linear complexity
131
+
132
+ #### Domain-Specific Guidelines
133
+ - ** Biomedical signals** : TUPE > SPE > T-PE (physiological complexity handling)
134
+ - ** Environmental sensors** : SPE > eRPE (regular sampling patterns)
135
+ - ** High-dimensional data (d > 5)** : Advanced methods consistently outperform simple approaches
136
+
137
+ #### Computational Resource Framework
138
+ - ** Limited resources** : Sinusoidal PE, tAPE (O(Ld) complexity)
139
+ - ** Balanced scenarios** : SPE, TUPE (optimal accuracy-efficiency trade-off)
140
+ - ** Performance-critical** : TUPE, SPE regardless of computational cost
141
+
142
+ #### Architecture-Specific Considerations
143
+ - ** Time Series Transformers** : Prioritize content-position separation methods (TUPE) and relative positioning (eRPE, SPE)
144
+ - ** Patch Embedding Transformers** : Multi-scale approaches (T-PE, ConvSPE) handle hierarchical processing more effectively
145
+
146
+
117
147
<!-- CONTRIBUTING -->
118
148
## Contributing
119
149
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
0 commit comments