Skip to content

Commit 724a8ed

Browse files
committed
Update readme
1 parent 12b718b commit 724a8ed

File tree

1 file changed

+69
-39
lines changed

1 file changed

+69
-39
lines changed

README.md

Lines changed: 69 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,11 @@
1-
2-
31
<!-- TITLE -->
42
# Positional Encoding Benchmark for Time Series Classification
53

4+
[![arXiv](https://img.shields.io/badge/arXiv-2502.12370-b31b1b.svg)](https://arxiv.org/abs/2502.12370)
5+
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
6+
[![Python 3.10](https://img.shields.io/badge/python-3.10-blue.svg)](https://www.python.org/downloads/release/python-3100/)
7+
[![PyTorch](https://img.shields.io/badge/PyTorch-2.4.1-ee4c2c.svg)](https://pytorch.org/)
8+
69
This repository provides a comprehensive evaluation framework for positional encoding methods in transformer-based time series models, along with implementations and benchmarking results.
710

811
Our work is available on arXiv: [Positional Encoding in Transformer-Based Time Series Models: A Survey](https://arxiv.org/abs/2502.12370)
@@ -14,43 +17,28 @@ We present a systematic analysis of positional encoding methods evaluated on two
1417
2. Time Series Transformer with Patch Embedding
1518

1619

17-
1820
### Positional Encoding Methods
1921
We implement and evaluate eight positional encoding methods:
2022

21-
| Method | Type | Injection Technique | Parameters |
22-
|--------|------|-------------------|------------|
23-
| Sinusoidal PE | Absolute | Additive | 0 |
24-
| Learnable PE | Absolute | Additive | L×d |
25-
| RPE | Relative | MAM | 2(2L-1)dl |
26-
| tAPE | Absolute | Additive | Ld |
27-
| eRPE | Relative | MAM | (L²+L)l |
28-
| TUPE | Rel+Abs | MAM | 2dl |
29-
| ConvSPE | Relative | MAM | 3Kdh+dl |
30-
| T-PE | Rel+Abs | Combined | 2d²l/h+(2L+2l)d |
31-
32-
Where:
33-
- L: sequence length
34-
- d: embedding dimension
35-
- h: number of attention heads
36-
- K: kernel size
37-
- l: number of layers
38-
39-
## Dataset Characteristics
40-
41-
| Dataset | Train Size | Test Size | Length | Classes | Channels | Type |
42-
|---------|------------|-----------|---------|----------|-----------|------|
43-
| Sleep | 478,785 | 90,315 | 178 | 5 | 1 | EEG |
44-
| ElectricDevices | 8,926 | 7,711 | 96 | 7 | 1 | Device |
45-
| FaceDetection | 5,890 | 3,524 | 62 | 2 | 144 | EEG |
46-
| MelbournePedestrian | 1,194 | 2,439 | 24 | 10 | 1 | Traffic |
47-
| SharePriceIncrease | 965 | 965 | 60 | 2 | 1 | Financial |
48-
| LSST | 2,459 | 2,466 | 36 | 14 | 6 | Other |
49-
| RacketSports | 151 | 152 | 30 | 4 | 6 | HAR |
50-
| SelfRegulationSCP1 | 268 | 293 | 896 | 2 | 6 | EEG |
51-
| UniMiB-SHAR | 4,601 | 1,524 | 151 | 9 | 3 | HAR |
52-
| RoomOccupancy | 8,103 | 2,026 | 30 | 4 | 18 | Sensor |
53-
| EMGGestures | 1,800 | 450 | 30 | 8 | 9 | EMG |
23+
| Method | Type | Inject. | Learn. | Params | Memory | Complex. |
24+
|--------|------|---------|--------|---------|---------|----------|
25+
| Sin. PE | Abs | Add | F | 0 | O(Ld) | O(Ld) |
26+
| Learn. PE | Abs | Add | L | Ld | O(Ld) | O(Ld) |
27+
| RPE | Rel | Att | F | (2L−1)dl | O(L²d) | O(L²d) |
28+
| tAPE | Abs | Add | F | 0 | O(Ld) | O(Ld) |
29+
| RoPE | Hyb | Att | F | 0 | O(Ld) | O(L²d) |
30+
| eRPE | Rel | Att | L | 2L − 1 | O(L² + L) | O(L²) |
31+
| TUPE | Hyb | Att | L | 2dl | O(Ld+d²) | O(Ld+d²) |
32+
| ConvSPE | Rel | Att | L | 3Kdh+dl | O(LKR) | O(LKR) |
33+
| T-PE | Hyb | Comb | M | 2d²l/h+(2L+2l)d | O(L²d) | O(L²d) |
34+
| ALiBi | Rel | Att | F | 0 | O(L²h) | O(L²h) |
35+
36+
**Legend:**
37+
- Abs=Absolute, Rel=Relative, Hyb=Hybrid
38+
- Add=Additive, Att=Attention, Comb=Combined
39+
- F=Fixed, L=Learnable, M=Mixed
40+
- L: sequence length, d: embedding dimension, h: attention heads, K: kernel size, l: layers
41+
5442

5543
## Dependencies
5644
- Python 3.10
@@ -88,16 +76,16 @@ Our experimental evaluation encompasses eight distinct positional encoding metho
8876

8977
### Key Findings
9078

91-
#### 1. Sequence Length Impact
79+
#### 📊 Sequence Length Impact
9280
- **Long sequences** (>100 steps): 5-6% improvement with advanced methods
9381
- **Medium sequences** (50-100 steps): 3-4% improvement
9482
- **Short sequences** (<50 steps): 2-3% improvement
9583

96-
#### 2. Architecture Performance
84+
#### ⚙️ Architecture Performance
9785
- **TST**: More distinct performance gaps
9886
- **Patch Embedding**: More balanced performance among top methods
9987

100-
#### 3. Average Rankings
88+
#### 🏆 Average Rankings
10189
- **SPE**: 1.727 (batch norm), 2.090 (patch embed)
10290
- **TUPE**: 1.909 (batch norm), 2.272 (patch embed)
10391
- **T-PE**: 2.636 (batch norm), 2.363 (patch embed)
@@ -114,6 +102,48 @@ Our experimental evaluation encompasses eight distinct positional encoding metho
114102
- TUPE maintains competitive accuracy
115103
- Relative encoding methods show improved local pattern recognition
116104

105+
106+
### Computational Efficiency Analysis
107+
108+
Training time measurements on Melbourne Pedestrian dataset (100 epochs):
109+
110+
| Method | Time (s) | Ratio | Accuracy |
111+
|--------|----------|-------|----------|
112+
| Sin. PE | 48.2 | 1.00 | 66.8% |
113+
| Learn. PE | 60.1 | 1.25 | 70.2% |
114+
| RPE | 128.4 | 2.66 | 72.4% |
115+
| tAPE | 54.0 | 1.12 | 68.2% |
116+
| RoPE | 67.8 | 1.41 | 69.0% |
117+
| eRPE | 142.8 | 2.96 | 73.3% |
118+
| TUPE | 118.3 | 2.45 | 74.5% |
119+
| ConvSPE | 101.6 | 2.11 | **75.3%** |
120+
| T-PE | 134.7 | 2.79 | 74.2% |
121+
| ALiBi | 93.8 | 1.94 | 67.2% |
122+
123+
**ConvSPE emerges as the efficiency frontier leader**, achieving highest accuracy (75.3%) with reasonable computational overhead (2.11×).
124+
125+
### Method Selection Guidelines
126+
127+
#### Sequence Length-Based Recommendations
128+
- **Short sequences (L ≤ 50)**: Learnable PE or tAPE (minimal gains don't justify computational overhead)
129+
- **Medium sequences (50 < L ≤ 100)**: SPE or eRPE (3-4% accuracy improvements)
130+
- **Long sequences (L > 100)**: TUPE for complex patterns, SPE for regular data, ConvSPE for linear complexity
131+
132+
#### Domain-Specific Guidelines
133+
- **Biomedical signals**: TUPE > SPE > T-PE (physiological complexity handling)
134+
- **Environmental sensors**: SPE > eRPE (regular sampling patterns)
135+
- **High-dimensional data (d > 5)**: Advanced methods consistently outperform simple approaches
136+
137+
#### Computational Resource Framework
138+
- **Limited resources**: Sinusoidal PE, tAPE (O(Ld) complexity)
139+
- **Balanced scenarios**: SPE, TUPE (optimal accuracy-efficiency trade-off)
140+
- **Performance-critical**: TUPE, SPE regardless of computational cost
141+
142+
#### Architecture-Specific Considerations
143+
- **Time Series Transformers**: Prioritize content-position separation methods (TUPE) and relative positioning (eRPE, SPE)
144+
- **Patch Embedding Transformers**: Multi-scale approaches (T-PE, ConvSPE) handle hierarchical processing more effectively
145+
146+
117147
<!-- CONTRIBUTING -->
118148
## Contributing
119149
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

0 commit comments

Comments
 (0)