Update readme

habibirani · habibirani · commit 724a8ed6163a · 2025-09-04T18:20:29.000-05:00
diff --git a/README.md b/README.md
@@ -1,8 +1,11 @@
-
-
 <!-- TITLE -->
 # Positional Encoding Benchmark for Time Series Classification
 
+[![arXiv](https://img.shields.io/badge/arXiv-2502.12370-b31b1b.svg)](https://arxiv.org/abs/2502.12370)
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
+[![Python 3.10](https://img.shields.io/badge/python-3.10-blue.svg)](https://www.python.org/downloads/release/python-3100/)
+[![PyTorch](https://img.shields.io/badge/PyTorch-2.4.1-ee4c2c.svg)](https://pytorch.org/)
+
 This repository provides a comprehensive evaluation framework for positional encoding methods in transformer-based time series models, along with implementations and benchmarking results.
 
 Our work is available on arXiv: [Positional Encoding in Transformer-Based Time Series Models: A Survey](https://arxiv.org/abs/2502.12370)
@@ -14,43 +17,28 @@ We present a systematic analysis of positional encoding methods evaluated on two
 2. Time Series Transformer with Patch Embedding 
 
 
-
 ### Positional Encoding Methods
 We implement and evaluate eight positional encoding methods:
 
-| Method | Type | Injection Technique | Parameters |
-|--------|------|-------------------|------------|
-| Sinusoidal PE | Absolute | Additive | 0 |
-| Learnable PE | Absolute | Additive | L×d |
-| RPE | Relative | MAM | 2(2L-1)dl |
-| tAPE | Absolute | Additive | Ld |
-| eRPE | Relative | MAM | (L²+L)l |
-| TUPE | Rel+Abs | MAM | 2dl |
-| ConvSPE | Relative | MAM | 3Kdh+dl |
-| T-PE | Rel+Abs | Combined | 2d²l/h+(2L+2l)d |
-
-Where:
-- L: sequence length
-- d: embedding dimension
-- h: number of attention heads
-- K: kernel size
-- l: number of layers
-
-## Dataset Characteristics
-
-| Dataset | Train Size | Test Size | Length | Classes | Channels | Type |
-|---------|------------|-----------|---------|----------|-----------|------|
-| Sleep | 478,785 | 90,315 | 178 | 5 | 1 | EEG |
-| ElectricDevices | 8,926 | 7,711 | 96 | 7 | 1 | Device |
-| FaceDetection | 5,890 | 3,524 | 62 | 2 | 144 | EEG |
-| MelbournePedestrian | 1,194 | 2,439 | 24 | 10 | 1 | Traffic |
-| SharePriceIncrease | 965 | 965 | 60 | 2 | 1 | Financial |
-| LSST | 2,459 | 2,466 | 36 | 14 | 6 | Other |
-| RacketSports | 151 | 152 | 30 | 4 | 6 | HAR |
-| SelfRegulationSCP1 | 268 | 293 | 896 | 2 | 6 | EEG |
-| UniMiB-SHAR | 4,601 | 1,524 | 151 | 9 | 3 | HAR |
-| RoomOccupancy | 8,103 | 2,026 | 30 | 4 | 18 | Sensor |
-| EMGGestures | 1,800 | 450 | 30 | 8 | 9 | EMG |
+| Method | Type | Inject. | Learn. | Params | Memory | Complex. |
+|--------|------|---------|--------|---------|---------|----------|
+| Sin. PE | Abs | Add | F | 0 | O(Ld) | O(Ld) |
+| Learn. PE | Abs | Add | L | Ld | O(Ld) | O(Ld) |
+| RPE | Rel | Att | F | (2L−1)dl | O(L²d) | O(L²d) |
+| tAPE | Abs | Add | F | 0 | O(Ld) | O(Ld) |
+| RoPE | Hyb | Att | F | 0 | O(Ld) | O(L²d) |
+| eRPE | Rel | Att | L | 2L − 1 | O(L² + L) | O(L²) |
+| TUPE | Hyb | Att | L | 2dl | O(Ld+d²) | O(Ld+d²) |
+| ConvSPE | Rel | Att | L | 3Kdh+dl | O(LKR) | O(LKR) |
+| T-PE | Hyb | Comb | M | 2d²l/h+(2L+2l)d | O(L²d) | O(L²d) |
+| ALiBi | Rel | Att | F | 0 | O(L²h) | O(L²h) |
+
+**Legend:**
+- Abs=Absolute, Rel=Relative, Hyb=Hybrid
+- Add=Additive, Att=Attention, Comb=Combined
+- F=Fixed, L=Learnable, M=Mixed
+- L: sequence length, d: embedding dimension, h: attention heads, K: kernel size, l: layers
+
 
 ## Dependencies
 - Python 3.10
@@ -88,16 +76,16 @@ Our experimental evaluation encompasses eight distinct positional encoding metho
 
 ### Key Findings
 
-#### 1. Sequence Length Impact
+#### 📊 Sequence Length Impact
 - **Long sequences** (>100 steps): 5-6% improvement with advanced methods
 - **Medium sequences** (50-100 steps): 3-4% improvement
 - **Short sequences** (<50 steps): 2-3% improvement
 
-#### 2. Architecture Performance
+#### ⚙️ Architecture Performance
 - **TST**: More distinct performance gaps
 - **Patch Embedding**: More balanced performance among top methods
 
-#### 3. Average Rankings
+#### 🏆 Average Rankings
 - **SPE**: 1.727 (batch norm), 2.090 (patch embed)
 - **TUPE**: 1.909 (batch norm), 2.272 (patch embed)
 - **T-PE**: 2.636 (batch norm), 2.363 (patch embed)
@@ -114,6 +102,48 @@ Our experimental evaluation encompasses eight distinct positional encoding metho
 - TUPE maintains competitive accuracy
 - Relative encoding methods show improved local pattern recognition
 
+
+### Computational Efficiency Analysis
+
+Training time measurements on Melbourne Pedestrian dataset (100 epochs):
+
+| Method | Time (s) | Ratio | Accuracy |
+|--------|----------|-------|----------|
+| Sin. PE | 48.2 | 1.00 | 66.8% |
+| Learn. PE | 60.1 | 1.25 | 70.2% |
+| RPE | 128.4 | 2.66 | 72.4% |
+| tAPE | 54.0 | 1.12 | 68.2% |
+| RoPE | 67.8 | 1.41 | 69.0% |
+| eRPE | 142.8 | 2.96 | 73.3% |
+| TUPE | 118.3 | 2.45 | 74.5% |
+| ConvSPE | 101.6 | 2.11 | **75.3%** |
+| T-PE | 134.7 | 2.79 | 74.2% |
+| ALiBi | 93.8 | 1.94 | 67.2% |
+
+**ConvSPE emerges as the efficiency frontier leader**, achieving highest accuracy (75.3%) with reasonable computational overhead (2.11×).
+
+### Method Selection Guidelines
+
+#### Sequence Length-Based Recommendations
+- **Short sequences (L ≤ 50)**: Learnable PE or tAPE (minimal gains don't justify computational overhead)
+- **Medium sequences (50 < L ≤ 100)**: SPE or eRPE (3-4% accuracy improvements)
+- **Long sequences (L > 100)**: TUPE for complex patterns, SPE for regular data, ConvSPE for linear complexity
+
+#### Domain-Specific Guidelines
+- **Biomedical signals**: TUPE > SPE > T-PE (physiological complexity handling)
+- **Environmental sensors**: SPE > eRPE (regular sampling patterns)
+- **High-dimensional data (d > 5)**: Advanced methods consistently outperform simple approaches
+
+#### Computational Resource Framework
+- **Limited resources**: Sinusoidal PE, tAPE (O(Ld) complexity)
+- **Balanced scenarios**: SPE, TUPE (optimal accuracy-efficiency trade-off)
+- **Performance-critical**: TUPE, SPE regardless of computational cost
+
+#### Architecture-Specific Considerations
+- **Time Series Transformers**: Prioritize content-position separation methods (TUPE) and relative positioning (eRPE, SPE)
+- **Patch Embedding Transformers**: Multi-scale approaches (T-PE, ConvSPE) handle hierarchical processing more effectively
+
+
 <!-- CONTRIBUTING -->
 ## Contributing
 Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.