|
| 1 | +# Snowplow Python Tracker - CLAUDE.md |
| 2 | + |
| 3 | +## Project Overview |
| 4 | + |
| 5 | +The Snowplow Python Tracker is a public Python library for sending analytics events to Snowplow collectors. It enables developers to integrate Snowplow analytics into Python applications, games, and web servers. The library provides a robust event tracking system with support for various event types, custom contexts, and reliable event delivery through configurable emitters. |
| 6 | + |
| 7 | +**Key Technologies:** |
| 8 | +- Python 3.8+ (supported versions: 3.8-3.13) |
| 9 | +- requests library for HTTP communication |
| 10 | +- typing_extensions for enhanced type hints |
| 11 | +- Event-driven architecture with schema validation |
| 12 | +- Asynchronous and synchronous event emission |
| 13 | + |
| 14 | +## Development Commands |
| 15 | + |
| 16 | +```bash |
| 17 | +# Install dependencies |
| 18 | +pip install -r requirements-test.txt |
| 19 | + |
| 20 | +# Run tests |
| 21 | +./run-tests.sh |
| 22 | + |
| 23 | +# Run specific test module |
| 24 | +python -m pytest snowplow_tracker/test/unit/test_tracker.py |
| 25 | + |
| 26 | +# Run integration tests |
| 27 | +python -m pytest snowplow_tracker/test/integration/ |
| 28 | + |
| 29 | +# Install package in development mode |
| 30 | +pip install -e . |
| 31 | + |
| 32 | +# Build Docker image for testing |
| 33 | +docker build -t snowplow-python-tracker . |
| 34 | +docker run snowplow-python-tracker |
| 35 | +``` |
| 36 | + |
| 37 | +## Architecture |
| 38 | + |
| 39 | +The tracker follows a layered architecture with clear separation of concerns: |
| 40 | + |
| 41 | +``` |
| 42 | +snowplow_tracker/ |
| 43 | +├── Core Components |
| 44 | +│ ├── tracker.py # Main Tracker class orchestrating events |
| 45 | +│ ├── snowplow.py # High-level API for tracker management |
| 46 | +│ └── subject.py # User/device context management |
| 47 | +├── Event Layer (events/) |
| 48 | +│ ├── event.py # Base Event class |
| 49 | +│ ├── page_view.py # PageView event |
| 50 | +│ ├── structured_event.py # Structured events |
| 51 | +│ └── self_describing.py # Custom schema events |
| 52 | +├── Emission Layer |
| 53 | +│ ├── emitters.py # Sync/Async event transmission |
| 54 | +│ ├── event_store.py # Event buffering and persistence |
| 55 | +│ └── payload.py # Event payload construction |
| 56 | +├── Configuration |
| 57 | +│ ├── tracker_configuration.py |
| 58 | +│ └── emitter_configuration.py |
| 59 | +└── Validation |
| 60 | + ├── contracts.py # Runtime validation |
| 61 | + └── typing.py # Type definitions |
| 62 | +``` |
| 63 | + |
| 64 | +## Core Architectural Principles |
| 65 | + |
| 66 | +1. **Schema-First Design**: All events conform to Iglu schemas for consistency |
| 67 | +2. **Separation of Concerns**: Event creation, validation, and emission are separate |
| 68 | +3. **Configuration Objects**: Use dedicated configuration classes, not raw dictionaries |
| 69 | +4. **Type Safety**: Extensive use of type hints and Protocol classes |
| 70 | +5. **Fail-Safe Delivery**: Events are buffered and retried on failure |
| 71 | +6. **Immutability**: Event objects are largely immutable after creation |
| 72 | + |
| 73 | +## Layer Organization & Responsibilities |
| 74 | + |
| 75 | +### Application Layer (snowplow.py) |
| 76 | +- Singleton pattern for global tracker management |
| 77 | +- Factory methods for tracker creation |
| 78 | +- Namespace-based tracker registry |
| 79 | + |
| 80 | +### Domain Layer (tracker.py, events/) |
| 81 | +- Event creation and validation |
| 82 | +- Subject (user/device) context management |
| 83 | +- Event enrichment with standard fields |
| 84 | + |
| 85 | +### Infrastructure Layer (emitters.py, event_store.py) |
| 86 | +- HTTP communication with collectors |
| 87 | +- Event buffering and retry logic |
| 88 | +- Async/sync emission strategies |
| 89 | + |
| 90 | +### Cross-Cutting (contracts.py, typing.py) |
| 91 | +- Runtime validation with togglable contracts |
| 92 | +- Shared type definitions and protocols |
| 93 | + |
| 94 | +## Critical Import Patterns |
| 95 | + |
| 96 | +```python |
| 97 | +# ✅ Import from package root for public API |
| 98 | +from snowplow_tracker import Snowplow, Tracker, Subject |
| 99 | +from snowplow_tracker import EmitterConfiguration, TrackerConfiguration |
| 100 | + |
| 101 | +# ✅ Import specific event classes |
| 102 | +from snowplow_tracker.events import PageView, StructuredEvent |
| 103 | + |
| 104 | +# ❌ Don't import from internal modules |
| 105 | +from snowplow_tracker.emitters import Requester # Internal class |
| 106 | + |
| 107 | +# ✅ Use typing module for type hints |
| 108 | +from snowplow_tracker.typing import PayloadDict, Method |
| 109 | +``` |
| 110 | + |
| 111 | +## Essential Library Patterns |
| 112 | + |
| 113 | +### Tracker Initialization Pattern |
| 114 | +```python |
| 115 | +# ✅ Use Snowplow factory with configuration objects |
| 116 | +tracker = Snowplow.create_tracker( |
| 117 | + namespace="my_app", |
| 118 | + endpoint="https://collector.example.com", |
| 119 | + tracker_config=TrackerConfiguration(encode_base64=True), |
| 120 | + emitter_config=EmitterConfiguration(batch_size=10) |
| 121 | +) |
| 122 | + |
| 123 | +# ❌ Don't instantiate Tracker directly without Snowplow |
| 124 | +tracker = Tracker("namespace", emitter) # Missing registration |
| 125 | +``` |
| 126 | + |
| 127 | +### Event Creation Pattern |
| 128 | +```python |
| 129 | +# ✅ Use event classes with named parameters |
| 130 | +page_view = PageView( |
| 131 | + page_url="https://example.com", |
| 132 | + page_title="Homepage" |
| 133 | +) |
| 134 | + |
| 135 | +# ✅ Add contexts to events |
| 136 | +event.context = [SelfDescribingJson(schema, data)] |
| 137 | + |
| 138 | +# ❌ Don't modify event payload directly |
| 139 | +event.payload.add("custom", "value") # Breaks schema validation |
| 140 | +``` |
| 141 | + |
| 142 | +### Subject Management Pattern |
| 143 | +```python |
| 144 | +# ✅ Set subject at tracker or event level |
| 145 | +subject = Subject() |
| 146 | +subject.set_user_id("user123") |
| 147 | +tracker = Snowplow.create_tracker(..., subject=subject) |
| 148 | + |
| 149 | +# ✅ Override subject per event |
| 150 | +event = PageView(..., event_subject=Subject()) |
| 151 | + |
| 152 | +# ❌ Don't modify subject after tracker creation |
| 153 | +tracker.subject.set_user_id("new_id") # Not thread-safe |
| 154 | +``` |
| 155 | + |
| 156 | +### Emitter Configuration Pattern |
| 157 | +```python |
| 158 | +# ✅ Configure retry and buffering behavior |
| 159 | +config = EmitterConfiguration( |
| 160 | + batch_size=50, |
| 161 | + buffer_capacity=10000, |
| 162 | + custom_retry_codes={429: True, 500: True} |
| 163 | +) |
| 164 | + |
| 165 | +# ❌ Don't use magic numbers |
| 166 | +emitter = Emitter(endpoint, 443, "post", 100) # Use config object |
| 167 | +``` |
| 168 | + |
| 169 | +## Model Organization Pattern |
| 170 | + |
| 171 | +### Event Hierarchy |
| 172 | +```python |
| 173 | +Event (base class) |
| 174 | +├── PageView # Web page views |
| 175 | +├── PagePing # Page engagement tracking |
| 176 | +├── ScreenView # Mobile screen views |
| 177 | +├── StructuredEvent # Category/action/label/property/value events |
| 178 | +└── SelfDescribing # Custom schema events |
| 179 | +``` |
| 180 | + |
| 181 | +### Data Structures |
| 182 | +```python |
| 183 | +# SelfDescribingJson for custom contexts |
| 184 | +context = SelfDescribingJson( |
| 185 | + "iglu:com.example/context/jsonschema/1-0-0", |
| 186 | + {"key": "value"} |
| 187 | +) |
| 188 | + |
| 189 | +# Payload for event data assembly |
| 190 | +payload = Payload() |
| 191 | +payload.add("e", "pv") # Event type |
| 192 | +payload.add_dict({"aid": "app_id"}) |
| 193 | +``` |
| 194 | + |
| 195 | +## Common Pitfalls & Solutions |
| 196 | + |
| 197 | +### Contract Validation |
| 198 | +```python |
| 199 | +# ❌ Passing invalid parameters silently fails in production |
| 200 | +tracker.track_page_view("") # Empty URL |
| 201 | + |
| 202 | +# ✅ Enable contracts during development |
| 203 | +from snowplow_tracker import enable_contracts |
| 204 | +enable_contracts() |
| 205 | +``` |
| 206 | + |
| 207 | +### Event Buffering |
| 208 | +```python |
| 209 | +# ❌ Not flushing events before shutdown |
| 210 | +tracker.track(event) |
| 211 | +sys.exit() # Events lost! |
| 212 | + |
| 213 | +# ✅ Always flush before exit |
| 214 | +tracker.track(event) |
| 215 | +tracker.flush() |
| 216 | +``` |
| 217 | + |
| 218 | +### Thread Safety |
| 219 | +```python |
| 220 | +# ❌ Sharing emitter across threads |
| 221 | +emitter = Emitter(endpoint) |
| 222 | +# Multiple threads using same emitter |
| 223 | + |
| 224 | +# ✅ Use AsyncEmitter for concurrent scenarios |
| 225 | +emitter = AsyncEmitter(endpoint, thread_count=2) |
| 226 | +``` |
| 227 | + |
| 228 | +### Schema Validation |
| 229 | +```python |
| 230 | +# ❌ Hardcoding schema strings |
| 231 | +schema = "iglu:com.snowplow/event/1-0-0" |
| 232 | + |
| 233 | +# ✅ Use constants for schemas |
| 234 | +from snowplow_tracker.constants import CONTEXT_SCHEMA |
| 235 | +``` |
| 236 | + |
| 237 | +## File Structure Template |
| 238 | + |
| 239 | +``` |
| 240 | +project/ |
| 241 | +├── tracker_app.py # Application entry point |
| 242 | +├── config/ |
| 243 | +│ └── tracker_config.py # Tracker configuration |
| 244 | +├── events/ |
| 245 | +│ ├── __init__.py |
| 246 | +│ └── custom_events.py # Custom event definitions |
| 247 | +├── contexts/ |
| 248 | +│ └── custom_contexts.py # Custom context schemas |
| 249 | +└── tests/ |
| 250 | + ├── unit/ |
| 251 | + │ └── test_events.py |
| 252 | + └── integration/ |
| 253 | + └── test_emission.py |
| 254 | +``` |
| 255 | + |
| 256 | +## Testing Patterns |
| 257 | + |
| 258 | +### Unit Testing |
| 259 | +```python |
| 260 | +# ✅ Mock emitters for unit tests |
| 261 | +@mock.patch('snowplow_tracker.emitters.Emitter') |
| 262 | +def test_track_event(mock_emitter): |
| 263 | + tracker = Tracker("test", mock_emitter) |
| 264 | + tracker.track(PageView(...)) |
| 265 | + mock_emitter.input.assert_called_once() |
| 266 | +``` |
| 267 | + |
| 268 | +### Contract Testing |
| 269 | +```python |
| 270 | +# ✅ Use ContractsDisabled context manager |
| 271 | +with ContractsDisabled(): |
| 272 | + # Test invalid inputs without raising |
| 273 | + tracker.track_page_view(None) |
| 274 | +``` |
| 275 | + |
| 276 | +### Integration Testing |
| 277 | +```python |
| 278 | +# ✅ Test against mock collector |
| 279 | +def test_event_delivery(): |
| 280 | + with requests_mock.Mocker() as m: |
| 281 | + m.post("https://collector.test/com.snowplow/tp2") |
| 282 | + # Track and verify delivery |
| 283 | +``` |
| 284 | + |
| 285 | +## Configuration Best Practices |
| 286 | + |
| 287 | +### Environment-Based Configuration |
| 288 | +```python |
| 289 | +# ✅ Use environment variables |
| 290 | +import os |
| 291 | +endpoint = os.getenv("SNOWPLOW_COLLECTOR_URL") |
| 292 | +namespace = os.getenv("SNOWPLOW_NAMESPACE", "default") |
| 293 | +``` |
| 294 | + |
| 295 | +### Retry Configuration |
| 296 | +```python |
| 297 | +# ✅ Configure intelligent retry behavior |
| 298 | +EmitterConfiguration( |
| 299 | + max_retry_delay_seconds=120, |
| 300 | + custom_retry_codes={ |
| 301 | + 429: True, # Retry rate limits |
| 302 | + 500: True, # Retry server errors |
| 303 | + 400: False # Don't retry bad requests |
| 304 | + } |
| 305 | +) |
| 306 | +``` |
| 307 | + |
| 308 | +## Quick Reference |
| 309 | + |
| 310 | +### Import Checklist |
| 311 | +- [ ] Import from `snowplow_tracker` package root |
| 312 | +- [ ] Use `EmitterConfiguration` and `TrackerConfiguration` |
| 313 | +- [ ] Import specific event classes from `snowplow_tracker.events` |
| 314 | +- [ ] Use type hints from `snowplow_tracker.typing` |
| 315 | + |
| 316 | +### Event Tracking Checklist |
| 317 | +- [ ] Create tracker with `Snowplow.create_tracker()` |
| 318 | +- [ ] Configure emitter with appropriate batch size |
| 319 | +- [ ] Set subject context if tracking users |
| 320 | +- [ ] Use appropriate event class for the use case |
| 321 | +- [ ] Add custom contexts as `SelfDescribingJson` |
| 322 | +- [ ] Call `flush()` before application shutdown |
| 323 | +- [ ] Handle failures with callbacks |
| 324 | + |
| 325 | +### Common Event Types |
| 326 | +- `PageView`: Web page views |
| 327 | +- `ScreenView`: Mobile app screens |
| 328 | +- `StructuredEvent`: Generic events with 5 parameters |
| 329 | +- `SelfDescribing`: Custom schema events |
| 330 | +- `PagePing`: Engagement tracking |
| 331 | + |
| 332 | +## Contributing to CLAUDE.md |
| 333 | + |
| 334 | +When adding or updating content in this document, please follow these guidelines: |
| 335 | + |
| 336 | +### File Size Limit |
| 337 | +- **CLAUDE.md must not exceed 40KB** (currently ~19KB) |
| 338 | +- Check file size after updates: `wc -c CLAUDE.md` |
| 339 | +- Remove outdated content if approaching the limit |
| 340 | + |
| 341 | +### Code Examples |
| 342 | +- Keep all code examples **4 lines or fewer** |
| 343 | +- Focus on the essential pattern, not complete implementations |
| 344 | +- Use `// ❌` and `// ✅` to clearly show wrong vs right approaches |
| 345 | + |
| 346 | +### Content Organization |
| 347 | +- Add new patterns to existing sections when possible |
| 348 | +- Create new sections sparingly to maintain structure |
| 349 | +- Update the architectural principles section for major changes |
| 350 | +- Ensure examples follow current codebase conventions |
| 351 | + |
| 352 | +### Quality Standards |
| 353 | +- Test any new patterns in actual code before documenting |
| 354 | +- Verify imports and syntax are correct for the codebase |
| 355 | +- Keep language concise and actionable |
| 356 | +- Focus on "what" and "how", minimize "why" explanations |
| 357 | + |
| 358 | +### Multiple CLAUDE.md Files |
| 359 | +- **Directory-specific CLAUDE.md files** can be created for specialized modules |
| 360 | +- Follow the same structure and guidelines as this root CLAUDE.md |
| 361 | +- Keep them focused on directory-specific patterns and conventions |
| 362 | +- Maximum 20KB per directory-specific CLAUDE.md file |
| 363 | + |
| 364 | +### Instructions for LLMs |
| 365 | +When editing files in this repository, **always check for CLAUDE.md guidance**: |
| 366 | + |
| 367 | +1. **Look for CLAUDE.md in the same directory** as the file being edited |
| 368 | +2. **If not found, check parent directories** recursively up to project root |
| 369 | +3. **Follow the patterns and conventions** described in the applicable CLAUDE.md |
| 370 | +4. **Prioritize directory-specific guidance** over root-level guidance when conflicts exist |
0 commit comments