Skip to content

Commit d9c261c

Browse files
authored
Merge pull request #378 from snowplow/feat/ai-claude
claude mds instrumentation
2 parents 12b7391 + e8f2629 commit d9c261c

File tree

3 files changed

+1019
-0
lines changed

3 files changed

+1019
-0
lines changed

CLAUDE.md

Lines changed: 370 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,370 @@
1+
# Snowplow Python Tracker - CLAUDE.md
2+
3+
## Project Overview
4+
5+
The Snowplow Python Tracker is a public Python library for sending analytics events to Snowplow collectors. It enables developers to integrate Snowplow analytics into Python applications, games, and web servers. The library provides a robust event tracking system with support for various event types, custom contexts, and reliable event delivery through configurable emitters.
6+
7+
**Key Technologies:**
8+
- Python 3.8+ (supported versions: 3.8-3.13)
9+
- requests library for HTTP communication
10+
- typing_extensions for enhanced type hints
11+
- Event-driven architecture with schema validation
12+
- Asynchronous and synchronous event emission
13+
14+
## Development Commands
15+
16+
```bash
17+
# Install dependencies
18+
pip install -r requirements-test.txt
19+
20+
# Run tests
21+
./run-tests.sh
22+
23+
# Run specific test module
24+
python -m pytest snowplow_tracker/test/unit/test_tracker.py
25+
26+
# Run integration tests
27+
python -m pytest snowplow_tracker/test/integration/
28+
29+
# Install package in development mode
30+
pip install -e .
31+
32+
# Build Docker image for testing
33+
docker build -t snowplow-python-tracker .
34+
docker run snowplow-python-tracker
35+
```
36+
37+
## Architecture
38+
39+
The tracker follows a layered architecture with clear separation of concerns:
40+
41+
```
42+
snowplow_tracker/
43+
├── Core Components
44+
│ ├── tracker.py # Main Tracker class orchestrating events
45+
│ ├── snowplow.py # High-level API for tracker management
46+
│ └── subject.py # User/device context management
47+
├── Event Layer (events/)
48+
│ ├── event.py # Base Event class
49+
│ ├── page_view.py # PageView event
50+
│ ├── structured_event.py # Structured events
51+
│ └── self_describing.py # Custom schema events
52+
├── Emission Layer
53+
│ ├── emitters.py # Sync/Async event transmission
54+
│ ├── event_store.py # Event buffering and persistence
55+
│ └── payload.py # Event payload construction
56+
├── Configuration
57+
│ ├── tracker_configuration.py
58+
│ └── emitter_configuration.py
59+
└── Validation
60+
├── contracts.py # Runtime validation
61+
└── typing.py # Type definitions
62+
```
63+
64+
## Core Architectural Principles
65+
66+
1. **Schema-First Design**: All events conform to Iglu schemas for consistency
67+
2. **Separation of Concerns**: Event creation, validation, and emission are separate
68+
3. **Configuration Objects**: Use dedicated configuration classes, not raw dictionaries
69+
4. **Type Safety**: Extensive use of type hints and Protocol classes
70+
5. **Fail-Safe Delivery**: Events are buffered and retried on failure
71+
6. **Immutability**: Event objects are largely immutable after creation
72+
73+
## Layer Organization & Responsibilities
74+
75+
### Application Layer (snowplow.py)
76+
- Singleton pattern for global tracker management
77+
- Factory methods for tracker creation
78+
- Namespace-based tracker registry
79+
80+
### Domain Layer (tracker.py, events/)
81+
- Event creation and validation
82+
- Subject (user/device) context management
83+
- Event enrichment with standard fields
84+
85+
### Infrastructure Layer (emitters.py, event_store.py)
86+
- HTTP communication with collectors
87+
- Event buffering and retry logic
88+
- Async/sync emission strategies
89+
90+
### Cross-Cutting (contracts.py, typing.py)
91+
- Runtime validation with togglable contracts
92+
- Shared type definitions and protocols
93+
94+
## Critical Import Patterns
95+
96+
```python
97+
# ✅ Import from package root for public API
98+
from snowplow_tracker import Snowplow, Tracker, Subject
99+
from snowplow_tracker import EmitterConfiguration, TrackerConfiguration
100+
101+
# ✅ Import specific event classes
102+
from snowplow_tracker.events import PageView, StructuredEvent
103+
104+
# ❌ Don't import from internal modules
105+
from snowplow_tracker.emitters import Requester # Internal class
106+
107+
# ✅ Use typing module for type hints
108+
from snowplow_tracker.typing import PayloadDict, Method
109+
```
110+
111+
## Essential Library Patterns
112+
113+
### Tracker Initialization Pattern
114+
```python
115+
# ✅ Use Snowplow factory with configuration objects
116+
tracker = Snowplow.create_tracker(
117+
namespace="my_app",
118+
endpoint="https://collector.example.com",
119+
tracker_config=TrackerConfiguration(encode_base64=True),
120+
emitter_config=EmitterConfiguration(batch_size=10)
121+
)
122+
123+
# ❌ Don't instantiate Tracker directly without Snowplow
124+
tracker = Tracker("namespace", emitter) # Missing registration
125+
```
126+
127+
### Event Creation Pattern
128+
```python
129+
# ✅ Use event classes with named parameters
130+
page_view = PageView(
131+
page_url="https://example.com",
132+
page_title="Homepage"
133+
)
134+
135+
# ✅ Add contexts to events
136+
event.context = [SelfDescribingJson(schema, data)]
137+
138+
# ❌ Don't modify event payload directly
139+
event.payload.add("custom", "value") # Breaks schema validation
140+
```
141+
142+
### Subject Management Pattern
143+
```python
144+
# ✅ Set subject at tracker or event level
145+
subject = Subject()
146+
subject.set_user_id("user123")
147+
tracker = Snowplow.create_tracker(..., subject=subject)
148+
149+
# ✅ Override subject per event
150+
event = PageView(..., event_subject=Subject())
151+
152+
# ❌ Don't modify subject after tracker creation
153+
tracker.subject.set_user_id("new_id") # Not thread-safe
154+
```
155+
156+
### Emitter Configuration Pattern
157+
```python
158+
# ✅ Configure retry and buffering behavior
159+
config = EmitterConfiguration(
160+
batch_size=50,
161+
buffer_capacity=10000,
162+
custom_retry_codes={429: True, 500: True}
163+
)
164+
165+
# ❌ Don't use magic numbers
166+
emitter = Emitter(endpoint, 443, "post", 100) # Use config object
167+
```
168+
169+
## Model Organization Pattern
170+
171+
### Event Hierarchy
172+
```python
173+
Event (base class)
174+
├── PageView # Web page views
175+
├── PagePing # Page engagement tracking
176+
├── ScreenView # Mobile screen views
177+
├── StructuredEvent # Category/action/label/property/value events
178+
└── SelfDescribing # Custom schema events
179+
```
180+
181+
### Data Structures
182+
```python
183+
# SelfDescribingJson for custom contexts
184+
context = SelfDescribingJson(
185+
"iglu:com.example/context/jsonschema/1-0-0",
186+
{"key": "value"}
187+
)
188+
189+
# Payload for event data assembly
190+
payload = Payload()
191+
payload.add("e", "pv") # Event type
192+
payload.add_dict({"aid": "app_id"})
193+
```
194+
195+
## Common Pitfalls & Solutions
196+
197+
### Contract Validation
198+
```python
199+
# ❌ Passing invalid parameters silently fails in production
200+
tracker.track_page_view("") # Empty URL
201+
202+
# ✅ Enable contracts during development
203+
from snowplow_tracker import enable_contracts
204+
enable_contracts()
205+
```
206+
207+
### Event Buffering
208+
```python
209+
# ❌ Not flushing events before shutdown
210+
tracker.track(event)
211+
sys.exit() # Events lost!
212+
213+
# ✅ Always flush before exit
214+
tracker.track(event)
215+
tracker.flush()
216+
```
217+
218+
### Thread Safety
219+
```python
220+
# ❌ Sharing emitter across threads
221+
emitter = Emitter(endpoint)
222+
# Multiple threads using same emitter
223+
224+
# ✅ Use AsyncEmitter for concurrent scenarios
225+
emitter = AsyncEmitter(endpoint, thread_count=2)
226+
```
227+
228+
### Schema Validation
229+
```python
230+
# ❌ Hardcoding schema strings
231+
schema = "iglu:com.snowplow/event/1-0-0"
232+
233+
# ✅ Use constants for schemas
234+
from snowplow_tracker.constants import CONTEXT_SCHEMA
235+
```
236+
237+
## File Structure Template
238+
239+
```
240+
project/
241+
├── tracker_app.py # Application entry point
242+
├── config/
243+
│ └── tracker_config.py # Tracker configuration
244+
├── events/
245+
│ ├── __init__.py
246+
│ └── custom_events.py # Custom event definitions
247+
├── contexts/
248+
│ └── custom_contexts.py # Custom context schemas
249+
└── tests/
250+
├── unit/
251+
│ └── test_events.py
252+
└── integration/
253+
└── test_emission.py
254+
```
255+
256+
## Testing Patterns
257+
258+
### Unit Testing
259+
```python
260+
# ✅ Mock emitters for unit tests
261+
@mock.patch('snowplow_tracker.emitters.Emitter')
262+
def test_track_event(mock_emitter):
263+
tracker = Tracker("test", mock_emitter)
264+
tracker.track(PageView(...))
265+
mock_emitter.input.assert_called_once()
266+
```
267+
268+
### Contract Testing
269+
```python
270+
# ✅ Use ContractsDisabled context manager
271+
with ContractsDisabled():
272+
# Test invalid inputs without raising
273+
tracker.track_page_view(None)
274+
```
275+
276+
### Integration Testing
277+
```python
278+
# ✅ Test against mock collector
279+
def test_event_delivery():
280+
with requests_mock.Mocker() as m:
281+
m.post("https://collector.test/com.snowplow/tp2")
282+
# Track and verify delivery
283+
```
284+
285+
## Configuration Best Practices
286+
287+
### Environment-Based Configuration
288+
```python
289+
# ✅ Use environment variables
290+
import os
291+
endpoint = os.getenv("SNOWPLOW_COLLECTOR_URL")
292+
namespace = os.getenv("SNOWPLOW_NAMESPACE", "default")
293+
```
294+
295+
### Retry Configuration
296+
```python
297+
# ✅ Configure intelligent retry behavior
298+
EmitterConfiguration(
299+
max_retry_delay_seconds=120,
300+
custom_retry_codes={
301+
429: True, # Retry rate limits
302+
500: True, # Retry server errors
303+
400: False # Don't retry bad requests
304+
}
305+
)
306+
```
307+
308+
## Quick Reference
309+
310+
### Import Checklist
311+
- [ ] Import from `snowplow_tracker` package root
312+
- [ ] Use `EmitterConfiguration` and `TrackerConfiguration`
313+
- [ ] Import specific event classes from `snowplow_tracker.events`
314+
- [ ] Use type hints from `snowplow_tracker.typing`
315+
316+
### Event Tracking Checklist
317+
- [ ] Create tracker with `Snowplow.create_tracker()`
318+
- [ ] Configure emitter with appropriate batch size
319+
- [ ] Set subject context if tracking users
320+
- [ ] Use appropriate event class for the use case
321+
- [ ] Add custom contexts as `SelfDescribingJson`
322+
- [ ] Call `flush()` before application shutdown
323+
- [ ] Handle failures with callbacks
324+
325+
### Common Event Types
326+
- `PageView`: Web page views
327+
- `ScreenView`: Mobile app screens
328+
- `StructuredEvent`: Generic events with 5 parameters
329+
- `SelfDescribing`: Custom schema events
330+
- `PagePing`: Engagement tracking
331+
332+
## Contributing to CLAUDE.md
333+
334+
When adding or updating content in this document, please follow these guidelines:
335+
336+
### File Size Limit
337+
- **CLAUDE.md must not exceed 40KB** (currently ~19KB)
338+
- Check file size after updates: `wc -c CLAUDE.md`
339+
- Remove outdated content if approaching the limit
340+
341+
### Code Examples
342+
- Keep all code examples **4 lines or fewer**
343+
- Focus on the essential pattern, not complete implementations
344+
- Use `// ❌` and `// ✅` to clearly show wrong vs right approaches
345+
346+
### Content Organization
347+
- Add new patterns to existing sections when possible
348+
- Create new sections sparingly to maintain structure
349+
- Update the architectural principles section for major changes
350+
- Ensure examples follow current codebase conventions
351+
352+
### Quality Standards
353+
- Test any new patterns in actual code before documenting
354+
- Verify imports and syntax are correct for the codebase
355+
- Keep language concise and actionable
356+
- Focus on "what" and "how", minimize "why" explanations
357+
358+
### Multiple CLAUDE.md Files
359+
- **Directory-specific CLAUDE.md files** can be created for specialized modules
360+
- Follow the same structure and guidelines as this root CLAUDE.md
361+
- Keep them focused on directory-specific patterns and conventions
362+
- Maximum 20KB per directory-specific CLAUDE.md file
363+
364+
### Instructions for LLMs
365+
When editing files in this repository, **always check for CLAUDE.md guidance**:
366+
367+
1. **Look for CLAUDE.md in the same directory** as the file being edited
368+
2. **If not found, check parent directories** recursively up to project root
369+
3. **Follow the patterns and conventions** described in the applicable CLAUDE.md
370+
4. **Prioritize directory-specific guidance** over root-level guidance when conflicts exist

0 commit comments

Comments
 (0)