A robust, production-ready sensor data simulation system for testing, development, and training!
This sensor log generator creates hyper-realistic environmental sensor data with configurable anomalies, making it perfect for:
- Testing data ingestion pipelines without expensive hardware
- Developing anomaly detection algorithms with controlled, reproducible scenarios
- Load testing time-series databases with concurrent sensor streams
- Training machine learning models on realistic sensor patterns
- Demonstrating monitoring systems with real-time dashboards
- Validating alerting systems with predictable anomalies
The simulator generates authentic sensor readings (temperature, humidity, pressure, voltage) with manufacturer-specific behaviors, firmware quirks, and location-based patternsβjust like real IoT deployments!
Unlike basic data generators that produce random numbers, this simulator models real-world sensor complexity:
- π¦ Hardware variations: Different manufacturers have unique failure characteristics (SensorTech fails 20% more often!)
- πΎ Firmware behaviors: Beta versions are 50% more likely to produce anomalies
- π‘οΈ Environmental patterns: Temperature, humidity, and pressure follow realistic daily cycles
- βοΈ Realistic anomalies: Five different anomaly types (spikes, drifts, dropouts, noise, patterns) that mirror actual sensor failures
- π Resilient operation: Automatic retry logic, corruption recovery, and graceful degradation
- π Production-ready: HTTP monitoring API, health checks, and metrics endpoints for observability
Perfect for teams who need to test their systems against realistic sensor behavior without expensive hardware!
- Multi-metric readings: Temperature, humidity, pressure, and voltage with realistic patterns
- Manufacturer behaviors: Different brands exhibit unique failure rates and characteristics
- Firmware quirks: Beta versions produce more anomalies than stable releases
- Location intelligence: City-based simulation with real coordinates and timezones
Test your detection systems with five anomaly patterns:
- π₯ Spike anomalies: Sudden value jumps (sensor malfunctions)
- π Trend anomalies: Gradual drift from baseline (sensor degradation)
- π Pattern anomalies: Disrupted normal cycles (timing issues)
- π« Missing data: Simulated dropouts (network/power failures)
- π Noise anomalies: Increased variance (environmental interference)
- π Dynamic configuration: Hot-reload config changes without restart
- π― Resilient database: Automatic retries and corruption recovery
- π‘ HTTP monitoring: Real-time metrics and health check endpoints
- π·οΈ Semantic versioning: Multi-platform Docker images with proper versioning
- π Checkpoint system: Resume from last state after restarts
- π³ Container testing tools: Validate SQLite behavior across Docker boundaries
- π Flexible identity: Support for legacy and enhanced sensor metadata
Validate your detection algorithms with controlled anomalies:
# chaos-test.yaml
anomalies:
enabled: true
probability: 0.20 # 20% anomaly rate for stress testing
types:
spike:
enabled: true
weight: 0.5 # Test threshold-based detection
trend:
enabled: true
weight: 0.3
duration_seconds: 600 # Test drift detection
Stress test with multiple concurrent sensors:
# load-test.yaml
replicas:
count: 100 # 100 concurrent sensors
prefix: "LOAD"
simulation:
readings_per_second: 10 # 1,000 total readings/second
Generate labeled training data with known anomaly patterns:
# ml-training.yaml
anomalies:
enabled: true
probability: 0.05 # Realistic 5% anomaly rate
sensor:
manufacturer: "SensorTech" # Higher failure rate
firmware_version: "3.0.0-beta" # More anomalies
When starting the simulator, you'll see the database mode in the startup logs:
INFO - Using sensor ID: TEST001
INFO - Sensor Location: New York
INFO - Database mode: WAL (change with SENSOR_WAL env var)
- Docker (recommended) or Python 3.11+ with uv
- SQLite3 (for data analysis)
# Quick start with defaults
docker run -v $(pwd)/data:/app/data sensor-simulator:latest
# Custom sensor identity
docker run -v $(pwd)/data:/app/data \
-e SENSOR_ID=SENSOR_001 \
-e SENSOR_LOCATION="San Francisco" \
sensor-simulator:latest
# With monitoring dashboard
docker run -v $(pwd)/data:/app/data \
-e MONITORING_ENABLED=true \
-p 8080:8080 \
sensor-simulator:latest
# docker-compose.yml
services:
sensor:
image: sensor-simulator:latest
volumes:
- ./data:/app/data
- ./config:/app/config
environment:
- CONFIG_FILE=/app/config/config.yaml
- MONITORING_ENABLED=true
ports:
- "8080:8080"
# Start the simulator
docker-compose up -d
# View logs
docker-compose logs -f
# Stop the simulator
docker-compose down
# Install uv package manager
pip install uv
# Install dependencies
uv sync
# Run with defaults
uv run main.py
# Run with custom config
uv run main.py --config config/config.yaml --identity config/identity.json
# Generate identity template
uv run main.py --generate-identity
# Run tests
uv run pytest tests/
# Clone the repository
git clone <repository-url>
cd sensor-log-generator
# IMPORTANT: Setup development environment (includes pre-commit hooks)
uv run scripts/setup.py --dev
# Run all checks before committing
uv run scripts/check.py
# Auto-fix linting issues
uv run scripts/check.py --fix
# Build Docker image
docker build -t sensor-simulator .
# Build multi-platform (AMD64 + ARM64)
uv run build.py
# Test the container
uv run scripts/testing/test_container.sh
# Setup (first time only)
uv run scripts/setup.py --dev # CRITICAL: Sets up environment and pre-commit hooks
# Before committing
uv run scripts/check.py # Runs lint, typecheck, and fast tests
uv run scripts/check.py --fix # Auto-fix linting issues
# Manual pre-commit run
uv run pre-commit run --all-files # Runs all pre-commit hooks on all files
# Skip hooks in emergency (NOT RECOMMENDED)
git commit --no-verify -m "emergency fix"
# Note: After setup, checks run automatically on commit!
# - Fast checks run on every commit
# - Full test suite runs on push
Configure the simulator through environment variables:
Variable | Description | Default |
---|---|---|
SENSOR_ID |
Unique sensor identifier | Auto-generated |
SENSOR_LOCATION |
Sensor location/city | Random city |
SENSOR_INTERVAL |
Reading interval (seconds) | 5 |
ANOMALY_PROBABILITY |
Chance of anomalies (0-1) | 0.05 |
LOG_LEVEL |
Logging verbosity (DEBUG/INFO/WARNING/ERROR) | INFO |
PRESERVE_EXISTING_DB |
Keep existing database on startup | false |
| MONITORING_ENABLED
| Enable web monitoring dashboard | false |
| MONITORING_PORT
| Dashboard port number | 8080 |
| CONFIG_FILE
| Path to configuration YAML | config.yaml |
| IDENTITY_FILE
| Path to identity JSON | identity.json |
| SQLITE_TMPDIR
| SQLite temp directory (containers) | /tmp |
The simulator uses two configuration files:
- config.yaml: Main configuration with simulation parameters
- identity.json: Sensor-specific metadata and location
The system supports both legacy and enhanced identity formats:
Legacy Format (backward compatible):
{
"id": "SENSOR_NY_001",
"location": "New York",
"latitude": 40.7128,
"longitude": -74.0060,
"timezone": "America/New_York",
"manufacturer": "SensorTech",
"model": "TempSensor Pro",
"firmware_version": "1.2.0"
}
Enhanced Format (recommended):
{
"sensor_id": "SENSOR_CO_DEN_001",
"location": {
"city": "Denver",
"state": "CO",
"coordinates": {
"latitude": 39.7337,
"longitude": -104.9906
},
"timezone": "America/Denver"
},
"device_info": {
"manufacturer": "DataLogger",
"model": "AirData-Plus",
"firmware_version": "3.15.21",
"serial_number": "DL-578463"
},
"deployment": {
"deployment_type": "mobile_unit",
"installation_date": "2025-03-24",
"height_meters": 8.3
}
}
# Sensor configuration
sensor:
type: "environmental"
location: "San Francisco" # Required
manufacturer: "SensorTech"
model: "EnvMonitor-3000"
firmware_version: "1.4"
# Simulation settings
simulation:
readings_per_second: 1
run_time_seconds: 3600 # 1 hour
# Normal operating ranges
normal_parameters:
temperature:
mean: 22.0
std_dev: 2.0
min: 15.0
max: 30.0
humidity:
mean: 65.0
std_dev: 5.0
min: 30.0
max: 90.0
pressure:
mean: 1013.0
std_dev: 10.0
min: 990.0
max: 1040.0
# Anomaly configuration
anomalies:
enabled: true
probability: 0.05 # 5% chance
types:
spike:
enabled: true
weight: 0.4
trend:
enabled: true
weight: 0.2
duration_seconds: 300
pattern:
enabled: true
weight: 0.1
duration_seconds: 600
missing_data:
enabled: true
weight: 0.1
duration_seconds: 30
noise:
enabled: true
weight: 0.2
duration_seconds: 180
# Database settings
database:
path: "data/sensor_data.db"
# Note: Use SENSOR_WAL=true env var to enable WAL mode for better concurrent access
# Logging configuration
logging:
level: "INFO"
file: "logs/sensor_simulator.log"
console_output: true
# Monitoring API (optional)
monitoring:
enabled: false
port: 8080
host: "0.0.0.0"
The simulator uses a simplified SQLite database with WAL (Write-Ahead Logging) mode:
- β 90,000+ writes/second capability
- β Concurrent access - readers never block writers
- β Zero-threading - no deadlocks or race conditions
- β Automatic batching - optimal performance out of the box
- β Works everywhere - Linux, Mac, Windows, Docker
The database "just works" - no tuning or configuration needed!
Different manufacturers and firmware versions affect anomaly rates:
Manufacturers:
- SensorTech: 20% more anomalies (budget hardware)
- DataLogger: 10% more anomalies
- DataSense: 20% fewer anomalies (premium)
- MonitorPro: 10% fewer anomalies
Firmware Versions:
- 1.x versions: 30% fewer anomalies (stable)
- 2.x versions: 15% fewer anomalies
- 3.x-beta/alpha: 50% more anomalies (unstable)
The sensor uses a simplified SQLite database with excellent performance. To prevent corruption, ALWAYS use read-only mode when reading while the sensor is running:
# β
CORRECT - Safe read-only access
import sqlite3
conn = sqlite3.connect("file:data/sensor_data.db?mode=ro", uri=True)
# β WRONG - Can cause corruption!
conn = sqlite3.connect("data/sensor_data.db") # DON'T DO THIS!
# β
CORRECT - Command line read-only access
sqlite3 "file:data/sensor_data.db?mode=ro" "SELECT COUNT(*) FROM sensor_readings;"
# β WRONG - Can cause corruption!
sqlite3 data/sensor_data.db # DON'T DO THIS!
We provide several safe reading scripts:
# Interactive reader with statistics and monitoring
python read_sensor_data.py
# Example reader showing best practices
./reader_example.py
# Test concurrent reading performance
uv run scripts/testing/test_readers.py -r 20 -t 30 # 20 readers for 30 seconds
- Single-threaded design - No deadlocks or race conditions
- 90,000+ writes/second capability
- WAL mode by default - Excellent read/write separation
- Automatic batching - Commits every 10 seconds or 50 records
- Zero maintenance - Just works out of the box
The database uses WAL (Write-Ahead Logging) mode for optimal performance:
# WAL mode is always used (default)
docker run -v $(pwd)/data:/app/data sensor-simulator:latest
Benefits of WAL mode:
- Concurrent readers don't block writers
- Writers don't block readers
- Better performance for continuous operations
- Automatic checkpointing every 300 seconds
- Works on all platforms (Linux, Mac, Windows)
The SQLite database stores comprehensive sensor data:
CREATE TABLE sensor_data (
id INTEGER PRIMARY KEY AUTOINCREMENT,
timestamp DATETIME,
sensor_id TEXT,
-- Sensor readings
temperature REAL,
humidity REAL,
pressure REAL,
voltage REAL,
-- Anomaly tracking
status_code INTEGER,
anomaly_flag BOOLEAN,
anomaly_type TEXT,
-- Device metadata
firmware_version TEXT,
model TEXT,
manufacturer TEXT,
serial_number TEXT,
-- Location data
location TEXT,
latitude REAL,
longitude REAL,
timezone TEXT,
-- Deployment info
deployment_type TEXT,
installation_date TEXT,
height_meters REAL
);
When JSON logging is enabled:
{
"timestamp": "2025-01-18T10:30:45.123456",
"sensor_id": "SENSOR_SF_001",
"readings": {
"temperature": 22.5,
"humidity": 65.3,
"pressure": 1013.25,
"voltage": 12.1
},
"location": {
"name": "San Francisco",
"latitude": 37.7749,
"longitude": -122.4194,
"timezone": "America/Los_Angeles"
},
"anomaly": {
"detected": false,
"type": null
},
"device": {
"manufacturer": "SensorTech",
"model": "EnvMonitor-3000",
"firmware": "1.4"
}
}
## π Reading from the Database Safely
The simulator writes continuously to the SQLite database. To read without conflicts, use read-only mode:
### Bash (Command Line)
```bash
# Safe one-liner monitoring script
while true; do echo "[$(date '+%H:%M:%S')] $(sqlite3 "file:data/sensor_data.db?mode=ro" "SELECT COUNT(*) FROM sensor_readings;" 2>/dev/null || echo "busy")"; sleep 2; done
# Single query with read-only mode
sqlite3 "file:data/sensor_data.db?mode=ro" "SELECT COUNT(*) FROM sensor_readings;"
# Query with timeout for busy handling
sqlite3 data/sensor_data.db "PRAGMA busy_timeout=2000; SELECT COUNT(*) FROM sensor_readings;"
import sqlite3
import time
from contextlib import contextmanager
@contextmanager
def get_readonly_connection(db_path="data/sensor_data.db", timeout=30.0):
"""Get a safe read-only connection to the database."""
conn = None
try:
# Open in read-only mode
conn = sqlite3.connect(
f"file:{db_path}?mode=ro",
uri=True,
timeout=timeout
)
# Set to query-only for extra safety
conn.execute("PRAGMA query_only=1;")
yield conn
finally:
if conn:
conn.close()
# Example usage
def monitor_database():
while True:
try:
with get_readonly_connection() as conn:
cursor = conn.cursor()
count = cursor.execute("SELECT COUNT(*) FROM sensor_readings").fetchone()[0]
print(f"[{time.strftime('%H:%M:%S')}] Readings: {count}")
except sqlite3.OperationalError as e:
print(f"[{time.strftime('%H:%M:%S')}] Database busy: {e}")
time.sleep(2)
# Simple query
with get_readonly_connection() as conn:
readings = conn.execute("SELECT * FROM sensor_readings ORDER BY timestamp DESC LIMIT 10").fetchall()
const sqlite3 = require('sqlite3').verbose();
// Safe read-only connection
function getReadOnlyConnection(dbPath = 'data/sensor_data.db') {
return new sqlite3.Database(dbPath, sqlite3.OPEN_READONLY, (err) => {
if (err) {
console.error('Database connection failed:', err.message);
}
});
}
// Monitor function
function monitorDatabase() {
const db = getReadOnlyConnection();
setInterval(() => {
db.get("SELECT COUNT(*) as count FROM sensor_readings", (err, row) => {
if (err) {
console.log(`[${new Date().toLocaleTimeString()}] Database busy`);
} else {
console.log(`[${new Date().toLocaleTimeString()}] Readings: ${row.count}`);
}
});
}, 2000);
}
// Async/await with better-sqlite3 (recommended)
const Database = require('better-sqlite3');
function safeQuery() {
try {
const db = new Database('data/sensor_data.db', {
readonly: true,
fileMustExist: true
});
const count = db.prepare('SELECT COUNT(*) as count FROM sensor_readings').get();
console.log(`Readings: ${count.count}`);
db.close();
} catch (error) {
console.error('Query failed:', error);
}
}
- Always use read-only mode when reading from the database
- Set appropriate timeouts to handle busy states (5-30 seconds recommended)
- Implement retry logic for transient errors
- Don't keep connections open longer than necessary
- Use PRAGMA query_only=1 for extra safety in critical applications
- Note: Database commits every 10 seconds, so readers have consistent windows
Common errors and solutions:
Error | Cause | Solution |
---|---|---|
"database is locked" | Write in progress | Add timeout and retry |
"file is not a database" | Checkpoint occurring | Retry after brief delay |
"no such table" | Database not initialized | Wait for simulator to start |
Query the SQLite database for insights:
-- Anomaly rate by manufacturer
SELECT
manufacturer,
COUNT(*) as total,
SUM(anomaly_flag) as anomalies,
ROUND(100.0 * SUM(anomaly_flag) / COUNT(*), 2) as anomaly_rate
FROM sensor_data
GROUP BY manufacturer;
-- Temperature patterns by hour
SELECT
strftime('%H', timestamp) as hour,
AVG(temperature) as avg_temp,
MIN(temperature) as min_temp,
MAX(temperature) as max_temp
FROM sensor_data
WHERE anomaly_flag = 0
GROUP BY hour
ORDER BY hour;
-- Anomaly types distribution
SELECT
anomaly_type,
COUNT(*) as count,
ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER(), 2) as percentage
FROM sensor_data
WHERE anomaly_flag = 1
GROUP BY anomaly_type;
Enable the HTTP monitoring API for real-time insights:
Endpoint | Description | Response |
---|---|---|
/healthz |
Health check | {"status": "healthy"} |
/statusz |
System status | Configuration and runtime stats |
/metricz |
Metrics data | Reading counts, anomaly statistics |
/samplez |
Recent samples | Last 10 sensor readings |
/db_stats |
Database info | Record count, file size |
# In config.yaml
monitoring:
enabled: true
port: 8080
host: "0.0.0.0"
# Run with monitoring
docker run -v $(pwd)/data:/app/data \
-e MONITORING_ENABLED=true \
-p 8080:8080 \
sensor-simulator
# Check health
curl http://localhost:8080/healthz
# View metrics
curl http://localhost:8080/metricz
sensor-log-generator/
βββ src/ # Core modules
β βββ simulator.py # Main simulation engine
β βββ database.py # SQLite operations with retry logic
β βββ anomaly.py # Anomaly generation system
β βββ location.py # City database and GPS coords
β βββ monitor.py # HTTP monitoring server
β βββ config.py # Configuration management
β βββ enums.py # Valid manufacturers/models
βββ tests/ # Test suite
β βββ test_simulator.py # Core functionality tests
β βββ test_database.py # Database operation tests
β βββ test_anomaly.py # Anomaly generation tests
β βββ test_config.py # Configuration tests
βββ scripts/ # Utility scripts
β βββ testing/ # Testing and stress testing tools
β βββ readers/ # Database reader utilities
β βββ monitoring/ # Monitoring and health check scripts
βββ config/ # Configuration files
β βββ config.yaml # Main configuration
β βββ identity.json # Sensor identity
βββ data/ # Data output
β βββ sensor_data.db # SQLite database
βββ logs/ # Application logs
- Resilient Database Operations: Automatic retries with exponential backoff
- Hot Configuration Reload: Change settings without restart
- Checkpoint System: Resume from last state after interruption
- Concurrent Safety: WAL mode for multiple writers
- Memory Efficient: Streaming design, no large buffers
- Container Optimized: Built for Docker and Kubernetes
Model gradual sensor failure over time:
# Start with low anomaly rate
anomalies:
probability: 0.01 # 1% initially
# Use dynamic reload to increase over time:
# Hour 1: probability: 0.05
# Hour 2: probability: 0.10
# Hour 3: probability: 0.25
Simulate a distributed sensor network:
random_location:
enabled: true
number_of_cities: 20
gps_variation: 500 # 500m radius
replicas:
count: 20
prefix: "GLOBAL"
Generate high-volume data:
# 100 sensors, 10 readings/second each
docker-compose up --scale sensor=100
# Or use configuration
replicas:
count: 100
simulation:
readings_per_second: 10
# Run all tests
uv run pytest tests/
# Run with coverage
uv run pytest tests/ --cov=src
# Run specific test
uv run pytest tests/test_simulator.py
Test SQLite database access patterns across Docker container boundaries:
# Test with 10 concurrent reader containers
uv run scripts/testing/test_readers_containerized.py -c 10 -d 60
# Faster read intervals for stress testing
uv run scripts/testing/test_readers_containerized.py -c 20 -i 0.1
# 1 writer + 5 readers for 60 seconds
uv run scripts/testing/test_containers_rw.py -r 5 -w 30 -d 60
# High-throughput test
uv run scripts/testing/test_containers_rw.py -r 10 -w 100 -d 120
These tools validate:
- WAL mode compatibility across container boundaries
- Concurrent read/write performance
- File locking behavior in containerized environments
- Mount permission safety (read-only vs read-write)
Example output from a 60-second test with 10 readers and 50 writes/sec:
Final Test Results
ββββββββββββββββββββββββ¬ββββββββββββ
β Test Duration β 60s β
β Total Writes β 2,694 β
β Write Errors β 0 β
β Avg Write Time β 1.92ms β
β Total Reads β 1,163 β
β Read Errors β 23 β
β Avg Read Time β 6.73ms β
β Success Rate β 99.4% β
ββββββββββββββββββββββββ΄ββββββββββββ
The scripts automatically build lightweight Docker containers and provide real-time monitoring dashboards. See TESTING_CONTAINERS.md for detailed documentation.
# Build local image
docker build -t sensor-simulator .
# Build multi-platform
./build.py
# Test the build
uv run scripts/testing/test_container.sh
The codebase is designed for extension:
- New Sensor Types: Add to
src/simulator.py
- Anomaly Patterns: Extend
src/anomaly.py
- Output Formats: Modify
src/database.py
- API Endpoints: Update
src/monitor.py
- Python 3.11+ with type hints
- Google-style docstrings
- Ruff for linting
- Black for formatting
Database Locked Error
# Multiple writers competing
# Solution: Increase timeout or use single writer
database:
busy_timeout: 60000 # 60 seconds
Disk I/O Error in Containers
# Set custom temp directory
docker run -v $(pwd)/data:/app/data \
-e SQLITE_TMPDIR=/app/data/tmp \
sensor-simulator
# Ensure permissions
chmod -R 755 data/
High Memory Usage
# Reduce cache size
database:
cache_size: 10000 # Smaller cache
simulation:
batch_size: 100 # Smaller batches
No Data Generated
# Check logs for errors
tail -f logs/sensor_simulator.log
# Verify database exists
ls -la data/sensor_data.db
# Test with debug mode
docker run -e LOG_LEVEL=DEBUG sensor-simulator
For High-Volume Data
- Use WAL mode (enabled by default)
- Increase cache_size for better performance
- Use batch inserts
- Consider PostgreSQL for > 1000 readings/sec
For Long-Running Simulations
- Enable checkpoint system
- Use log rotation
- Monitor disk space
- Set database size limits
- Documentation: CLAUDE.md for AI assistants
- Examples: See
config.example.yaml
for all options - Tests: Browse
tests/
for usage examples - Support: Open an issue on GitHub
MIT License - See LICENSE file for details.
Built with β€οΈ for reliable sensor data simulation and testing!