add a mock S3 server#116
Conversation
ysaito1001
left a comment
There was a problem hiding this comment.
Great work, love to see what we can build on top of this!
| /// * `key` - The object key | ||
| /// * `range` - Optional byte range to retrieve |
There was a problem hiding this comment.
I assume these are a breakdown of GetObjectRequest (and the same applies to other API docs for this trait)?
| // and VecDeque<Bytes> operations are atomic at the individual element level | ||
| unsafe impl Sync for VecByteStream {} | ||
|
|
||
| impl VecByteStream { |
There was a problem hiding this comment.
Are methods pub(create) if the struct itself is pub(crate)?
| /// root/ | ||
| /// ├── objects/ | ||
| /// │ ├── my-file.txt # Object data | ||
| /// │ └── my-file.txt.metadata # Object metadata (JSON) | ||
| /// ├── uploads/ | ||
| /// │ ├── upload-123/ | ||
| /// │ │ ├── metadata.json # Upload metadata | ||
| /// │ │ ├── part-1.dat # Part data | ||
| /// │ │ └── part-1.metadata # Part metadata | ||
| /// │ └── ... |
| // Helper method to list all objects in a directory | ||
| // Helper method to list all objects in a directory recursively |
There was a problem hiding this comment.
Somewhat duplicated comments?
landonxjames
left a comment
There was a problem hiding this comment.
Looks awesome! Excited to see what error injection looks like. I think having that capability could have helped catch bugs like this one in CRT S3 awslabs/aws-c-s3#543. Wonder if along those lines C bindings should be on our radar so CRT can use it as well?
Ya I've wondered this as well though I'm curious if shipping an binary server that can be spun up might be easier. |
Interesting idea, I like it. I think the difficult part would be modeling the fault injection in a way that could be passed in to the binary? I guess it could have a human writable serialized format that you could pass in on startup like: That format would get complicated pretty quickly though. |
59c8797 to
937d4e1
Compare
8c34094 to
5585597
Compare
Summary
This PR implements a comprehensive S3 Mock Server for testing and benchmarking the AWS S3 Transfer Manager. The mock server providesS3-compatible API endpoints with support for both in-memory and filesystem storage backends, enabling realistic testing scenariosfor multipart uploads, concurrent operations, and data integrity verification.
This has been an idea we discussed for a while and never had time to pursue. I leveraged Q CLI to help bootstrap and help implement much of the functionality, intervening and writing some of the more difficult/nuanced bits as necessary. I wouldn't say it's exactly how I'd write it but it is close enough of a POC to talk about at this point.
How to review this PR:
Key Features Implemented
Core S3 API Operations
• Object Operations: GetObject, PutObject, HeadObject, DeleteObject, ListObjectsV2
• Multipart Upload Operations: CreateMultipartUpload, UploadPart, CompleteMultipartUpload, AbortMultipartUpload
• Range Request Support: Partial content retrieval with proper HTTP 206 responses
• Error Handling: S3-compliant error responses (NoSuchKey, InvalidPart, etc.)
Storage Backend Architecture
• Dual Storage Options: In-memory (ephemeral) and filesystem (persistent) backends
• Clean Abstraction: StorageBackend trait separates API logic from data storage
• Concurrent Safe: Thread-safe operations with proper locking mechanisms
• Streaming Support: Efficient handling of large objects without memory bloat
S3 Checksum Compliance (WIP)
• Algorithm Support: CRC64NVME (default), CRC32, CRC32C, SHA1, SHA256, MD5
• Multipart Restrictions: Enforces S3's algorithm/type combinations (CRC64NVME full-object only, etc.)
• Part Validation: Consecutive part number validation starting from 1
• Checksum Storage: Persistent checksum metadata with proper serialization
Advanced Multipart Features
• Checksum Type Support: FULL_OBJECT vs COMPOSITE checksum types
• Upload Metadata Persistence: Stores algorithm and type choices from CreateMultipartUpload
• Part-Level Checksums: Individual part integrity verification during upload
• ETag Generation: Proper multipart ETag calculation (MD5 of concatenated part MD5s + part count)
Architecture Overview
Files Overview
Core Implementation
• s3-mock-server/src/lib.rs - Main library interface
• s3-mock-server/src/s3s.rs - S3 API implementation using s3s library
• s3-mock-server/src/server.rs - Server builder and management
• s3-mock-server/src/error.rs - Error types and S3 error mapping
Storage Layer
• s3-mock-server/src/storage.rs - StorageBackend trait and interfaces
• s3-mock-server/src/storage/models.rs - Data models with checksum support
• s3-mock-server/src/storage/in_memory.rs - In-memory storage implementation
• s3-mock-server/src/storage/filesystem.rs - Filesystem storage implementation
Supporting Infrastructure
• s3-mock-server/src/types.rs - Checksum and integrity types
• s3-mock-server/src/streaming.rs - Data streaming utilities
Usage Example
Next Steps (Future Work)
Priority 1: CompleteMultipartUpload Checksum Validation
• Implement full object checksum validation
• Support composite checksum calculation from part checksums
• Add BadDigest error responses for checksum mismatches
Priority 2: Network Simulation
• Add latency, jitter, and bandwidth limiting
• Implement error injection capabilities
• Support test-specific behaviors
Priority 3: Benchmarking Utilities
• Performance measurement tools
• Throughput and latency metrics
• Comparative analysis features
Impact
This implementation seeks to provide:
The S3 Mock Server bridges the gap between simple unit test mocks and full integration testing.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.