Skip to content

add a mock S3 server#116

Merged
aajtodd merged 36 commits into
mainfrom
test-server
Sep 18, 2025
Merged

add a mock S3 server#116
aajtodd merged 36 commits into
mainfrom
test-server

Conversation

@aajtodd
Copy link
Copy Markdown
Contributor

@aajtodd aajtodd commented Aug 22, 2025

Summary

This PR implements a comprehensive S3 Mock Server for testing and benchmarking the AWS S3 Transfer Manager. The mock server providesS3-compatible API endpoints with support for both in-memory and filesystem storage backends, enabling realistic testing scenariosfor multipart uploads, concurrent operations, and data integrity verification.

This has been an idea we discussed for a while and never had time to pursue. I leveraged Q CLI to help bootstrap and help implement much of the functionality, intervening and writing some of the more difficult/nuanced bits as necessary. I wouldn't say it's exactly how I'd write it but it is close enough of a POC to talk about at this point.

How to review this PR:

  • I'd look at the overall customer/test usage on how we can/would leverage it. Is this API correct
  • We have both in-memory and filesystem storage, do we want this? My thought with in-memory was certain benchmarking scenarios may do better with in-memory storage but having filesystem for durable/repeatable test scenarios would also be useful
  • The implementation may have bugs or not match the S3 API exactly. I haven't vetted it so we'll want to be careful if we come across integration issues or responses that we don't expect and confirm our implementation is correct before assuming e.g. the transfer manager has an issue
  • Cursory overview of the internals and how things fit together and whether we think we'll be able to layer on some of the future ideas proposed like network simulation or head of line blocking tests, etc.

Key Features Implemented

Core S3 API Operations

Object Operations: GetObject, PutObject, HeadObject, DeleteObject, ListObjectsV2
Multipart Upload Operations: CreateMultipartUpload, UploadPart, CompleteMultipartUpload, AbortMultipartUpload
Range Request Support: Partial content retrieval with proper HTTP 206 responses
Error Handling: S3-compliant error responses (NoSuchKey, InvalidPart, etc.)

Storage Backend Architecture

Dual Storage Options: In-memory (ephemeral) and filesystem (persistent) backends
Clean Abstraction: StorageBackend trait separates API logic from data storage
Concurrent Safe: Thread-safe operations with proper locking mechanisms
Streaming Support: Efficient handling of large objects without memory bloat

S3 Checksum Compliance (WIP)

Algorithm Support: CRC64NVME (default), CRC32, CRC32C, SHA1, SHA256, MD5
Multipart Restrictions: Enforces S3's algorithm/type combinations (CRC64NVME full-object only, etc.)
Part Validation: Consecutive part number validation starting from 1
Checksum Storage: Persistent checksum metadata with proper serialization

Advanced Multipart Features

Checksum Type Support: FULL_OBJECT vs COMPOSITE checksum types
Upload Metadata Persistence: Stores algorithm and type choices from CreateMultipartUpload
Part-Level Checksums: Individual part integrity verification during upload
ETag Generation: Proper multipart ETag calculation (MD5 of concatenated part MD5s + part count)

Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                     S3MockServer                            │
│                 (Public API, Builder Pattern)               │
└───────────────────────────────┬─────────────────────────────┘
                                │
┌───────────────────────────────▼─────────────────────────────┐
│                 Inner<S: StorageBackend>                    │
│                 (implements s3s::S3 trait)                  │
└───────────────────────────────┬─────────────────────────────┘
                                │
┌───────────────────────────────▼─────────────────────────────┐
│                   StorageBackend                            │
│              (Data + Metadata Storage)                      │
├─────────────────────────┬─────┴─────────────────────────────┤
│    InMemoryStorage      │       FilesystemStorage           │
└─────────────────────────┴───────────────────────────────────┘

Files Overview

Core Implementation

• s3-mock-server/src/lib.rs - Main library interface
• s3-mock-server/src/s3s.rs - S3 API implementation using s3s library
• s3-mock-server/src/server.rs - Server builder and management
• s3-mock-server/src/error.rs - Error types and S3 error mapping

Storage Layer

• s3-mock-server/src/storage.rs - StorageBackend trait and interfaces
• s3-mock-server/src/storage/models.rs - Data models with checksum support
• s3-mock-server/src/storage/in_memory.rs - In-memory storage implementation
• s3-mock-server/src/storage/filesystem.rs - Filesystem storage implementation

Supporting Infrastructure

• s3-mock-server/src/types.rs - Checksum and integrity types
• s3-mock-server/src/streaming.rs - Data streaming utilities

Usage Example

use s3_mock_server::S3MockServer;

// Create server with in-memory storage
let server = S3MockServer::builder()
    .with_in_memory_store()
    .with_port(9000)
    .build()?;

// Start server and get AWS client
let handle = server.start().await?;
let s3_client = handle.client();

// Use with Transfer Manager
let tm_config = aws_sdk_s3_transfer_manager::Config::builder()
    .client(s3_client)
    .build();

Next Steps (Future Work)

Priority 1: CompleteMultipartUpload Checksum Validation

• Implement full object checksum validation
• Support composite checksum calculation from part checksums
• Add BadDigest error responses for checksum mismatches

Priority 2: Network Simulation

• Add latency, jitter, and bandwidth limiting
• Implement error injection capabilities
• Support test-specific behaviors

Priority 3: Benchmarking Utilities

• Performance measurement tools
• Throughput and latency metrics
• Comparative analysis features

Impact

This implementation seeks to provide:

  1. Realistic Testing Environment: Full S3 API compatibility for comprehensive testing
  2. Concurrent Operation Testing: Ability to test multi-threaded scenarios that are difficult with mocks
  3. Data Integrity Verification: Complete checksum support for validating Transfer Manager behavior
  4. Performance Benchmarking: Controlled environment for measuring Transfer Manager performance
  5. Development Velocity: Faster iteration cycles without requiring real S3 infrastructure

The S3 Mock Server bridges the gap between simple unit test mocks and full integration testing.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Copy link
Copy Markdown
Contributor

@ysaito1001 ysaito1001 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, love to see what we can build on top of this!

Comment on lines +207 to +208
/// * `key` - The object key
/// * `range` - Optional byte range to retrieve
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume these are a breakdown of GetObjectRequest (and the same applies to other API docs for this trait)?

// and VecDeque<Bytes> operations are atomic at the individual element level
unsafe impl Sync for VecByteStream {}

impl VecByteStream {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are methods pub(create) if the struct itself is pub(crate)?

Comment on lines +99 to +108
/// root/
/// ├── objects/
/// │ ├── my-file.txt # Object data
/// │ └── my-file.txt.metadata # Object metadata (JSON)
/// ├── uploads/
/// │ ├── upload-123/
/// │ │ ├── metadata.json # Upload metadata
/// │ │ ├── part-1.dat # Part data
/// │ │ └── part-1.metadata # Part metadata
/// │ └── ...
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice visualization!

Comment on lines +214 to +215
// Helper method to list all objects in a directory
// Helper method to list all objects in a directory recursively
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat duplicated comments?

Copy link
Copy Markdown
Contributor

@landonxjames landonxjames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome! Excited to see what error injection looks like. I think having that capability could have helped catch bugs like this one in CRT S3 awslabs/aws-c-s3#543. Wonder if along those lines C bindings should be on our radar so CRT can use it as well?

@aajtodd
Copy link
Copy Markdown
Contributor Author

aajtodd commented Sep 10, 2025

Wonder if along those lines C bindings should be on our radar so CRT can use it as well?

Ya I've wondered this as well though I'm curious if shipping an binary server that can be spun up might be easier.

@landonxjames
Copy link
Copy Markdown
Contributor

I'm curious if shipping an binary server that can be spun up might be easier.

Interesting idea, I like it. I think the difficult part would be modeling the fault injection in a way that could be passed in to the binary? I guess it could have a human writable serialized format that you could pass in on startup like:

s3-server --port 1234 --errors error-model.json

That format would get complicated pretty quickly though.

@aajtodd aajtodd marked this pull request as ready for review September 18, 2025 19:08
@aajtodd aajtodd requested a review from a team as a code owner September 18, 2025 19:08
@aajtodd aajtodd merged commit 14435dc into main Sep 18, 2025
16 checks passed
@aajtodd aajtodd deleted the test-server branch September 18, 2025 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants