GitHub

The model should be capable on testing different factors

Enable long/short

Some key factors should be included in the backtest:

Sharpe Ratio
Sortino Ratio
Annualized return
Cum. Return
Max Drawdown
No. of trades throughout the period

Whenever sharpe Ratio >2.5, we will do further analysis

Equity curve of the strategy (compare with BTC price chart)
3D visualization for visualize potential overfit

Workflow

graph TD
B --> E
H --> I
H --> K
M --> I 
J --> P

subgraph Data Process
A[Download Data] --> B[Load and Process Data]
end

subgraph Factor initialization
E[Initialize Factor Engine] --> F[Define and Register Factor]
F --> G[Calculate Factor Values]
G --> H[Initialize Factor-Based Strategy]
end

subgraph Backtest
I[Run Backtest with Parameters] --> J[Evaluate Performance Metrics]
end

subgraph Optimization
K[Define Parameter Search Space] --> L[Run Parameter Optimization]
L --> M[Find Optimal Parameters]
end

subgraph Save results
P[Save Results to CSV and TXT Files] --> Q[Visualize Optimization Results]
end

Code module

graph LR
A[Data Module] 
B[Factor Engine]
C[Backtest Engine]
D[Performance Evaluation]

subgraph Evaluation
D1[Performance Metrics] --> D
D2[Visulization] --> D
end

subgraph Backtest System
C1[Strategy Base Class] --> C
C2[Trade strategy] --> C
end

subgraph Factor System
B1[Factor Base Class] --> B
B2[Custom Factors] --> B
end

subgraph Data Pipeline
A1[Exchange Data] --> A
A2[External Data] --> A
end

Data Module

classDiagram
class DataDownloader {
download_data()
process_data()
fetch_and_process_data()
}
class DataLoader {
load_data()
merge_data()
preprocess_data()
}
DataDownloader --> DataLoader

The DataDownloader class is designed to download and process cryptocurrency market data from various exchanges. Here's a comprehensive guide on how to use it.

from data.data_downloader import DataDownloader
# Initialize DataDownloader
downloader = DataDownloader(
    symbol="BTCUSDT",          # Trading pair
    interval="1d",             # Time interval
    start_date="2023-01-01",   # Start date
    end_date="2023-12-31",     # End date
    data_folder="./dataset",   # Data storage location
    data_type="spot",          # Market type
    exchange="binance"         # Exchange name
)

# Download and process data
data = downloader.fetch_and_process_data()

Parameters Explanation

Required Parameters
- symbol: Trading pair symbol (e.g., "BTCUSDT", "ETHUSDT")
- interval: Kline interval (e.g., "1m", "5m", "1h", "1d")
- start_date: Start date in "YYYY-MM-DD" format
- end_date: End date in "YYYY-MM-DD" format
- data_folder: Directory path for data storage
Optional Parameters
- data_type: Market type (default: "spot")
  - "spot": Spot market data
  - "futures": Futures market data
- exchange: Exchange name (default: "binance")

File Storage Structure

dataset/
└── binance/
	└── BTCUSDT/
		└── spot/
			└── 1d/
				└── BTCUSDT_1d_2023-01-01_to_2023-12-31.h5

Factor System

classDiagram
class FactorEngine {
register_factor()
calculate_factors()
list_factors()
}
class BaseFactor {
init()
calculate()
}
class CustomFactor {
init()
calculate()
}
BaseFactor <|-- CustomFactor
FactorEngine --> BaseFactor

Example

The provided code defines a USDT Issuance Factor as a class named USDTIssuance2Factor. This factor is part of a trading strategy framework and is designed to generate trading signals based on the daily issuance of USDT.

The USDTIssuance2Factor analyzes the issuance changes of USDT and determines whether to buy, sell, or hold a position based on predefined thresholds:

Long Signal (1): When the issuance exceeds the upper threshold.
Short Signal (-1): When the issuance falls below the lower threshold.
Close Signal (0): When the issuance is between the two thresholds.

Backtest System

classDiagram
class BacktestEngine {
	execute_trade()
}
class BaseStrategy {
	generate_signals()
}
class PerformanceEvaluator {
	calculate_metrics()
}
BaseStrategy --> BacktestEngine
BacktestEngine --> PerformanceEvaluator

Trading strategy

Signal-Based Trading:
- The strategy depends on signals (buy, sell, or hold) generated by a strategy class.
- Signals are numeric values:
  - 1: Long signal.
  - 1: Short signal.
  - 0: close position (no action).
Trade Execution: No Position:
- Signal 1: Open long position
- Signal -1: Open short position
- Signal 0: No action
Long Position:
- Signal 1: No action
- Signal -1: Switch to short (sell 2x position)
- Signal 0: Close position
Short Position:
- Signal 1: Switch to long (buy 2x position)
- Signal -1: No action
- Signal 0: Close position
Added a slippage parameter, initialized to 0.001.

The current logic is as follows:
- Open Long Position: Buy at price * (1 + slippage)
- Open Short Position: Sell at price * (1 - slippage)
- Close Long Position: Sell at price * (1 - slippage)
- Close Short Position: Buy at price * (1 + slippage)

Performance Metrics Explanation and Formulas

Total Return
- Description: Measures the overall return of the portfolio relative to the initial investment.
- Formula:
  
  $Total Return = \frac{Final Portfolio Value}{Initial Investment} - 1$
Annualized Return

Description: Adjusts the total return to an annualized rate, taking into account the duration of the investment.
Formula:

$$ \text{Annualized Return} = \left(1 + \text{Total Return}\right)^{\frac{1}{\text{Years}}} - 1 $$

$$ \text{Years} = \frac{\text{End Date} - \text{Start Date}}{365.25} $$

Sharpe Ratio
- Description: Measures the strategy's risk-adjusted return by comparing the excess return (return above the risk-free rate) to its volatility.
- Formula:
$$ \text{Sharpe Ratio} = \frac{\text{Mean(Excess Daily Returns)}}{\text{Standard Deviation of Daily Returns}} \times \sqrt{365} $$

$$ \text{Excess Daily Returns} = \text{Daily Returns} - \frac{\text{Risk-Free Rate（0.0）}}{365} $$
Sortino Ratio
- Description: Similar to the Sharpe Ratio but focuses only on downside risk, which is more relevant for risk-averse investors.
- Formula:
  
  $$ \text{Sortino Ratio} = \frac{\text{Mean(Excess Daily Returns)}}{\text{Downside Deviation}} \times \sqrt{365} \ \text{Excess Daily Returns} = \text{Daily Returns} - \frac{\text{Risk-Free Rate（0.0）}}{365} $$
$$ \text{Downside Deviation} = \sqrt{\frac{1}{N} \sum_{i=1}^{N} (\text{Negative Returns})^2} $$

Only negative returns (below the target return(0.0)) are considered.
Maximum Drawdown
- Description: Measures the largest peak-to-trough decline in portfolio value, representing the worst potential loss.
- Formula:
$$ \text{Max Drawdown} = \min(\text{Portfolio Returns} - \text{Running Maximum}) $$

$$ \text{Running Maximum} = \max(\text{Cumulative Portfolio Returns}) $$
Number of Trades
- Description: Counts the total number of trades executed during the backtest.
- Formula:
  
  $$ \text{Number of Trades} = \text{Count(Position Changes)} $$
  - Each change in position (buy or sell) is counted as a trade.
Cumulative Returns
- Description: Tracks the portfolio's return over time relative to the initial investment.
- Formula:
$$ \text{Cumulative Return at Time } t = \frac{\text{Portfolio Value at } t}{\text{Initial Investment}} $$

Performance Optimization

The system has undergone significant performance optimization, achieving a 23x speed improvement. Here are the key optimization strategies:

1. NumPy Array Operations

Before: Frequent Pandas DataFrame operations
After: Using NumPy arrays for calculations
Why it's faster:
- Lower-level implementation
- No index overhead
- More efficient memory access patterns
- Direct CPU array operations

2. Vectorization

Before: Loop-based calculations and multiple if-else statements
After: Vectorized operations using np.where and array operations
Why it's faster:
- Leverages CPU's SIMD (Single Instruction Multiple Data) capabilities
- Reduces branch prediction failures
- Allows parallel processing at CPU level
- Minimizes Python interpreter overhead

3. Memory Management

Before: Frequent DataFrame updates and copies
After: Batch operations on NumPy arrays
Why it's faster:
- Reduced memory allocations
- Fewer data copies
- Better cache utilization
- Single DataFrame update at the end

4. Code Example

# Before Optimization
for i in range(len(data)):
    if portfolio.iloc[i-1]['holdings'] == 0:
        if signals.iloc[i]['signal'] == 1:
            portfolio.iloc[i]['holdings'] = 1
        elif signals.iloc[i]['signal'] == -1:
            portfolio.iloc[i]['holdings'] = -1

# After Optimization
signal_array = signals['signal'].values
holdings_array = portfolio['holdings'].values

holdings_array[i] = np.where(signal_array[i] == 1, 1,
                   np.where(signal_array[i] == -1, -1, 0))
portfolio['holdings'] = holdings_array

5. Key Improvements

Trading logic execution: 23x faster
Memory usage: Significantly reduced
Code maintainability: Improved through consistent vectorization patterns
Scalability: Better handling of large datasets

6. Best Practices Learned

Use NumPy arrays for numerical computations whenever possible
Vectorize operations instead of using loops
Minimize DataFrame operations and perform them in batch
Keep data in contiguous memory blocks
Reduce object creation and copying
Use appropriate data structures for the task

These optimizations demonstrate how proper vectorization and data structure selection can dramatically improve performance in Python data processing applications.

7. FactorEngine Optimization

The FactorEngine has been optimized with a state management system:

State Reset Mechanism
- Added reset() method to clear cached factor values and signals
- Allows reuse of FactorEngine instances across multiple tests
- Maintains factor definitions while clearing computed results
Memory Management
- Efficient reuse of engine instances reduces memory allocation
- Prevents memory leaks during large-scale optimization
- Minimizes garbage collection overhead

8. Strategy Optimizer Enhancements

Parallel Processing

Automatic CPU core detection and limitation

n_jobs = min(psutil.cpu_count(), 32)  # Limit max processes

Optimized batch size based on CPU cores

batch_size = max(10, n_jobs * 10)  # Dynamic batch sizing

Resource Management

Pre-allocated FactorEngine pool for each process
Cyclic engine reuse pattern to minimize resource consumption

factor_engines = [FactorEngine() for _ in range(n_jobs)]
# ... 
factor_engines[i % len(factor_engines)]  # Cyclic usage

Batch Processing
- Efficient parameter combination testing in batches
- Reduced inter-process communication overhead
- Optimized progress tracking with batch updates
Performance Monitoring
- Real-time tracking of best Sharpe ratio
- Elapsed time monitoring per combination
- Batch-level progress updates

9. Key Improvements

Memory Efficiency
- Reduced memory allocation frequency
- Better memory usage patterns
- Minimized object creation/destruction cycles
Processing Speed
- Optimized parallel execution
- Efficient resource utilization
- Reduced system call overhead
Scalability
- Handles large parameter spaces efficiently
- Automatic resource allocation
- Balanced CPU utilization
Reliability
- Robust error handling
- Process isolation
- State management consistency

10. Strategy Optimization (`strategy.py`)

Vectorized Operations
- Pre-allocated numpy arrays for signals
- Batch factor value processing
- Reduced DataFrame operations
```
signal_array = np.zeros(len(data))
signals[ma_1 > ma_2] = 1
signals[ma_1 < ma_2] = -1
```

11. Factor Definitions (`factor_definitions.py`)

Efficient Moving Average Calculation

Used numpy's convolution for MA computation
Optimized padding for missing values

ma = np.convolve(values, np.ones(ma_period)/ma_period, mode='valid')
ma = np.pad(ma, (ma_period-1, 0), mode='edge')

Vectorized Signal Generation
- Single-pass signal calculation
- Eliminated loops and conditionals
- Optimized threshold comparisons
```
signals[(values > self.upper_threshold)] = 1
signals[(values < self.lower_threshold)] = -1
```

12. Factor Engine (`factor_engine.py`)

Memory Management
- Pre-allocated result arrays
- Cached factor values in numpy arrays
- Reduced DataFrame conversions
```
factor_arrays = np.empty((self._data_length, len(self.factors)))
```
Computation Optimization
- Single-pass factor calculation
- Efficient state management
- Optimized reset mechanism
```
self.factor_values[name] = factor_arrays[:, i]
```

13. Example Optimization (`2ma_factor_mining.py`)

Timestamp Processing

Direct integer conversion
Avoided string operations
Used compact data types

data['timestamp_start'] = (data['timestamp_start'].astype(np.int64) // 1000000000).astype(np.int32)

Data Selection
- Used views instead of copies
- Pre-defined required columns
- Optimized memory usage
```
data = data[required_columns].copy(deep=False)
```

Key Performance Improvements

Memory Efficiency
- Reduced memory allocations
- Used numpy arrays instead of DataFrames where possible
- Implemented efficient data type conversions
- Minimized data copying
Computational Speed
- Vectorized operations throughout
- Eliminated loops and conditionals
- Reduced DataFrame operations
- Optimized numerical computations
Data Processing
- Efficient timestamp handling
- Optimized data selection
- Reduced type conversions
- Minimized string operations
Resource Management
- Efficient memory usage
- Optimized state management
- Reduced object creation
- Better garbage collection

Strategy Optimizer Memory Management

Problem Evolution

1. Initial Memory Issue

The optimizer showed memory leaks during large-scale parameter optimization, particularly when processing multiple trading pairs and timeframes.

2. First Optimization Attempt

# Added aggressive memory management
memory_threshold = 0.85  # Memory usage threshold
memory_usage = psutil.Process().memory_percent()
if memory_usage > memory_threshold:
    gc.collect()

Result: Significant performance degradation due to frequent memory checks and garbage collection

3. Second Attempt

# Changed data structure
results = []  # Instead of results = {}
results.append((combo, result))
results_dict = dict(results)

Result: Added unnecessary conversion overhead without memory benefits

Final Solution

1. Simplified Memory Management

# Keep original dictionary structure
results = {}

# Periodic cleanup only
if i % (self.batch_size * 5) == 0:
    gc.collect()

2. Resource Pooling

# Factor engine pooling with minimal management
factor_engines = [FactorEngine() for _ in range(n_jobs)]

3. Immediate Cleanup

# Clean up batch results immediately
del processed_results
del valid_results

Key Learnings

Less is More
- Minimal memory management performs better
- Avoid frequent garbage collection
- Keep data structures simple
Resource Management
- Pool and reuse resources where possible
- Clean up resources immediately after use
- Use periodic rather than continuous cleanup
Performance Impact
- Frequent garbage collection significantly slows processing
- Data structure conversions add unnecessary overhead
- Simple periodic cleanup provides the best balance

Best Practices

Memory Management
- Use periodic cleanup instead of continuous monitoring
- Keep original data structures when possible
- Clean up batch results immediately
Resource Handling
- Pool resources for reuse
- Implement cleanup in finally blocks
- Reset pooled resources between uses
Code Structure
- Maintain simple, direct code paths
- Avoid unnecessary data transformations
- Focus on essential cleanup points

14. Trading Logic Optimization (`trading_logic.py`)

Simplified Signal Processing
- Removed np.where conditional checks
- Replaced with direct array operations
- Achieved 4x performance improvement
Before Optimization:

holdings_array[i] = np.where(signal_array[i] == 1, 1,
                   np.where(signal_array[i] == -1, -1, 0))

After Optimization:

# Direct array operations instead of np.where
holdings_array[signal_array == 1] = 1
holdings_array[signal_array == -1] = -1
holdings_array[signal_array == 0] = 0

Performance Improvement Reasons:
- Eliminated conditional check overhead
- Avoided temporary array creation
- More efficient memory access patterns
- Reduced CPU instruction count

Name	Name	Last commit message	Last commit date
Latest commit yifei 更新数据下载配置，调整日期范围并添加稳定币流动因子 Mar 20, 2025 55be2e7 · Mar 20, 2025 History 61 Commits
__pycache__	__pycache__	Refactor backtest engine and performance evaluation; update file perm…	Dec 31, 2024
backtest	backtest	Simplify signal-to-holdings mapping in StandardTradingLogic	Feb 3, 2025
data	data	更新数据下载配置，调整日期范围并添加稳定币流动因子	Mar 20, 2025
dataset	dataset	更新数据下载配置，调整日期范围并添加稳定币流动因子	Mar 20, 2025
examples	examples	更新数据下载配置，调整日期范围并添加稳定币流动因子	Mar 20, 2025
factors	factors	更新数据下载配置，调整日期范围并添加稳定币流动因子	Mar 20, 2025
reports	reports	更新数据下载配置，调整日期范围并添加稳定币流动因子	Mar 20, 2025
test	test	Update .DS_Store files across multiple directories to reflect recent …	Jan 23, 2025
.DS_Store	.DS_Store	更新数据下载配置，调整日期范围并添加稳定币流动因子	Mar 20, 2025
.gitattributes	.gitattributes	Refactor backtest engine and performance evaluation; update file perm…	Dec 31, 2024
.gitignore	.gitignore	Update .gitignore to exclude specific Binance and trading report dire…	Feb 19, 2025
README.md	README.md	Update README.md with trading logic optimization details	Feb 3, 2025
requirements.txt	requirements.txt	Modify data download configuration and clean up commented code	Feb 19, 2025

yifei99/quant

Folders and files

Latest commit

History

Repository files navigation

Workflow

Code module

Data Module

Factor System

Backtest System

Performance Optimization

1. NumPy Array Operations

2. Vectorization

3. Memory Management

4. Code Example

5. Key Improvements

6. Best Practices Learned

7. FactorEngine Optimization

8. Strategy Optimizer Enhancements

9. Key Improvements

10. Strategy Optimization (strategy.py)

11. Factor Definitions (factor_definitions.py)

12. Factor Engine (factor_engine.py)

13. Example Optimization (2ma_factor_mining.py)

Key Performance Improvements

Strategy Optimizer Memory Management

Problem Evolution

1. Initial Memory Issue

2. First Optimization Attempt

3. Second Attempt

Final Solution

1. Simplified Memory Management

2. Resource Pooling

3. Immediate Cleanup

Key Learnings

Best Practices

14. Trading Logic Optimization (trading_logic.py)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

10. Strategy Optimization (`strategy.py`)

11. Factor Definitions (`factor_definitions.py`)

12. Factor Engine (`factor_engine.py`)

13. Example Optimization (`2ma_factor_mining.py`)

14. Trading Logic Optimization (`trading_logic.py`)

Packages