Skip to content

Conversation

@sli-tao
Copy link
Contributor

@sli-tao sli-tao commented Oct 7, 2025

Taoshi Pull Request

Description

[Provide a brief description of the changes introduced by this pull request.]

Related Issues (JIRA)

[Reference any related issues or tasks that this pull request addresses or closes.]

Checklist

  • I have tested my changes on testnet.
  • I have updated any necessary documentation.
  • I have added unit tests for my changes (if applicable).
  • If there are breaking changes for validators, I have (or will) notify the community in Discord of the release.

Reviewer Instructions

[Provide any specific instructions or areas you would like the reviewer to focus on.]

Definition of Done

  • Code has been reviewed.
  • All checks and tests pass.
  • Documentation is up to date.
  • Approved by at least one reviewer.

Checklist (for the reviewer)

  • Code follows project conventions.
  • Code is well-documented.
  • Changes are necessary and align with the project's goals.
  • No breaking changes introduced.

Optional: Deploy Notes

[Any instructions or notes related to deployment, if applicable.]

/cc @mention_reviewer

@github-actions
Copy link

github-actions bot commented Oct 7, 2025

🤖 Claude AI Code Review

Last reviewed on: 14:23:47


Summary

This PR updates forex slippage calculations and significantly rebalances scoring weights, shifting emphasis from diversified metrics (50% PnL + 50% risk-adjusted ratios) to PnL-dominant scoring (90% PnL + 10% risk-adjusted ratios). It also introduces time-based forex slippage differentiation and fixes a bug in PnL aggregation across subcategories.


✅ Strengths

  1. Bug Fix Identified: The addition of pnl_gain and pnl_loss aggregation in asset_segmentation.py (lines 113-114) appears to fix a legitimate bug where these fields weren't being summed during checkpoint aggregation.

  2. Test Coverage: Good test coverage added for the aggregation fix (test_aggregate_pnl_single_subcategory) with explicit verification of aggregated values.

  3. Parameterization: Making weighting_distribution() and average() methods accept configurable decay parameters improves flexibility and testability.

  4. Time-Zone Awareness: Using ZoneInfo for timezone handling in slippage calculations is the modern, correct approach.


⚠️ Concerns

CRITICAL: Scoring Weight Changes

Impact Analysis Required - vali_config.py lines 241-247:

# Changed from balanced approach to PnL-dominant
SCORING_PNL_WEIGHT = 0.9  # Was 0.5
# All other metrics reduced from 0.1 to 0.02

This is a fundamental change to the incentive structure:

  • Reduces emphasis on risk-adjusted returns by 80%
  • Makes system vulnerable to high-risk, high-variance strategies
  • Miners can now ignore Sharpe, Sortino, Omega, Calmar ratios almost entirely
  • Could encourage reckless trading behavior since risk metrics now matter 5x less

Questions:

  1. Has this been validated through backtesting with historical miner data?
  2. What's the risk of encouraging overleveraged positions?
  3. How does this align with the stated goal of "long-term value of strategies" mentioned in the docs?

CRITICAL: Incomplete PR Template

The PR description has empty sections:

  • No description of changes provided
  • No related JIRA issues
  • No reviewer instructions
  • All checkboxes unchecked

This makes it difficult to understand the business context and rationale.

MAJOR: Forex Slippage Model Change

price_slippage_model.py lines 116-126:

if 17 <= hour < 18:  # Daily 5-6 pm EST
    return 0.001     # 10 bps
else:
    return 0.0005    # 5 bps

Issues:

  1. Magic Numbers: No explanation for why 10 bps vs 5 bps
  2. Timezone Edge Cases: What happens during DST transitions?
  3. Simplified Model: The previous model used actual spread and volatility data. This flat-rate approach may be less accurate
  4. G1 vs Other Distinction Removed: The previous V2 logic had different rates for G1 (10 bps) vs others (15 bps). Now all forex pairs are treated identically except for time of day

MODERATE: Aggressive PnL Weighting

vali_config.py line 173:

WEIGHTED_AVERAGE_DECAY_MIN_PNL = 0.045  # 70% weight in first 30 days

Combined with the documentation change showing:

  • First 10 days: 40% of total score
  • First 30 days: 70% of total score
  • First 70 days: 87% of total score

This creates extreme recency bias:

  • A bad week can destroy months of good performance
  • Discourages long-term strategic thinking
  • May cause miners to game the system with short-term tactics

MINOR: Test Precision

test_metrics.py line 685:

self.assertAlmostEqual(weighted_score, 70.0, delta=1.0, ...)

A delta of 1.0 (±1.43% tolerance) seems loose for validating a specific weighting distribution. Consider tightening to delta=0.5.


💡 Suggestions

1. Provide Business Context

Add to PR description:

  • Why the dramatic shift to PnL-focused scoring?
  • What problem does this solve?
  • What simulations/analyses support this change?

2. Staged Rollout

Consider a more gradual transition:

# Phase 1: Moderate shift
SCORING_PNL_WEIGHT = 0.65  # Instead of jumping to 0.9
SCORING_SHARPE_WEIGHT = 0.07  # Instead of dropping to 0.02

3. Add Constants for Forex Slippage

class ValiConfig:
    FOREX_SLIPPAGE_HIGH_VOLATILITY_HOUR_START = 17
    FOREX_SLIPPAGE_HIGH_VOLATILITY_HOUR_END = 18
    FOREX_SLIPPAGE_HIGH_BPS = 0.001
    FOREX_SLIPPAGE_LOW_BPS = 0.0005

4. Document Slippage Rationale

Add a comment explaining why 5-6 PM EST has higher slippage:

# Higher slippage during 5-6 PM EST due to:
# - London market close (5 PM GMT = 12 PM EST)
# - NYSE close (4 PM EST)
# - Reduced liquidity during transition periods

5. Add Validation for Weight Sum

# In vali_config.py or a test
assert abs(sum([
    SCORING_OMEGA_WEIGHT,
    SCORING_SHARPE_WEIGHT,
    SCORING_SORTINO_WEIGHT,
    SCORING_STATISTICAL_CONFIDENCE_WEIGHT,
    SCORING_CALMAR_WEIGHT,
    SCORING_RETURN_WEIGHT,
    SCORING_PNL_WEIGHT
]) - 1.0) < 0.001, "Scoring weights must sum to 1.0"

6. Add Integration Test

Test that the new weighting actually produces the documented distribution:

def test_pnl_weight_distribution_matches_docs():
    """Verify that 10 days = 40%, 30 days = 70%, 70 days = 87%"""
    # Create strategic test data and verify weights

7. Version Documentation

Add a changelog entry explaining V2 slippage model changes and the rationale.


🔒 Security Notes

  1. No Direct Security Vulnerabilities: The code changes don't introduce obvious security flaws.

  2. Economic Attack Surface: The extreme PnL weighting could enable economic attacks:

    • Miners might exploit short-term volatility
    • Wash trading detection becomes more critical
    • Front-running concerns if PnL is measured too frequently
  3. Timezone Dependency: The ZoneInfo import requires the system to have timezone data installed. Consider adding error handling:

    try:
        eastern_tz = ZoneInfo("America/New_York")
    except ZoneInfoNotFoundError:
        # Fallback or raise clear error

🎯 Action Items

Before Merging:

  1. MUST: Fill out PR template completely with business justification
  2. MUST: Provide backtesting results showing impact of weight changes
  3. MUST: Add constants for magic numbers in forex slippage
  4. ⚠️ SHOULD: Consider staged rollout of weight changes
  5. ⚠️ SHOULD: Add validation that scoring weights sum to 1.0
  6. ⚠️ SHOULD: Document slippage model V2 rationale in code comments
  7. 💡 CONSIDER: Tighten test precision tolerance

Risk Assessment: ⚠️ MEDIUM-HIGH - The scoring weight changes fundamentally alter system incentives and could have unintended consequences. Recommend thorough validation before deployment.

@sli-tao sli-tao force-pushed the sli--slippage branch 4 times, most recently from 08bd24b to ffaa6de Compare October 9, 2025 00:01
@sli-tao sli-tao changed the title Update forex slippage values Update forex slippage values, increase pnl weight Oct 9, 2025
@sli-tao sli-tao merged commit 3ffcb63 into main Oct 9, 2025
4 checks passed
ward-taoshi pushed a commit that referenced this pull request Oct 9, 2025
* Update forex slippage values

* increase pnl weight in weight recent pnl more

---------

Co-authored-by: Derek Awender <[email protected]>
jbonilla-tao added a commit that referenced this pull request Nov 1, 2025
* WIP asset class scoring

* segmentation fix

* Add claude pr review (#611)

* Lower challengeperiod slashing (#610)

* lower challengeperiod slash rate

* reinstate miner deposit

* Update time

* Update forex slippage values, increase pnl weight (#612)

* Update forex slippage values

* increase pnl weight in weight recent pnl more

---------

Co-authored-by: Derek Awender <[email protected]>

* Remove subcategory logic

* Block trade pairs on signal receive

* Enforce min risk adjusted threshold

* Revert "Enforce min risk adjusted threshold"

This reverts commit 66f4d678268ee11f4125395cd87c8fe438090c00.

* debt ledger v1

* Add risk adjusted performance penalty to penalty ledger

* emissions ledger

* Add EmissionsLedgerManager with efficient delta updates

Implements continuous emissions tracking with delta updates to avoid
reprocessing historical data. Manager runs in background and efficiently
updates all hotkeys by only processing new 12-hour chunks.

Features:
- Delta updates: 11x faster, only processes new chunks
- State persistence: Saves/loads compressed ledgers to disk
- Background loop: Configurable update interval (default 6 hours)
- Auto hotkey management: Tracks new miners, removes inactive ones
- Default network changed to local subtensor

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Add fallback for archive nodes without runtime API

Fixes issue where archive nodes don't have NeuronInfoRuntimeApi.get_neurons_lite.
Now falls back to direct substrate storage queries when metagraph API unavailable.

Fallback methods:
- _get_uid_for_hotkey_from_chain: Query Keys storage
- _get_registration_block_from_chain: Query BlockAtRegistration
- _get_all_hotkeys_from_chain: Iterate through all UIDs

These work on any node regardless of runtime API version.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Add debug logging for hotkey search fallback

* Add subnet existence check and detailed debug logging

* Change emissions tracking to start from Sept 1, 2025 and default to finney

Changes:
- Default network changed from "local" to "finney" for both ledger and manager
- Added DEFAULT_START_DATE_MS constant (Sept 1, 2025 00:00:00 UTC)
- Updated build_emissions_ledger_for_hotkey to use default start date instead of registration block
- This avoids querying full historical data and focuses on recent emissions

Benefits:
- Faster initial build (no need to query from registration)
- Works immediately with mainnet without local node setup
- Still allows custom start_time_ms if needed

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Add archive node endpoint support for historical queries

Public finney endpoints are lite nodes that don't keep historical state.
Added archive_endpoints parameter to EmissionsLedger and EmissionsLedgerManager
to support querying old blocks via archive nodes.

Changes:
- Added archive_endpoints parameter to EmissionsLedger.__init__()
- Added archive_endpoints parameter to EmissionsLedgerManager.__init__()
- Added --archive-endpoint CLI argument (can be specified multiple times)
- Subtensor initialized with archive_endpoints for fallback

Usage:
  # Using archive node
  python3 vali_objects/vali_dataclasses/emissions_ledger.py \
    --archive-endpoint wss://archive.chain.opentensor.ai:443

  # Or with local archive node
  python3 vali_objects/vali_dataclasses/emissions_ledger.py \
    --network local --archive-endpoint ws://127.0.0.1:9944

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Fix emissions query to use UID and add caching optimization

Fixed the "Storage function requires 1 parameters, 2 given" error by:
1. Querying emissions by UID instead of hotkey (Emission storage is indexed by netuid+uid)
2. Added _get_uid_for_hotkey_at_block to find UID at historical blocks
3. Implemented UID caching - once found, reuse the UID for subsequent blocks
4. query_emissions_at_block now returns (emissions, uid) tuple for caching

Performance optimization:
- First query searches all UIDs to find hotkey
- Subsequent queries try cached UID first (fast path)
- Only searches all UIDs if cached UID doesn't match (rare)
- Dramatically reduces queries for miners that stay on same UID

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Fix Emission storage query - takes netuid only, returns vector

* Fix emissions ledger conversion and add execution timing

Key fixes:
1. Corrected RAO to TAO conversion - Emission storage is per tempo (360 blocks), not per block
   - Now dividing by 360 to get per-block rate before multiplying by blocks_elapsed
   - This fixes the issue where emissions were ~360x too high

2. Added execution time tracking per hotkey
   - Logs elapsed time when build_emissions_ledger_for_hotkey completes

3. Fixed archive endpoint configuration
   - Properly configures subtensor to use archive endpoints for historical queries
   - Resolves "State discarded" errors when querying old blocks

With these fixes, emissions values should be accurate and the ledger works correctly
with both remote archive nodes (wss://archive.chain.opentensor.ai:443) and local
archive nodes (ws://127.0.0.1:9944).

* Add matplotlib plotting for emissions ledger visualization

Added plot_emissions() method to generate visual analysis of emissions data:
- Top subplot: Bar chart of emissions per 12-hour chunk
- Bottom subplot: Cumulative emissions over time with area fill
- Statistics box showing total, average, max, and active chunks
- Final value annotation on cumulative chart

Usage:
  # Display plot interactively
  python -m vali_objects.vali_dataclasses.emissions_ledger --hotkey <hotkey> --plot

  # Save plot to file
  python -m vali_objects.vali_dataclasses.emissions_ledger --hotkey <hotkey> --save-plot emissions.png

Matplotlib is already in requirements.txt, so no additional dependencies needed.

* Fix: Remove matplotlib imports from module level to avoid import errors

* Fix: Re-add matplotlib imports to plot_emissions function

* Critical fix: Properly connect to archive node for historical queries

The previous implementation was using archive_endpoints as a fallback only,
which meant historical queries were still hitting lite nodes and getting
"State discarded" errors. This caused all historical chunks to show 0 emissions,
with only the most recent block returning data.

Fix: Configure subtensor to use the archive endpoint as the PRIMARY connection
by setting config.subtensor.chain_endpoint and clearing the network parameter.

This resolves the bug where emissions appeared to only occur in the last chunk.
Historical emissions data will now be properly queried from archive nodes.

* Add TAO value tracking to emissions ledger

- Updated EmissionsCheckpoint to store both alpha and TAO emissions
- Added query_alpha_to_tao_rate() method to get conversion rates from blockchain pool reserves
- Modified build_emissions_ledger_for_hotkey() to calculate TAO values using average conversion rates at chunk boundaries
- Updated plotting to display dual y-axes showing both alpha and TAO emissions
- Enhanced statistics to show both token amounts

The TAO value is approximated by querying SubnetTAO and SubnetAlphaIn storage at the start and end of each 12-hour chunk, then averaging the conversion rate.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* checkpoint

* cp

* Add rate limiting and fix hotkey extraction in emissions ledger

- Add configurable rate limiting (default: 1 req/sec for official endpoints)
- Fix query_map key format: key is UID (int), not [netuid, uid] tuple
- Fix hotkey extraction: convert ScaleObj bytes to SS58 address using scalecodec
- Add rate limiting before all substrate queries to respect endpoint limits
- Import scalecodec for SS58 encoding
- Remove try-except to fail fast on conversion errors

This fixes the "0 active hotkeys" issue and prevents rate limiting errors
when using official Bittensor archive endpoints.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* cp

* cp

* allow tao price fetching. fix live price fetcher. update daemon logic to be consistent with PTN code

* USD value in emissions ledger

* valid init times

* Improve block timestamp extraction with multiple approaches and debug logging

- Added 3 different extraction approaches to handle varying block data structures
- Approach 1: extrinsic.value['call'].value structure (original)
- Approach 2: Direct dict access for extrinsic['call']['call_args']
- Approach 3: Look for 'method' instead of 'call' key
- Added detailed debug logging when extraction fails to diagnose structure
- Logs extrinsic type, available keys, and structure details

This should help identify the correct block data structure being returned
from the archive endpoint and fix the timestamp extraction failure.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* timestamp issue

* Add missing risk_adjusted_performance_penalty to PenaltyCheckpoint

Fixed bug where risk_adjusted_performance penalty was calculated but not
stored in the checkpoint object.

Changes:
- Added risk_adjusted_performance_penalty attribute to PenaltyCheckpoint.__init__
- Updated checkpoint creation to store the risk_adjusted_performance penalty value
- Now all 4 configured penalties are properly tracked in checkpoints

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Implement DebtLedger dataclass combining emissions, penalties, and performance

Created comprehensive DebtLedger structure to unify data from three sources:

DebtCheckpoint dataclass includes:
- Emissions data: alpha/TAO/USD earned and cumulative totals
- Performance metrics: PnL, fees, drawdown, portfolio return
- Penalty multipliers: all 4 penalty types + cumulative
- Derived fields: net PnL, total fees, weighted score

DebtLedger class provides:
- Chronological checkpoint management
- Query methods for latest/historical data
- Serialization via to_dict() with nested structure
- Summary printing for debugging/analysis

Structure is UI-ready with organized nested dictionaries grouping
emissions, performance, penalties, and derived metrics.

DebtLedgerManager marked as TODO for future implementation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* vali

* debt ledger definition.

* Fix loop iteration bug in daily_portfolio_returns.py

Changed line 394 from incrementing to decrementing current_ms timestamp.
The loop was moving forward in time instead of backward, causing some
hotkeys to fail inserting rows during continuous updates.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Fix date bounds calculation for multiple miners in daily_portfolio_returns

When processing all miners, the previous logic would set final_end_ms for every
position, causing the last-processed miner to determine the end date for all miners.
This resulted in incorrect date ranges when eliminated miners were processed after
active miners.

The fix:
- Check elimination status once per hotkey, not per position
- If ANY miner is active, process all miners through today
- If ALL miners are eliminated, use latest elimination date + 1 day
- Properly set elim_time for logging

This ensures that when running for all miners, data is processed through the
current date as long as at least one miner is still active.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* end time fix

* api intgration v0

* cp

* c#p

* cleanup

* Fix production errors and enhance monitoring

- Fix daemon coordination: DebtLedgerManager now orchestrates penalty, emissions, and debt ledger updates to prevent race conditions
- Fix Flask route conflict: Rename duplicate get_emissions_ledger to get_debt_ledger
- Fix EmissionsLedger error: Handle both PerfLedger return types (direct vs dict)
- Enhance Slack alerts: Add VM name, validator hotkey, and git branch to REST/WS down notifications for faster incident response

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Add penalty ledger REST endpoint and improve logging

- Add /penalty-ledger/<minerid> REST endpoint for querying penalty data
- Add informative logging to penalty ledger manager showing:
  - Number of hotkeys being processed
  - Delta update vs full rebuild mode
  - Number of new checkpoints added per run
  - Total ledgers maintained
- Logging is concise and useful for monitoring without being verbose

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Add detailed logging to REST server initialization

- Add step-by-step logging in start_rest_server() function
- Add 8-step initialization logging in PTNRestServer.__init__()
- Track each major operation: API keys, config, Flask app, contract loading,
  metrics setup, route registration, error handlers, and refresh thread
- Helps identify exactly where initialization fails during startup
- All steps show completion with ✓ markers for easy visual tracking

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* logging

* Fix PerfCheckpoint attribute access in debt ledger building

- Changed from non-existent gain_loss_by_type array to direct attributes
- Use pnl_gain, pnl_loss, spread_fee_loss, carry_fee_loss directly
- Use mdd instead of max_drawdown
- Use mpv instead of prev_portfolio_ret
- Fixes AttributeError causing daemon crashes every 12 hours

* Add clarifying comments for PerfCheckpoint attribute mapping

- Document that performance data fields are sourced from PerfCheckpoint
- Explain attribute name differences (gain->portfolio_return, mdd->max_drawdown, mpv->max_portfolio_value)
- Add note that pnl_loss is a negative value

* Consolidate ledger Slack alerts with VM/git/hotkey context

- Add send_ledger_failure_alert() and send_ledger_recovery_alert() to SlackNotifier
- Include VM hostname, git branch, and validator hotkey in all ledger alerts
- Update debt_ledger, emissions_ledger, and penalty_ledger to use new methods
- Consistent alert format across all three ledger managers
- Fixes missing context in Slack notifications for ledger failures

* Temporary: aggregate risk adjusted performance with asset subcategories

* Validate asset class selection before risk adjusted performance penalty

* Fix Slack alerts and duplicate checkpoint errors in ledger managers

1. Fix missing validator hotkey in Slack alerts:
   - Thread validator_hotkey parameter through DebtLedgerManager,
     EmissionsLedgerManager, and PenaltyLedgerManager initialization
   - Pass hotkey to SlackNotifier in all three ledger managers
   - Update validator.py to pass validator hotkey when creating DebtLedgerManager
   - Resolves "Validator Hotkey: Unknown" issue in Slack notifications

2. Fix duplicate checkpoint timestamp error in debt ledger:
   - Add per-hotkey checkpoint validation before adding new checkpoints
   - Previous logic used global minimum timestamp causing attempts to
     re-add existing checkpoints for some hotkeys
   - Now checks each hotkey's last checkpoint individually in delta update mode
   - Resolves "checkpoint at X, but previous checkpoint at X" errors

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Enhance health monitor with port connectivity checks and heartbeat logging

- Add socket-based port connectivity checks for REST/WebSocket servers
- Log heartbeat every minute to prove health monitor is running
- Detect processes that are alive but not serving on ports
- Add detailed diagnostics showing both process and port state
- Add traceback printing for health check exceptions

This prevents silent failures where servers crash/hang without detection.

* enahnce init logs

* Add comprehensive initialization monitoring with error detection and Slack alerts

This commit addresses the validator hanging issue by adding multiple layers of
monitoring and alerting to identify exactly where initialization hangs occur:

1. **Detailed Step Logging**: Each of the 10 initialization steps now logs:
   - Start time with step description
   - Process/thread PIDs after spawning
   - Completion time with elapsed duration
   - Success verification (process.is_alive() checks)

2. **Error Handling with Slack Alerts**: Each step wrapped in try-catch blocks:
   - Captures and logs full exception tracebacks
   - Sends immediate Slack alerts on any initialization failure
   - Includes step number, description, error details, and elapsed time
   - Helps identify specific component causing failures

3. **Watchdog Thread for Hang Detection**: Background monitor that:
   - Tracks current initialization step and elapsed time
   - Sends Slack alert if any step exceeds 60 seconds
   - Identifies hung initialization steps in real-time
   - Provides context on which component is blocking

4. **Success Notification**: Sends Slack confirmation when all steps complete

Previous issue: Validator hung at "Running validator on uid: 71" for 26+ hours
with no subsequent logs. HealthMonitor showed processes alive but REST/WS
servers never started. These enhancements will pinpoint the exact subprocess
initialization that causes the hang.

Related to prior commit: "Enhance health monitor with port connectivity checks"

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Suppress WebSocket handshake errors from health monitor checks

The health monitor uses socket.connect_ex() to check if the WebSocket port
is open, creating raw TCP connections without completing the WebSocket
handshake. This causes the websockets library to log EOFError exceptions
when it tries to parse the handshake and receives nothing.

Changes:
- Added logging.Filter to suppress health-check-related EOFErrors
- Filters out "opening handshake failed" messages
- Filters out "connection closed while reading HTTP request line" messages
- Filters out "stream ends after 0 bytes" errors
- Applied filter to all websockets loggers (server, protocol, base)

These errors are harmless noise from the health monitoring system and do
not indicate actual connection problems. Real WebSocket client connections
work normally.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Add validator context to Slack recovery alerts

Updated send_recovery_alert() to include:
- Validator hotkey (last 8 chars)
- VM hostname
- Git branch

This brings the recovery alerts in line with other alert methods
(send_websocket_down_alert, send_rest_down_alert, send_ledger_recovery_alert)
which already include this context for better operational visibility.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Add standalone debt ledger API testing and visualization script

Created comprehensive testing script for the debt ledger REST API endpoint
that fetches, validates, and visualizes data for cross-checking with TaoStats.

Features:
- Fetches debt ledger data from REST API with authentication
- Validates data structure and value ranges
- Prints detailed summary statistics (min/max/mean/std for all metrics)
- Creates 4-panel visualization:
  * Debt percentage over time with min/max annotations
  * Share percentage over time
  * Total debt evolution
  * Recent entries table (last 10 checkpoints)
- Supports saving plots to file or interactive display
- Optionally saves raw JSON response for debugging
- Configurable host/port for testing against localhost or production

Usage examples:
  python test_debt_ledger_api.py --hotkey 5DUi8ZCaNabsR6bnHfs471y52cUN1h9DcugjRbEBo341aKhY
  python test_debt_ledger_api.py --hotkey 5DUi8... --save-plot output.png --save-json data.json
  python test_debt_ledger_api.py --hotkey 5FRW... --host localhost --port 48888

Dependencies: matplotlib, pandas, numpy

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* remove test file

* Add TAO and ALPHA balance snapshots to emissions and debt ledgers

- Add tao_balance_snapshot and alpha_balance_snapshot fields to EmissionsCheckpoint
- Implement _query_tao_balance_at_block() and _query_alpha_balance_at_block() methods
- Query balances at checkpoint end block with fail-fast error handling
- Pass through balance snapshots from emissions to debt ledgers
- Update serialization/deserialization to handle new fields
- Add +2 subtensor queries per hotkey per 12-hour checkpoint for validation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Fix ALPHA balance queries and add auto-restart for API servers

## ALPHA Balance Query Fixes
- Fix: Use correct storage function `get_stake(coldkey, hotkey, netuid, block)` instead of non-existent `SubnetAccount`
- Query coldkey ownership via `Owner` storage function
- Implement hotkey->coldkey cache to reduce redundant queries (~45% reduction)
- Persist cache to disk for efficiency across restarts
- Update signature to use block_number instead of block_hash (required by get_stake API)
- Add proper error handling with fail-fast semantics
- Note: Balance.tao represents ALPHA when netuid != 0 (confusing SDK naming)

## API Server Auto-Restart
- Add automatic restart mechanism for WebSocket and REST servers
- Implement restart throttling (max 3 restarts per 5 minutes)
- Add 60-second startup grace period to prevent false alerts on boot
- Store process references as instance variables for restart capability
- Add thread-safe restart lock to prevent race conditions
- New Slack alerts: restart attempt, recovery (after restart), critical (throttle exceeded)

## Debt Ledger Visualization
- Update local_debt_ledger_api.py to plot cumulative ALPHA emissions vs balance delta
- Add validation section showing difference percentage
- Enable visual verification that emissions match wallet balance changes

Performance: Saves ~2,300 queries per full ledger rebuild with caching

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Implement debt-based scoring algorithm for miner weight calculation

This implements the debt-based scoring system specified in the TODO comment.
The algorithm pays miners based on their previous month's performance (PnL
scaled by penalties), distributing emissions to cover remaining debt over
the days left in the current month.

Key Features:
- Activates starting November 2025 (returns zeros before then)
- Calculates "needed payout" from previous month: net_pnl * total_penalty
- Calculates "actual payout" given in current month: sum of chunk_emissions_alpha
- Computes "remaining payout" and distributes over remaining days
- Normalizes weights to sum to 1.0

Implementation Details:
- Follows existing scoring.py patterns for weight normalization
- Handles edge cases (empty ledgers, single miner, negative performance)
- Uses efficient O(n) checkpoint filtering by timestamp
- Comprehensive test coverage (6 tests, all passing)
- Includes example demonstrating usage

Files Added/Modified:
- vali_objects/scoring/debt_based_scoring.py: Core algorithm implementation
- tests/vali_tests/test_debt_based_scoring.py: Unit tests
- examples/debt_based_scoring_example.py: Usage example

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Remove unused target_cp_duration_ms parameter from debt-based scoring

The parameter was not used anywhere in the implementation and can be
derived from the debt ledger checkpoints if ever needed.

Changes:
- Removed target_cp_duration_ms from compute_results() signature
- Removed unused setUp() method from tests
- All tests still passing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Update debt-based scoring to use real-time emission rate queries

Major improvements to align with production requirements:

1. Target Day 25 Payout:
   - Changed from "end of month" to "day 25" target
   - Calculates days_until_target instead of days_remaining

2. Real-Time Emission Projection:
   - Query current TAO emission rate from subtensor.metagraph
   - Query current ALPHA-to-TAO conversion rate
   - Project total ALPHA available from now until day 25
   - Warn if sum(remaining_payouts) > projected_emissions

3. Simplified Weight Calculation:
   - Weights are directly proportional to remaining_payout
   - Removed per-day division (unnecessary since all miners share same time window)
   - Normalization ensures weights sum to 1.0

4. New Method: _estimate_alpha_emissions_until_target()
   - Queries metagraph.emission for real-time TAO rate
   - Converts TAO -> ALPHA using current conversion rate
   - Uses 7200 blocks/day estimate (12 sec/block)
   - Provides detailed logging in verbose mode

5. Updated API:
   - Added required parameters: subtensor, netuid, emissions_ledger_manager
   - These enable real-time queries for emission projections

6. Comprehensive Test Coverage:
   - All 7 tests passing
   - New test for emission projection calculation
   - Uses mocks for subtensor and emissions_ledger_manager

Implementation Details:
- Uses most recent data (current block for conversion rates)
- Emission rate: sum(metagraph.emission) gives total TAO/block
- Conversion: TAO_emissions / alpha_to_tao_rate = ALPHA_emissions
- Warning logged if insufficient emissions to meet payouts by day 25

Example Warning Output:
"⚠️  INSUFFICIENT EMISSIONS: Projected ALPHA available until day 25
(450,000) is less than total remaining payout needed (500,000).
Shortage: 10.0%. Miners will receive proportional payouts."

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Fix critical bug in emission projection: metagraph.emission is in TAO not RAO

CRITICAL BUG FIX:
- metagraph.emission values are already in TAO units, not RAO
- Was incorrectly dividing by 1e9, causing projections to be off by 1 billion
- This would have made projections completely incorrect in production

Changes:
- vali_objects/scoring/debt_based_scoring.py: Removed incorrect RAO conversion
- validate_emission_math.py: Fixed emission display logic
- Added comprehensive validation suite

Validation Results (ALL PASSING):
✅ Blocks/day assumption: 7,200 (12 sec/block) - accurate
✅ Projection math: Verified to machine precision
✅ Live emission query: 296.02 TAO/block for subnet 8

Real Data Confirmed:
- Subnet 8 emitting ~2.13M TAO/day, ~64M TAO/month
- 90.9% concentrated in top 10% of neurons
- Projection method now mathematically correct

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Fix emission calculation: metagraph.emission is TAO per tempo, not per block

CRITICAL FIX:
- metagraph.emission values are in TAO (not RAO) per tempo (360 blocks)
- Was incorrectly treating as TAO per block, causing 360x overestimation
- Previous calculation: 296 TAO/block → 2.13M TAO/day (impossible!)
- Corrected calculation: 0.822 TAO/block → 5,920 TAO/day (realistic!)

Changes:
- debt_based_scoring.py: Divide by 360 to convert per-tempo to per-block
- test_debt_based_scoring.py: Update mock to use TAO per tempo (360 per miner)
- validate_emission_math.py: Apply same correction to validation

Validation Results (After Fix):
✅ Subnet 8: 0.822 TAO/block = 5,920 TAO/day
✅ Within total network emission (~7,200 TAO/day)
✅ At $500/TAO ≈ $3M USD/day (not $1B!)

All 7 unit tests passing

* Add aggressive payout strategy for debt-based scoring

Implements front-loaded emission distribution that creates urgency early
in the month while tapering off as we approach the day 25 deadline.

Strategy:
- Days 1-21: Use 4-day buffer (aggressive, creates urgency)
- Days 22-25: Use actual remaining days (tapers off)
- Always respect hard deadline at day 25

Benefits:
- Early month: Projects only 4 days of emissions → shows warnings sooner
- Creates urgency to pay miners throughout the month
- Reduces "last minute rush" behavior near deadline
- Smoother payout distribution

Example Impact (with 50,000 ALPHA remaining payout):
- Day 1: Projects 47,360 ALPHA (4 days) → INSUFFICIENT warning
- Day 10: Projects 47,360 ALPHA (4 days) → INSUFFICIENT warning
- Day 23: Projects 35,520 ALPHA (3 days) → INSUFFICIENT warning
- Day 25: Projects 11,840 ALPHA (1 day) → INSUFFICIENT warning

Changes:
- debt_based_scoring.py: Add AGGRESSIVE_PAYOUT_BUFFER_DAYS = 4
- debt_based_scoring.py: Apply min(actual_days, buffer) logic
- debt_based_scoring.py: Update logging and docstrings
- test_debt_based_scoring.py: Add test_aggressive_payout_strategy
- demo_aggressive_payout.py: Demonstration script

All 8 unit tests passing

* cp

* Add challenge period status tracking and UTC-aligned refresh to penalty ledgers

- Add challenge_period_status field to PenaltyCheckpoint (MAINCOMP/CHALLENGE/PROBATION/PLAGIARISM)
- Integrate challengeperiod_manager for real-time status on latest checkpoints
- Implement 12-hour UTC-aligned refresh (00:00 and 12:00 UTC) for accurate checkpoint data
- Run penalty ledger in dedicated daemon process separate from debt ledger
- Optimize IPC performance: fetch entire active_miners dict once upfront instead of O(n) individual calls
- Improve timing precision with 60-second recalculation loops to hit UTC boundaries within ~1 second

Historical checkpoints use "unknown" status to avoid retroactive data corruption.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* cp

* Enforce minimum weights in debt-based scoring based on challenge period status

Implement tiered minimum "dust" weights for all miners based on their current
challenge period status to ensure fair baseline compensation:

Changes:
- Add _apply_minimum_weights() method to enforce status-based minimums
- CHALLENGE/PLAGIARISM/UNKNOWN: 1x dust (1.2e-05)
- PROBATION: 2x dust (2.4e-05)
- MAINCOMP: 3x dust (3.6e-05)
- Apply max(debt_weight, minimum_weight) before normalization
- Filter checkpoints to only count MAINCOMP/PROBATION for earnings
- Update algorithm documentation to reflect new flow

This ensures all miners receive baseline emissions proportional to their status
while still rewarding performance-based debt payouts.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Reuse Scoring.normalize_scores instead of redefining in DebtBasedScoring

Remove duplicate _normalize_scores method from DebtBasedScoring class and use
the existing normalize_scores method from the Scoring class to avoid code
duplication and maintain consistency.

Changes:
- Import Scoring class
- Replace DebtBasedScoring._normalize_scores() with Scoring.normalize_scores()
- Remove _normalize_scores method from DebtBasedScoring

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Update debt-based scoring: Dec 2025 activation and burn address logic

Implement updated debt-based scoring requirements:

1. Activation Date:
   - Changed from November 2025 to December 2025
   - Nominal payouts begin Dec 1, 2025

2. Pre-Activation Behavior:
   - Apply minimum dust weights only (no debt-based earnings)
   - Excess weight goes to burn address (uid 229)
   - Clarified that "zero weights" actually means dust weights

3. Burn Address Logic (uid 229):
   - If sum of weights < 1.0: assign (1.0 - sum) to burn address
   - If sum of weights >= 1.0: normalize to 1.0, burn address gets 0
   - Weights always sum to exactly 1.0

Changes:
- Add BURN_UID constant (229)
- Add metagraph parameter to compute_results()
- Add _get_burn_address_hotkey() to resolve uid 229 hotkey
- Add _normalize_with_burn_address() for special normalization
- Add _apply_pre_activation_weights() for pre-Dec 2025 period
- Update all docstrings and algorithm documentation

This ensures proper weight distribution with burn address handling for
periods when miner weights don't sum to 1.0.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Add is_testnet parameter for network-specific burn address UIDs

Add is_testnet parameter to DebtBasedScoring to support different burn
address UIDs for mainnet (uid 229) vs testnet (uid 5).

Changes:
- Add BURN_UID_MAINNET (229) and BURN_UID_TESTNET (5) constants
- Add get_burn_uid(is_testnet) helper method
- Add is_testnet parameter to compute_results()
- Thread is_testnet through all methods that need burn UID:
  - _normalize_with_burn_address()
  - _get_burn_address_hotkey()
  - _apply_pre_activation_weights()
- Update all docstrings to document is_testnet parameter
- Update algorithm documentation to mention both UIDs

This allows validators to pass self.is_mainnet to use the correct burn
address UID based on whether they're running on mainnet (netuid 8) or
testnet (netuid 116).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Replace Scoring.compute_results_checkpoint with DebtBasedScoring in weight setter

Update SubtensorWeightSetter to use debt-based scoring instead of legacy
checkpoint-based scoring. This enables the new debt payout system with
burn address support and challenge period status-based minimum weights.

Changes:
1. Add DebtBasedScoring import
2. Add new dependencies to __init__:
   - debt_ledger_manager: Access to debt ledgers
   - subtensor: For querying metagraph/emissions
   - netuid: Network identifier
   - emissions_ledger_manager: For ALPHA-to-TAO rate
   - is_mainnet: Network type for burn UID selection

3. Refactor _compute_miner_weights():
   - Use DebtBasedScoring.compute_results() instead of Scoring.compute_results_checkpoint()
   - Filter debt ledgers instead of perf ledgers
   - Pass is_testnet parameter (not is_mainnet) for burn address selection
   - Remove challenge period special handling (now built into DebtBasedScoring)

4. Simplify compute_weights_default():
   - Compute weights for ALL miners in single call
   - DebtBasedScoring automatically applies:
     * Debt-based weights for MAINCOMP/PROBATION
     * Minimum dust weights for CHALLENGE/PLAGIARISM/UNKNOWN
     * Burn address assignment for excess weight
   - Remove separate challengeperiod_weights computation
   - Remove Scoring.score_testing_miners() call (obsolete)

Benefits:
- Unified weight computation for all miner types
- Automatic burn address handling
- Challenge period status reflected in weights
- Debt-based payout system enabled
- Simplified weight computation logic

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* 30 day lookup for emissions

* Add 5-minute retry delay when debt_ledger_manager is unavailable

Fix busy loop issue where weight setter was retrying every second when
debt_ledger_manager is not available during startup.

Changes:
- Add check for debt_ledger_manager availability in run_update_loop()
- Sleep for 5 minutes (300 seconds) when debt_ledger_manager is None
- Log warning message before sleeping to inform about retry delay
- Prevents log spam and reduces CPU usage during startup

This ensures the weight setter process waits gracefully for debt ledger
manager to become available instead of busy looping with failed attempts.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Add first-boot check to wait for challengeperiod_manager data

Fix issue where penalty ledgers built on first boot could have all UNKNOWN
status if challengeperiod_manager.active_miners not yet populated.

Changes:
- Check if active_miners is empty on first boot (no existing penalty ledgers)
- Wait 5 minutes for challengeperiod_manager to populate before building
- Log warning message to explain the delay
- Prevents creating penalty checkpoints with incorrect UNKNOWN status

This ensures debt-based scoring has accurate challenge period status from
the start instead of waiting 12 hours for next refresh.

* Revert "Add first-boot check to wait for challengeperiod_manager data"

This reverts commit f5f421222489d4a6cb25f7835fd8da35d5137047.

* Add per-miner progress logging to penalty ledger building

Add progress tracking that logs one line per miner iteration showing:
- Current progress [X/Y miners]
- Miner hotkey
- New checkpoints added this iteration
- Total checkpoints in ledger

This provides visibility during long first-boot builds where all historical
checkpoints are processed (potentially hundreds per miner).

Example output:
[1/100] 5DZu...: +247 new checkpoints (total: 247)
[2/100] 5FHn...: +189 new checkpoints (total: 189)
[3/100] 5CiP...: +0 new checkpoints (total: 342)

* Add greppable logging prefix and optimize progress logs

Changes:
- Add [PENALTY_LEDGER] prefix to all info/warning logs for easy grepping
- Only log progress for first-time-seen hotkeys (new penalty ledgers)
- Track is_first_time_seen flag to avoid logging on delta updates

This reduces log spam during normal 12-hour delta updates while still
providing visibility during initial first-boot builds.

Example logs:
[PENALTY_LEDGER] Building penalty ledgers for 100 hotkeys (delta update)
[PENALTY_LEDGER] [1/100] 5DZu...: +247 new checkpoints (total: 247)
[PENALTY_LEDGER] Built penalty ledgers: 5 hotkeys processed, 847 new checkpoints added

* Prevent race conditions in debt ledger building

Build candidate ledgers first, then atomically swap into production to prevent
race conditions where ledgers momentarily disappear during the build process.

Changes:
- Build into candidate_ledgers dict instead of directly into self.debt_ledgers
- For delta updates: copy existing ledgers into candidates first
- For full rebuilds: start with empty candidates
- After successful build, atomically swap candidates into self.debt_ledgers
- Handle IPC-managed dicts by clearing and updating keys individually
- Save to disk only AFTER atomic swap completes
- Remove incremental saves during build loop

This ensures:
- Existing ledgers remain accessible during the entire build process
- No partial/corrupted state visible to readers
- Atomic transition from old to new ledger state
- Build failures don't corrupt existing ledgers

* Update debt-based scoring tests for recent changes

Updated tests to reflect recent implementation changes:

1. Challenge period status integration
   - All DebtCheckpoint creations now include challenge_period_status
   - Added test for only counting MAINCOMP/PROBATION earning periods

2. Updated activation date
   - Changed from November to December 2025
   - Updated test dates and comments accordingly

3. Pre-activation behavior
   - Changed from all-zero weights to dust weights + burn address
   - Updated test_before_activation_date to verify dust weights by status

4. Minimum weights enforcement
   - New test_minimum_weights_by_status verifies status-based dust weights:
     - CHALLENGE/PLAGIARISM/UNKNOWN: 1x dust
     - PROBATION: 2x dust
     - MAINCOMP: 3x dust

5. Burn address logic
   - New test_burn_address_mainnet verifies uid 229 on mainnet
   - New test_burn_address_testnet verifies uid 5 on testnet
   - Tests verify burn address receives excess when sum < 1.0
   - Tests verify total weights sum to 1.0

6. New parameters
   - All compute_results calls include metagraph parameter
   - All compute_results calls include is_testnet parameter
   - Mock metagraph includes hotkeys list for burn address testing

All tests pass syntax check.

* Add first-boot optimization for penalty ledger manager

When penalty ledgers don't exist on first boot, immediately build them
instead of waiting up to 12 hours for the next UTC-aligned window.

Changes:
- Check if penalty_ledgers is empty after initial delay
- If empty, perform immediate full rebuild
- Log warning and completion time
- Resume normal UTC-aligned schedule after initial build

This fixes critical UX issue where validators couldn't set weights for
up to 12 hours after first boot.

Also added comprehensive iterative payout test that simulates 25 days
of weight setting and verifies:
- Proportional distribution maintained
- Weights decrease over time
- Early aggressive payout strategy
- Weights approach minimum by day 25

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* remove logs

* Fix race condition in debt ledger atomic swap

The previous implementation used clear() which left the IPC dict completely
empty between clear and re-adding keys. Other processes reading during this
window would see zero debt ledgers, breaking weight setting.

Fixed by:
1. Delete obsolete keys first (old keys not in new candidates)
2. Then add/update all new keys

This ensures the dict is never empty - it always contains at least the
keys being kept during the swap operation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Fix emissions ledger to only do delta updates (never full rebuilds)

- Removed self.emissions_ledgers.clear() from build_delta_update()
- Start from last_computed_chunk_end_ms instead of lookback period
- Changed log from "Rebuilding ALL" to "Delta update from X to Y"
- Preserves all historical emissions data (architectural requirement)
- Emissions ledgers should NEVER be cleared, only appended to

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Change debt ledgers to ALWAYS do full rebuilds (never delta updates)

Architectural requirement:
- Emissions ledgers: ONLY delta updates (append-only historical record)
- Penalty ledgers: Full rebuilds (derived from performance, can change retroactively)
- Debt ledgers: Full rebuilds (combines emissions + penalties + perf)

Changes:
- Changed build_debt_ledgers() call from delta_update=True to delta_update=False
- Updated daemon docstring to reflect full rebuild behavior
- Updated log messages to say "Full Rebuild Mode" instead of "Delta Update Mode"
- Added comment explaining why debt ledgers must always rebuild

Debt ledgers are derived from three sources (emissions, penalties, performance).
Since penalties can change retroactively when performance data updates,
debt ledgers must be fully rebuilt to reflect the latest state of all inputs.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* pass dlm to weight setter

* Pass required debt-based scoring parameters to SubtensorWeightSetter

Fix for warning: "debt_ledger_manager not available for scoring"

SubtensorWeightSetter requires these parameters for debt-based scoring:
- subtensor: Get from metagraph_updater.get_subtensor()
- netuid: Get from config.netuid
- emissions_ledger_manager: Get from debt_ledger_manager.emissions_ledger_manager
- is_mainnet: Get from self.is_mainnet

These are needed by DebtBasedScoring.compute_results() to calculate TAO emissions.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* rpc metagraph

* ⏺ Improve IPC substrate reserve refresh with fail-fast and thread-safe synchronization

  Refactored substrate reserve handling in IPC metagraph to improve reliability,
  efficiency, and thread-safety:

  1. Fail-fast error handling: Removed try-except wrapper in refresh_substrate_reserves()
     to propagate exceptions to slack alert mechanism for immediate notification

  2. Fixed CPU waste: Replaced 5-second polling loop in weight processor with blocking
     queue.get(timeout=30) - eliminates 60 unnecessary wake-ups per weight request

  3. Added zero-alpha validation: Added alpha_reserve_rao == 0 check consistent with
     emissions_ledger.py::query_alpha_to_tao_rate to prevent division-by-zero errors

  4. Thread-safe float synchronization: Changed tao_reserve_rao and alpha_reserve_rao
     from plain Namespace attributes to manager.Value('d', 0.0) for atomic read/write
     operations with internal locking

  5. Updated all access points to use .value accessor for manager.Value() objects

  Files modified:
  - shared_objects/sn8_multiprocessing.py - Use manager.Value() for reserves
  - shared_objects/metagraph_updater.py - Fail-fast refresh, blocking queue, .value writes
  - vali_objects/scoring/debt_based_scoring.py - Safe .value reads with null checks

* debug log

* fix ipc passing

* ⏺ Simplify reserve refresh to use metagraph.pool and fix emissions constant

  - Replace custom substrate queries with metagraph.pool.tao_in/alpha_in
  - Convert pool token values to RAO (multiply by 1e9) for IPC storage
  - Maintain individual tao_reserve_rao/alpha_reserve_rao fields interface
  - Fix emissions ledger error: use TARGET_CHECKPOINT_DURATION_MS constant
  - Verified identical values: 0 RAO difference between methods

  Benefits:
  - Simpler code using standard Bittensor API
  - Fewer potential failure points
  - Same data accuracy (verified with test_reserve_comparison.py)

* cp

* Add comprehensive test coverage for dynamic dust weight system

  Implement 7 new unit tests validating:
  - Backward compatibility (disabled by default)
  - Within-bucket scaling based on 30-day PnL
  - Cross-bucket hierarchy preservation
  - Edge cases (zero/negative PnL handling)
  - 30-day lookback window behavior
  - Penalty integration with PnL calculation

  All 21 tests passing. Feature ready for testnet deployment.

* Fix critical debt-based scoring bug: sum all checkpoints for monthly payouts

  Critical bug fix:
  - MAJOR: Fixed scoring algorithm that only used last checkpoint instead of summing
    all checkpoints in previous month, severely underpaying miners
  - Changed from last_checkpoint.net_pnl to sum(cp.net_pnl for cp in checkpoints)
  - Affects both main calculation and _calculate_penalty_adjusted_pnl() helper

  Documentation fixes:
  - Fixed misleading docs stating pnl_gain/pnl_loss were cumulative
  - Clarified these fields are per-checkpoint values, not cumulative
  - Updated docstrings and inline comments in debt_ledger.py

  Configuration fixes:
  - Changed UNKNOWN status from 1x dust to 0x dust (should get 0 weight)
  - Burn address is handled separately by UID, not by status

  Test fixes:
  - Fixed 3 dynamic dust tests that broke after scoring bug fix
  - Moved test checkpoints to avoid interference between dynamic dust lookback
    and previous month payout calculations
  - All 21 tests now passing

  Impact:
  This bug would have paid miners only for their last 12-hour period instead of
  the full month's performance. This fix ensures correct proportional payouts
  based on complete monthly performance data.

* Fix critical USD unit mismatch in debt-based scoring

  PROBLEM:
  Debt-based scoring was comparing incompatible units:
  - "needed payout" calculated from net_pnl (in USD)
  - "actual payout" summed from chunk_emissions_alpha (in ALPHA tokens)
  This caused incorrect remaining payout calculations and weight distributions.

  SOLUTION:
  Convert all payout tracking to USD for consistent accounting:

  1. debt_based_scoring.py:
     - Changed actual_payout to use chunk_emissions_usd instead of chunk_emissions_alpha
     - Renamed all variables to include _usd suffix for clarity (needed_payout_usd, actual_payout_usd, remaining_payout_usd)
     - Added _convert_alpha_to_usd() helper with strict validation (raises exception if TAO/USD price unavailable)
     - Updated module docstring to reflect USD-based accounting throughout

  2. metagraph_updater.py:
     - Added live_price_fetcher parameter to __init__
     - Implemented refresh_tao_usd_price() method:
       * Queries TAO/USD price via live_price_fetcher.get_close_at_date(TradePair.TAOUSD)
       * Stores price in metagraph.tao_to_usd_rate (IPC-shared)
       * Non-blocking: logs warnings and continues with stale price on failure
     - Integrated price refresh into update_metagraph() cycle

  3. validator.py:
     - Pass live_price_fetcher to MetagraphUpdater constructor

  4. test_debt_based_scoring.py:
     - Updated mocks to include tao_to_usd_rate = 500.0
     - Fixed test_iterative_payouts_approach_target_by_day_25 to set both chunk_emissions_alpha and chunk_emissions_usd

  BEHAVIOR:
  - Strict validation in scoring: raises exception if TAO/USD price unavailable
  - Graceful degradation in updates: price refresh failures don't block metagraph updates
  - All weight calculations now use consistent USD-based accounting

  TESTING:
  All 21 debt-based scoring tests passing

* ⏺ Remove use_dynamic_dust flag - dynamic dust now always enabled

  RATIONALE:
  Dynamic dust weights have been tested and validated. Simplify the codebase
  by removing the feature flag and making dynamic dust the default behavior.

  CHANGES:

  1. debt_based_scoring.py:
     - Removed use_dynamic_dust parameter from compute_results()
     - Removed use_dynamic_dust parameter from _apply_minimum_weights()
     - Removed use_dynamic_dust parameter from _apply_pre_activation_weights()
     - Simplified _apply_minimum_weights() to always calculate dynamic dust weights
     - Updated all docstrings to reflect dynamic dust is always enabled

  2. test_debt_based_scoring.py:
     - Removed all use_dynamic_dust=True parameters from test calls
     - Renamed test_dynamic_dust_disabled_by_default → test_dynamic_dust_enabled_by_default
     - Updated test descriptions

  BEHAVIOR:
  Before: Dynamic dust was opt-in via use_dynamic_dust=True parameter
  After: Dynamic dust is always active - miners automatically receive performance-
         scaled weights within their bucket (floor to floor+1 DUST range based on
         30-day penalty-adjusted PnL)

  TESTING:
  All 21 debt-based scoring tests passing

  This simplifies the API and removes a conditional code path, making the
  system more predictable and easier to maintain.

* PROBLEM:
  Delta update mode was using the MINIMUM timestamp across all debt ledgers as the
  resume point. This meant newly registered miners (with only 1-2 checkpoints) would
  become the reference, causing all other miners to lose their historical checkpoints.

  Example:
  - Established miner: 500 checkpoints, last at timestamp T+500
  - New miner: 2 checkpoints, last at timestamp T+2
  - OLD logic: min(T+500, T+2) = T+2 ← would skip T+3 to T+500 for everyone!
  - Result: 498 checkpoints of history lost for established miner

  SOLUTION:
  Changed delta update to use the ledger with the MOST checkpoints (longest history)
  as the reference point. This preserves full history for all miners.

  CHANGES:

  1. debt_ledger.py build_debt_ledgers():
     - Changed from finding minimum timestamp to finding maximum checkpoint count
     - Reference ledger = ledger with most checkpoints
     - Use that ledger's last timestamp as resume point

  2. Added sanity check assertion:
     - Validates reference ledger (most checkpoints) has the maximum timestamp
     - Prevents silent history truncation if reference ledger is stale
     - Raises clear error if data corruption detected

  3. Improved logging:
     - Now shows which ledger was chosen and why
     - Example: "Delta update mode: resuming from 2025-01-15 (reference ledger
       with 500 checkpoints)"

  BEHAVIOR:
  Before: Newly registered miners would truncate history for everyone
  After: Longest-running miner sets the baseline, preserving full history

  This prevents catastrophic data loss in production where validators continuously
  receive new miner registrations.

* Refactor coldkey storage to live in EmissionsLedger instead of separate dict

  Move coldkey from EmissionsLedgerManager.hotkey_to_coldkey dict into
  EmissionsLedger.coldkey attribute for single source of truth and
  automatic persistence.

  Changes:
  - Add required coldkey parameter to EmissionsLedger.__init__()
  - Update EmissionsLedger.to_dict() to persist coldkey
  - Remove hotkey_to_coldkey dict from EmissionsLedgerManager
  - Add _get_coldkey_for_hotkey() method with ledger-first lookup
  - Query substrate only when coldkey not cached in ledger
  - Update ledger with queried coldkey for future persistence
  - Remove hotkey_to_coldkey from save_to_disk/load_data_from_disk

  This is a clean break with no backward compatibility - existing cached
  data will need to query substrate once to populate coldkey in ledgers.

  The code compiles successfully and the refactoring is complete!

* fix

* 1. Overlap Detection Feature ✓

  Implementation:
  - Added detect_and_delete_overlapping_positions() method to validator_sync_base.py:804
  - Added _find_overlapping_positions_via_merge() helper method using interval merging algorithm (O(n log n))
  - Integrated into sync_positions() method at validator_sync_base.py:99-104

  Key Features:
  - Analyzes positions per-hotkey, per-trade-pair
  - Uses efficient interval merging to detect all overlapping time ranges
  - Handles both open positions (using current_time as end) and closed positions
  - Deletes all positions involved in any overlap
  - Comprehensive logging and statistics

  Unit Tests: Added 10 comprehensive tests in test_auto_sync.py:3027-3274:
  - No overlaps (single position, non-overlapping positions)
  - Two positions with clear overlap
  - Multiple overlapping positions (chain detection)
  - Open positions with current_time
  - Boundary conditions (touching but not overlapping)
  - Different trade pairs handled separately
  - Full integration test with disk deletion
  - Multiple trade pairs overlap detection
  - Unsorted input handling
  - Empty positions list edge case

  2. Removed dTAO Registration Blocks Feature ✓

  Changes in subtensor_weight_setter.py:
  - Removed target_dtao_block_zero_incentive_start and target_dtao_block_zero_incentive_end variables
  - Removed block_reg_failures set
  - Removed call to handle_block_reg_failures()
  - Removed log message: "Miners with registration blocks outside of permissible dTAO blocks"
  - Deleted entire handle_block_reg_failures() method (lines 178-193)

  The log message will no longer appear in your validator output.

* meta inc and fix testnet owner uid

* cleanup overlap detection

* cp

* Fix critical bug: eliminated miners receiving emissions

  PROBLEM:
  ChallengePeriodManager.remove_eliminated() was failing to remove eliminated
  miners from active_miners dict. When called without arguments, it converted
  None to [], causing _remove_eliminated_from_memory() to use the wrong code
  path (checking eliminations list instead of querying elimination_manager).

  This caused eliminated miners to:
  - Remain in active_miners dict
  - Be included in weight calculations via get_hotkeys_by_bucket()
  - Receive dust weights from DebtBasedScoring
  - Continue earning emissions despite being eliminated

  SOLUTION:
  - Remove the `eliminations = []` assignment in remove_eliminated()
  - Pass eliminations parameter directly to _remove_eliminated_from_memory()
  - Let the inner function properly handle None by querying elimination_manager

  TESTING:
  - Migrated elimination tests from mocks to production code paths
  - Added required metagraph attributes for DebtBasedScoring (emission, reserves, tao_to_usd_rate)
  - Updated test timestamps to post-activation (Jan 2026)
  - All 7 elimination tests now pass with production code

  Files changed:
  - vali_objects/utils/challengeperiod_manager.py (bug fix)
  - tests/vali_tests/test_elimination_weight_calculation.py (production code migration)
  - tests/vali_tests/test_elimination_integration.py (production code migration)
  - tests/vali_tests/mock_utils.py (added metagraph attributes)

* fix test

---------

Co-authored-by: Derek Awender <[email protected]>
Co-authored-by: sli-tao <[email protected]>
Co-authored-by: ward-taoshi <[email protected]>
Co-authored-by: Claude <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants