Rebuild TiDE model data/model training pipeline by forstmeier · Pull Request #821 · oscmcompany/fund

forstmeier · 2026-04-13T19:08:31Z

Overview

Changes

Renamed tide_data.py → data.py and tide_model.py → model.py for consistency
Rebuilt Data preprocessing pipeline with named stages (ValidateColumns, EngineerFeatures, CleanData, ScaleAndEncode) and a Pipeline orchestrator
Fixed CleanData to collect stats on the original dataframe before filtering, and use pl.all_horizontal() for a single-pass filter; corrected is_infinite() vs ~is_finite() distinction
Fixed filter_equity_bars in tasks.py to exclude all lowercase-suffix tickers (warrants, rights, preferred shares) using str.contains("[a-z]") instead of only filtering p
Fixed consolidate_data in tasks.py to filter rows with null sector or industry after the join, resolving pandera SchemaError on non-nullable columns
Added explicit GPU device logging (Device.DEFAULT) before training loop in model.py
Pre-load full dataset as GPU tensors once before the epoch loop to eliminate per-step host-to-device transfers; fixed batch indexing with .tolist() for tinygrad compatibility
Updated trainer.py default configuration: early_stopping_patience 10 → 3, batch_size 256 → 512, learning_rate 0.0005 → 0.001
Removed run.py and test_run.py; replaced with deploy.py and trainer.py
Added tasks.py with S3 read/write and data consolidation functions, fully tested
Added tools/src/tools/build_work_pool_template.py for ECS work pool configuration generation
Updated maskfile.md model commands and training infrastructure section
Added infrastructure/training.py with ECS GPU cluster and TiDE trainer task definition

Context

This branch rebuilds the TiDE model training pipeline after migrating the model code into models/tide/. Key fixes address pandera schema validation failures observed in production (lowercase ticker suffixes and null sector/industry values post-join). Training loop performance is improved by pre-loading dataset tensors onto the GPU once per training run rather than per batch step. Early stopping and batch size defaults are tuned to reduce wall-clock training time on the g4dn.xlarge ECS instance.

Summary by CodeRabbit

Release Notes

New Features
- Added monitoring capabilities via Prometheus client integration
- Introduced ECS-based distributed training infrastructure with GPU support
Improvements
- Consolidated Docker image build and push into single operation
- Refactored training system to use dataset-based processing for improved efficiency
- Enhanced data validation with improved ticker filtering and schema enforcement
- Extended model training timeout for complex workloads
- Updated Prefect deployment scheduling configuration
Bug Fixes
- Improved data cleaning logic for handling invalid values across multiple columns
- Added nullable field validation in data consolidation
Documentation
- Updated documentation to reference current source file paths

coderabbitai · 2026-04-13T19:08:44Z

Warning

Rate limit exceeded

@forstmeier has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 38 minutes and 47 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 38 minutes and 47 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: e268805d-9903-497b-afea-755dc3c9e18f

📥 Commits

Reviewing files that changed from the base of the PR and between 9c1e8b8 and 7df676e.

📒 Files selected for processing (10)

applications/data_manager/src/storage.rs
applications/data_manager/tests/test_storage.rs
applications/ensemble_manager/src/ensemble_manager/server.py
maskfile.md
models/tide/src/tide/model.py
models/tide/src/tide/tasks.py
models/tide/src/tide/trainer.py
models/tide/tests/test_tasks.py
tools/src/tools/build_work_pool_template.py
tools/tests/test_build_work_pool_template.py

📝 Walkthrough

Walkthrough

This PR refactors the Tide model training pipeline by reorganizing module structure (tide_model/tide_data → model/data), replacing batch-list-based training with a dataset-centric TrainingDataset API, removing the is_holiday feature, adding ECS task definitions for remote training, consolidating CI image build/push steps, and updating Prefect deployment to cloud-based run invocation with work-pool provisioning via infrastructure outputs.

Changes

Cohort / File(s)	Summary
Documentation & Skills `.claude/skills/autotrain/SKILL.md`	Updated file references from `tide_model.py`/`tide_data.py` to `models/tide/src/tide/model.py`/`models/tide/src/tide/data.py`.
CI Workflow & Infrastructure Deployment `.github/workflows/launch_infrastructure.yaml`, `maskfile.md`	Consolidated image build and push into single step (`build-and-push`); changed deployment from image-specific to service update command; updated Prefect trainer section to provision ECS work pool via Pulumi outputs and trigger Prefect cloud deployment runs.
Storage & Validation `applications/data_manager/src/storage.rs`, `applications/data_manager/tests/test_storage.rs`	Increased DuckDB config value length limit from 512 to 4096 characters; updated validation tests accordingly.
Dependency Management `applications/ensemble_manager/pyproject.toml`, `applications/portfolio_manager/pyproject.toml`	Added `prometheus-client>=0.21.0` to both projects.
Ensemble Manager `applications/ensemble_manager/src/ensemble_manager/server.py`	Updated tide imports to new paths (`tide.data`/`tide.model`); refactored prediction flow to use dataset-based inference with single batch construction instead of per-batch iteration; updated empty-input handling and metrics reporting.
Tide Core Model API `models/tide/src/tide/model.py`	Major refactor: replaced batch-list methods with `TrainingDataset`-based API; changed training loop from batch iteration to shuffled sample indices; updated validation from `validate(batches)` to `validate_model(dataset)`; adjusted default `output_length` from 7 to 5; added device logging and early-stopping logic with optional validation dataset.
Tide Data Pipeline `models/tide/src/tide/data.py`	Replaced `get_batches()` with `get_dataset()` returning `TrainingDataset`; removed `is_holiday` feature and `ExpandDateRange`/`FillNulls` stages; consolidated continuous column cleaning into single `pl.all_horizontal()` filter; updated empty-dataset behavior to return correctly-shaped zero arrays; gated `Tensor` import behind `TYPE_CHECKING`.
Tide Training & Tasks `models/tide/src/tide/trainer.py`, `models/tide/src/tide/tasks.py`	Updated trainer to use `get_dataset()` and create separate validation dataset; added schema validation with equity-bars column-type enforcement; added lowercase ticker filtering and null sector/industry filtering; adjusted hyperparameter defaults and batch-size handling.
Tide Deployment & Workflow `models/tide/src/tide/deploy.py`, `models/tide/src/tide/workflow.py`	Updated deployment to accept image argument and use `Schedule(cron=..., timezone=...)` instead of separate fields; changed tags and added job variables; extended workflow task timeout from 3600 to 14400 seconds; added `FUND_TIDE_IMAGE_URI` environment variable validation.
Tide Removed Module `models/tide/src/tide/run.py`	Deleted entire module including `run_training_job` function and its CLI entrypoint.
Docker & Infrastructure `models/tide/Dockerfile`, `infrastructure/__main__.py`, `infrastructure/training.py`	Added venv bin directory to PATH in Dockerfile; exported new ECS networking/task definition outputs from infrastructure; added `tide_trainer_task_definition` ECS resource with GPU support and CloudWatch logging.
Work Pool Template Building `tools/src/tools/build_work_pool_template.py`, `tools/tests/test_build_work_pool_template.py`	Added new module to mutate Prefect ECS work-pool base job template with cluster/network/task-definition configuration and GPU/logging resource requirements.
Prefect Configuration `prefect.yaml`	Removed `tide-trainer-remote` deployment configuration.
Test Suite Updates `models/tide/tests/test_data.py`, `models/tide/tests/test_deploy.py`, `models/tide/tests/test_model.py`, `models/tide/tests/test_tasks.py`	Updated tests to use `TrainingDataset` and new `get_dataset()` API; adjusted schema assertions; verified Prefect schedule object configuration; expanded equity-bar filtering and schema validation coverage.
Test Removed Module `models/tide/tests/test_run.py`	Deleted entire test suite for `run_training_job`.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

Possibly related issues

Improve model training pipeline and data handling #691: The refactors to models/tide's data/model/trainer modules—removing preprocessing stages, switching from batch lists to TrainingDataset, and fixing validation logic—directly address the same code-level concerns in this issue.

Possibly related PRs

Replace SageMaker with Prefect for training pipeline #775: Continues and expands the same Prefect-based training migration and tide model/data API refactor (renaming tide_model/tide_data → model/data, replacing get_batches() with get_dataset()/TrainingDataset, updating trainer/deploy/workflow).
Migrate to Prefect Cloud with optimized TiDE model #796: Directly overlaps with identical API refactors across the same tide modules (data.py, model.py, trainer.py, workflow.py, deploy.py) and matching TrainingDataset/validation signature changes.
Cleanup ECR resources and Docker image handling #813: Shares overlapping image/repository naming and CI/workflow/Dockerfile changes (renaming runner references and updating ECR exports/infrastructure outputs).

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 19.51% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: rebuilding the TiDE model data and model training pipeline, which encompasses all major changes including file renames, pipeline restructuring, training improvements, and infrastructure updates.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch model-data-rebuild

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

Rebuilds the TiDE model data + training pipeline (now under models/tide/) and updates the surrounding Prefect/ECS deployment tooling, while addressing production schema-validation issues and improving GPU training performance.

Changes:

Refactors TiDE preprocessing into a staged Pipeline and introduces a TrainingDataset array-based interface to feed training/prediction.
Updates TiDE training loop to pre-load tensors onto the compute device once per run, adds validation-dataset support, and tunes trainer defaults.
Adds/updates infrastructure + tooling for ECS GPU training (task definition, Prefect work pool template generator, Mask/GitHub Actions updates) and fixes S3 consolidation edge cases.

Reviewed changes

Copilot reviewed 26 out of 27 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`uv.lock`	Adds `prometheus-client` to locked deps.
`tools/tests/test_build_work_pool_template.py`	Adds test coverage for ECS work pool template generation.
`tools/src/tools/build_work_pool_template.py`	New utility to generate/patch a Prefect ECS GPU work pool base job template.
`prefect.yaml`	Removes the remote TiDE deployment entry (local deployment remains).
`models/tide/tests/test_tasks.py`	Expands tests for S3 reads, ticker filtering, and consolidation null-handling.
`models/tide/tests/test_run.py`	Removes tests for deleted `tide.run`.
`models/tide/tests/test_model.py`	Updates tests for renamed modules + dataset-based training/validation APIs.
`models/tide/tests/test_deploy.py`	Updates deploy tests to use `Schedule` + image/build options.
`models/tide/tests/test_data.py`	Updates tests for staged pipeline + `get_dataset` output and new dimensions.
`models/tide/src/tide/workflow.py`	Increases model training task timeout to 4 hours.
`models/tide/src/tide/trainer.py`	Switches to dataset-based training + new defaults (batch size, LR, patience, output length).
`models/tide/src/tide/tasks.py`	Adds schema validation + improves filtering/join consolidation logic + column normalization.
`models/tide/src/tide/run.py`	Removes old CLI entrypoint wrapper.
`models/tide/src/tide/model.py`	Migrates to dataset-based batching, adds device logging, GPU preloading, and validation-dataset early stopping.
`models/tide/src/tide/deploy.py`	Updates deployment registration to supply image + schedule + job variables and enforce image env var.
`models/tide/src/tide/data.py`	Implements staged preprocessing pipeline + `TrainingDataset` and replaces batch list generation.
`models/tide/Dockerfile`	Adds venv bin directory to `PATH` in the runtime image.
`maskfile.md`	Updates image build workflow (build-and-push), adds Prefect work pool template flow, updates trainer commands.
`infrastructure/training.py`	Adds ECS GPU task definition for the TiDE trainer.
`infrastructure/__main__.py`	Exports ECS network IDs + TiDE trainer task definition ARN for tooling.
`applications/portfolio_manager/pyproject.toml`	Adds `prometheus-client` dependency (metrics support).
`applications/ensemble_manager/src/ensemble_manager/server.py`	Updates TiDE usage to dataset-based prediction path and single-call predict flow.
`applications/ensemble_manager/pyproject.toml`	Adds `prometheus-client` dependency (metrics support).
`applications/data_manager/tests/test_storage.rs`	Updates tests for increased DuckDB config length limit.
`applications/data_manager/src/storage.rs`	Raises DuckDB config value max length to 4096 (ECS session token support).
`.github/workflows/launch_infrastructure.yaml`	Switches CI to build-and-push and uses `service update` for deployments.
`.claude/skills/autotrain/SKILL.md`	Updates documentation references to renamed TiDE files.

Comments suppressed due to low confidence (2)

models/tide/src/tide/model.py:321

Model.train computes total_batches = (num_samples + batch_size - 1) // batch_size but does not validate batch_size > 0. A zero/negative batch size would raise a ZeroDivisionError (or behave unexpectedly). Add an explicit check for a positive batch size (and mirror it in validate_model).
models/tide/src/tide/model.py:396
Calling Tensor.realize(*get_parameters(self)) on every optimization step forces parameter realization/synchronization each batch and can significantly reduce the benefit of the new GPU preloading (and slow training). If this is not strictly required for correctness with tinygrad, prefer removing it or realizing less frequently (e.g., per log interval / epoch).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

greptile-apps · 2026-04-13T19:13:40Z

Greptile Summary

This PR rebuilds the TiDE model training pipeline: renames source files, introduces named pipeline stages and a TrainingDataset dataclass, pre-loads the full dataset as GPU tensors once per run to eliminate per-step host-to-device transfers, and fixes two production pandera failures (lowercase-suffix ticker filtering and null sector/industry rows post-join).

Confidence Score: 5/5

Safe to merge; both remaining findings are P2 defensive-coding suggestions with no impact on current production behavior.

Production bug fixes (ticker filtering, null sector/industry) are correct and well-tested. The GPU pre-loading pattern is sound. Both inline comments are P2 style/hardening suggestions that do not block merge.

models/tide/src/tide/model.py (Tensor.training guard), models/tide/src/tide/trainer.py (error string match)

Important Files Changed

Filename	Overview
models/tide/src/tide/model.py	Rebuilt training loop: GPU dataset pre-loading, dataset-based API replacing batch lists, added validation dataset early-stopping support; Tensor.training state may not be restored if GPU pre-loading raises
models/tide/src/tide/data.py	Named pipeline stages (ValidateColumns, EngineerFeatures, CleanData, ScaleAndEncode), new TrainingDataset dataclass, CleanData correctly uses is_infinite() for stats and pl.all_horizontal() for single-pass filtering
models/tide/src/tide/tasks.py	Fixed ticker filtering to exclude all lowercase-suffix instruments via regex [a-z]; consolidate_data now filters null sector/industry rows post-join; added per-file column type normalization on S3 reads
models/tide/src/tide/trainer.py	Updated hyperparameters (batch_size 512, patience 3, epochs 20), creates separate train/validation datasets, brittle error-message string match for validation-set fallback
infrastructure/training.py	Adds tide_trainer_task_definition ECS task with GPU resource requirement, awsvpc networking, and CloudWatch log configuration
applications/ensemble_manager/src/ensemble_manager/server.py	Migrated from per-batch prediction loop to single-batch dataset-based prediction; updated imports from renamed data/model modules
tools/src/tools/build_work_pool_template.py	New utility to populate Prefect ECS work pool template with GPU resource requirements, network config, and CloudWatch logging for the TiDE trainer

Prompt To Fix All With AI

This is a comment left during a code review.
Path: models/tide/src/tide/model.py
Line: 337-344

Comment:
**`Tensor.training` not restored if GPU pre-loading raises**

`Tensor.training = True` is set on line 315 before the `try/finally` block. If any of the five `Tensor(...)` calls on lines 338–342 raises (e.g., OOM), the `finally` clause never runs, leaving `Tensor.training = True` permanently. Any subsequent inference call on the same model instance would then run with dropout active.

Move the tensor creation inside the `try` block, or expand the `try` to start at line 314:

```suggestion
        logger.info("Training device", device=Device.DEFAULT)

        try:
            # Pre-load dataset onto compute device once to eliminate per-step transfers
            gpu_past_continuous = Tensor(dataset.past_continuous)
            gpu_past_categorical = Tensor(dataset.past_categorical)
            gpu_future_categorical = Tensor(dataset.future_categorical)
            gpu_static_categorical = Tensor(dataset.static_categorical)
            gpu_targets = Tensor(dataset.targets)

```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: models/tide/src/tide/trainer.py
Line: 69-71

Comment:
**Fragile error-message string match**

`if "Total days available" not in str(e)` couples this catch to the exact wording of the error message in `data.py`. If that string changes, the guard silently swallows unrelated `ValueError`s instead of re-raising them, hiding real failures.

Consider raising a distinct exception subclass in `get_dataset` (e.g., `InsufficientDataError`) so the catch here can be type-based rather than text-based.

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (2): Last reviewed commit: "Address PR #821 review feedback" | Re-trigger Greptile}

coveralls · 2026-04-13T19:15:47Z

coverage: 78.714% (+0.6%) from 78.162% — model-data-rebuild into master

coderabbitai

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

models/tide/src/tide/data.py (1)
411-413: ⚠️ Potential issue | 🟠 Major

predict is reusing historical rows as future covariates.

For data_type="predict", the last output_length historical rows become future_categorical, so features like day_of_month, month, and year describe the past rather than the forecast horizon. That also makes prediction require input_length + output_length history instead of just the encoder window. Build the decoder categorical features from dates after the latest timestamp, then pair them with only the last input_length historical rows.

Also applies to: 472-483
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@models/tide/src/tide/data.py` around lines 411 - 413, For data_type ==
"predict" (the branch setting self.batch_data via _get_prediction_data), stop
using the last output_length historical rows as future_categorical; instead
generate decoder categorical/temporal features from timestamps after the most
recent timestamp (i.e., the actual forecast horizon) and pair those decoder
features with only the last input_length encoder rows (not input_length +
output_length history). Update _get_prediction_data (and the analogous logic at
the other block around the 472-483 region) to build future_categorical/decoder
features from generated future dates, and ensure batch_data uses the encoder
window of length input_length plus decoder features for output_length.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@applications/data_manager/src/storage.rs`:
- Around line 133-134: The magic number 4096 used in storage.rs should be
extracted into a shared constant (e.g., DUCKDB_CONFIG_MAX_LEN: usize = 4096) so
the length limit is a single cross-file contract; add the constant in a common
place (top of storage.rs or a shared constants module) with appropriate
visibility (pub or pub(crate)) and replace the literal usage in the value.len()
> 4096 check and any tests that assert against 4096 to reference
DUCKDB_CONFIG_MAX_LEN instead; update imports/usages accordingly so all
production code and tests use the constant.

In `@applications/ensemble_manager/src/ensemble_manager/server.py`:
- Line 421: The metric variable prediction_batch_count is being set to
len(dataset) (sample count) which mismatches its name; update the metric to
either be renamed to prediction_sample_count everywhere it's defined/used
(search for prediction_batch_count) or adjust its help/description to indicate
it tracks sample count, and update any telemetry/metrics registration and usages
(e.g., in server.py where prediction_batch_count.set(len(dataset)) is called) to
use the new name/description so the metric name and value semantics match.

In `@maskfile.md`:
- Around line 791-809: The script currently calls prefect directly in the line
prefect deployment run "${deployment}" --param "lookback_days=${lookback_days}",
which fails when Prefect is only available via the uv environment; change that
invocation to run Prefect through uv (e.g., use uv run prefect deployment run
...), preserving the deployment and lookback_days variables (deployment,
lookback_days, model_name) so the exact call becomes an uv-run of prefect with
the same arguments and params; ensure quotes and variable expansions remain
intact and test that mask model train tide triggers the deployment via uv.

In `@models/tide/src/tide/model.py`:
- Around line 420-423: The early-stopping branch uses
validate_model(validation_dataset) and lets a NaN stopping_loss consume
patience, so change the logic in the block that sets stopping_loss (and the
identical block later) to treat an empty validation_dataset the same as None:
first check if validation_dataset is None or empty (e.g.,
len(validation_dataset)==0 or validation_dataset.is_empty()), and if so set
stopping_loss = epoch_loss; otherwise call validate_model(...), then if the
returned stopping_loss is NaN (use math.isnan or np.isnan) fall back to
stopping_loss = epoch_loss before comparing against best_loss and
early_stopping_min_delta; apply this change for both occurrences around the
validate_model calls.
- Around line 420-423: The code uses validate_model to compute stopping_loss for
early stopping but still selects/checkpoints on epoch_loss, causing restored
weights to mismatch; update the checkpoint selection and saving logic so that
when validation_dataset is provided you use stopping_loss (the value returned by
validate_model) as the metric for saving best checkpoints and for restoring
weights, i.e., set a unified checkpoint_metric variable to stopping_loss (else
epoch_loss), use that metric in the checkpoint saving/compare code (the same
branch that currently references epoch_loss), and update the
best-loss/best-epoch tracking and restore logic to reference that unified metric
so early stopping and checkpoint restore use the same criterion.
- Around line 530-535: validate_model is calling quantile_loss without the
huber_delta used during training, so validation uses a different objective;
update the quantile_loss call in validate_model (where predictions and
targets_reshaped are computed) to pass huber_delta=self.huber_delta (same
argument used in train()), ensuring the validation loss matches the training
loss function.

In `@models/tide/src/tide/trainer.py`:
- Around line 61-73: In the validation dataset creation in trainer.py, narrow
the except block around tide_data.get_dataset to only swallow the specific "not
enough data for validation windows" error: catch the ValueError as e, inspect e
(e.g., check for a distinctive substring like "not enough data" or "too small
for windowing") and only set validation_dataset = None and log the warning in
that case; for any other ValueError re-raise it so invalid split/configuration
errors from tide_data.get_dataset surface instead. Use the existing symbols
tide_data.get_dataset, validation_dataset, and the surrounding logger.warning
call to locate where to implement this check.

In `@models/tide/tests/test_tasks.py`:
- Around line 261-302: The test currently lets DLNGpB be dropped by the join
because _SAMPLE_CATEGORIES lacks a category for it, so update the test in
test_prepare_training_data_succeeds_when_raw_data_contains_preferred_tickers to
add a category row for "DLNGpB" to _SAMPLE_CATEGORIES (so both tickers are
present through the join), then capture and inspect the parquet uploaded by
prepare_training_data (mock_s3_client.put_object or the object returned from
model_artifacts_bucket upload) to assert that the resulting parquet contains
only the preferred ticker(s) you expect (e.g., only "AAPL" if filter_equity_bars
should remove DLNGpB); reference prepare_training_data and the mocked
boto3.client (mock_s3_client.get_object/put_object) to locate where to read the
uploaded bytes and validate the parquet contents.

In `@tools/tests/test_build_work_pool_template.py`:
- Around line 91-105: Add an assertion to verify that the AWS region is
propagated into the container log config: in the test function
test_build_work_pool_template_configures_gpu_and_logging (or a new test like
test_build_work_pool_template_uses_aws_region_env_var), set or monkeypatch the
AWS_REGION env var and assert that
result["job_configuration"]["task_definition"]["containerDefinitions"][0]["logConfiguration"]["options"]["awslogs-region"]
equals the expected region (e.g., "eu-west-1"); this ensures
build_work_pool_template reads the environment variable and populates
awslogs-region in the logConfiguration.

---

Outside diff comments:
In `@models/tide/src/tide/data.py`:
- Around line 411-413: For data_type == "predict" (the branch setting
self.batch_data via _get_prediction_data), stop using the last output_length
historical rows as future_categorical; instead generate decoder
categorical/temporal features from timestamps after the most recent timestamp
(i.e., the actual forecast horizon) and pair those decoder features with only
the last input_length encoder rows (not input_length + output_length history).
Update _get_prediction_data (and the analogous logic at the other block around
the 472-483 region) to build future_categorical/decoder features from generated
future dates, and ensure batch_data uses the encoder window of length
input_length plus decoder features for output_length.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9ac4c5ef-efc1-4146-8a4c-6d7c60b2a66f

📥 Commits

Reviewing files that changed from the base of the PR and between 8f39795 and 9c1e8b8.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (26)

.claude/skills/autotrain/SKILL.md
.github/workflows/launch_infrastructure.yaml
applications/data_manager/src/storage.rs
applications/data_manager/tests/test_storage.rs
applications/ensemble_manager/pyproject.toml
applications/ensemble_manager/src/ensemble_manager/server.py
applications/portfolio_manager/pyproject.toml
infrastructure/__main__.py
infrastructure/training.py
maskfile.md
models/tide/Dockerfile
models/tide/src/tide/data.py
models/tide/src/tide/deploy.py
models/tide/src/tide/model.py
models/tide/src/tide/run.py
models/tide/src/tide/tasks.py
models/tide/src/tide/trainer.py
models/tide/src/tide/workflow.py
models/tide/tests/test_data.py
models/tide/tests/test_deploy.py
models/tide/tests/test_model.py
models/tide/tests/test_run.py
models/tide/tests/test_tasks.py
prefect.yaml
tools/src/tools/build_work_pool_template.py
tools/tests/test_build_work_pool_template.py

💤 Files with no reviewable changes (3)

prefect.yaml
models/tide/tests/test_run.py
models/tide/src/tide/run.py

- server.py: set prediction_batch_count to 1 (all predictions run in a single in-memory batch after refactor; previously used len(dataset) which counted symbols, not batches) - tasks.py: guard sector/industry filter with column presence check so consolidate_data does not fail when those columns are absent from the consolidated frame - build_work_pool_template.py: fix CLI usage string to reflect actual invocation via `python -m tools.build_work_pool_template` - storage.rs: extract magic 4096 to DUCKDB_CONFIG_VALUE_MAX_LENGTH const; update test to use the constant - maskfile.md: prefix prefect deployment run with `uv run` so Prefect executes inside the uv environment on a clean workspace - model.py: add math import; fall back stopping_loss to epoch_loss when validate_model returns NaN; use stopping_loss (not epoch_loss) as checkpoint metric when validation is available to avoid saving overfit weights; pass huber_delta to quantile_loss in validate_model to match training loop - trainer.py: narrow ValueError catch to only swallow "Total days available" errors; re-raise config/programming errors immediately - test_tasks.py: add DLNGpB category row so filter_equity_bars test exercises the actual filter path; assert only AAPL appears in the uploaded parquet output from prepare_training_data - test_build_work_pool_template.py: assert awslogs-region in GPU/logging test to match what build_work_pool_template writes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

forstmeier added 9 commits April 10, 2026 22:48

Rebuild TiDE model data pre/processing logic

3045a53

Add cleanup/refactoring/TiDE training data processing rebuild

3aa5445

Various model training fixes

1be6599

Refactor Docker image build/push logic

810d1e3

Add various data/model fixes

6931d15

Add various data/model fixes

5440920

Add various data/model fixes

61c09a1

Add various data/model fixes

64774ce

Add various data/model fixes

9c1e8b8

Copilot AI review requested due to automatic review settings April 13, 2026 19:08

forstmeier added the python Python code updates label Apr 13, 2026

github-project-automation Bot added this to Overview Apr 13, 2026

github-project-automation Bot moved this to In Progress in Overview Apr 13, 2026

github-actions Bot added rust Rust code updates markdown Markdown code updates yaml YAML code updates labels Apr 13, 2026

github-actions Bot requested a review from chrisaddy April 13, 2026 19:08

github-actions Bot assigned forstmeier Apr 13, 2026

Copilot started reviewing on behalf of forstmeier April 13, 2026 19:09 View session

Copilot AI reviewed Apr 13, 2026

View reviewed changes

Comment thread models/tide/src/tide/tasks.py Outdated

Comment thread applications/ensemble_manager/src/ensemble_manager/server.py Outdated

Comment thread tools/src/tools/build_work_pool_template.py

greptile-apps Bot reviewed Apr 13, 2026

View reviewed changes

Comment thread applications/ensemble_manager/src/ensemble_manager/server.py

coderabbitai Bot reviewed Apr 13, 2026

View reviewed changes

forstmeier and others added 2 commits April 13, 2026 21:38

Merge branch 'master' into model-data-rebuild

240b385

coderabbitai Bot approved these changes Apr 14, 2026

View reviewed changes

forstmeier merged commit 6259b0a into master Apr 14, 2026
14 checks passed

forstmeier deleted the model-data-rebuild branch April 14, 2026 01:53

github-project-automation Bot moved this from In Progress to Done in Overview Apr 14, 2026

Conversation

forstmeier commented Apr 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Changes

Context

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

Uh oh!

coveralls commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

forstmeier commented Apr 13, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 13, 2026 •

edited

Loading

greptile-apps Bot commented Apr 13, 2026 •

edited

Loading

coveralls commented Apr 13, 2026 •

edited

Loading