Skip to content

Conversation

@apoorvalal
Copy link
Member

No description provided.

@apoorvalal apoorvalal requested a review from Copilot September 4, 2025 04:35
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a streaming OLS implementation that leverages DuckDB's Arrow IPC support for memory-efficient regression on large datasets. The implementation uses sufficient statistics to process data in chunks rather than loading everything into memory.

  • Adds streaming regression functionality with DuckDB Arrow interop
  • Implements sufficient statistics-based OLS and Ridge regression
  • Provides comprehensive test coverage for the streaming implementation

Reviewed Changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 3 comments.

File Description
duckreg/stream.py Core streaming regression implementation with Arrow IPC support
tests/test_stream.py Test suite covering streaming regression functionality
duckreg/init.py Exposes StreamingRegression in public API

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.


def solve_ols(self) -> np.ndarray:
"""Compute OLS coefficients."""
if self.n < self.k:
Copy link

Copilot AI Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition self.n < self.k checks if there are fewer observations than features, but self.k could be None if no data has been processed yet. This will raise a TypeError when comparing int with NoneType.

Suggested change
if self.n < self.k:
if self.k is None or self.n < self.k:

Copilot uses AI. Check for mistakes.
Comment on lines +74 to +75
feature_cols: list[str] = None,
target_col: str = None,
Copy link

Copilot AI Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using mutable default arguments (None) should be replaced with proper Optional type annotations and default to None explicitly. The type hints should be Optional[list[str]] and Optional[str].

Suggested change
feature_cols: list[str] = None,
target_col: str = None,
feature_cols: Optional[list[str]] = None,
target_col: Optional[str] = None,

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant