Skip to content

Conversation

@dreadatour
Copy link
Contributor

@dreadatour dreadatour commented Sep 8, 2025

Fix tests:

  • PyTorch does not supports ffmpeg 8, pin ffmpeg version to 7 in OS X tests (Linux/Windows tests uses outdated ffmpeg by default)
  • Fix docstrings for mkdocs build + fix some mistypes in docstrings + fix import according to code style

Note: it is better to review this PR with "changes in whitespaces" hidden.

Summary by Sourcery

Fix CI failures by pinning ffmpeg to v7 on macOS and update docstrings and imports for mkdocs build and code style compliance

Bug Fixes:

  • Pin ffmpeg version to 7 in macOS CI to restore PyTorch compatibility
  • Adjust docstring formatting to resolve mkdocs build errors

Enhancements:

  • Standardize docstring parameter syntax across multiple modules
  • Consolidate and reorder import statements to match code style guidelines

CI:

  • Install ffmpeg@7 and set DYLD_FALLBACK_LIBRARY_PATH in GitHub Actions for macOS runners

Documentation:

  • Correct typos and mistyped parameter names in docstrings

@dreadatour dreadatour self-assigned this Sep 8, 2025
@dreadatour dreadatour marked this pull request as draft September 8, 2025 01:55
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Sep 8, 2025

Reviewer's Guide

This PR resolves macOS test failures by pinning FFmpeg to version 7 and cleans up mkdocs compatibility by standardizing docstring formatting and fixing minor typos, alongside a small import refactor for consistent code style.

File-Level Changes

Change Details Files
Pin FFmpeg version in macOS CI to address compatibility issues
  • Install ffmpeg@7 instead of latest on macOS runner
  • Update DYLD_FALLBACK_LIBRARY_PATH to ffmpeg@7 lib path
.github/workflows/tests.yml
Standardize and correct docstring formatting for mkdocs build
  • Remove extra spaces before parameter colons and normalize indentations
  • Add or refine type hints in parameter lists
  • Fix typos and align descriptions
src/datachain/lib/dc/datachain.py
src/datachain/lib/dc/storage.py
src/datachain/lib/dc/csv.py
src/datachain/lib/dc/hf.py
src/datachain/lib/clip.py
src/datachain/lib/dc/json.py
src/datachain/lib/dc/parquet.py
src/datachain/lib/dc/datasets.py
src/datachain/lib/dc/records.py
Refactor imports for consistent code style
  • Replace separate os.path import with import os
  • Combine typing imports into single line
  • Group and reorder library imports
src/datachain/lib/dc/storage.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • On macOS, brew install ffmpeg@7 is keg-only by default, so you’ll need to either run brew link --force --overwrite ffmpeg@7 or add its bin path to $PATH so CI actually uses the v7 binaries.
  • For Windows and Ubuntu runners, explicitly pin the ffmpeg major version (e.g. choco install ffmpeg --version=7.x or using an apt pin) instead of relying on the default install to ensure you don’t accidentally pick up v8 when it’s released.
  • Consider adding a quick ffmpeg -version check after installation to fail fast if the wrong version ends up in the CI environment.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- On macOS, `brew install ffmpeg@7` is keg-only by default, so you’ll need to either run `brew link --force --overwrite ffmpeg@7` or add its bin path to $PATH so CI actually uses the v7 binaries.
- For Windows and Ubuntu runners, explicitly pin the ffmpeg major version (e.g. `choco install ffmpeg --version=7.x` or using an apt pin) instead of relying on the default install to ensure you don’t accidentally pick up v8 when it’s released.
- Consider adding a quick `ffmpeg -version` check after installation to fail fast if the wrong version ends up in the CI environment.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@codecov
Copy link

codecov bot commented Sep 8, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.84%. Comparing base (b48c0a9) to head (9555181).

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1322   +/-   ##
=======================================
  Coverage   88.84%   88.84%           
=======================================
  Files         155      155           
  Lines       14238    14240    +2     
  Branches     2025     2025           
=======================================
+ Hits        12650    12652    +2     
  Misses       1124     1124           
  Partials      464      464           
Flag Coverage Δ
datachain 88.78% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/datachain/lib/clip.py 90.90% <ø> (ø)
src/datachain/lib/dc/csv.py 92.30% <100.00%> (+0.20%) ⬆️
src/datachain/lib/dc/datachain.py 91.14% <100.00%> (ø)
src/datachain/lib/dc/datasets.py 95.12% <ø> (ø)
src/datachain/lib/dc/hf.py 89.47% <100.00%> (ø)
src/datachain/lib/dc/json.py 95.83% <ø> (ø)
src/datachain/lib/dc/parquet.py 100.00% <100.00%> (ø)
src/datachain/lib/dc/records.py 100.00% <ø> (ø)
src/datachain/lib/dc/storage.py 100.00% <100.00%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Sep 8, 2025

Deploying datachain-documentation with  Cloudflare Pages  Cloudflare Pages

Latest commit: 9555181
Status: ✅  Deploy successful!
Preview URL: https://08860c21.datachain-documentation.pages.dev
Branch Preview URL: https://fix-ci-tests-ffmpeg.datachain-documentation.pages.dev

View logs

Comment on lines +118 to +119
brew install ffmpeg@7
echo 'DYLD_FALLBACK_LIBRARY_PATH=/opt/homebrew/opt/ffmpeg@7/lib' >> "$GITHUB_ENV"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyTorch does not supports ffmpeg 8

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's put a link to this please meta-pytorch/torchcodec#839 to remove it when it is fixed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment. Thank you! 🙏

@dreadatour dreadatour changed the title WIP: Use ffmpeg version <8 in GitHub CI Fix failing tests and mkdocs Sep 8, 2025
@dreadatour dreadatour marked this pull request as ready for review September 8, 2025 03:38
@dreadatour dreadatour requested a review from a team September 8, 2025 03:38
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • This PR mixes functional CI changes (ffmpeg pinning) with sweeping docstring/style updates, which makes it harder to review—consider splitting it into separate commits or PRs for clarity and maintainability.
  • In tests.yml the ffmpeg version is hardcoded in multiple places; centralizing it via a variable or matrix axis would simplify future updates and reduce duplication.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- This PR mixes functional CI changes (ffmpeg pinning) with sweeping docstring/style updates, which makes it harder to review—consider splitting it into separate commits or PRs for clarity and maintainability.
- In tests.yml the ffmpeg version is hardcoded in multiple places; centralizing it via a variable or matrix axis would simplify future updates and reduce duplication.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +579 to +582
name: dataset name. This can be either a fully qualified name, including
the namespace and project, or just a regular dataset name. In the latter
case, the namespace and project will be taken from the settings
(if specified) or from the default values otherwise.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have asked AI to help me with this 👀

Comment on lines +37 to +38
limit: The maximum number of items to read from the HF dataset.
Applies `take(limit)` to `datasets.load_dataset`.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here, AI was used to improve this docstring.

@dreadatour dreadatour merged commit 6eec7e7 into main Sep 8, 2025
38 checks passed
@dreadatour dreadatour deleted the fix-ci-tests-ffmpeg branch September 8, 2025 05:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants