Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix CI check when version changes -- remove checked in file that is created by doc example #12034

Merged
merged 4 commits into from
Aug 19, 2024

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Aug 16, 2024

Which issue does this PR close?

Closes #11892

Rationale for this change

See #11892 -- basically if versions change the CI fails

The problem is that the output parquet file that is written as part of the doc example is checked in, and thus if the actual bytes written by the example change (e.g. when the version number changes) the local file is different too

What changes are included in this PR?

  1. Delete the example.parquet file
  2. Add it to .gitignore so it isn't added again
  3. Update example to avoid recreating it

Are these changes tested?

Yes by CI

Are there any user-facing changes?

No, this is a test fix only

@korowa
Copy link
Contributor

korowa commented Aug 18, 2024

Since this file is produced by https://github.com/apache/datafusion/blob/main/docs/source/library-user-guide/using-the-dataframe-api.md#write-dataframe-to-files, which is runned by this doctest as

cargo test --doc 'library_user_guide_dataframe_api'

won't it be better to replace it with tempdir rather than adding this file to gitignore? E.g.

use datafusion::prelude::*;
use datafusion::error::Result;
use datafusion::dataframe::DataFrameWriteOptions;

#[tokio::main]
async fn main() -> Result<()> {
    // Replace with actual target path
    let target_path = tempfile::tempdir()?.path().join("example.parquet");

    let ctx = SessionContext::new();
    // read example.csv file into a DataFrame
    let df = ctx.read_csv("tests/data/example.csv", CsvReadOptions::new()).await?;
    // stream the contents of the DataFrame to the target file
    df.write_parquet(
        target_path.to_string_lossy().as_ref(),
        DataFrameWriteOptions::new(),
        None, // writer_options
    ).await;
    Ok(())
}

or to delete the file in the end of the example, like it's done in other doctests, but using tempdir has zero (or close to it) risk that this file will be committed.

@alamb alamb changed the title Remove checked in file that is created by doc example Fix CI check when version changes -- remove checked in file that is created by doc example Aug 19, 2024
@alamb
Copy link
Contributor Author

alamb commented Aug 19, 2024

won't it be better to replace it with tempdir rather than adding this file to gitignore? E.g.

Good idea -- thanks @korowa. Done in 14caf64

@alamb alamb added the documentation Improvements or additions to documentation label Aug 19, 2024
Copy link
Contributor

@korowa korowa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

Thank you @alamb

@alamb
Copy link
Contributor Author

alamb commented Aug 19, 2024

Thank you for the review @korowa

@alamb alamb merged commit c2cbba2 into apache:main Aug 19, 2024
25 of 26 checks passed
@alamb alamb deleted the alamb/fix_exmple branch August 19, 2024 18:26
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate development-process Related to development process of DataFusion documentation Improvements or additions to documentation
Projects
None yet
2 participants