Skip to content

Conversation

@returnString
Copy link
Contributor

As discussed here, we were looking into how we might add code examples to the DataFusion readme whilst keeping them in sync with reality as we go through API revisions etc.

This PR pulls in a new dev dependency, doc-comment, which allows for detecting all the rust-tagged code blocks in a Markdown file and treating them as doctests, and wires this up for README.md.

My only concerns are:

  • because the end result is a full-blown doctest, you do need to make sure imports etc are present, which makes the samples more verbose than some people would perhaps like
  • again on the verbosity front: we have lots of async code which requires a #[tokio::main] async fn main() { ... } wrapper

Neither of these are inherently bad imo, but worth noting upfront.

As an example of a readme sample that passes as a doctest (borrowed from @alamb's latest documentation PR, #9710):

use datafusion::prelude::*;
use arrow::util::pretty::print_batches;
use arrow::record_batch::RecordBatch;

#[tokio::main]
async fn main() -> datafusion::error::Result<()> {
  let mut ctx = ExecutionContext::new();
  // create the dataframe
  let df = ctx.read_csv("tests/example.csv", CsvReadOptions::new())?;

  let df = df.filter(col("a").lt_eq(col("b")))?
            .aggregate(&[col("a")], &[min(col("b"))])?
            .limit(100)?;

  let results: Vec<RecordBatch> = df.collect().await?;
  print_batches(&results)?;

  Ok(())
}

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @returnString -- this is great! It also has already found a bug (in the example on #9710) 👍

@github-actions
Copy link

@alamb
Copy link
Contributor

alamb commented Mar 19, 2021

The lint failure appears unrelated to this PR -- #9754 has a fix

@alamb
Copy link
Contributor

alamb commented Mar 19, 2021

So I tried this out locally by taking my commit from #9710

(arrow_dev) alamb@ip-10-0-0-124:~/Software/arrow2/rust$ git cherry-pick fa58e14
Auto-merging rust/datafusion/src/lib.rs
[readme_doctest 861031b62] ARROW-11969:  [Rust][DataFusion] Improve Examples in documentation
 Date: Mon Mar 15 11:04:32 2021 -0400
 4 files changed, 75 insertions(+), 4 deletions(-)

And then running the doc tests (which found several bugs!)

cargo test --doc -p datafusion
...
'
error[E0277]: the `?` operator can only be used in a function that returns `Result` or `Option` (or another type that implements `Try`)
  --> src/lib.rs:296:3
   |
2  |   fn main() { #[allow(non_snake_case)] fn _doctest_main_src_lib_rs_285_0() {
   |  ______________________________________-
3  | | let mut ctx = ExecutionContext::new();
4  | |   // create the dataframe
5  | |   let df = ctx.read_csv("tests/example.csv", CsvReadOptions::new())?;
...  |
13 | |   print_batches(&results)?;
   | |   ^^^^^^^^^^^^^^^^^^^^^^^^ cannot use the `?` operator in a function that returns `()`
14 | | }; _doctest_main_src_lib_rs_285_0() }
   | |_- this function should return `Result` or `Option` to accept `?`
   |
   = help: the trait `Try` is not implemented for `()`
   = note: required by `from_error`

error: aborting due to 18 previous errors

Some errors have detailed explanations: E0277, E0412, E0425, E0433, E0728.
For more information about an error, try `rustc --explain E0277`.
Couldn't compile the test.

failures:
    src/lib.rs - readme_example_test (line 271)
    src/lib.rs - readme_example_test (line 285)
    src/lib.rs - readme_example_test (line 301)

I also verified that running cargo test --all ran the tests as well

I am convinced this will run as part of CI. Thanks for this @returnString . Good stuff!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants