Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 3 additions & 6 deletions rust/benchmarks/src/bin/tpch.rs
Original file line number Diff line number Diff line change
Expand Up @@ -142,12 +142,9 @@ async fn benchmark(opt: BenchmarkOpt) -> Result<Vec<arrow::record_batch::RecordB
println!("Loading table '{}' into memory", table);
let start = Instant::now();

let memtable = MemTable::load(
table_provider.as_ref(),
opt.batch_size,
Some(opt.partitions),
)
.await?;
let memtable =
MemTable::load(table_provider, opt.batch_size, Some(opt.partitions))
.await?;
println!(
"Loaded table '{}' into memory in {} ms",
table,
Expand Down
2 changes: 1 addition & 1 deletion rust/datafusion/benches/sort_limit_query_sql.rs
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ fn create_context() -> Arc<Mutex<ExecutionContext>> {
let partitions = 16;

rt.block_on(async {
let mem_table = MemTable::load(&csv, 16 * 1024, Some(partitions))
let mem_table = MemTable::load(Box::new(csv), 16 * 1024, Some(partitions))
.await
.unwrap();

Expand Down
2 changes: 1 addition & 1 deletion rust/datafusion/src/datasource/memory.rs
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ impl MemTable {

/// Create a mem table by reading from another data source
pub async fn load(
t: &dyn TableProvider,
t: Box<dyn TableProvider + Send + Sync>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
t: Box<dyn TableProvider + Send + Sync>,
t: Arc<dyn TableProvider + Send + Sync>,

I suggest using Arc to follow the convention in ExecutionContext:
https://github.com/apache/arrow/blob/master/rust/datafusion/src/execution/context.rs#L565

I don't think there is any reason load needs exclusive ownership over the TableProvider

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alamb .

You make a good point and it is confusing to me why Box is used in some places and Arc in others (given the I thought ideal characteristics of Arc): e.g.

/// Execution context for registering data sources and executing queries
#[derive(Clone)]
pub struct ExecutionContextState {
    /// Data sources that are registered with the context
    pub datasources: HashMap<String, Arc<dyn TableProvider + Send + Sync>>,

vs

    pub fn register_table(
        &mut self,
        name: &str,
        provider: Box<dyn TableProvider + Send + Sync>,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can take a shot at cleaning up all the APIs to take an Arc over the next few days. I think that would be the ideal outcome.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. I did replace everything with Arc locally for testing and tests pass at least.

batch_size: usize,
output_partitions: Option<usize>,
) -> Result<Self> {
Expand Down