[DataFrame] Parallel Load into dataframe #6983

alamb · 2023-07-16T12:51:23Z

Is your feature request related to a problem or challenge?

When loading data into a DataFusion via SessionContext::read_parquet, DataFrame , only a single core is used even when there are many cores available.

This leads to slower performance, as reported by @mispp on #6908

Reproducer

Create data using

cd datafusion/benchmarks
./bench.sh data tpch10

Then lad the

use std::{io::Error, time::Instant};
use datafusion::prelude::*;
use chrono;

const FILENAME: &str = "/Users/alamb/Software/arrow-datafusion/benchmarks/data/tpch_sf10/lineitem/part-0.parquet";

#[tokio::main]
async fn main() -> Result<(), Error> {
    env_logger::init();
    {
        let _ = _datafusion().await;
    }

    Ok(())
}

pub async fn _datafusion() {
    let _ctx = SessionContext::new();

    let _read_options = ParquetReadOptions { file_extension: ".parquet", table_partition_cols: vec!(), parquet_pruning: None, skip_metadata: None };
    let _df = _ctx.read_parquet(FILENAME, _read_options).await.unwrap();

    let start = Instant::now();
    println!("datafusion start -> {:?}", chrono::offset::Local::now());

    let _cached = _df.cache().await;
    let elapsed = Instant::now() - start;
    println!("datafusion end -> {:?} {elapsed:?}", chrono::offset::Local::now());
}

Cargo.toml

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[package]
name = "perf_test"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
env_logger = "0.10.0"

parquet = "40.0.0"
serde = "1.0.163"
serde_json = "1.0.96"
datafusion = "27.0.0"
tokio = "1.0"
chrono = "0.4.26"

Describe the solution you'd like

I would like datafusion to read the parquet file in parallel, using target_partitions config parameter

https://docs.rs/datafusion/latest/datafusion/config/struct.ExecutionOptions.html#structfield.target_partitions

Describe alternatives you've considered

No response

Additional context

No response

alamb · 2023-07-17T14:20:02Z

I made a POC on #6984 which demonstrates the issue is indeed using more cores to do the write. However, the implementation of doing repartitioning is probably not right -- I think the better approach would be to set the target partitions when writing into memory table

Perhaps this could be done by creating a LogicalPlan::DmlStatement for write and then letting the existing insert machinery work rather than doing a custom "collect".

https://docs.rs/datafusion/latest/datafusion/logical_expr/logical_plan/struct.DmlStatement.html

Marking this as a good first issue as I think the approach will work well and should be able to follow existing patterns, has a reproducer, and was asked for by a customer

gobraves · 2023-07-17T17:31:49Z

@alamb Hello, I'm new to DataFusion, but can I give this issue a try?

alamb · 2023-07-17T17:54:46Z

Thank you @gobraves -- that would be great. Once you have looked around let me know if you have any questions

Basically I would suggest first verifying that running the equivalent SQL with datafusion-cli

create table t;
INSERT INTO t from SELECT * from `data.parquet`

is properly parallelized

Then look at the plan that comes out

INSERT INTO t from SELECT * from `data.parquet`

And try to update DataFrame::cache() to use the same

gobraves · 2023-08-01T17:40:48Z

hi @alamb, I apologize for the delayed response. Based on your tips, I executed the following commands in the CLI and also ran the code you provided to reproduce the issue. I noticed that executing the commands in the CLI was almost 8 times faster than running the code mentioned above, which is consistent with my CPU core count.

Here are the commands I executed in the CLI:

create external table test stored as parquet location 'part-0.parquet';
create table t as select * from test;
explain create table t as select * from test;

In the logical_plan of the explain output, I observed CreateMemoryTable and TableScan. Consequently, I reviewed the code for CreateMemoryTable in the datafusion-cli and the .cache() function, hoping to identify the differences. I noticed that the target_partitions are indeed passed in both cases, but I'm unsure why they are not utilized in .cache(). However, from the commit mentioned in issue #6984 , it seems that the problem is resolved by using repartitioning. Therefore, it appears that the difference lies in one implementation using Partitioning, while the other does not. However, when browsing through the code myself, I couldn't find any relevant settings. If this is the case, could you please provide some hints as to which part of the code this operation occurs?

I have one more question: Do we need to create a new DmlStatement to address this issue or improve the existing one?

Perhaps this could be done by creating a LogicalPlan::DmlStatement for write and then letting the existing insert machinery work rather than doing a custom "collect".

I'm not entirely clear about this statement, and I believe it might be because I haven't fully grasped the problem described above.

2010YOUY01 · 2023-08-02T00:26:02Z

@gobraves Thank you for trying! I also took a look at this issue (and find it pretty difficult to solve 😨 ), hope the following info might be helpful:
Here is an overview of parallel parquet scan

let _df = _ctx.read_parquet(FILENAME, _read_options).await.unwrap();

let _cached = _df.cache().await;

After _ctx.read_parquet(..., a LogicalPlan with TableScan is created and stored inside the dataframe.
Inside _df.cache(), the LogicalPlan will first be converted into a physical plan with ParquetExec node, and then the physical optimizer will try to modify the ParquetExec node's file_groups to make it parallel.

My reproducer:

DataFusion CLI v28.0.0
❯ create external table test stored as parquet location '/Users/yongting/Desktop/code/my_datafusion/arrow-datafusion/benchmarks/data/tpch_sf1/lineitem/part-0.parquet';
0 rows in set. Query took 0.064 seconds.
❯ create table t as select * from test;
0 rows in set. Query took 16.364 seconds.
❯ create table t as (select * from test where l_orderkey > 0);
0 rows in set. Query took 3.646 seconds.

❯ explain select * from test;
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type     | plan                                                                                                                                                                                                                                                                                                                                                                         |
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan  | TableScan: test projection=[l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment]                                                                                                                                      |
| physical_plan | ParquetExec: file_groups={1 group: [[Users/yongting/Desktop/code/my_datafusion/arrow-datafusion/benchmarks/data/tpch_sf1/lineitem/part-0.parquet]]}, projection=[l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment] |
|               |                                                                                                                                                                                                                                                                                                                                                                              |
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set. Query took 0.009 seconds.
❯ explain select * from test where l_orderkey > 0;
+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type     | plan                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan  | Filter: test.l_orderkey > Int64(0)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
|               |   TableScan: test projection=[l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment], partial_filters=[test.l_orderkey > Int64(0)]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| physical_plan | CoalesceBatchesExec: target_batch_size=8192                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
|               |   FilterExec: l_orderkey@0 > 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
|               |     ParquetExec: file_groups={12 groups: [[Users/yongting/Desktop/code/my_datafusion/arrow-datafusion/benchmarks/data/tpch_sf1/lineitem/part-0.parquet:0..13271296], [Users/yongting/Desktop/code/my_datafusion/arrow-datafusion/benchmarks/data/tpch_sf1/lineitem/part-0.parquet:13271296..26542592], [Users/yongting/Desktop/code/my_datafusion/arrow-datafusion/benchmarks/data/tpch_sf1/lineitem/part-0.parquet:26542592..39813888], [Users/yongting/Desktop/code/my_datafusion/arrow-datafusion/benchmarks/data/tpch_sf1/lineitem/part-0.parquet:39813888..53085184], [Users/yongting/Desktop/code/my_datafusion/arrow-datafusion/benchmarks/data/tpch_sf1/lineitem/part-0.parquet:53085184..66356480], ...]}, projection=[l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment], predicate=l_orderkey@0 > 0, pruning_predicate=l_orderkey_max@0 > 0 |
|               |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

The 2nd one is parallelized, explain verbose select ... can be used to see the specific physical optimizer rule to repartition the ParquetExec
https://github.com/apache/arrow-datafusion/blob/a9561a0f06c25f370dc39df08d057db85c4e0c7a/datafusion/core/src/physical_optimizer/repartition.rs#L166
I think there might be some bug inside this function, if parquet_exec.get_repartitioned() inside it gets called, then the ParquetExec should be properly parallelized

This reproducer should have the same root cause as the original one, for the original reproducer, adding a filter to _df can also get it parallelized:

    let _df = _ctx
        .read_parquet(FILENAME, _read_options)
        .await
        .unwrap()
        .filter(col("l_orderkey").gt(lit(0)))
        .unwrap();
// Then can be parallelized

2010YOUY01 · 2023-08-02T00:38:00Z

I made a POC on #6984 which demonstrates the issue is indeed using more cores to do the write. However, the implementation of doing repartitioning is probably not right -- I think the better approach would be to set the target partitions when writing into memory table

This POC and adding predicate seem both suppress the physical optimizer bug in repartition rule by adding another execution node on top of ParquetExec node 🤔

gobraves · 2023-08-02T16:35:29Z

@2010YOUY01 Thank you!
Based on your findings, I also retested the datafusion-cli and arrow-datafusion crate after updating them to version 28.0.0. Here are the results:

DataFusion CLI v28.0.0
❯ create external table test stored as parquet location 'part-0.parquet';
0 rows in set. Query took 0.023 seconds.
❯ create table t as select * from test;
0 rows in set. Query took 14.621 seconds.

DataFusion CLI v28.0.0
❯ create external table test stored as parquet location 'part-0.parquet';
0 rows in set. Query took 0.014 seconds.
❯ create table t as (select * from test where l_linenumber > 0);
0 rows in set. Query took 4.280 seconds.

use chrono;
use datafusion::common::DataFusionError;
use datafusion::prelude::*;
use object_store::local::LocalFileSystem;
use std::{sync::Arc, time::Instant};
use url::Url;

const FILENAME: &str =
    "/home/neo/project_learning/arrow-datafusion/benchmarks/data/tpch_sf10/lineitem/part-0.parquet";

#[tokio::main]
async fn main() -> Result<(), DataFusionError> {
    let _ctx = SessionContext::new();
    let config = _ctx.copied_config();
    for item in config.options().entries().iter() {
        let key = &item.key;
        let value = &item.value;
        println!("{key} {value:?}")
    }
    let local = Arc::new(LocalFileSystem::new());
    let local_url = Url::parse("file://local").unwrap();
    _ctx.runtime_env().register_object_store(&local_url, local);

    let _read_options = ParquetReadOptions {
        file_extension: ".parquet",
        table_partition_cols: vec![],
        parquet_pruning: None,
        skip_metadata: None,
    };
    let _df = _ctx
        .read_parquet(FILENAME, _read_options)
        .await
        .unwrap();

    let start = Instant::now();
    let _cached = _df.cache().await;
    let elapsed = Instant::now() - start;
    println!(
        "datafusion end -> {:?} {elapsed:?}",
        chrono::offset::Local::now()
    );
    Ok(())
}

without filter: 114.913562535s

If the code is modified with filter

 let _df = _ctx
        .read_parquet(FILENAME, _read_options)
        .await
        .unwrap()
        .filter(col("l_linenumber").gt(lit(0)))
        .unwrap();

with filter: 15.583268924s

	datafuison-cli	script
without filter	14.621s	114.913562535s
with filter	4.280s	15.583268924s

I will need to continue examining the code to understand the specific reason behind this performance difference.

alamb · 2023-08-03T15:27:10Z

I think there is something in the physical planning that assumes the result of the final plan should be in a single partition (or at least it won't expand it when adding additional partioning). because when connecting to a client this is what makes the most sense

I believe this is controlled by ExecutionPlan::benefits_from_input_partitioning: https://github.com/apache/arrow-datafusion/blob/6a2d4a3a254c0495a398608d178496a191450750/datafusion/core/src/physical_plan/mod.rs#L169-L183

ProjectionExec returns false if it is only columns (which is what these queries are doing)

https://github.com/apache/arrow-datafusion/blob/6a2d4a3a254c0495a398608d178496a191450750/datafusion/core/src/physical_plan/projection.rs#L285

So the reason the filter case goes faster is that the filter is that the filter will return true for benefits from repartitioning but the Partition won't.

I wonder if we could somehow add a flag to LogicalProjection / ProjectionExec that says "always benefits from repartitioning somehow and set that flag for the writes 🤔

Alternately, I was thinking the `ExecutionPlan that does the writing could say "I want the input partitioned" and the optimizer would do the right thing. But given the DataFrame API doesn't use an ExecutionPlan for writing it might not work.

Thank you both for pushing on this -- it is going to be awesome to get this working correctly

alamb · 2023-08-03T15:27:46Z

BTW @devinjdangelo has been looking at using ExecutionPlan for dataframes here: #7141

alamb · 2023-12-11T21:52:19Z

This might well be done, I think all that remains is for someone to test / verify that the reproducer now runs in parallel

marvinlanhenke · 2023-12-27T16:35:26Z

@alamb
I ran the same reproducer as stated here: #6983 (comment) and I can report the same results as before (CLI & Dataframe). The issue seems not to be resolved, unfortunately.

Edit:

... I did some debugging on this issue:

When running the query without a filter, we get a plan OutputRequirementExec. By looking at the implementation of fn benefits_from_input_partitioning we can see it returns always vec![false]. This causes the EnforceDistribution optimizer to do nothing and file_groups remains at 1.

When running the query with a filter, we get a plan FilterExec which doesn't have an implementation of fn benefits_from_input_partitioning. So it relies on the default impl of the trait - which returns true. So the EnforceDistribution optimizer is allowed to do its job.

Possible Solution:
Simply remove the implementation on OutputRequirementsExec and also rely on the default impl from the trait?

alamb · 2023-12-28T19:39:37Z

Thank you for the follow up @marvinlanhenke

Possible Solution:
Simply remove the implementation on OutputRequirementsExec and also rely on the default impl from the trait?

I think this is likely a great thing to try. @devinjdangelo perhaps you have some more input or ideas to try

pmcgleenon · 2024-02-19T18:34:33Z

I ran the reproducer #6983 (comment) and didn't see this issue.

generate benchmark data

cd benchmarks
./bench.sh data tpch10

run CLI with query (3.2 seconds) and without query (3.5 seconds)

DataFusion CLI v36.0.0
❯ create external table test stored as parquet location '/Users/pmcgleen/work/arrow-datafusion/datafusion-cli/part-0.parquet';
0 rows in set. Query took 0.115 seconds.

❯ create table t as select * from test;
0 rows in set. Query took 3.527 seconds.

DataFusion CLI v36.0.0
❯ create external table test stored as parquet location '/Users/pmcgleen/work/arrow-datafusion/datafusion-cli/part-0.parquet';
0 rows in set. Query took 0.006 seconds.

❯ create table t as (select * from test where l_linenumber > 0);
0 rows in set. Query took 3.216 seconds.

ran the rust program with query (3.1 seconds) and without query (3 seconds)

    let _df = _ctx
        .read_parquet(FILENAME, _read_options)
        .await
        .unwrap();
        // .filter(col("l_orderkey").gt(lit(0)))
        // .unwrap();

checked the plan output for the presence of file_groups in the physical plan to make it parallel.

❯ explain select * from test;
+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type     | plan                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan  | TableScan: test projection=[l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment]                                                                                                                                                                                                                                                                                                                                                                                     |
| physical_plan | ParquetExec: file_groups={4 groups: [[Users/pmcgleen/work/arrow-datafusion/datafusion-cli/part-0.parquet:0..10165445], [Users/pmcgleen/work/arrow-datafusion/datafusion-cli/part-0.parquet:10165445..20330890], [Users/pmcgleen/work/arrow-datafusion/datafusion-cli/part-0.parquet:20330890..30496335], [Users/pmcgleen/work/arrow-datafusion/datafusion-cli/part-0.parquet:30496335..40661778]]}, projection=[l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment] |
|               |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set. Query took 0.010 seconds.

❯ explain select * from test where l_orderkey > 0;
+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type     | plan                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan  | Filter: test.l_orderkey > Int64(0)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
|               |   TableScan: test projection=[l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment], partial_filters=[test.l_orderkey > Int64(0)]                                                                                                                                                                                                                                                                                                                                                                                                                                     |
| physical_plan | CoalesceBatchesExec: target_batch_size=8192                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
|               |   FilterExec: l_orderkey@0 > 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
|               |     ParquetExec: file_groups={4 groups: [[Users/pmcgleen/work/arrow-datafusion/datafusion-cli/part-0.parquet:0..10165445], [Users/pmcgleen/work/arrow-datafusion/datafusion-cli/part-0.parquet:10165445..20330890], [Users/pmcgleen/work/arrow-datafusion/datafusion-cli/part-0.parquet:20330890..30496335], [Users/pmcgleen/work/arrow-datafusion/datafusion-cli/part-0.parquet:30496335..40661778]]}, projection=[l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment], predicate=l_orderkey@0 > 0, pruning_predicate=l_orderkey_max@0 > 0, required_guarantees=[] |
|               |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set. Query took 0.012 seconds.

@alamb this looks ok to me (unless I've missed something). file_groups = 4 means it's loaded in parallel on each of the 4 CPUs available?

alamb · 2024-02-20T07:06:17Z

@alamb this looks ok to me (unless I've missed something). file_groups = 4 means it's loaded in parallel on each of the 4 CPUs available?

I agree -- thank you for checking @pmcgleenon . Let's close this issue and we can open new issues for future improvements if warranted

alamb added the enhancement New feature or request label Jul 16, 2023

alamb mentioned this issue Jul 16, 2023

[DataFrame] Read files in parallel (4x faster) #6984

Closed

alamb assigned alamb and unassigned alamb Jul 16, 2023

alamb added the good first issue Good for newcomers label Jul 17, 2023

devinjdangelo mentioned this issue Jul 24, 2023

[DataFrame] Parallel Write out of dataframe #7079

Closed

alamb mentioned this issue Oct 6, 2023

[EPIC] Streaming partitioned writes #6569

Open

38 tasks

marvinlanhenke mentioned this issue Dec 27, 2023

Parallel NDSON file reading #8502

Closed

alamb closed this as completed Feb 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DataFrame] Parallel Load into dataframe #6983

[DataFrame] Parallel Load into dataframe #6983

alamb commented Jul 16, 2023

alamb commented Jul 17, 2023 •

edited

Loading

gobraves commented Jul 17, 2023

alamb commented Jul 17, 2023 •

edited

Loading

gobraves commented Aug 1, 2023 •

edited

Loading

2010YOUY01 commented Aug 2, 2023

2010YOUY01 commented Aug 2, 2023

gobraves commented Aug 2, 2023

alamb commented Aug 3, 2023

alamb commented Aug 3, 2023

alamb commented Dec 11, 2023

marvinlanhenke commented Dec 27, 2023 •

edited

Loading

alamb commented Dec 28, 2023

pmcgleenon commented Feb 19, 2024

alamb commented Feb 20, 2024

[DataFrame] Parallel Load into dataframe #6983

[DataFrame] Parallel Load into dataframe #6983

Comments

alamb commented Jul 16, 2023

Is your feature request related to a problem or challenge?

Reproducer

Describe the solution you'd like

Describe alternatives you've considered

Additional context

alamb commented Jul 17, 2023 • edited Loading

gobraves commented Jul 17, 2023

alamb commented Jul 17, 2023 • edited Loading

gobraves commented Aug 1, 2023 • edited Loading

2010YOUY01 commented Aug 2, 2023

2010YOUY01 commented Aug 2, 2023

gobraves commented Aug 2, 2023

alamb commented Aug 3, 2023

alamb commented Aug 3, 2023

alamb commented Dec 11, 2023

marvinlanhenke commented Dec 27, 2023 • edited Loading

Edit:

alamb commented Dec 28, 2023

pmcgleenon commented Feb 19, 2024

alamb commented Feb 20, 2024

alamb commented Jul 17, 2023 •

edited

Loading

alamb commented Jul 17, 2023 •

edited

Loading

gobraves commented Aug 1, 2023 •

edited

Loading

marvinlanhenke commented Dec 27, 2023 •

edited

Loading