Replies: 1 comment
-
I upgraded to 4.13.0 and this slowness was not present. Seems this slowness emerged in 4.14.0 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am opening a parquet file from s3 and doing some basic transformations. The file from s3 is a 1gb parquet with 30 million records and about 15 columns and almost all fields are of type float. The following are the transformations I am doing.
Converting all decimal columns to strings using the astype() method and then doing the following transformation on all converted string columns aswell:
df[col_name]=df.func.where(df[col_name]=='-0','0',df[col_name])
There is a date string column which is being transformed as following:
df[col_name]=df[col_name].str.slice(0, 4) + '-' + df[col_name].str.slice(4, 6) + '-' + df[col_name].str.slice(6, 8)
these transformations took 11 seconds including opening the file from s3 using the open method with vaex-core 4.12.0 but with 4.14.0 these transformations are taking 150 seconds. Opening from s3 time is the same for both versions. Nothing else has changed in the code. My python version is 3.8.10.
I am unable to provide the exact data but let me know if there are any questions about my process.
Beta Was this translation helpful? Give feedback.
All reactions