-
Notifications
You must be signed in to change notification settings - Fork 48
Open
Description
I can only reproduce this with my data file which is hosted on github link. It's not a large file (270kb, 250 rows by 500 columns) of integer data, delimited by tab. Please download this file and save as symp.dat (or alter the code below accordingly).
using DataFrames
using Statistics
using Query
using Base.Filesystem
using CSV
headers = ["sim$i" for i = 1:500]
dt = CSV.File("symp.dat", delim='\t', header=headers) |> DataFrame
dt.time = 1:250 ## add a time column for `melt` purposes
f(g) = g |> @map(mean(_))
@time f(dt)
Output:
julia> @time f(dt)
31.975361 seconds (7.63 M allocations: 415.528 MiB, 0.84% gc time)
And then the REPL hangs. It hangs for about 2 minutes! before it finishes printing the 250-element query result.
julia> @time f(dt)
31.190349 seconds (7.64 M allocations: 416.400 MiB, 0.94% gc time)
250-element query result
0.249501
0.265469
0.0838323
0.237525
0.323353
0.359281
0.457086
0.566866
0.798403
1.06587
... with 240 more elements
In summary:
- First run takes about 35 seconds even when
@mapis in a function (maybe this has to something with the fact thatdtis global?) - The REPL hangs for about 2 minutes after the
@timemacro prints its information. It seems to me it takes a long time to "collect" the results of the query and display it back in the REPL.
Version info:
julia> versioninfo()
Julia Version 1.0.0
Commit 5d4eaca0c9 (2018-08-08 20:58 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin14.5.0)
CPU: Intel(R) Core(TM) i7-3615QM CPU @ 2.30GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.0 (ORCJIT, ivybridge)