-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to speed up object conversion? #404
Comments
If you know that |
When I do this: @time rows = pycall(cs.cursor[:fetchall], Array{Tuple}) I get this:
What should I pass in as the type? |
Actually it looks like |
Ok, so what I have is an iterator, that correctly returns a |
What does the iterator return? A tuple of what size/type? |
it depends on the query, but let's say for this particular query it is: |
You can do |
With the latest PyCall master, you should be able to do One thing that has been on my to-do list for a while is to speed up type introspection by caching types in a hash table. |
Thanks. That didn't change the time it takes because presumably the call to |
Ok, so I've managed to get huge speedups (90%) by trying several different things:
Overall on average for 100,000 rows & 40 colums I've seen a change from approx 200seconds to 17seconds. Larger datasets have had even more improvements, going from several hours of running time to under 10 minutes. Hope this helps others. |
Why not just do
That way, you eliminate all of the constructions of temporary arrays, transposition, etcetera. |
Nice, that does improve performance a little and reduces memory usage. |
I made a small change to your code and got it a little faster: m = length(rows)
n = length(cs.description)
a = Array{PyObject}(n,m)
for i=1:m
row = get(rows, PyObject, i-1)
for j = 1:n
a[j,i] = get(row, PyObject, j-1)
end
end
b = transpose(a) This is faster because indexing a julia array a column at a time is faster than indexing it a row at a time and the |
Slightly faster and more concise with this: m = length(rows)
n = length(cs.description)
a = Array{PyObject}(n, m)
for i = 1:m
a[:, i] = get(rows, PyVector{PyObject}, i-1)
end |
I'm using
PyCall
(1.7.2) to run an SQL query against a database, and then getting the results into Julia. It appears to be very slow to convert the python list of tuples to a Julia array of tuples, and it seems that all the slowness is in iterating through the list elements, i.e., the speed isO(n)
on the number of items in the list.Here's some example code. The SQL statement returns exactly 100,000 rows:
Using automatic type conversion
Getting a
PyObject
and then converting withmap
As you can see, calling
fetchall()
takes 157seconds, and then themap
is very fast, whereas callingpycall(fetchall, PyObject)
takes 7seconds, and then themap
is very slow.So, wise PyCall devs, is there a way for me to combine the fastest parts of the two approaches? I'm not averse to going as low level as necessary as this level of database slowness is causing us a lot of grief.
PS: I've tried parallelising this with
pmap
and other low level julia parallel functions, but this involves copying the object which has the same issue.The text was updated successfully, but these errors were encountered: