Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Cython instead of f2py #3

Merged
merged 2 commits into from
Sep 18, 2017
Merged

Use Cython instead of f2py #3

merged 2 commits into from
Sep 18, 2017

Conversation

certik
Copy link
Contributor

@certik certik commented Sep 14, 2017

The f2py wrappers seem to have some overhead. In my experience, one can get the best performance when using Cython, and making sure everything is compiled with all optimizations on. I have done that in this PR.

Now the Cython Fortran version is about 10% faster than Numba on my machine, here is how the executed notebook looks like for me now: http://nbviewer.jupyter.org/gist/certik/4dec819b75ef49488fb223918b148fae:

In [15]: numba_time.best / fortran_time2.best
Out[15]: 1.1155824066253082

I used gfortran 5.4.0 on Ubuntu 16.04.

I also converted the Fortran subroutine to modern Fortran:

subroutine abc_model(a, b, c, inflow, outflow)
real(dp), intent(in) :: a, b, c, inflow(:)
real(dp), intent(out) :: outflow(:)
real(dp) :: state_in, state_out
integer :: t
state_in = 0
do t = 1, size(inflow)
    state_out = (1 - c) * state_in + a * inflow(t)
    outflow(t) = (1 - a - b) * inflow(t) + c * state_in
    state_in = state_out
end do
end subroutine

That does not affect performance, but the function is much cleaner and shorter, more in line with the Python version.

This provides another speedup on my machine.
This does not seem to have any speed implications, but the code is simpler and
more in line with modern Fortran usage.
@kratzert
Copy link
Owner

As said in in the other pull request, I'm on travel till sunday and won't find much time until I'm home to try and merge this. But I already had a look at the nb. Thanks for the effort you put into this. I'm not so experienced with fortran so I really appreaciate your suggestions

@certik
Copy link
Contributor Author

certik commented Sep 14, 2017

@kratzert no problem, take your time. I did it, since benchmark comparisons should be fair, and numba is probably using the most optimized LLVM together with the most efficient wrappers, and so it wasn't fair to compare against not fully optimized Fortran build together with less efficient Python wrappers. This PR fixes it, so now the benchmark is fair, as far as I can tell.

@kratzert
Copy link
Owner

I share your opinion that all implementations should be implemented (and compiled) as best as possible for a speed comparision. I did it to my best knowledge but I wasn't aware of the different compile possibilities. Thanks for the correction And I think this PR even shows another advantage of numba (compile complexity) which seems to come with a minor speed trade off.

@certik
Copy link
Contributor Author

certik commented Sep 15, 2017

Regarding the compile complexity, for actual projects I just use cmake and have written tools that generate the Cython and C header files from Fortran code (https://github.com/certik/dftatom/blob/4c435a30feb8f075b7d62fd5788970e4d100678d/utils/fparser), similarly how f2py works, just without the speed overhead. I should probably make it a standalone library and integrate with the jupyter notebook, so that it's as painless as f2py.

@kratzert kratzert merged commit 9077833 into kratzert:master Sep 18, 2017
@certik certik deleted the cython branch September 18, 2017 21:12
@kratzert
Copy link
Owner

@certik I think many people would appreciate having your code as a maybe documented pardon to the f2py library since there is a serious speed gain. So maybe you should go for it :)

@certik
Copy link
Contributor Author

certik commented Sep 18, 2017

@kratzert Yes, I should do that, I put it on my TODO list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants