Batched NUFFT
This adds support for a new batched NUFFT, which is substantially faster than using a Python for loop over the batch dimension when applying a NUFFT with many small k-space trajectories. It also updates the documentation and includes a new page for performance tips. See PR #24 and Issue #24 for details and testing.