Here is my work concerning parallel (and even heterogeneous) bitonic sort implementation using OpenACC and OpenMP.
While writing the code, I read other implementations. The materials are mentioned in my "Proseminar" paper to the best of my knowledge. Also, the licence to the code is as permissive as possible partially because of the nature of the work involved.