Openblas defaults to the maximum available threads (not physical cores) which is usually not optimal. User code should be able to control this directly to get optimal performance.
void goto_set_num_threads(int num_threads);
void openblas_set_num_threads(int num_threads);