Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ORC-1356: [C++] Use Intel AVX-512 instructions to accelerate the Rle-bit-packing decode #1375

Merged
merged 110 commits into from
May 7, 2023
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
110 commits
Select commit Hold shift + click to select a range
58c3ab6
Use AVX512 to optimize bit-packing decode functions. This will improve
wpleonardo Jan 10, 2023
acbc214
Fix some conficts.
wpleonardo Jan 10, 2023
293d863
Fix some conflicts.
wpleonardo Jan 10, 2023
e7a9119
Fix the code format.
wpleonardo Jan 11, 2023
cfde08f
Modify TestRleVectorDecoder.cc to match the new format.
wpleonardo Jan 11, 2023
8341943
Fix a mistake on function name
wpleonardo Jan 11, 2023
e840649
Modified code into namespace orc
wpleonardo Jan 11, 2023
c7962d5
Modify function name to fix a build issue.
wpleonardo Jan 11, 2023
495a620
Modify code format.
wpleonardo Jan 12, 2023
5c937e6
Fix a build issue about int64 has different printf format between mac…
wpleonardo Jan 12, 2023
a87c281
Fix build issue on windows.
wpleonardo Jan 12, 2023
d8fcbe6
Fix some code format issue and function name.
wpleonardo Jan 12, 2023
668335c
1. Modified the code format;
wpleonardo Jan 14, 2023
46daa2d
1. Use clang-format to modify the code format of TestRleVectorDecoder.cc
wpleonardo Jan 14, 2023
415d1eb
1. Use clang-format -style=google to format code style of TestRleVect…
wpleonardo Jan 15, 2023
cd2f71d
1. Use clang-format to modify the code style of c++/test/TestRleVecto…
Jan 30, 2023
f9ee0b4
Use clang-format to modify code style of files:
Jan 30, 2023
6f8cb56
1.Add an Env parameter "ENABLE_RUNTIME_AVX512" to open or close AVX51…
wpleonardo Jan 31, 2023
c1c2448
Update CMakeLists.txt
wpleonardo Feb 1, 2023
edf164f
Update CMakeLists.txt
wpleonardo Feb 1, 2023
f360582
Merge pull request #3 from wpleonardo/fix_comments
wpleonardo Feb 13, 2023
743ac84
1.Add the dynamic dispatch function to distribute avx512 and default …
Feb 13, 2023
6bc9035
Delete some comments in code.
Feb 13, 2023
ca3af78
Fix some comments.
Feb 14, 2023
eeafccf
Fix some comments
Feb 14, 2023
1beb9b5
Merge pull request #4 from wpleonardo/fix_comments
wpleonardo Feb 15, 2023
d9c562b
1.Modified the CMakelists, delete the part of aarch64 and ORC_RUNTIME…
Feb 16, 2023
0cf5620
Modified the macro name
Feb 16, 2023
08e32f4
Merge pull request #5 from wpleonardo/fix_comments
wpleonardo Feb 16, 2023
8a6b9f7
Merge pull request #6 from wpleonardo/fix_comments
wpleonardo Feb 16, 2023
1b8301f
1.Fixed build error on macos
Feb 17, 2023
1924ecf
Merge pull request #7 from wpleonardo/fix_comments
wpleonardo Feb 17, 2023
3c4f2b8
Merge pull request #8 from wpleonardo/fix_comments
wpleonardo Feb 17, 2023
b1759a1
Merge pull request #9 from wpleonardo/fix_comments
wpleonardo Feb 17, 2023
dc81e79
Merge pull request #10 from wpleonardo/fix_comments
wpleonardo Feb 17, 2023
b37c7dd
Fixed the build error on macos.
Feb 17, 2023
23dd7ff
Fix the build error on macos, and code format.
Feb 17, 2023
6a6f491
Fix build error on macos.
Feb 18, 2023
b2abf44
Fix build error on macos
Feb 18, 2023
36f06aa
Fix a build error about "%ld" and "%lld" on macos.
Feb 18, 2023
3db8d1a
Merge pull request #11 from wpleonardo/fix_comments
wpleonardo Feb 18, 2023
15db3d1
Use std::cout instead of printf function
Feb 19, 2023
d77c81b
Merge pull request #12 from wpleonardo/fix_comments
wpleonardo Feb 20, 2023
42cc703
Fix build error on macos.
Feb 20, 2023
4fbe1d7
Merge pull request #13 from wpleonardo/fix_comments
wpleonardo Feb 28, 2023
9d86e3d
Macos doesn't support AVX512 fully. So skip Macos to support AVX512 d…
Mar 1, 2023
75e4cfa
Add the comments about arch=native compile option.
Mar 1, 2023
284a9a4
Merge pull request #14 from wpleonardo/fix_comments
wpleonardo Mar 1, 2023
2c9f93f
Merge pull request #15 from wpleonardo/fix_comments
wpleonardo Mar 2, 2023
d21705c
Merge pull request #16 from wpleonardo/fix_comments
wpleonardo Mar 2, 2023
197f2e6
Add the cpu flags information in the cmake process.
Mar 2, 2023
1d050af
Modified the cmake check of supoorting AVX512.
Mar 2, 2023
10b7009
Merge pull request #17 from wpleonardo/fix_comments
wpleonardo Mar 3, 2023
a239e47
When user set BUILD_ENABLE_AVX512=on, but the compiler cannot support…
Mar 3, 2023
b2b6aff
Add the comment about -mtune=native in cmake process.
Mar 3, 2023
6f06b79
Merge pull request #18 from wpleonardo/fix_comments
wpleonardo Mar 6, 2023
a0aa823
Merge pull request #19 from wpleonardo/fix_comments
wpleonardo Mar 6, 2023
d7112e9
1.Add the new CI action to test AVX512 feature.
Mar 6, 2023
5b38980
Change the build_type back to Debug, keep consistent with the original.
Mar 6, 2023
2ad64bc
Merge pull request #20 from wpleonardo/fix_comments
wpleonardo Mar 8, 2023
6768165
Fix an error about _mm512_load_si512 on some CPU core when running wi…
Mar 9, 2023
d383035
Merge pull request #21 from wpleonardo/fix_comments
wpleonardo Mar 10, 2023
ce7f6de
Most hotspot of function RleDecoderV2::resetBufferStart locates in sa…
Mar 11, 2023
8f6806b
Modified some cmake options and status message
Mar 14, 2023
e27be9e
Delete macro ORC_HAVE_RUNTIME_AVX512. Modified CMakeLists.txt to choo…
Mar 16, 2023
fe5b6c7
Merge pull request #22 from wpleonardo/fix_comments
wpleonardo Mar 16, 2023
5b0e66d
Modified the code format.
Mar 16, 2023
8c99fcd
1.Delete the redundancy code in CpuInfo file
Mar 16, 2023
0f1adda
Add the cpu flags print on windows.
Mar 17, 2023
440d6d1
Merge pull request #23 from wpleonardo/fix_comments
wpleonardo Mar 17, 2023
070ca0f
Update cmake_modules/ConfigSimdLevel.cmake
wgtmac Mar 17, 2023
21de59a
1. Code format change in c++/src/Bpacking.hh
Mar 17, 2023
ae0d5c2
Code format change about c++/src/CpuInfoUtil.cc
Mar 17, 2023
3f47d1c
Merge pull request #24 from wpleonardo/fix_comments
wpleonardo Mar 17, 2023
1fdfe54
1. Deleted some useless header files included in source file
Mar 20, 2023
3f156b4
1. Code format about c++/src/BpackingAvx512.cc
Mar 20, 2023
4b166ee
1. Delete the redundant buffer array in class UnpackAvx512
Mar 22, 2023
3c21f2e
Use macros to replace some number
wpleonardo Mar 22, 2023
27d5b40
Change RleDecoderV2::readLongs return type back to void.
Mar 24, 2023
7cea68e
Added "how to build&use AVX512 in ORC" in README.md
Mar 27, 2023
3be42ee
1.Modified the description about how to use AVX512 in README.md
Mar 27, 2023
11ceeaa
When compiler doesn't support AVX512, but customer set BUILD_ENABLE_A…
Mar 27, 2023
277d9be
1. Update link information about apple avx512 in CMakeLists.txt
Mar 28, 2023
305a317
Fix an error about if judgement in windows CI test
Mar 28, 2023
4debd50
Add the align header and tailer code in the process of bit-unpacking.
Mar 29, 2023
62d373c
Fix an error in the CI test yaml file on windows platform.
Mar 29, 2023
3dca1d7
Modified the AVX512 enable description in the README.md
Mar 29, 2023
e23ca29
Add "shell: bash" in the CI test on windows, and make CI commands run…
Mar 31, 2023
1a32212
1. In function alignHeaderBoundary and alignTailerBoundary, rename pa…
Apr 11, 2023
fc2c288
Change the parameter bitMaxSize type to const uint32_t
Apr 12, 2023
3468df0
Change some parameter's type to const
wpleonardo Apr 13, 2023
93feaf9
1. Changed the parameters bufferStart, bufferEnd, bitsLeft and curByt…
Apr 17, 2023
3b831f2
Merge pull request #39 from wpleonardo/fix_comments
wpleonardo Apr 18, 2023
1deb2cf
Merge pull request #40 from wpleonardo/fix_comments
wpleonardo Apr 18, 2023
b48ec06
1. Modified vectorUnpack16,vectorUnpack24,vectorUnpack32 to support a…
Apr 18, 2023
b89870a
Added the comments of function alignHeaderBoundary and alignTailerBou…
Apr 18, 2023
fe09a92
Delete useless header file
Apr 18, 2023
596835d
Merge branch 'main' into fix_comments
Apr 18, 2023
e236773
Code format change
Apr 18, 2023
321ab63
Merge pull request #41 from wpleonardo/fix_comments
wpleonardo Apr 19, 2023
df6fe45
Add a parameter comments
Apr 19, 2023
ce77b50
Merge pull request #42 from wpleonardo/fix_comments
wpleonardo Apr 21, 2023
6c84d8d
Merge pull request #43 from wpleonardo/fix_comments
wpleonardo Apr 21, 2023
f3ff215
Change the invoking way about bufferstart,bufferend parameters.
Apr 21, 2023
af96de9
1. Code format change
Apr 22, 2023
d6fd57d
Merge pull request #44 from wpleonardo/fix_comments
wpleonardo Apr 23, 2023
0bfc862
Modified cmakefile about the checking of AVX512.
Apr 23, 2023
e584a42
Because check_cxx_source_run will be hung on windows, change check_cx…
Apr 24, 2023
4d261eb
Change check_cxx_source_runs back to CHECK_CXX_SOURCE_COMPILES
Apr 24, 2023
1f2085e
Merge pull request #45 from wpleonardo/fix_comments
wpleonardo Apr 23, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
155 changes: 153 additions & 2 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,10 @@ option(BUILD_CPP_ENABLE_METRICS
"Enable the metrics collection at compile phase"
OFF)

option(BUILD_ENABLE_AVX512
"Enable AVX512 vector decode of bit-packing"
ON)

# Make sure that a build type is selected
if (NOT CMAKE_BUILD_TYPE)
message(STATUS "No build type selected, default to ReleaseWithDebugInfo")
Expand All @@ -87,6 +91,17 @@ if (BUILD_POSITION_INDEPENDENT_LIB)
set(CMAKE_POSITION_INDEPENDENT_CODE ON)
endif ()

if(NOT DEFINED ORC_SIMD_LEVEL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These options and variables look confusing to me. BUILD_ENABLE_AVX512 and ORC_SIMD_LEVEL serve the same purpose. At least one of them should be removed.

If ORC_SIMD_LEVEL and ORC_RUNTIME_SIMD_LEVEL only have default values, then they should be removed because they cannot be changed. Otherwise, they should at least support NONE and AVX512 to be configurable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Delete ORC_RUNTIME_SIMD_LEVEL

set(ORC_SIMD_LEVEL
"DEFAULT"
CACHE STRING "Compile time SIMD optimization level")
endif()
if(NOT DEFINED ORC_RUNTIME_SIMD_LEVEL)
set(ORC_RUNTIME_SIMD_LEVEL
"MAX"
CACHE STRING "Max runtime SIMD optimization level")
endif()

#
# Compiler specific flags
#
Expand Down Expand Up @@ -116,7 +131,7 @@ if (CMAKE_CXX_COMPILER_ID MATCHES "Clang")
set (WARN_FLAGS "${WARN_FLAGS} -Wno-covered-switch-default")
set (WARN_FLAGS "${WARN_FLAGS} -Wno-missing-noreturn -Wno-unknown-pragmas")
set (WARN_FLAGS "${WARN_FLAGS} -Wno-gnu-zero-variadic-macro-arguments")
set (WARN_FLAGS "${WARN_FLAGS} -Wconversion")
set (WARN_FLAGS "${WARN_FLAGS} -Wno-conversion")
if (CMAKE_CXX_COMPILER_VERSION VERSION_GREATER_EQUAL "13.0")
set (WARN_FLAGS "${WARN_FLAGS} -Wno-reserved-identifier -Wno-suggest-destructor-override -Wno-suggest-override")
endif()
Expand All @@ -135,7 +150,7 @@ elseif (CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
else ()
set (CXX17_FLAGS "-std=c++17")
endif ()
set (WARN_FLAGS "-Wall -Wno-unknown-pragmas -Wconversion")
set (WARN_FLAGS "-Wall -Wno-unknown-pragmas -Wno-conversion")
if (CMAKE_CXX_COMPILER_VERSION VERSION_GREATER "12.0")
set (WARN_FLAGS "${WARN_FLAGS} -Wno-array-bounds -Wno-stringop-overread") # To compile protobuf in Fedora37
endif ()
Expand All @@ -157,6 +172,134 @@ elseif (MSVC)
set (WARN_FLAGS "${WARN_FLAGS} -wd4146") # unary minus operator applied to unsigned type, result still unsigned
endif ()

include(CheckCXXCompilerFlag)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The architecture detecting logic below worth a separate file under cmake_modules directory and be included here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move lines from 175 to 270 into a separate cmake module.

include(CheckCXXSourceCompiles)
message(STATUS "System processor: ${CMAKE_SYSTEM_PROCESSOR}")

if(NOT DEFINED ORC_CPU_FLAG)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we supposed to support ppc, s390x and riscv64? The CI checks do not cover these architectures so we are unable to verify and maintain them.

cc @dongjoon-hyun

if(CMAKE_SYSTEM_PROCESSOR MATCHES "AMD64|X86|x86|i[3456]86|x64")
set(ORC_CPU_FLAG "x86")
elseif(CMAKE_SYSTEM_PROCESSOR MATCHES "aarch64|ARM64|arm64")
set(ORC_CPU_FLAG "aarch64")
elseif(CMAKE_SYSTEM_PROCESSOR MATCHES "^arm$|armv[4-7]")
set(ORC_CPU_FLAG "aarch32")
elseif(CMAKE_SYSTEM_PROCESSOR MATCHES "powerpc|ppc")
set(ORC_CPU_FLAG "ppc")
elseif(CMAKE_SYSTEM_PROCESSOR MATCHES "s390x")
set(ORC_CPU_FLAG "s390x")
elseif(CMAKE_SYSTEM_PROCESSOR MATCHES "riscv64")
set(ORC_CPU_FLAG "riscv64")
else()
message(FATAL_ERROR "Unknown system processor")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fails the build on these processors, though it rarely happens. At least we should not break build which succeeds in the past.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created a new cmake module "cmake_modules/ConfigSimdLevel.cmake" to config AVX512.

endif()
endif()

# Check architecture specific compiler flags
if(ORC_CPU_FLAG STREQUAL "x86")
# x86/amd64 compiler flags, msvc/gcc/clang
if(MSVC)
set(ORC_SSE4_2_FLAG "")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this patch aims for AVX512 only, we can remove SSE4 and AVX2 for now. So flags like ORC_AVX2_FLAG, CXX_SUPPORTS_SSE4_2, and CXX_SUPPORTS_AVX2 can be removed for now.

set(ORC_AVX2_FLAG "/arch:AVX2")
set(ORC_AVX512_FLAG "/arch:AVX512")
set(CXX_SUPPORTS_SSE4_2 TRUE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CXX_SUPPORTS_SSE4_2 is not used and can be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

else()
set(ORC_SSE4_2_FLAG "-msse4.2")
set(ORC_AVX2_FLAG "-march=haswell")
# skylake-avx512 consists of AVX512F,AVX512BW,AVX512VL,AVX512CD,AVX512DQ
set(ORC_AVX512_FLAG "-march=native -mbmi2")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use a single set for ORC_AVX512_FLAG

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

# Append the avx2/avx512 subset option also, fix issue ORC-9877 for homebrew-cpp
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does fix issue ORC-9877 for homebrew-cpp mean?

Copy link
Contributor Author

@wpleonardo wpleonardo Feb 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for bad reference, already deleted.

set(ORC_AVX2_FLAG "${ORC_AVX2_FLAG} -mavx2")
set(ORC_AVX512_FLAG
"${ORC_AVX512_FLAG} -mavx512f -mavx512cd -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi")
endif()
check_cxx_compiler_flag(${ORC_AVX512_FLAG} CXX_SUPPORTS_AVX512)
if(MINGW)
# https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65782
message(STATUS "Disable AVX512 support on MINGW for now")
else()
# Check for AVX512 support in the compiler.
set(OLD_CMAKE_REQURED_FLAGS ${CMAKE_REQUIRED_FLAGS})
set(CMAKE_REQUIRED_FLAGS "${CMAKE_REQUIRED_FLAGS} ${ORC_AVX512_FLAG}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that CMAKE_REQUIRED_FLAGS is not officially documented. Do we any have better alternatives?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can find the CMAKE_REQUIRED_FLAGS information in the cmake document:
https://cmake.org/cmake/help/latest/module/CheckCXXSourceCompiles.html
Is there no need to change CMAKE_REQUIRED_FLAGS ? Is my understanding right?

check_cxx_source_compiles("
#ifdef _MSC_VER
#include <intrin.h>
#else
#include <immintrin.h>
#endif

int main() {
__m512i mask = _mm512_set1_epi32(0x1);
char out[32];
_mm512_storeu_si512(out, mask);
return 0;
}"
CXX_SUPPORTS_AVX512)
set(CMAKE_REQUIRED_FLAGS ${OLD_CMAKE_REQURED_FLAGS})
endif()
# Runtime SIMD level it can get from compiler and ORC_RUNTIME_SIMD_LEVEL
if(CXX_SUPPORTS_SSE4_2 AND ORC_RUNTIME_SIMD_LEVEL MATCHES
"^(SSE4_2|AVX2|AVX512|MAX)$")
set(ORC_HAVE_RUNTIME_SSE4_2 ON)
add_definitions(-DORC_HAVE_RUNTIME_SSE4_2)
endif()
if(CXX_SUPPORTS_AVX2 AND ORC_RUNTIME_SIMD_LEVEL MATCHES "^(AVX2|AVX512|MAX)$")
set(ORC_HAVE_RUNTIME_AVX2 ON)
add_definitions(-DORC_HAVE_RUNTIME_AVX2 -DORC_HAVE_RUNTIME_BMI2)
endif()
if(CXX_SUPPORTS_AVX512 AND ORC_RUNTIME_SIMD_LEVEL MATCHES "^(AVX512|MAX)$")
set(ORC_HAVE_RUNTIME_AVX512 ON)
add_definitions(-DORC_HAVE_RUNTIME_AVX512 -DORC_HAVE_RUNTIME_BMI2)
endif()
if(ORC_SIMD_LEVEL STREQUAL "DEFAULT")
set(ORC_SIMD_LEVEL "AVX512")
endif()

elseif(ORC_CPU_FLAG STREQUAL "ppc")
# power compiler flags, gcc/clang only
set(ORC_ALTIVEC_FLAG "-maltivec")
check_cxx_compiler_flag(${ORC_ALTIVEC_FLAG} CXX_SUPPORTS_ALTIVEC)
if(ORC_SIMD_LEVEL STREQUAL "DEFAULT")
set(ORC_SIMD_LEVEL "NONE")
endif()
elseif(ORC_CPU_FLAG STREQUAL "aarch64")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the logic relevant to aarch64

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

# Arm64 compiler flags, gcc/clang only
set(ORC_ARMV8_MARCH "armv8-a")
check_cxx_compiler_flag("-march=${ORC_ARMV8_MARCH}+sve" CXX_SUPPORTS_SVE)
if(ORC_SIMD_LEVEL STREQUAL "DEFAULT")
set(ORC_SIMD_LEVEL "NEON")
endif()
endif()

# Only enable additional instruction sets if they are supported
if(ORC_CPU_FLAG STREQUAL "x86")
if(MINGW)
# Enable _xgetbv() intrinsic to query OS support for ZMM register saves
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mxsave")
endif()
if(ORC_SIMD_LEVEL STREQUAL "AVX512")
if(NOT CXX_SUPPORTS_AVX512)
message(FATAL_ERROR "AVX512 required but compiler doesn't support it.")
endif()
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${ORC_AVX512_FLAG}")
add_definitions(-DORC_HAVE_AVX512 -DORC_HAVE_AVX2 -DORC_HAVE_BMI2
-DORC_HAVE_SSE4_2)
elseif(ORC_SIMD_LEVEL STREQUAL "AVX2")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove levels other than AVX512 for now to make it simpler.

if(NOT CXX_SUPPORTS_AVX2)
message(FATAL_ERROR "AVX2 required but compiler doesn't support it.")
endif()
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${ORC_AVX2_FLAG}")
add_definitions(-DORC_HAVE_AVX2 -DORC_HAVE_BMI2 -DORC_HAVE_SSE4_2)
elseif(ORC_SIMD_LEVEL STREQUAL "SSE4_2")
if(NOT CXX_SUPPORTS_SSE4_2)
message(FATAL_ERROR "SSE4.2 required but compiler doesn't support it.")
endif()
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${ORC_SSE4_2_FLAG}")
add_definitions(-DORC_HAVE_SSE4_2)
elseif(NOT ORC_SIMD_LEVEL STREQUAL "NONE")
message(WARNING "ORC_SIMD_LEVEL=${ORC_SIMD_LEVEL} not supported by x86.")
endif()
endif()

if (BUILD_CPP_ENABLE_METRICS)
message(STATUS "Enable the metrics collection")
add_compile_definitions(ENABLE_METRICS=1)
Expand All @@ -165,6 +308,14 @@ else ()
add_compile_definitions(ENABLE_METRICS=0)
endif ()

if (BUILD_ENABLE_AVX512 AND CXX_SUPPORTS_AVX512 AND ORC_SIMD_LEVEL STREQUAL "AVX512")
message(STATUS "Enable the AVX512 vector decode of bit-packing")
add_compile_definitions(ENABLE_AVX512=1)
else ()
message(STATUS "Disable the AVX512 vector decode of bit-packing")
add_compile_definitions(ENABLE_AVX512=0)
endif ()

enable_testing()

INCLUDE(CheckSourceCompiles)
Expand Down
96 changes: 96 additions & 0 deletions c++/src/DetectPlatform.hh
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

#ifndef ORC_DETECTPLATFORM_HH
#define ORC_DETECTPLATFORM_HH

#if defined(__GNUC__) || defined(__clang__)
DIAGNOSTIC_IGNORE("-Wold-style-cast")
#endif

namespace orc
{
#ifdef _WIN32
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Platform dependent function like cpuid can be defined in the file Adaptor.hh.in


#include "intrin.h"
// Windows CPUID
#define cpuid(info, x) __cpuidex(info, x, 0)
#else
// GCC Intrinsics
#include <cpuid.h>
#include <dlfcn.h>

void cpuid(int info[4], int InfoType) {
__cpuid_count(InfoType, 0, info[0], info[1], info[2], info[3]);
}

unsigned long long xgetbv(unsigned int index) {
unsigned int eax, edx;
__asm__ __volatile__(
"xgetbv;"
: "=a" (eax), "=d"(edx)
: "c" (index)
);
return ((unsigned long long) edx << 32) | eax;
}

#endif

#define CPUID_AVX512F 0x00100000
#define CPUID_AVX512CD 0x00200000
#define CPUID_AVX512VL 0x04000000
#define CPUID_AVX512BW 0x01000000
#define CPUID_AVX512DQ 0x02000000
#define EXC_OSXSAVE 0x08000000 // 27th bit

#define CPUID_AVX512_MASK (CPUID_AVX512F | CPUID_AVX512CD | CPUID_AVX512VL | CPUID_AVX512BW | CPUID_AVX512DQ)

enum class Arch {
PX_ARCH = 0,
AVX2_ARCH = 1,
AVX512_ARCH = 2
};

Arch detectPlatform() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we rename the function and the file name to detect architecture?

Arch detected_platform = Arch::PX_ARCH;
int cpuInfo[4];
cpuid(cpuInfo, 1);

bool avx512_support_cpu = cpuInfo[1] & CPUID_AVX512_MASK;
bool os_uses_XSAVE_XSTORE = cpuInfo[2] & EXC_OSXSAVE;

if (avx512_support_cpu && os_uses_XSAVE_XSTORE) {
// Check if XMM state and YMM state are saved
#ifdef _WIN32
unsigned long long xcr_feature_mask = _xgetbv(0); /* min VS2010 SP1 compiler is required */
#else
unsigned long long xcr_feature_mask = xgetbv(0);
#endif

if ((xcr_feature_mask & 0x6) == 0x6) { // AVX2 is supported now
if ((xcr_feature_mask & 0xe0) == 0xe0) { // AVX512 is supported now
detected_platform = Arch::AVX512_ARCH;
}
}
}

return detected_platform;
}
}

#endif
42 changes: 41 additions & 1 deletion c++/src/RLEv2.hh
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,9 @@

#include <vector>

#define MAX_VECTOR_BUF_8BIT_LENGTH 64
#define MAX_VECTOR_BUF_16BIT_LENGTH 32
#define MAX_VECTOR_BUF_32BIT_LENGTH 16
#define MAX_LITERAL_SIZE 512
#define MIN_REPEAT 3
#define HIST_LEN 32
Expand Down Expand Up @@ -189,13 +192,45 @@ namespace orc {
resetReadLongs();
}

void resetBufferStart(uint64_t len, bool resetBuf, uint32_t backupLen);
unsigned char readByte();

int64_t readLongBE(uint64_t bsz);
int64_t readVslong();
uint64_t readVulong();
void readLongs(int64_t* data, uint64_t offset, uint64_t len, uint64_t fbs);
void plainUnpackLongs(int64_t* data, uint64_t offset, uint64_t len, uint64_t fbs);
void plainUnpackLongs(int64_t *data, uint64_t offset, uint64_t len, uint64_t fbs,
uint64_t& startBit);

#if ENABLE_AVX512
void unrolledUnpackVector1(int64_t *data, uint64_t offset, uint64_t len);
void unrolledUnpackVector2(int64_t *data, uint64_t offset, uint64_t len);
void unrolledUnpackVector3(int64_t *data, uint64_t offset, uint64_t len);
void unrolledUnpackVector4(int64_t *data, uint64_t offset, uint64_t len);
void unrolledUnpackVector5(int64_t *data, uint64_t offset, uint64_t len);
void unrolledUnpackVector6(int64_t *data, uint64_t offset, uint64_t len);
void unrolledUnpackVector7(int64_t *data, uint64_t offset, uint64_t len);
void unrolledUnpackVector9(int64_t *data, uint64_t offset, uint64_t len);
void unrolledUnpackVector10(int64_t *data, uint64_t offset, uint64_t len);
void unrolledUnpackVector11(int64_t *data, uint64_t offset, uint64_t len);
void unrolledUnpackVector12(int64_t *data, uint64_t offset, uint64_t len);
void unrolledUnpackVector13(int64_t *data, uint64_t offset, uint64_t len);
void unrolledUnpackVector14(int64_t *data, uint64_t offset, uint64_t len);
void unrolledUnpackVector15(int64_t *data, uint64_t offset, uint64_t len);
void unrolledUnpackVector16(int64_t *data, uint64_t offset, uint64_t len);
void unrolledUnpackVector17(int64_t *data, uint64_t offset, uint64_t len);
void unrolledUnpackVector18(int64_t *data, uint64_t offset, uint64_t len);
void unrolledUnpackVector19(int64_t *data, uint64_t offset, uint64_t len);
void unrolledUnpackVector20(int64_t *data, uint64_t offset, uint64_t len);
void unrolledUnpackVector21(int64_t *data, uint64_t offset, uint64_t len);
void unrolledUnpackVector22(int64_t *data, uint64_t offset, uint64_t len);
void unrolledUnpackVector23(int64_t *data, uint64_t offset, uint64_t len);
void unrolledUnpackVector24(int64_t *data, uint64_t offset, uint64_t len);
void unrolledUnpackVector26(int64_t *data, uint64_t offset, uint64_t len);
void unrolledUnpackVector28(int64_t *data, uint64_t offset, uint64_t len);
void unrolledUnpackVector30(int64_t *data, uint64_t offset, uint64_t len);
void unrolledUnpackVector32(int64_t *data, uint64_t offset, uint64_t len);
#endif

void unrolledUnpack4(int64_t* data, uint64_t offset, uint64_t len);
void unrolledUnpack8(int64_t* data, uint64_t offset, uint64_t len);
Expand Down Expand Up @@ -230,6 +265,11 @@ namespace orc {
uint32_t curByte; // Used by anything that uses readLongs
DataBuffer<int64_t> unpackedPatch; // Used by PATCHED_BASE
DataBuffer<int64_t> literals; // Values of the current run
#if ENABLE_AVX512
uint8_t vectorBuf8[MAX_VECTOR_BUF_8BIT_LENGTH + 1]; // Used by vectorially 1~8 bit-unpacking data
uint16_t vectorBuf16[MAX_VECTOR_BUF_16BIT_LENGTH + 1]; // Used by vectorially 9~16 bit-unpacking data
uint32_t vectorBuf32[MAX_VECTOR_BUF_32BIT_LENGTH + 1]; // Used by vectorially 17~32 bit-unpacking data
#endif
};
} // namespace orc

Expand Down
Loading