Skip to content

Commit

Permalink
Merge pull request #126 from meneguzzi/sse2neon
Browse files Browse the repository at this point in the history
Added sse2neon for arm compilation.
  • Loading branch information
georgmartius authored Jan 7, 2024
2 parents 05829db + 406e61d commit cbe3d5c
Show file tree
Hide file tree
Showing 5 changed files with 9,219 additions and 0 deletions.
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,7 @@ CMakeFiles
Makefile
cmake_install.cmake
install_manifest.txt

# Don't include the build directory and compiled Mac dynamic libraries.
build
*.dylib
6 changes: 6 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@ set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${CMAKE_SOURCE_DIR}/CMakeModules/")

include (FindSSE)
include (GNUInstallDirs)
# include (ExternalProject) # There must be a nicer way to include this
# include (sse2neon)
find_package(OpenMP)

set(MAJOR_VERSION 1)
Expand All @@ -30,6 +32,10 @@ if(NOT WIN32)
add_definitions(-fPIC)
endif()

if(APPLE)
add_definitions(-march=armv8-a+fp+simd+crypto+crc)

This comment has been minimized.

Copy link
@ddennedy

ddennedy Jan 8, 2024

Contributor

@meneguzzi This broke compilation of universal (CMAKE_OSX_ARCHITECTURES='arm64;x86_64') on my M1 build machine for daily Shotcut builds. See https://github.com/mltframework/shotcut/actions/runs/7447186625

The error is error: unknown target CPU 'armv8-a+fp+simd+crypto+crc'
From cmake output -- The C compiler identification is AppleClang 14.0.0.14000029

This comment has been minimized.

Copy link
@meneguzzi

meneguzzi Jan 8, 2024

Contributor

Weird, I tested it months ago. Next time my pull request should include tests as well. My apologies for this. Until next week I won't have bandwidth to fix this. If you have time to try and fix it, I'd appreciate it.

This comment has been minimized.

Copy link
@meneguzzi

meneguzzi Jan 8, 2024

Contributor

I guess an easy fix is to add a check on the architecture, do you happen to know preprocessor constant applies only to Apple Silicon?

This comment has been minimized.

Copy link
@ddennedy

ddennedy Jan 8, 2024

Contributor

I was thinking to change if(APPLE) to if (APPLE AND NOT CMAKE_OSX_ARCHITECTURES), but I have not yet tested it. Where does the strange (to me) architecture string you are using come from? Maybe also add a comment about that with a link. I will try it and report back.

This comment has been minimized.

Copy link
@meneguzzi

meneguzzi Jan 8, 2024

Contributor

I found it in the CMake Documentation here:

https://cmake.org/cmake/help/latest/variable/APPLE.html

I did not find anything authoritative and "CMakey" to zero in on whether it's an ARM or Intel (or Motorola for real old timers).

This comment has been minimized.

Copy link
@ddennedy

ddennedy Jan 8, 2024

Contributor

"APPLE" is not the confusing part, "-march=armv8-a+fp+simd+crypto+crc" is.

This comment has been minimized.

Copy link
@ddennedy

ddennedy Jan 10, 2024

Contributor

I made PR #128 to address this. I think your arch string came from the sse2neon readme. In that project, DLTcollab/sse2neon#502 suggests there is no obvious path to use sse2neon in a mulit-arch universal build. I tried "armv8-a+fp+simd+crypto+crc;x86_64", but it did not work with Apple clang (14). -DCMAKE_OSX_ARCHITECTURES=arm64;x86_64 works now, but I am not certain that it benefits from sse2neon.

endif(APPLE)

### ORC is not used in any active code at the moment ###
# I tried it with 0.4.14
# 0.4.10 did not work (not all opcode implemented)
Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# VidStab

[![C/C++ CI](https://github.com/georgmartius/vid.stab/actions/workflows/c-cpp.yml/badge.svg)](https://github.com/georgmartius/vid.stab/actions/workflows/c-cpp.yml)

Vidstab is a video stabilization library which can be plugged-in with Ffmpeg and Transcode.

**Why is it needed**
Expand Down
7 changes: 7 additions & 0 deletions src/motiondetect_opt.c
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,13 @@
#define SSE2_CMP_SUM_ROWS 8
#endif

#ifdef __ARM_NEON__
#include "sse2neon.h"
#define USE_SSE2
#define USE_SSE2_CMP_HOR
#define SSE2_CMP_SUM_ROWS 8
#endif

#ifdef USE_SSE2
/**
\see contrastSubImg using SSE2 optimization, Planar (1 byte per channel) only
Expand Down
Loading

1 comment on commit cbe3d5c

@georgmartius
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I merged the new PR.

Please sign in to comment.