Parabix on ARM

This project adds support for key Parabix operations such as transposition, long-stream addition and stream advance on the ARM architecture targeting the SIMD Neon instructions by reimplementing platform-specific x86 SSE instructions. Our code closely follows the format of idisa_sse_builder in order to implement these features.

View the Parabix source code on GitLab.

Example Translation for _mm_movmask_ps

Description

Set each bit of mask dst based on the most significant bit of the corresponding packed single-precision (32-bit) floating-point element in a.

We take a 4xi32 and want to resolve an integer bitmask of the MSBs. This SSE instruction is simulated using a load, right shift, left shift, and horizontal vector addition in Neon.

Implementation

Resources

  1. Ong, “Porting Intel Intrinsics to Arm Neon Intrinsics”, CodeProject, 04-May-2021. [Online]. Available:
    https://www.codeproject.com/Articles/5301747/Porting-Intel-Intrinsics-to-Arm-Neon-Intrinsics.
    [Accessed: 17-Jun-2021].
  2. Simd-Everywhere, “simd-everywhere/simde”, GitHub. [Online]. Available:
    https://github.com/simd-everywhere/simde.
    [Accessed: 17-Jun-2021].
  3. Intrinsics – Arm Developer. [Online]. Available:
    https://developer.arm.com/architectures/instruction-sets/intrinsics.
    [Accessed: 17-Jun-2021].
  4. Intel® Intrinsics Guide. [Online]. Available:
    https://software.intel.com/sites/landingpage/IntrinsicsGuide.
    [Accessed: 17-Jun-2021].