Tuesday, November 6, 2018

Vector Processing Units

Vector Processing Units

Modern general-purpose processors come with specialized hardware called vector processing units. As their name suggests, they perform vector operations. In other words, a vector in one instruction. Due to their vector processing capability, they are also called SIMD (Single Instruction Multiple Data) units. Complex mathematical operations can be efficiently implemented using SIMD units. Specifically in signal processing applications where the same operation needs to be performed on the set of data points.

In this series of blog posts, I am going to delve into the details of SIMD units and how to utilize them for efficiently implementing vector operations. The SIMD units are exposed to programmer through special instructions known as vector processing instructions. Popular compilers such as gcc, Microsoft Visual Studio, and Intel try to optimize the code by auto-vectorization. However, to get most out of SIMD units algorithms needs reformulated to suit vectorization. In this post, I will show one simple example of adding two vectors using SIMD instructions.
In this series of blog posts, I am going to delve into the details of SIMD units and how to utilize them for efficiently implementing vector operations. The SIMD units are exposed to programmer through special instructions known as vector processing instructions. Vector processing units can be accessed through intrinsics which are just wrappers around assembly instructions. For all the available intrinsics please refer intel intrinsics page. This page gives the details about different instructions.
Following example provides a peek view into the usage of SIMD instructions to implement vector addition.

A naive implementation of vector addition:

The SIMD implementation of vector addition:




No comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Featured posts

Supplementary Uplink(SUL) in 5G NR

Introduction  An additional UL carrier located at a lower frequency to allow better coverage in case UE is at the cell edge. For example, ...