This tutorial helps the programmers to benefit the progress of the auto-vectorization algorithms that are implemented in modern compilers, such as gcc. Before you start playing with the vectorization of your code i assume that you don't have any bottleneck in you code (like dynamic memory allocation etc) in the critical path. In this tutorial we will use the gcc 4.4.1, but the same steps can be applied to newer or older versions. First of all there are two issues with auto vectorization: 1) gcc must know the architecture (eg what SIMD instructions are available) 2) The data structures must by properly aligned in memory The first step is to find the architecture of your processor and point it to gcc using the flags -mtune=... / -march=... you specify the architecture. For example, my laptop is core2Duo so i put -march=core2. You can find more more information here . The next problem we must solve is knowledge of memory alignment. ...