Back To Schedule
Friday, October 30 • 2:00pm - 2:45pm
Throttling Automatic Vectorization: When Less Is More

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

SIMD vectors are widely adopted in modern general purpose processors as they can boost performance and energy efficiency for certain applications. 

Compiler-based automatic vectorization is one approach for generating code that makes efficient use of the SIMD units, and has the benefit of avoiding hand development and platform-specific optimizations. 

The Superword-Level Parallelism (SLP) vectorization algorithm is the most well-known implementation of automatic vectorization when starting from straight-line scalar code, and is implemented in several major compilers. 


The existing SLP algorithm greedily packs scalar instructions into vectors starting from stores and traversing the data dependence graph upwards until it reaches loads or non-vectorizable instructions. 

Choosing whether to vectorize is a one-off decision for the whole graph that has been generated. 

This, however, is suboptimal because the graph may contain code that is harmful to vectorization due to the need to move data from scalar registers into vectors. 

The decision does not consider the potential benefits of throttling the graph by removing this harmful code. 

In this work we propose a solution to overcome this limitation by introducing Throttled SLP (TSLP), a novel vectorization algorithm that finds the optimal graph to vectorize, forcing vectorization to stop earlier whenever this is beneficial. 

Our experiments show that TSLP improves performance across a number of kernels extracted from widely-used benchmark suites, decreasing execution time compared to SLP by 9% on average and up to 14% in the best case. 


Vasileios Porpodas

University of Cambridge

Friday October 30, 2015 2:00pm - 2:45pm PDT
Salon I & Salon II

Attendees (0)