diff options
author | Sean Silva <silvas@purdue.edu> | 2012-12-20 22:59:36 +0000 |
---|---|---|
committer | Sean Silva <silvas@purdue.edu> | 2012-12-20 22:59:36 +0000 |
commit | 689858b8da235855a6b0b3409f97b2dd9be1a9df (patch) | |
tree | e65fe8337dda78bb174e87ef49409d8e00d6ebeb /docs | |
parent | 56706db45bbc7be0c087451ca9f1d258324df4b2 (diff) | |
download | external_llvm-689858b8da235855a6b0b3409f97b2dd9be1a9df.zip external_llvm-689858b8da235855a6b0b3409f97b2dd9be1a9df.tar.gz external_llvm-689858b8da235855a6b0b3409f97b2dd9be1a9df.tar.bz2 |
docs: Cleanup trailing whitespace.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170799 91177308-0d34-0410-b5e6-96231b3b80d8
Diffstat (limited to 'docs')
-rw-r--r-- | docs/Vectorizers.rst | 22 |
1 files changed, 11 insertions, 11 deletions
diff --git a/docs/Vectorizers.rst b/docs/Vectorizers.rst index fe6a986..5ec3da4 100644 --- a/docs/Vectorizers.rst +++ b/docs/Vectorizers.rst @@ -70,7 +70,7 @@ pointers are disjointed, but in our example, the Loop Vectorizer has no way of knowing that the pointers A and B are unique. The Loop Vectorizer handles this loop by placing code that checks, at runtime, if the arrays A and B point to disjointed memory locations. If arrays A and B overlap, then the scalar version -of the loop is executed. +of the loop is executed. .. code-block:: c++ @@ -83,11 +83,11 @@ of the loop is executed. Reductions ^^^^^^^^^^ -In this example the ``sum`` variable is used by consecutive iterations of +In this example the ``sum`` variable is used by consecutive iterations of the loop. Normally, this would prevent vectorization, but the vectorizer can detect that 'sum' is a reduction variable. The variable 'sum' becomes a vector of integers, and at the end of the loop the elements of the array are added -together to create the correct result. We support a number of different +together to create the correct result. We support a number of different reduction operations, such as addition, multiplication, XOR, AND and OR. .. code-block:: c++ @@ -95,7 +95,7 @@ reduction operations, such as addition, multiplication, XOR, AND and OR. int foo(int *A, int *B, int n) { unsigned sum = 0; for (int i = 0; i < n; ++i) - sum += A[i] + 5; + sum += A[i] + 5; return sum; } @@ -159,8 +159,8 @@ The Loop Vectorizer can vectorize loops that count backwards. Scatter / Gather ^^^^^^^^^^^^^^^^ -The Loop Vectorizer can vectorize code that becomes scatter/gather -memory accesses. +The Loop Vectorizer can vectorize code that becomes scatter/gather +memory accesses. .. code-block:: c++ @@ -204,13 +204,13 @@ See the table below for a list of these functions. Performance ----------- -This section shows the the execution time of Clang on a simple benchmark: +This section shows the the execution time of Clang on a simple benchmark: `gcc-loops <http://llvm.org/viewvc/llvm-project/test-suite/trunk/SingleSource/UnitTests/Vectorizer/>`_. -This benchmarks is a collection of loops from the GCC autovectorization +This benchmarks is a collection of loops from the GCC autovectorization `page <http://gcc.gnu.org/projects/tree-ssa/vectorization.html>`_ by Dorit Nuzman. The chart below compares GCC-4.7, ICC-13, and Clang-SVN with and without loop vectorization at -O3, tuned for "corei7-avx", running on a Sandybridge iMac. -The Y-axis shows the time in msec. Lower is better. The last column shows the geomean of all the kernels. +The Y-axis shows the time in msec. Lower is better. The last column shows the geomean of all the kernels. .. image:: gcc-loops.png :width: 100% @@ -228,7 +228,7 @@ through clang using the command line flag: .. code-block:: console - $ clang -fslp-vectorize file.c + $ clang -fslp-vectorize file.c Details ------- @@ -237,7 +237,7 @@ The goal of basic-block vectorization (a.k.a. superword-level parallelism) is to combine similar independent instructions within simple control-flow regions into vector instructions. Memory accesses, arithemetic operations, comparison operations and some math functions can all be vectorized using this technique -(subject to the capabilities of the target architecture). +(subject to the capabilities of the target architecture). For example, the following function performs very similar operations on its inputs (a1, b1) and (a2, b2). The basic-block vectorizer may combine these |