diff options
author | dalecurtis@chromium.org <dalecurtis@chromium.org@0039d316-1c4b-4281-b951-d872f2087c98> | 2012-08-02 02:59:42 +0000 |
---|---|---|
committer | dalecurtis@chromium.org <dalecurtis@chromium.org@0039d316-1c4b-4281-b951-d872f2087c98> | 2012-08-02 02:59:42 +0000 |
commit | fce61dcbdd3fb31839ba056b204bf903580a163b (patch) | |
tree | 5a8c37cfe0867ea14e0e37be2a7ce381a4ac2e27 /base | |
parent | 3770d6951c0db5ef047e77bcaee1b5bd9864f952 (diff) | |
download | chromium_src-fce61dcbdd3fb31839ba056b204bf903580a163b.zip chromium_src-fce61dcbdd3fb31839ba056b204bf903580a163b.tar.gz chromium_src-fce61dcbdd3fb31839ba056b204bf903580a163b.tar.bz2 |
Add SSE optimizations to SincResampler.
These are not the same optimizations in the WebKit version of
SincResampler. The WebKit version focuses on aligning the input
vector, resulting in at worst two unaligned loads on each kernel
index; or 2 * kKernelSize / 4 unaligned loads per call.
Instead I chose to focus on keeping the kernel vectors aligned
and eating at worst a single unaligned load on the input vector;
or kKernelSize / 4 unaligned loads per call.
Performance results from SincResamplerTest.ConvolveBenchmark:
clang version 3.2 (trunk 159409):
Convolve_C took 2100ms for 50000000 iterations.
Convolve_SSE (aligned) took 677ms for 50000000 iterations.
Convolve_SSE (unaligned) took 717ms for 50000000 iterations.
gcc (Ubuntu 4.4.3-4ubuntu5.1) 4.4.3
Convolve_C took 2183ms for 50000000 iterations.
Convolve_SSE (aligned) took 806ms for 50000000 iterations.
Convolve_SSE (unaligned) took 844ms for 50000000 iterations.
For reference, the original WebKit optimizations:
clang version 3.2 (trunk 159409):
Convolve_C took 2132ms for 50000000 iterations.
Convolve_SSE (aligned) took 1146ms for 50000000 iterations.
Convolve_SSE (unaligned) took 1797ms for 50000000 iterations.
gcc (Ubuntu 4.4.3-4ubuntu5.1) 4.4.3:
Convolve_C took 2209ms for 50000000 iterations.
Convolve_SSE (aligned) took 1450ms for 50000000 iterations.
Convolve_SSE (unaligned) took 4415ms for 50000000 iterations.
In summary, SSE provides an ~2.6x to ~3x speedup on GCC and
clang respectively.
BUG=133637
TEST=media_unittests + SincResampler/* tests.
Review URL: https://chromiumcodereview.appspot.com/10803003
git-svn-id: svn://svn.chromium.org/chrome/trunk/src@149569 0039d316-1c4b-4281-b951-d872f2087c98
Diffstat (limited to 'base')
0 files changed, 0 insertions, 0 deletions