diff options
| author | primiano <primiano@chromium.org> | 2016-03-11 03:21:49 -0800 |
|---|---|---|
| committer | Commit bot <commit-bot@chromium.org> | 2016-03-11 11:23:01 +0000 |
| commit | d76421171daa1327f8e1c10ee710ebf911a90d2f (patch) | |
| tree | 199f61018c9d4b540286ede8fbb5488da1cd1977 | |
| parent | 42c563766f6ee364683246a4d363906691772c0e (diff) | |
| download | chromium_src-d76421171daa1327f8e1c10ee710ebf911a90d2f.zip chromium_src-d76421171daa1327f8e1c10ee710ebf911a90d2f.tar.gz chromium_src-d76421171daa1327f8e1c10ee710ebf911a90d2f.tar.bz2 | |
Improve performance of the malloc shim layer
This fixes the perf regression introduced by crrev.com/1781573002
As speculated, crbug.com/593344 (NoBarrier_Load being double-fenced)
causes a visible perf regression, as it causes the addition of two
fences in the malloc fast path.
This CL adds a workaraound that falls-back on a raw volatile ptr read
on Linux+Clang, relying on the fact that on the architectures we care
about a load of an aligned pointer is intrinsically atomic [1,2].
Perf regression:
https://chromeperf.appspot.com/report?sid=1237900c90f9c5e5320a87af8d9e8828fbd1794af07f0a4a335f1fc2a45f120a&start_rev=379480&end_rev=380417
I verified manually that the regression goes away on cc_perftests with
this patch.
[1] Chapter 7 of Part 3A - System Programming Guide
http://download.intel.com/design/processor/manuals/253668.pdf
[2] A3.5.3 Atomicity in the ARM architecture, ARM Architecture Reference Manual, ARMv7-A and ARMv7-R edition
http://liris.cnrs.fr/~mmrissa/lib/exe/fetch.php?media=armv7-a-r-manual.pdf
BUG=550886,593872
TEST=cc_perftestsfast --gtest_filter=*PrepareTiles*
Review URL: https://codereview.chromium.org/1777363002
Cr-Commit-Position: refs/heads/master@{#380600}
| -rw-r--r-- | base/allocator/allocator_shim.cc | 10 |
1 files changed, 9 insertions, 1 deletions
diff --git a/base/allocator/allocator_shim.cc b/base/allocator/allocator_shim.cc index cc5097e..f689035 100644 --- a/base/allocator/allocator_shim.cc +++ b/base/allocator/allocator_shim.cc @@ -72,8 +72,16 @@ bool CallNewHandler() { } inline const allocator::AllocatorDispatch* GetChainHead() { + // TODO(primiano): Just use NoBarrier_Load once crbug.com/593344 is fixed. + // Unfortunately due to that bug NoBarrier_Load() is mistakenly fully + // barriered on Linux+Clang, and that causes visible perf regressons. return reinterpret_cast<const allocator::AllocatorDispatch*>( - subtle::NoBarrier_Load(&g_chain_head)); +#if defined(OS_LINUX) && defined(__clang__) + *static_cast<const volatile subtle::AtomicWord*>(&g_chain_head) +#else + subtle::NoBarrier_Load(&g_chain_head) +#endif + ); } } // namespace |
