| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GSS/TLAB GC speedup on N4 (ms):
MemAllocTest 2963 -> 2792
BinaryTrees 2205 -> 2113
Also, measured wth -XX:IgnoreMaxFootprint to invoke GC less often
(only when the bump pointer space is filled rather than based on the
target utilization):
MemAllocTest 2707 -> 2590
BinaryTrees 2023 -> 1906
TODO: implement fast paths for array allocations.
Bug: 9986565
Change-Id: I73ff6327b229704f8ae5924ae9b747443c229841
|
|
|
|
|
|
|
| |
RemoveRememberedSet now deletes the remembered set.
Bug: 16532086
Change-Id: I01092931cc20cd0688dd42eed3dde9ad140889b2
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We now create spaces when we need them for collector transitions or
homogeneous compaction by recycling mem maps. Change the bump
pointer space size to be as large as the heap capacity instead of
1/2 heap capacity like it used to be. For GSS, bump pointer spaces
are set to 32MB currently.
Changed GSS to have main space == non moving space since we don't
need to copy from the main space.
Fixes GC stress tests 074, 096.
Fixed test 080 oom throw with -Xmx2m for GC stress test, this was
broken since it was allocating a 4 MB array before starting the
OOM process.
Bug: 14059466
Bug: 16406852
Change-Id: I62877cfa24ec944a6f34ffac30334f454a8002fd
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Disabled adding the main and non moving space to the immune region.
This will enable us to recycle bump pointer spaces for malloc space
-> malloc space compaction as well as collector transitions.
Also added logic for falling back to the non moving space, we may
copy objects there.
Refactored mod union table logic into MarkReachableObjects.
No measurable performance benefit or regression.
Bug: 14059466
Bug: 16291259
Change-Id: If663d9fdbde943b988173b7f6ac844e5f78a0327
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Current semispace copy GC is mainly associated with bump pointer
spaces. Though it squeezes fragmentation most aggressively, an extra
copy is required to re-establish the data in the ROS/DlMalloc space to allow
CMS GCs to happen afterwards. As semispace copy GC is still stop-the-world,
this not only introduces unnecessary overheads but also longer response time.
Response time indicates the time duration between the start of transition
request and the start of transition animation, which may impact the user
experience.
Using semispace copy GC to compact the data in a ROS space to another ROS(or
DlMalloc space to another DlMalloc) space solves this problem. Although it
squeezes less fragmentation, CMS GCs can run immediately after the compaction.
We apply this algorithm in two cases:
1) Right before throwing an OOM if -XX:EnableHSpaceCompactForOOM is passed in
as true.
2) When app is switched to background if the -XX:BackgroundGC option has value
HSpaceCompact.
For case 1), OOMs are significantly delayed in the harmony GC stress test,
with compaction ratio up to 0.87. For case 2), compaction ratio around 0.5 is
observed in both built-in SMS and browser. Similar results have been obtained
on other apps as well.
Change-Id: Iad9eabc6d046659fda3535ae20f21bc31f89ded3
Signed-off-by: Wang, Zuo <zuo.wang@intel.com>
Signed-off-by: Chang, Yang <yang.chang@intel.com>
Signed-off-by: Lei Li <lei.l.li@intel.com>
Signed-off-by: Lin Zang <lin.zang@intel.com>
|
|
|
|
|
|
| |
Useful for command line benchmarks.
Change-Id: Ie525863cd8eff93c64ce76639b1108fbdad91633
|
|
|
|
| |
Change-Id: I390d3622f8d572ec7e34ea6dff9e1e0936e81ac1
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A notion of released vs empty pages helps get a more accurate view of
how much memory was released during heap trimming. Otherwise we get
that the same pages possibly get madvised multiple times without
getting dirtied.
Also enabled heap trimming of rosalloc spaces even when we care about
jank. This is safe to do since the trimming process only acquires
locks for short periods of time.
Dalvik PSS reduces from ~52M to ~50M after boot on N4.
Bug: 9969166
Change-Id: I4012e0a2554f413d18efe1a0371fe18d1edabaa9
|
|
|
|
|
|
|
|
|
| |
Added a class called ScopedHeapFill which changes the bytes allocated
counter to be equal to the growth limit. This causes the next
allocation to do a GC and possibly generate an OOM error. This is
useful for tests which need GC to happen at specific point.
Change-Id: Ibd8f3d5928b58534c5165ba7c296980002aa2c28
|
|
|
|
|
|
|
|
|
|
|
|
| |
The max allowed footprint is only updated after the GC. But we can
still allocate even if bytes_allocated > max_allowed_footprint_.
This means that we used to be able to get a negative value if
bytes_allocated > max_allowed_footprint_.
External bug:
https://code.google.com/p/android/issues/detail?id=72221
Change-Id: I4ef9a534e29211786e82cdcb2582c11ab37a348a
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, each garbage collector had data that was only used
during collection. Since only one collector can be running at any
given time, we can make this data be shared between all collectors.
This reduces memory usage since we don't need to have redundant
information for each GC types. Also reduced how much code is required
to sweep spaces.
Bug: 9969166
Change-Id: I31caf0ee4d572f75e0c66863fe7db12c08ae08e7
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The mark compact collector is a 4 phase collection, doing a normal
full mark_sweep, calculating forwarding addresses of objects in the
from space, updating references of objects in the from space, and
moving the objects in the from space.
Support is diabled by default since it needs to have non movable
classes and field arrays. Performance numbers is around 50% as fast.
The main advantage that this has over semispace is that the worst
case memory usage is 50% since we only need one space isntead of two.
TODO: Make field arrays and classes movable. This causes complication
since Object::VisitReferences relies on these, so if we update the
fields of an object but another future object uses this object to
figure out what fields are reference fields it doesn't work.
Bug: 14059466
Change-Id: I661ed3b71ad4dde124ef80312c95696b4a5665a1
|
|
|
|
|
|
|
|
| |
Based on definitions in:
http://developer.android.com/reference/java/lang/Runtime.html
Bug: 15507122
Change-Id: I02f34682d7ac2d379a07631b5207b6cfb224da6b
|
|
|
|
|
|
|
|
|
| |
This makes more sense since it is what the allocator uses. Also fixed
somewhere where we were not properly passing in whether or not it was
a large object allocation.
Bug: 15327879
Change-Id: Ieab7af5427f5cdc2760390186b67e2c96d4bafa7
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes an issue with heap verification which was caused when
the allocation stack overflowed. This resulted in heap verification
failures since we were storing the newly allocated object in a
handle scope without having it be live either in the live bitmap
or allocation stack. We now push the object in the reserve area
before we do a GC due to allocation stack overflow.
Change-Id: I83b42c4b3250d7eaab1b49e53066e21c8656a740
|
|\ |
|
| |
| |
| |
| | |
Change-Id: I4858d9cbed95e5ca560956b9dabd976cebe68333
|
| |
| |
| |
| |
| |
| |
| |
| | |
Static variables aren't thread safe and could cause the zygote to be
created twice.
Bug: 15133494
Change-Id: I65c8f089bed8de93f895b62b3dcff4c936931860
|
|/
|
|
|
|
|
|
|
|
|
| |
The new root verification prints the root type and owner thread id as
well as the type of the object.
Also a bit of work for planned multi-threaded verification.
Bug: 14289301
Change-Id: Ia73c517dc11ec6dd82f3d945604ee3836b3db536
|
|
|
|
|
|
|
| |
Also remove the Android.libcxx.mk and other bits of stlport compatibility
mechanics.
Change-Id: Icdf7188ba3c79cdf5617672c1cfd0a68ae596a61
|
|
|
|
| |
Change-Id: I7f08cc3052fbed93a56ccf1ab7675ae8bc129da9
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Delete SirtRef and replaced it with Handle. Handles are value types
which wrap around StackReference*.
Renamed StackIndirectReferenceTable to HandleScope.
Added a scoped handle wrapper which wraps around an Object** and
restores it in its destructor.
Renamed Handle::get -> Get.
Bug: 8473721
Change-Id: Idbfebd4f35af629f0f43931b7c5184b334822c7a
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Concurrent reference processing currently works by going into native
code from java.lang.ref.Reference.get(). From there, we have a fast
path if the references aren't being processed which returns the
referent without needing to access any locks. In the slow path we
block until reference processing is complete. It may be possible to
improve the slow path if the referent is blackened.
TODO: Investigate doing the fast path in java code by using racy reads
of a static volatile boolean. This will work as long as there are no
suspend points inbetween the boolean read and referent read.
Bug: 14381653
Change-Id: I1546b55be4691fe4ff4aa6d857b234cce7187d87
|
|
|
|
|
|
|
| |
RecordFree now calls the Heap::RecordFree as well as updates the
garbage collector's internal bytes freed accounting.
Change-Id: I8cb03748b0768e3c8c50ea709572960e6e4ad219
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We now have an invariant where we never allocate finalizable
objects with the Initialized or Resolved entrypoints. This speeds up
allocation by only doing the check in the slow path.
Before:
MemAllocTest: 3625, 3707, 3641
EvaluateAndApplyChanges: 3448, 3421, 3413
After:
MemAllocTest: 3164, 3109, 3135
EvaluateAndApplyChanges: 3272, 3299, 3353
Bug: 14078487
Change-Id: I2b0534af3e7c75ea5e5257cf3647744f7abfb74e
|
|/
|
|
| |
Change-Id: I8fe107d90a84de065c407b8d29fd106267ac440d
|
|\ |
|
| |
| |
| |
| |
| |
| |
| | |
Deleted the set_as_default parameter and added a new function
SetSpaceAsDefault instead.
Change-Id: Ic4c359854d08e64ac0d0df92f0105447adb9df36
|
|/
|
|
|
|
|
|
|
|
|
|
| |
- All oat & art files are now placed under /data/dalvik-cache/<isa>/.
- GetDalvikCacheOrDie now requires a mandatory subdirectory argument,
and is implicitly rooted under /data/.
- Added helper methods to convert InstructionSet enums into strings
and vice versa.
(cherry picked from commit 2974bc3d8a5d161d449dd66826d668d87bdc3cbe)
Change-Id: Ic7986938e6a7091a2af675ebafec768f7b5fb8cd
|
|
|
|
|
|
|
|
|
| |
Make volatility for GetFieldObject a template parameter.
Move some trivial mirror::String routines to a -inl.h.
Bug: 14285442
Change-Id: Ie23b11d4f18cb15a62c3bbb42837a8aaf6b68f92
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Refactored the GarbageCollector to let all of the phases be run by
the collector's RunPhases virtual method. This lets the GC decide
which phases should be concurrent and reduces how much baked in GC
logic resides in GarbageCollector.
Enabled concurrent sweeping in the semi space and non concurrent
mark sweep GCs. Changed the semi-space collector to have a swap semi
spaces boolean which can be changed with a setter.
Fixed tests to pass with GSS collector, there was an error related to
the large object space limit.
Before (EvaluateAndApplyChanges):
GSS paused GC time 7.81s/7.81s, score: 3920
After (EvaluateAndApplyChanges):
GSS paused GC time 6.94s/7.71s, score: 3900
Benchmark score doesn't go up since the GC happens in the allocating
thread. There is a slight reduction in pause times experienced by
other threads (0.8s total).
Added options for pre sweeping GC heap verification and pre sweeping
rosalloc verification.
Bug: 14226004
Bug: 14250892
Bug: 14386356
Change-Id: Ib557d0590c1ed82a639d0f0281ba67cf8cae938c
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Enabling this flag greatly reduces how much time was spent in the GC.
It was not done previously since it was regressing MemAllocTest. With
these RosAlloc changes, the benchmark score no longer regresses after
we enable the flag.
Changed Run::AllocSlot to only have one mode of allocation. The new
mode is finding the first free bit in the bitmap. This was
previously the slow path but is now the fast path. Some optimizations
which enabled this include always having the alloc bitmap bits which
correspond to invalid slots be set to 1. This prevents us from needing
a bound check since we will never end up allocating there.
Changed revoking thread local buffer to point to an invalid run. The
invalid run is just a run which always has all the allocation bits set
to 1. When a thread attempts to do a thread local allocation from here
it will always fail and go slow path. This eliminates the need for a
null check for revoked runs.
Changed zeroing of memory to happen during free, AllocPages should
always return zeroed memory. Added prefetching which happens when we
allocate a run.
Some refactoring to reduce duplicated code.
Ergonomics changes: Changed kStickyGcThroughputAdjustment to 1.0,
this helps reduce GC time.
Measurements (3 samples per benchmark):
Before: MemAllocTest scores: 3463, 3445, 3431
EvaluateAndApplyChanges score | total GC time
Iter 1: 3485, 23.602436s
Iter 2: 3434, 22.499882s
Iter 3: 3483, 23.253274s
After: MemAllocTest scores: 3495, 3417, 3409
EvaluateAndApplyChanges score | total GC time:
Iter 1: 3375, 17.463462s
Iter 2: 3358, 16.185188s
Iter 3: 3367, 15.822312s
Bug: 8788501
Bug: 11790317
Bug: 9986565
Change-Id: Ifd273a054824028dabed27c07c081dde1816f93c
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Speeds up large object marking since large objects no longer required
a lock. Changed the GCs to use the heap bitmap for marking objects
which aren't in the fast path. This eliminates the need for a
MarkLargeObject function.
Maps before (10 GC iterations):
Mean partial time: 180ms
Mean sticky time: 151ms
Maps after:
Mean partial time: 161ms
Mean sticky time: 101ms
Note: the GC durations are long due to recent ergonomic changes and
because the fast bulk free hasn't yet been enabled. Over 50% of the
GC time is spent in RosAllocSpace::FreeList.
Bug: 13571028
Change-Id: Id8f94718aeaa13052672ccbae1e8edf77d653f62
|
|
|
|
|
|
|
|
|
|
| |
GC time in FormulaEvaluationActions.EvaluateAndApplyChanges goes from
26.1s to 23.2s. Benchmark score goes down ~50 in
FormulaEvaluationActions.EvaluateAndApplyChanges, and up ~50 in
GenericCalcActions.MemAllocTest.
Bug: 8788501
Change-Id: I412af1205f8b67e70a12237c990231ea62167bc0
|
|
|
|
|
|
|
|
|
|
|
| |
Required for:
Using space bitmaps instead of std::set in mod union table +
remembered set.
Using a bitmap instead of set for large object marking.
Bug: 13571028
Change-Id: Id024e9563d4ca4278f79607cdb2f81895121b113
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We now use the CMS collector instead of the semispace collector when
the phone is booting. We still perform compaction during the zygote
space creation. This reduces time spent in GC by ~2s during boot
and doesn't affect zygote space size.
Changed the space creation logic to create the temp space when a
background transition occurs.
Added a flag to each space which is true if you are allowed to
move objects that are within this space.
Removed SwapSemiSpaces call from the semi space collector, it is now
the job of the caller to do this with threads suspended. This
simplifies the logic in the zygote compaction / heap transition code
since these do not copy from one semispace to another.
Added Space::Clear to RosAllocSpace and DlMallocSpace. This greatly
simplifies the code used for collector transitions.
Time spent in GC creating zygote space:
Before: 3.4s, After: 1.28s
No change in zygote space size.
Bug: 13878055
Change-Id: I700348ab7d5bf3aa537c0cd70c0fed09aa4b0623
|
|
|
|
|
|
| |
The invalid root dumping now attempts to print the root type.
Change-Id: Ie821296d569f34909ba6e2705f5c347cd2143a3a
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Pass in a pre-fence barrier object that sets in the array length
instead of setting it after returning from AllocObject().
Fix another potential bug due to the wrong default pre-fence barrier
parameter value. Since this appears error-prone, removed the default
parameter value and make it an explicit parameter.
Fix another potential moving GC bug due to a lack of a SirtRef.
Bug: 13097759
Change-Id: I466aa0e50f9e1a5dbf20be5a195edee619c7514e
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| | |
Reduced amount of code in mark sweep / semi space by moving
common logic to garbage_collector.cc. Cleaned up mod union tables
and deleted an unused implementation.
Change-Id: I4bcc6ba41afd96d230cfbaf4d6636f37c52e37ea
|
|\ \
| |/
|/| |
|
| |
| |
| |
| |
| |
| | |
Bug: 12687968
Change-Id: Ic2a3a7b9943ca64e7f60f4d6ed552a316ea4a6f3
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The old sticky ergonomics used partial/full GC when the bytes until
the footprint limit was < min free. This was suboptimal. The new
sticky GC ergonomics do partial/full GC when the throughput
of the current sticky GC iteration is <= mean throughput of the
partial/full GC.
Total GC time on FormulaEvaluationActions.EvaluateAndApplyChanges.
Before: 26.4s
After: 24.8s
No benchmark score change measured.
Bug: 8788501
Change-Id: I90000305e93fd492a8ef5a06ec9620d830eaf90d
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the mark sweep collector, rosalloc thread-local buffers were
revoked during the pause. Now, they are revoked at the thread
checkpoint, as opposed to during the pause, which appears to help
reduce the pause time.
In Ritz MemAllocTest, the average sticky pause time went down ~20%
(925 us -> 724 us).
Bug: 13394464
Bug: 9986565
Change-Id: I104992a11b46d59264c0b9aa2db82b1ccf2826bc
|
|\ |
|
| |
| |
| |
| |
| |
| | |
Bug: 12687968
Change-Id: Ifc9ee86249f7938f51495ea1498cf0f7853a27e8
|
|\ \ |
|
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| | |
RecordFree can get negative bytes allocated when background
compaction foreground transitions occur. This caused a DCHECK to
fail on debug builds. Also did some refactoring in
PreProcessReferences.
Bug: 13568814
Change-Id: I57543f1c78544a94f1d241459698b736dba8cfa8
|
|/
|
|
|
|
| |
Makes it easier to disable valgrind support.
Change-Id: I1bde792f1b76a2dd968fa03c6142e92fcc3670b0
|