| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The new timing loggers have lower overhead since they only push into
a vector. The new format has two types, a start timing and a stop
timing. You can thing of these as brackets associated with a
timestamp. It uses these to construct various statistics when needed,
such as: Total time, exclusive time, and nesting depth.
Changed PrettyDuration to have a default of 3 digits after the decimal
point.
Exaple of a GC dump with exclusive / total times and indenting:
I/art (23546): GC iteration timing logger [Exclusive time] [Total time]
I/art (23546): 0ms InitializePhase
I/art (23546): 0.305ms/167.746ms MarkingPhase
I/art (23546): 0ms BindBitmaps
I/art (23546): 0ms FindDefaultSpaceBitmap
I/art (23546): 0ms/1.709ms ProcessCards
I/art (23546): 0.183ms ImageModUnionClearCards
I/art (23546): 0.916ms ZygoteModUnionClearCards
I/art (23546): 0.610ms AllocSpaceClearCards
I/art (23546): 1.373ms AllocSpaceClearCards
I/art (23546): 0.305ms/6.318ms MarkRoots
I/art (23546): 2.106ms MarkRootsCheckpoint
I/art (23546): 0.153ms MarkNonThreadRoots
I/art (23546): 4.287ms MarkConcurrentRoots
I/art (23546): 43.461ms UpdateAndMarkImageModUnionTable
I/art (23546): 0ms/112.712ms RecursiveMark
I/art (23546): 112.712ms ProcessMarkStack
I/art (23546): 0.610ms/2.777ms PreCleanCards
I/art (23546): 0.305ms/0.855ms ProcessCards
I/art (23546): 0.153ms ImageModUnionClearCards
I/art (23546): 0.610ms ZygoteModUnionClearCards
I/art (23546): 0.610ms AllocSpaceClearCards
I/art (23546): 0.549ms AllocSpaceClearCards
I/art (23546): 0.549ms MarkRootsCheckpoint
I/art (23546): 0.610ms MarkNonThreadRoots
I/art (23546): 0ms MarkConcurrentRoots
I/art (23546): 0.610ms ScanGrayImageSpaceObjects
I/art (23546): 0.305ms ScanGrayZygoteSpaceObjects
I/art (23546): 0.305ms ScanGrayAllocSpaceObjects
I/art (23546): 1.129ms ScanGrayAllocSpaceObjects
I/art (23546): 0ms ProcessMarkStack
I/art (23546): 0ms/0.977ms (Paused)PausePhase
I/art (23546): 0.244ms ReMarkRoots
I/art (23546): 0.672ms (Paused)ScanGrayObjects
I/art (23546): 0ms (Paused)ProcessMarkStack
I/art (23546): 0ms/0.610ms SwapStacks
I/art (23546): 0.610ms RevokeAllThreadLocalAllocationStacks
I/art (23546): 0ms PreSweepingGcVerification
I/art (23546): 0ms/10.621ms ReclaimPhase
I/art (23546): 0.610ms/0.702ms ProcessReferences
I/art (23546): 0.214ms/0.641ms EnqueueFinalizerReferences
I/art (23546): 0.427ms ProcessMarkStack
I/art (23546): 0.488ms SweepSystemWeaks
I/art (23546): 0.824ms/9.400ms Sweep
I/art (23546): 0ms SweepMallocSpace
I/art (23546): 0.214ms SweepZygoteSpace
I/art (23546): 0.122ms SweepMallocSpace
I/art (23546): 6.226ms SweepMallocSpace
I/art (23546): 0ms SweepMallocSpace
I/art (23546): 2.144ms SweepLargeObjects
I/art (23546): 0.305ms SwapBitmaps
I/art (23546): 0ms UnBindBitmaps
I/art (23546): 0.275ms FinishPhase
I/art (23546): GC iteration timing logger: end, 178.971ms
Change-Id: Ia55b65609468f212b3cd65cda66b843da42be645
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, each garbage collector had data that was only used
during collection. Since only one collector can be running at any
given time, we can make this data be shared between all collectors.
This reduces memory usage since we don't need to have redundant
information for each GC types. Also reduced how much code is required
to sweep spaces.
Bug: 9969166
Change-Id: I31caf0ee4d572f75e0c66863fe7db12c08ae08e7
|
|
|
|
|
|
|
|
|
|
|
| |
Moved the GC top level systrace logging to be inside of Collector::Run.
This prevents cases where we forgot to call it such as background
compaction. Fixed a unit error regarding total pause time. Fixed
negative bytes saved to use the word "expanded".
Bug: 15702709
Change-Id: Ic2991ecad2daa000d0aee9d559b8bc77d8c160aa
|
|
|
|
|
|
|
| |
RecordFree now calls the Heap::RecordFree as well as updates the
garbage collector's internal bytes freed accounting.
Change-Id: I8cb03748b0768e3c8c50ea709572960e6e4ad219
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Refactored the GarbageCollector to let all of the phases be run by
the collector's RunPhases virtual method. This lets the GC decide
which phases should be concurrent and reduces how much baked in GC
logic resides in GarbageCollector.
Enabled concurrent sweeping in the semi space and non concurrent
mark sweep GCs. Changed the semi-space collector to have a swap semi
spaces boolean which can be changed with a setter.
Fixed tests to pass with GSS collector, there was an error related to
the large object space limit.
Before (EvaluateAndApplyChanges):
GSS paused GC time 7.81s/7.81s, score: 3920
After (EvaluateAndApplyChanges):
GSS paused GC time 6.94s/7.71s, score: 3900
Benchmark score doesn't go up since the GC happens in the allocating
thread. There is a slight reduction in pause times experienced by
other threads (0.8s total).
Added options for pre sweeping GC heap verification and pre sweeping
rosalloc verification.
Bug: 14226004
Bug: 14250892
Bug: 14386356
Change-Id: Ib557d0590c1ed82a639d0f0281ba67cf8cae938c
|
|
|
|
|
|
|
|
| |
People who use DDMS want to see that a GC actually occurs when they
press GC button.
Bug: 14325353
Change-Id: I44e0450c92abf7223d33552ed37f626fe63e1c28
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Speeds up large object marking since large objects no longer required
a lock. Changed the GCs to use the heap bitmap for marking objects
which aren't in the fast path. This eliminates the need for a
MarkLargeObject function.
Maps before (10 GC iterations):
Mean partial time: 180ms
Mean sticky time: 151ms
Maps after:
Mean partial time: 161ms
Mean sticky time: 101ms
Note: the GC durations are long due to recent ergonomic changes and
because the fast bulk free hasn't yet been enabled. Over 50% of the
GC time is spent in RosAllocSpace::FreeList.
Bug: 13571028
Change-Id: Id8f94718aeaa13052672ccbae1e8edf77d653f62
|
|
|
|
|
|
|
|
|
|
|
| |
Required for:
Using space bitmaps instead of std::set in mod union table +
remembered set.
Using a bitmap instead of set for large object marking.
Bug: 13571028
Change-Id: Id024e9563d4ca4278f79607cdb2f81895121b113
|
|
|
|
|
|
|
|
| |
We now reset the GC timings when a SIGQUIT happens, this is useful
for excluding GCs which happen during the initialization of an app
when measuring GC performance.
Change-Id: I68c79bdb279290c12ae588bc7e95ac24908c157e
|
|
|
|
|
|
|
|
|
|
|
| |
We now deflate the monitors when we perform a heap trim. This causes
a pause but it shouldn't matter since we should be in a state where
we don't care about pauses. Memory savings are hard to measure.
Fixed integer overflow bug in GetEstimatedLastIterationThroughput.
Bug: 13733906
Change-Id: I4e0e68add02e7f43370b3a5ea763d6fe8a5b212c
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This enables us to collect objects allocated during the GC for both
sticky, partial, and full GC. This also significantly simplifies GC
code. No measured performance impact on benchmarks, but this should
slightly increase sticky GC throughput.
Changed RevokeRosAllocThreadLocalBuffers to happen at most once per
GC. Previously it occured twice if pre-cleaning was enabled.
Renamed HandleDirtyObjectsPhase to PausePhase and enabled it for
non-concurrent GC. This helps reduce duplicated code which was in
both HandleDirtyObjectsPhase for concurrent GC and ReclaimPhase for
non-concurrent GC.
Change-Id: I533414b5c2cd2800f00724418e0ff90e7fdb0252
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| | |
Reduced amount of code in mark sweep / semi space by moving
common logic to garbage_collector.cc. Cleaned up mod union tables
and deleted an unused implementation.
Change-Id: I4bcc6ba41afd96d230cfbaf4d6636f37c52e37ea
|
|\ \
| |/
|/| |
|
| |
| |
| |
| |
| |
| | |
Bug: 12687968
Change-Id: Ic2a3a7b9943ca64e7f60f4d6ed552a316ea4a6f3
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The old sticky ergonomics used partial/full GC when the bytes until
the footprint limit was < min free. This was suboptimal. The new
sticky GC ergonomics do partial/full GC when the throughput
of the current sticky GC iteration is <= mean throughput of the
partial/full GC.
Total GC time on FormulaEvaluationActions.EvaluateAndApplyChanges.
Before: 26.4s
After: 24.8s
No benchmark score change measured.
Bug: 8788501
Change-Id: I90000305e93fd492a8ef5a06ec9620d830eaf90d
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the mark sweep collector, rosalloc thread-local buffers were
revoked during the pause. Now, they are revoked at the thread
checkpoint, as opposed to during the pause, which appears to help
reduce the pause time.
In Ritz MemAllocTest, the average sticky pause time went down ~20%
(925 us -> 724 us).
Bug: 13394464
Bug: 9986565
Change-Id: I104992a11b46d59264c0b9aa2db82b1ccf2826bc
|
|
|
|
|
|
| |
Bug: 12687968
Change-Id: Ifc9ee86249f7938f51495ea1498cf0f7853a27e8
|
|
|
|
|
|
| |
This is part of the pause and should be accounted for.
Change-Id: I3165324de810e8fab02719098977402a18013da1
|
|
|
|
|
|
|
|
|
|
|
| |
If enabled, RosAlloc verification checks the allocator internal
metadata and invariants to detect bugs, heap corruptions, and race
conditions. Added runtime options for enabling and disabling
it. Enable it for the debug build.
Bug: 9986565
Bug: 12592026
Change-Id: I923742b87805ae839f1549d78d0d492733da6a58
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Moved the SwapBitmaps function to ContinuousMemMapAllocSpace since
the zygote space uses this function during full GC.
Fixed a place where we were casting a ZygoteSpace to a MallocSpace,
somehow this didn't cause any issues in non-debug builds.
Moved the CollectGarbage in PreZygoteFork before the lock to prevent
an occasional lock level violation caused by attempting to enqueue
java lang references with the a lock.
Bug: 12876255
Change-Id: I77439e46d5b26b37724bdcee3a0948410f1b0eb4
|
|
|
|
|
|
|
|
|
|
|
| |
- Turn the compile-time flags for generational mode into a command
line flag.
- In the generational mode, always collect the whole heap, as opposed
to the bump pointer space only, if a collection is an explicit,
native allocation-triggered or last attempt one.
Change-Id: I7a14a707cc47e6e3aa4a3292db62533409f17563
|
|
|
|
|
|
|
| |
Moved basic sweeping logic into large_object_space.cc.
Renamed SpaceSetMap -> ObjectSet.
Change-Id: I938c1f29f69b0682350347da2bd5de021c0e0224
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the process state changes to a state which does not perceives
jank, we copy from the main free-list backed allocation space to
the bump pointer space and enable the semispace allocator.
When we transition back to foreground, we copy back to a free-list
backed space.
Create a seperate non-moving space which only holds non-movable
objects. This enables us to quickly wipe the current alloc space
(DlMalloc / RosAlloc) when we transition to background.
Added multiple alloc space support to the sticky mark sweep GC.
Added a -XX:BackgroundGC option which lets you specify
which GC to use for background apps. Passing in
-XX:BackgroundGC=SS makes the heap compact the heap for apps which
do not perceive jank.
Results:
Simple background foreground test:
0. Reboot phone, unlock.
1. Open browser, click on home.
2. Open calculator, click on home.
3. Open calendar, click on home.
4. Open camera, click on home.
5. Open clock, click on home.
6. adb shell dumpsys meminfo
PSS Normal ART:
Sample 1:
88468 kB: Dalvik
3188 kB: Dalvik Other
Sample 2:
81125 kB: Dalvik
3080 kB: Dalvik Other
PSS Dalvik:
Total PSS by category:
Sample 1:
81033 kB: Dalvik
27787 kB: Dalvik Other
Sample 2:
81901 kB: Dalvik
28869 kB: Dalvik Other
PSS ART + Background Compaction:
Sample 1:
71014 kB: Dalvik
1412 kB: Dalvik Other
Sample 2:
73859 kB: Dalvik
1400 kB: Dalvik Other
Dalvik other reduction can be explained by less deep allocation
stacks / less live bitmaps / less dirty cards.
TODO improvements: Recycle mem-maps which are unused in the current
state. Not hardcode 64 MB capacity of non movable space (avoid
returning linear alloc nightmares). Figure out ways to deal with low
virtual address memory problems.
Bug: 8981901
Change-Id: Ib235d03f45548ffc08a06b8ae57bf5bada49d6f3
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Added a thread local allocator to the heap, each thread has three
pointers which specify the thread local buffer: start, cur, and
end. When the remaining space in the thread local buffer isn't large
enough for the allocation, the allocator allocates a new thread
local buffer using the bump pointer allocator.
The bump pointer space had to be modified to accomodate thread
local buffers. These buffers are called "blocks", where a block
is a buffer which contains a set of adjacent objects. Blocks
aren't necessarily full and may have wasted memory towards the
end. Blocks have an 8 byte header which specifies their size and is
required for traversing bump pointer spaces.
Memory usage is in between full bump pointer and ROSAlloc since
madvised memory limits wasted ram to an average of 1/2 page per
block.
Added a runtime option -XX:UseTLAB which specifies whether or
not to use the thread local allocator. Its a NOP if the garbage
collector is not the semispace collector.
TODO: Smarter block accounting to prevent us reading objects until
we either hit the end of the block or GetClass() == null which
signifies that the block isn't 100% full. This would provide a
slight speedup to BumpPointerSpace::Walk.
Timings: -XX:HeapMinFree=4m -XX:HeapMaxFree=8m -Xmx48m
ritzperf memalloc:
Dalvik -Xgc:concurrent: 11678
Dalvik -Xgc:noconcurrent: 6697
-Xgc:MS: 5978
-Xgc:SS: 4271
-Xgc:CMS: 4150
-Xgc:SS -XX:UseTLAB: 3255
Bug: 9986565
Bug: 12042213
Change-Id: Ib7e1d4b199a8199f3b1de94b0a7b6e1730689cad
|
|
|
|
|
|
|
| |
Printed when you dump the GC performance info.
Bug: 10855285
Change-Id: I3bf7f958305f97c52cb31c03bdd6218c321575b9
|
|
|
|
|
| |
Bug: 9986565
Change-Id: I0eb73b9458752113f519483616536d219d5f798b
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The compacting collector is currently similar to semispace. It works by
copying objects back and forth between two bump pointer spaces. There
are types of objects which are "non-movable" due to current runtime
limitations. These are Classes, Methods, and Fields.
Bump pointer spaces are a new type of continuous alloc space which have
no lock in the allocation code path. When you allocate from these it uses
atomic operations to increase an index. Traversing the objects in the bump
pointer space relies on Object::SizeOf matching the allocated size exactly.
Runtime changes:
JNI::GetArrayElements returns copies objects if you attempt to get the
backing data of a movable array. For GetArrayElementsCritical, we return
direct backing storage for any types of arrays, but temporarily disable
the GC until the critical region is completed.
Added a new runtime call called VisitObjects, this is used in place of
the old pattern which was flushing the allocation stack and walking
the bitmaps.
Changed image writer to be compaction safe and use object monitor word
for forwarding addresses.
Added a bunch of added SIRTs to ClassLinker, MethodLinker, etc..
TODO: Enable switching allocators, compacting on background, etc..
Bug: 8981901
Change-Id: I3c886fd322a6eef2b99388d19a765042ec26ab99
|
|
|
|
|
|
| |
build in master
Change-Id: I56a4dd18ec1c212f9dbb73b14c0c0623b23c87bd
|
|
|
|
|
|
|
|
|
|
| |
There was some confusing systrace messages which made it seem like
pauses were longer than they actually were due to premption occuring
during thread_list->ResumeAll().
Bug: 10612142
Change-Id: I6eeedd1cf85ff38c5b116f15059469db52cbb73b
|
|
|
|
|
|
|
|
|
|
| |
If we dont have concurrent GC enabled, we need to force GC for alloc
when we hit the maximum allowed footprint so that our heap doesn't
keep growing until it hits the growth limit.
Refactored a bit of stuff.
Change-Id: I8eceac4ef01e969fd286ebde3a735a09d0a6dfc1
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Card scanning may now be done in parallel. This speeds up sticky and
reduces pause times for all GC types.
Speedup on my mako (ritz perf):
Average pause time for sticky GC (~250 samples):
Without parallel cards scanning enabled: 2.524904215ms
Parallel card scanning (num_gc_threads_): 1.552123552ms
Throughput (~250 samples):
Sticky GC throughput with parallel card scanning: 69MB/s
Sticky GC throughput without parallel card scanning: 51MB/s
Rewrote the mark stack processing to be LIFO and use a prefetch queue
like the non parallel version.
Cleaned up some of the logcat printing for the activity manager
process state listening.
Added unlikely hints to object scanning since arrays and classes are
scanned much less often than normal objects.
Fixed a bug where the number of GC threads was clamped to 1 due to a
bool instead of a size_t.
Fixed a race condition when we added references to the reference
queues. Sharded the reference queue lock into one lock for each reference
type (weak, soft, phatom, finalizer).
Changed timing splits to be different for processing gray objects with
and without mutators paused since sticky GC does both.
Mask out the class bit when visiting fields as an optimization, this is
valid since classes are held live by the class linker.
Partially completed: Parallel recursive mark + finger.
Bug: 10245302
Bug: 9969166
Bug: 9986532
Bug: 9961698
Change-Id: I142d09718c4609b7c2387cb28f517a6983c73288
|
|
|
|
|
|
|
|
|
|
|
|
| |
Needed to update cpplint to handle const auto.
Fixed a few cpplint errors that were being missed before.
Replaced most of the TODO c++0x with ranged based loops. Loops which
do not have a descriptive container name have a concrete type instead
of auto.
Change-Id: Id7cc0f27030f56057c544e94277300b3f298c9c5
|
|
The runtime, compiler, dex2oat, and oatdump now are in seperate trees
to prevent dependency creep. They can now be individually built
without rebuilding the rest of the art projects. dalvikvm and jdwpspy
were already this way. Builds in the art directory should behave as
before, building everything including tests.
Change-Id: Ic6b1151e5ed0f823c3dd301afd2b13eb2d8feb81
|