summaryrefslogtreecommitdiffstats
path: root/runtime/gc/collector/garbage_collector.cc
Commit message (Collapse)AuthorAgeFilesLines
* More advanced timing loggers.Mathieu Chartier2014-06-241-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The new timing loggers have lower overhead since they only push into a vector. The new format has two types, a start timing and a stop timing. You can thing of these as brackets associated with a timestamp. It uses these to construct various statistics when needed, such as: Total time, exclusive time, and nesting depth. Changed PrettyDuration to have a default of 3 digits after the decimal point. Exaple of a GC dump with exclusive / total times and indenting: I/art (23546): GC iteration timing logger [Exclusive time] [Total time] I/art (23546): 0ms InitializePhase I/art (23546): 0.305ms/167.746ms MarkingPhase I/art (23546): 0ms BindBitmaps I/art (23546): 0ms FindDefaultSpaceBitmap I/art (23546): 0ms/1.709ms ProcessCards I/art (23546): 0.183ms ImageModUnionClearCards I/art (23546): 0.916ms ZygoteModUnionClearCards I/art (23546): 0.610ms AllocSpaceClearCards I/art (23546): 1.373ms AllocSpaceClearCards I/art (23546): 0.305ms/6.318ms MarkRoots I/art (23546): 2.106ms MarkRootsCheckpoint I/art (23546): 0.153ms MarkNonThreadRoots I/art (23546): 4.287ms MarkConcurrentRoots I/art (23546): 43.461ms UpdateAndMarkImageModUnionTable I/art (23546): 0ms/112.712ms RecursiveMark I/art (23546): 112.712ms ProcessMarkStack I/art (23546): 0.610ms/2.777ms PreCleanCards I/art (23546): 0.305ms/0.855ms ProcessCards I/art (23546): 0.153ms ImageModUnionClearCards I/art (23546): 0.610ms ZygoteModUnionClearCards I/art (23546): 0.610ms AllocSpaceClearCards I/art (23546): 0.549ms AllocSpaceClearCards I/art (23546): 0.549ms MarkRootsCheckpoint I/art (23546): 0.610ms MarkNonThreadRoots I/art (23546): 0ms MarkConcurrentRoots I/art (23546): 0.610ms ScanGrayImageSpaceObjects I/art (23546): 0.305ms ScanGrayZygoteSpaceObjects I/art (23546): 0.305ms ScanGrayAllocSpaceObjects I/art (23546): 1.129ms ScanGrayAllocSpaceObjects I/art (23546): 0ms ProcessMarkStack I/art (23546): 0ms/0.977ms (Paused)PausePhase I/art (23546): 0.244ms ReMarkRoots I/art (23546): 0.672ms (Paused)ScanGrayObjects I/art (23546): 0ms (Paused)ProcessMarkStack I/art (23546): 0ms/0.610ms SwapStacks I/art (23546): 0.610ms RevokeAllThreadLocalAllocationStacks I/art (23546): 0ms PreSweepingGcVerification I/art (23546): 0ms/10.621ms ReclaimPhase I/art (23546): 0.610ms/0.702ms ProcessReferences I/art (23546): 0.214ms/0.641ms EnqueueFinalizerReferences I/art (23546): 0.427ms ProcessMarkStack I/art (23546): 0.488ms SweepSystemWeaks I/art (23546): 0.824ms/9.400ms Sweep I/art (23546): 0ms SweepMallocSpace I/art (23546): 0.214ms SweepZygoteSpace I/art (23546): 0.122ms SweepMallocSpace I/art (23546): 6.226ms SweepMallocSpace I/art (23546): 0ms SweepMallocSpace I/art (23546): 2.144ms SweepLargeObjects I/art (23546): 0.305ms SwapBitmaps I/art (23546): 0ms UnBindBitmaps I/art (23546): 0.275ms FinishPhase I/art (23546): GC iteration timing logger: end, 178.971ms Change-Id: Ia55b65609468f212b3cd65cda66b843da42be645
* Shared single GC iteration accounting for all GCs.Mathieu Chartier2014-06-201-40/+50
| | | | | | | | | | | | Previously, each garbage collector had data that was only used during collection. Since only one collector can be running at any given time, we can make this data be shared between all collectors. This reduces memory usage since we don't need to have redundant information for each GC types. Also reduced how much code is required to sweep spaces. Bug: 9969166 Change-Id: I31caf0ee4d572f75e0c66863fe7db12c08ae08e7
* Fix systrace logging, total paused time, and bytes saved message.Mathieu Chartier2014-06-171-0/+2
| | | | | | | | | | | Moved the GC top level systrace logging to be inside of Collector::Run. This prevents cases where we forgot to call it such as background compaction. Fixed a unit error regarding total pause time. Fixed negative bytes saved to use the word "expanded". Bug: 15702709 Change-Id: Ic2991ecad2daa000d0aee9d559b8bc77d8c160aa
* Add RecordFree to the GarbageCollector interfaceMathieu Chartier2014-05-051-0/+12
| | | | | | | RecordFree now calls the Heap::RecordFree as well as updates the garbage collector's internal bytes freed accounting. Change-Id: I8cb03748b0768e3c8c50ea709572960e6e4ad219
* Enable concurrent sweeping for non-concurrent GC.Mathieu Chartier2014-04-291-84/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Refactored the GarbageCollector to let all of the phases be run by the collector's RunPhases virtual method. This lets the GC decide which phases should be concurrent and reduces how much baked in GC logic resides in GarbageCollector. Enabled concurrent sweeping in the semi space and non concurrent mark sweep GCs. Changed the semi-space collector to have a swap semi spaces boolean which can be changed with a setter. Fixed tests to pass with GSS collector, there was an error related to the large object space limit. Before (EvaluateAndApplyChanges): GSS paused GC time 7.81s/7.81s, score: 3920 After (EvaluateAndApplyChanges): GSS paused GC time 6.94s/7.71s, score: 3900 Benchmark score doesn't go up since the GC happens in the allocating thread. There is a slight reduction in pause times experienced by other threads (0.8s total). Added options for pre sweeping GC heap verification and pre sweeping rosalloc verification. Bug: 14226004 Bug: 14250892 Bug: 14386356 Change-Id: Ib557d0590c1ed82a639d0f0281ba67cf8cae938c
* Always log explicit GC.Mathieu Chartier2014-04-281-0/+1
| | | | | | | | People who use DDMS want to see that a GC actually occurs when they press GC button. Bug: 14325353 Change-Id: I44e0450c92abf7223d33552ed37f626fe63e1c28
* Replace ObjectSet with LargeObjectBitmap.Mathieu Chartier2014-04-171-6/+6
| | | | | | | | | | | | | | | | | | | | | | | Speeds up large object marking since large objects no longer required a lock. Changed the GCs to use the heap bitmap for marking objects which aren't in the fast path. This eliminates the need for a MarkLargeObject function. Maps before (10 GC iterations): Mean partial time: 180ms Mean sticky time: 151ms Maps after: Mean partial time: 161ms Mean sticky time: 101ms Note: the GC durations are long due to recent ergonomic changes and because the fast bulk free hasn't yet been enabled. Over 50% of the GC time is spent in RosAllocSpace::FreeList. Bug: 13571028 Change-Id: Id8f94718aeaa13052672ccbae1e8edf77d653f62
* Refactor space bitmap to support different alignments.Mathieu Chartier2014-04-141-2/+2
| | | | | | | | | | | Required for: Using space bitmaps instead of std::set in mod union table + remembered set. Using a bitmap instead of set for large object marking. Bug: 13571028 Change-Id: Id024e9563d4ca4278f79607cdb2f81895121b113
* Reset GC timings after SIGQUIT.Mathieu Chartier2014-04-081-0/+8
| | | | | | | | We now reset the GC timings when a SIGQUIT happens, this is useful for excluding GCs which happen during the initialization of an app when measuring GC performance. Change-Id: I68c79bdb279290c12ae588bc7e95ac24908c157e
* Add monitor deflation.Mathieu Chartier2014-04-071-1/+1
| | | | | | | | | | | We now deflate the monitors when we perform a heap trim. This causes a pause but it shouldn't matter since we should be in a state where we don't care about pauses. Memory savings are hard to measure. Fixed integer overflow bug in GetEstimatedLastIterationThroughput. Bug: 13733906 Change-Id: I4e0e68add02e7f43370b3a5ea763d6fe8a5b212c
* Swap allocation stacks in pause.Mathieu Chartier2014-03-281-3/+4
| | | | | | | | | | | | | | | | | This enables us to collect objects allocated during the GC for both sticky, partial, and full GC. This also significantly simplifies GC code. No measured performance impact on benchmarks, but this should slightly increase sticky GC throughput. Changed RevokeRosAllocThreadLocalBuffers to happen at most once per GC. Previously it occured twice if pre-cleaning was enabled. Renamed HandleDirtyObjectsPhase to PausePhase and enabled it for non-concurrent GC. This helps reduce duplicated code which was in both HandleDirtyObjectsPhase for concurrent GC and ReclaimPhase for non-concurrent GC. Change-Id: I533414b5c2cd2800f00724418e0ff90e7fdb0252
* Merge "Refactor some GC code."Mathieu Chartier2014-03-281-3/+7
|\
| * Refactor some GC code.Mathieu Chartier2014-03-281-3/+7
| | | | | | | | | | | | | | | | Reduced amount of code in mark sweep / semi space by moving common logic to garbage_collector.cc. Cleaned up mod union tables and deleted an unused implementation. Change-Id: I4bcc6ba41afd96d230cfbaf4d6636f37c52e37ea
* | Merge "An empty collector skeleton for a read barrier-based collector."Hiroshi Yamauchi2014-03-281-0/+4
|\ \ | |/ |/|
| * An empty collector skeleton for a read barrier-based collector.Hiroshi Yamauchi2014-03-271-0/+4
| | | | | | | | | | | | Bug: 12687968 Change-Id: Ic2a3a7b9943ca64e7f60f4d6ed552a316ea4a6f3
* | Change sticky GC ergonomics to use GC throughput.Mathieu Chartier2014-03-271-2/+11
|/ | | | | | | | | | | | | | | | | The old sticky ergonomics used partial/full GC when the bytes until the footprint limit was < min free. This was suboptimal. The new sticky GC ergonomics do partial/full GC when the throughput of the current sticky GC iteration is <= mean throughput of the partial/full GC. Total GC time on FormulaEvaluationActions.EvaluateAndApplyChanges. Before: 26.4s After: 24.8s No benchmark score change measured. Bug: 8788501 Change-Id: I90000305e93fd492a8ef5a06ec9620d830eaf90d
* Revoke rosalloc thread-local buffers at the checkpoint.Hiroshi Yamauchi2014-03-211-6/+0
| | | | | | | | | | | | | | In the mark sweep collector, rosalloc thread-local buffers were revoked during the pause. Now, they are revoked at the thread checkpoint, as opposed to during the pause, which appears to help reduce the pause time. In Ritz MemAllocTest, the average sticky pause time went down ~20% (925 us -> 724 us). Bug: 13394464 Bug: 9986565 Change-Id: I104992a11b46d59264c0b9aa2db82b1ccf2826bc
* Refactor the garbage collector driver (GarbageCollector::Run).Hiroshi Yamauchi2014-03-201-44/+55
| | | | | | Bug: 12687968 Change-Id: Ifc9ee86249f7938f51495ea1498cf0f7853a27e8
* Add timing split for RevokeAllThreadLocalBuffers.Mathieu Chartier2014-03-101-3/+9
| | | | | | This is part of the pause and should be accounted for. Change-Id: I3165324de810e8fab02719098977402a18013da1
* RosAlloc verification.Hiroshi Yamauchi2014-02-071-0/+6
| | | | | | | | | | | If enabled, RosAlloc verification checks the allocator internal metadata and invariants to detect bugs, heap corruptions, and race conditions. Added runtime options for enabling and disabling it. Enable it for the debug build. Bug: 9986565 Bug: 12592026 Change-Id: I923742b87805ae839f1549d78d0d492733da6a58
* Move SwapBitmaps to ContinuousMemMapAllocSpace.Mathieu Chartier2014-02-031-2/+3
| | | | | | | | | | | | | | | | Moved the SwapBitmaps function to ContinuousMemMapAllocSpace since the zygote space uses this function during full GC. Fixed a place where we were casting a ZygoteSpace to a MallocSpace, somehow this didn't cause any issues in non-debug builds. Moved the CollectGarbage in PreZygoteFork before the lock to prevent an occasional lock level violation caused by attempting to enqueue java lang references with the a lock. Bug: 12876255 Change-Id: I77439e46d5b26b37724bdcee3a0948410f1b0eb4
* Improve the generational mode.Hiroshi Yamauchi2014-01-141-1/+3
| | | | | | | | | | | - Turn the compile-time flags for generational mode into a command line flag. - In the generational mode, always collect the whole heap, as opposed to the bump pointer space only, if a collection is an explicit, native allocation-triggered or last attempt one. Change-Id: I7a14a707cc47e6e3aa4a3292db62533409f17563
* Refactor large object sweeping.Mathieu Chartier2014-01-101-2/+2
| | | | | | | Moved basic sweeping logic into large_object_space.cc. Renamed SpaceSetMap -> ObjectSet. Change-Id: I938c1f29f69b0682350347da2bd5de021c0e0224
* Background compaction support.Mathieu Chartier2014-01-081-5/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the process state changes to a state which does not perceives jank, we copy from the main free-list backed allocation space to the bump pointer space and enable the semispace allocator. When we transition back to foreground, we copy back to a free-list backed space. Create a seperate non-moving space which only holds non-movable objects. This enables us to quickly wipe the current alloc space (DlMalloc / RosAlloc) when we transition to background. Added multiple alloc space support to the sticky mark sweep GC. Added a -XX:BackgroundGC option which lets you specify which GC to use for background apps. Passing in -XX:BackgroundGC=SS makes the heap compact the heap for apps which do not perceive jank. Results: Simple background foreground test: 0. Reboot phone, unlock. 1. Open browser, click on home. 2. Open calculator, click on home. 3. Open calendar, click on home. 4. Open camera, click on home. 5. Open clock, click on home. 6. adb shell dumpsys meminfo PSS Normal ART: Sample 1: 88468 kB: Dalvik 3188 kB: Dalvik Other Sample 2: 81125 kB: Dalvik 3080 kB: Dalvik Other PSS Dalvik: Total PSS by category: Sample 1: 81033 kB: Dalvik 27787 kB: Dalvik Other Sample 2: 81901 kB: Dalvik 28869 kB: Dalvik Other PSS ART + Background Compaction: Sample 1: 71014 kB: Dalvik 1412 kB: Dalvik Other Sample 2: 73859 kB: Dalvik 1400 kB: Dalvik Other Dalvik other reduction can be explained by less deep allocation stacks / less live bitmaps / less dirty cards. TODO improvements: Recycle mem-maps which are unused in the current state. Not hardcode 64 MB capacity of non movable space (avoid returning linear alloc nightmares). Figure out ways to deal with low virtual address memory problems. Bug: 8981901 Change-Id: Ib235d03f45548ffc08a06b8ae57bf5bada49d6f3
* Thread local bump pointer allocator.Mathieu Chartier2013-12-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Added a thread local allocator to the heap, each thread has three pointers which specify the thread local buffer: start, cur, and end. When the remaining space in the thread local buffer isn't large enough for the allocation, the allocator allocates a new thread local buffer using the bump pointer allocator. The bump pointer space had to be modified to accomodate thread local buffers. These buffers are called "blocks", where a block is a buffer which contains a set of adjacent objects. Blocks aren't necessarily full and may have wasted memory towards the end. Blocks have an 8 byte header which specifies their size and is required for traversing bump pointer spaces. Memory usage is in between full bump pointer and ROSAlloc since madvised memory limits wasted ram to an average of 1/2 page per block. Added a runtime option -XX:UseTLAB which specifies whether or not to use the thread local allocator. Its a NOP if the garbage collector is not the semispace collector. TODO: Smarter block accounting to prevent us reading objects until we either hit the end of the block or GetClass() == null which signifies that the block isn't 100% full. This would provide a slight speedup to BumpPointerSpace::Walk. Timings: -XX:HeapMinFree=4m -XX:HeapMaxFree=8m -Xmx48m ritzperf memalloc: Dalvik -Xgc:concurrent: 11678 Dalvik -Xgc:noconcurrent: 6697 -Xgc:MS: 5978 -Xgc:SS: 4271 -Xgc:CMS: 4150 -Xgc:SS -XX:UseTLAB: 3255 Bug: 9986565 Bug: 12042213 Change-Id: Ib7e1d4b199a8199f3b1de94b0a7b6e1730689cad
* Add histogram for GC pause times.Mathieu Chartier2013-11-211-7/+10
| | | | | | | Printed when you dump the GC performance info. Bug: 10855285 Change-Id: I3bf7f958305f97c52cb31c03bdd6218c321575b9
* A custom 'runs-of-slots' memory allocator.Hiroshi Yamauchi2013-11-161-1/+5
| | | | | Bug: 9986565 Change-Id: I0eb73b9458752113f519483616536d219d5f798b
* Compacting collector.Mathieu Chartier2013-11-111-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The compacting collector is currently similar to semispace. It works by copying objects back and forth between two bump pointer spaces. There are types of objects which are "non-movable" due to current runtime limitations. These are Classes, Methods, and Fields. Bump pointer spaces are a new type of continuous alloc space which have no lock in the allocation code path. When you allocate from these it uses atomic operations to increase an index. Traversing the objects in the bump pointer space relies on Object::SizeOf matching the allocated size exactly. Runtime changes: JNI::GetArrayElements returns copies objects if you attempt to get the backing data of a movable array. For GetArrayElementsCritical, we return direct backing storage for any types of arrays, but temporarily disable the GC until the critical region is completed. Added a new runtime call called VisitObjects, this is used in place of the old pattern which was flushing the allocation stack and walking the bitmaps. Changed image writer to be compaction safe and use object monitor word for forwarding addresses. Added a bunch of added SIRTs to ClassLinker, MethodLinker, etc.. TODO: Enable switching allocators, compacting on background, etc.. Bug: 8981901 Change-Id: I3c886fd322a6eef2b99388d19a765042ec26ab99
* Change thread.h to thread-inl.h to pick up missing Thread::Currnet for debug ↵Brian Carlstrom2013-11-061-1/+1
| | | | | | build in master Change-Id: I56a4dd18ec1c212f9dbb73b14c0c0623b23c87bd
* Add more systrace logging to GC.Mathieu Chartier2013-09-051-2/+6
| | | | | | | | | | There was some confusing systrace messages which made it seem like pauses were longer than they actually were due to premption occuring during thread_list->ResumeAll(). Bug: 10612142 Change-Id: I6eeedd1cf85ff38c5b116f15059469db52cbb73b
* Fix non concurrent GC ergonomics.Mathieu Chartier2013-08-191-1/+1
| | | | | | | | | | If we dont have concurrent GC enabled, we need to force GC for alloc when we hit the maximum allowed footprint so that our heap doesn't keep growing until it hits the growth limit. Refactored a bit of stuff. Change-Id: I8eceac4ef01e969fd286ebde3a735a09d0a6dfc1
* More parallel GC, rewritten parallel mark stack processing.Mathieu Chartier2013-08-161-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Card scanning may now be done in parallel. This speeds up sticky and reduces pause times for all GC types. Speedup on my mako (ritz perf): Average pause time for sticky GC (~250 samples): Without parallel cards scanning enabled: 2.524904215ms Parallel card scanning (num_gc_threads_): 1.552123552ms Throughput (~250 samples): Sticky GC throughput with parallel card scanning: 69MB/s Sticky GC throughput without parallel card scanning: 51MB/s Rewrote the mark stack processing to be LIFO and use a prefetch queue like the non parallel version. Cleaned up some of the logcat printing for the activity manager process state listening. Added unlikely hints to object scanning since arrays and classes are scanned much less often than normal objects. Fixed a bug where the number of GC threads was clamped to 1 due to a bool instead of a size_t. Fixed a race condition when we added references to the reference queues. Sharded the reference queue lock into one lock for each reference type (weak, soft, phatom, finalizer). Changed timing splits to be different for processing gray objects with and without mutators paused since sticky GC does both. Mask out the class bit when visiting fields as an optimization, this is valid since classes are held live by the class linker. Partially completed: Parallel recursive mark + finger. Bug: 10245302 Bug: 9969166 Bug: 9986532 Bug: 9961698 Change-Id: I142d09718c4609b7c2387cb28f517a6983c73288
* Fix up TODO: c++0x, update cpplint.Mathieu Chartier2013-08-161-10/+3
| | | | | | | | | | | | Needed to update cpplint to handle const auto. Fixed a few cpplint errors that were being missed before. Replaced most of the TODO c++0x with ranged based loops. Loops which do not have a descriptive container name have a concrete type instead of auto. Change-Id: Id7cc0f27030f56057c544e94277300b3f298c9c5
* Create separate Android.mk for main build targetsBrian Carlstrom2013-07-121-0/+150
The runtime, compiler, dex2oat, and oatdump now are in seperate trees to prevent dependency creep. They can now be individually built without rebuilding the rest of the art projects. dalvikvm and jdwpspy were already this way. Builds in the art directory should behave as before, building everything including tests. Change-Id: Ic6b1151e5ed0f823c3dd301afd2b13eb2d8feb81