| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
This patch fixes the implementation of the x86 vectorization opcodes.
Change-Id: I0028d54a9fa6edce791b7e3a053002d076798748
Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
Signed-off-by: Udayan Banerji <udayan.banerji@intel.com>
Signed-off-by: Philbert Lin <philbert.lin@intel.com>
|
|
|
|
|
|
| |
Change-Id: I0df8ece13deadf247a425beac0c08b2be5d773f9
Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Added non-temporal store support as a hint from the ME.
Added the implementation of the memory barrier
extended instruction that supports non-temporal stores
by explicitly serializing all previous store-to-memory instructions.
Change-Id: I8205a92083f9725253d8ce893671a133a0b6849d
Signed-off-by: Jean Christophe Beyler <jean.christophe.beyler@intel.com>
Signed-off-by: Chao-ying Fu <chao-ying.fu@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The semantics of kMirOpNullCheck is to check object for null and
throw exception in that case. However, the implementation for it
is empty. This has been changed and appropriate dataflow have been
added to correctly reflect behavior.
In order to allow testing of implementation, the SpecialMethodInliner
has been updated to get rid of invoke and use this instead. This helps
all optimizations which do not check the MIR_INLINED flag because
when invoke is left in, they believe that invoke will still be done.
Change-Id: I62e425e42bdbc6357246fb949db5f79de73cf358
Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
|
|
|
|
|
|
|
|
| |
Since the branch offset supported by tbz/tbnz is quite small(-32k ~ +32k),
it will be replaced by tst and beq/bneq in the fix-up stage if the branch
offset is too large.
Change-Id: I4cace06bec6425e0f2e1f5f7c471eec08d06bca6
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch refactors the implementation of the LoadStoreElimination
optimisation pass. Please note that this pass was disabled and not
functional for any of the backends.
The current implementation tracks aliases and handles DalvikRegs as well
as Heap memory regions. It has been tested and it is known to optimise
out the following:
* Load - Load
* Store - Load
* Store - Store
* Load Literals
Change-Id: I3aadb12a787164146a95bc314e85fa73ad91e12b
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To reduce the complexity of calling trampolines in generic code,
introduce an enumeration for entrypoints. Introduce a header that lists
the entrypoint enum and exposes a templatized method that translates an
enum value to the corresponding thread offset value.
Call helpers are rewritten to have an enum parameter instead of the
thread offset. Also rewrite LoadHelper and GenConversionCall this way.
It is now LoadHelper's duty to select the right thread offset size.
Introduce InvokeTrampoline virtual method to Mir2Lir. This allows to
further simplify the call helpers, as well as make OpThreadMem specific
to X86 only (removed from Mir2Lir).
Make GenInlinedCharAt virtual, move a copy to X86 backend, and simplify
both copies. Remove LoadBaseIndexedDisp and OpRegMem from Mir2Lir, as they
are now specific to X86 only.
Remove StoreBaseIndexedDisp from Mir2Lir, as it was only ever used in the
X86 backend.
Remove OpTlsCmp from Mir2Lir, as it was only ever used in the X86 backend.
Remove OpLea from Mir2Lir, as it was only ever defined in the X86 backend.
Remove GenImmedCheck from Mir2Lir as it was neither used nor implemented.
Change-Id: If0a6182288c5d57653e3979bf547840a4c47626e
|
|
|
|
|
|
|
|
| |
On extended testing, I'm seeing a CHECK failure at utility_arm.cc:1201.
This reverts commit fcc36ba2a2b8fd10e6eebd21ecb6329606443ded.
Change-Id: Icae3d49cd7c8fcab09f2f989cbcb1d7e5c6d137a
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch refactors the implementation of the LoadStoreElimination
optimisation pass. Please note that this pass was disabled and not
functional for any of the backends.
The current implementation tracks aliases and handles DalvikRegs as well
as Heap memory regions. It has been tested and it is known to optimise
out the following:
* Load - Load
* Store - Load
* Store - Store
* Load Literals
Change-Id: Iefae9b696f87f833ef35c451ed4d49c5a1b6fde0
|
|
|
|
|
|
|
|
|
| |
Make the standard implementation in Mir2Lir and the specialized one
in the x86 backend return a pair when wide = "true". Introduce
WideKind enumeration to improve code readability. Simplify generic
code based on this implementation.
Change-Id: I670d45aa2572eedfdc77ac763e6486c83f8e26b4
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Replaces barriers that enforce ordering of one access type
(e.g. Load) with respect to another (e.g. store) with more general
ones that better reflect both Java requirements and actual hardware
barrier/fence instructions. The old code was inconsistent and
unclear about which barriers implied which others. Sometimes
multiple barriers were generated and then eliminated;
sometimes it was assumed that certain barriers implied others.
The new barriers closely parallel those in C++11, though, for now,
we use something closer to the old naming.
Bug: 14685856
Change-Id: Ie1c80afe3470057fc6f2b693a9831dfe83add831
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support for reserving vector registers for the duration of vector loop.
Add support for 16x16 multiplication, shifts, and add reduce.
Changed the vectorization implementation to be able to use the dataflow
elements for SSA recreation and fixed a few implementation details.
Change-Id: I2f358f05f574fc4ab299d9497517b9906f234b98
Signed-off-by: Jean Christophe Beyler <jean.christophe.beyler@intel.com>
Signed-off-by: Olivier Come <olivier.come@intel.com>
Signed-off-by: Udayan Banerji <udayan.banerji@intel.com>
|
|
|
|
|
|
|
|
|
|
| |
parameter""
This reverts commit de68676b24f61a55adc0b22fe828f036a5925c41.
Fixes an API comment, and differentiates between inserting and appending.
Change-Id: I0e9a21bb1d25766e3cbd802d8b48633ae251a6bf
|
|
|
|
|
|
|
|
| |
This reverts commit 2689fbad6b5ec1ae8f8c8791a80c6fd3cf24144d.
Breaks the build.
Change-Id: I9faad4e9a83b32f5f38b2ef95d6f9a33345efa33
|
|
|
|
|
|
|
|
|
|
| |
Splits out more cases of ref registers being loaded or stored. For
code clarity, adds volatile as a flag parameter instead of a separate
method.
On ARM64, continue cleanup. Add flags to print/fatal on size mismatches.
Change-Id: I30ed88433a6b4ff5399aefffe44c14a5e6f4ca4e
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Basic enabling of hard-float for Arm64. In future CLs we'll
consolidate the various targets - there is a lot of overlap.
Compilation remains turned off in this CL, but I expect
to enable a subset shortly. With compilation fully enabled
(including the EXPERIMENTAL opcodes with the exception of
REM and THROW), we get the following run-test results:
003-omnibus-opcode failures:
Classes.checkCast
Classes.arrayInstance
UnresTest2
Haven't gone deep, but these appear to be related to throw/catch and/or
stacktrace.
For REM, the generated code looks reasonable to me - my guess is that
we've got something wrong on the transition to the runtime. Haven't
looked deeper yet, though.
The bulk of the other failure also appear to be related to transitioning
to the runtime system, or handling try/catch.
run-test status:
Status with optimizations disabled, REM_FLOAT/DOUBLE and THROW disabled:
succeeded tests: 94
failed tests: 22
failed: 003-omnibus-opcodes
failed: 004-annotations
failed: 009-instanceof2
failed: 024-illegal-access
failed: 025-access-controller
failed: 031-class-attributes
failed: 044-proxy
failed: 045-reflect-array
failed: 046-reflect
failed: 058-enum-order
failed: 062-character-encodings
failed: 063-process-manager
failed: 064-field-access
failed: 068-classloader
failed: 071-dexfile
failed: 083-compiler-regressions
failed: 084-class-init
failed: 086-null-super
failed: 087-gc-after-link
failed: 100-reflect2
failed: 107-int-math2
failed: 201-built-in-exception-detail-messages
Change-Id: Ib66209285cad8998d77a14781de300af02a96b15
|
|
|
|
|
|
|
|
| |
Reduce LIR memory usage by holding masks by pointers in the
LIR rather than directly and using pre-defined const masks
for the common cases, allocating very few on the arena.
Change-Id: I0f6d27ef6867acd157184c8c74f9612cebfe6c16
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch shows our efforts on resolving the ART limitations:
- passing "float"/"double" arguments via FPR
- passing "long" arguments via single GPR, not pair
- passing more than 3 agruments via GPR.
Work done:
- Extended SpecialTargetRegister enum with kARG4, kARG5, fARG4..fARG7.
- Created initial LoadArgRegs/GenDalvikX/FlushIns version in X86Mir2Lir.
- Unlimited number of long/double/float arguments support
- Refactored (v2)
Change-Id: I5deadd320b4341d5b2f50ba6fa4a98031abc3902
Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
Signed-off-by: Dmitry Petrochenko <dmitry.petrochenko@intel.com>
Signed-off-by: Chao-ying Fu <chao-ying.fu@intel.com>
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For 32-bit targets, object references are 32 bits wide both in
Dalvik virtual registers and in core physical registers. Because of
this, object references and non-floating point values were both
handled as if they had the same register class (kCoreReg).
However, for 64-bit systems, references are 32 bits in Dalvik vregs, but
64 bits in physical registers. Although the same underlying physical
core registers will still be used for object reference and non-float
values, different register class views will be used to represent them.
For example, an object reference in arm64 might be held in x3 at some
point, while the same underlying physical register, w3, would be used
to hold a 32-bit int.
This CL breaks apart the handling of object reference and non-float values
to allow the proper register class (or register view) to be used. A
new register class, kRefReg, is introduced which will map to a 32-bit
core register on 32-bit targets, and 64-bit core registers on 64-bit
targets. From this point on, object references should be allocated
registers in the kRefReg class rather than kCoreReg.
Change-Id: I6166827daa8a0ea3af326940d56a6a14874f5810
|
|
|
|
|
|
|
|
|
|
|
| |
This patch enables all the DOUBLE and FLOAT opcodes except for REM ones.
It has been tested and passes all Dalvik tests except for:
failed: 018-stack-overflow[pid=1076]
failed: 107-int-math2[pid=1593]
Change-Id: I581f219bde354e3402aa3ad6e24ef15566da5f78
Signed-off-by: Serban Constantinescu <serban.constantinescu@arm.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add in some vector instructions. Implement the ConstVector
instruction, which takes 4 words of data and loads it into
an XMM register.
Initially, only the ConstVector MIR opcode is implemented. Others will
be added after this one goes in.
Change-Id: I5c79bc8b7de9030ef1c213fc8b227debc47f6337
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
- Added a last item in the enumeration in order to be able to notice
when an extended MIR opcode is beyond the last one known to libart, and hence
must be an opcode added by code from a plugin.
- Fixed the naming typo of the enumeration.
Change-Id: I0e021ba54b0e60531338f23ca0ab64755e15229b
Signed-Off-By: Jean Christophe Beyler <jean.christophe.beyler@intel.com>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Significant refactoring of register handling to unify usage across
all targets & 32/64 backends.
Reworked RegStorage encoding to allow expanded use of
x86 xmm registers; removed vector registers as a separate
register type. Reworked RegisterInfo to describe aliased
physical registers. Eliminated quite a bit of target-specific code
and generalized common code.
Use of RegStorage instead of int for registers now propagated down
to the NewLIRx() level. In future CLs, the NewLIRx() routines will
be replaced with versions that are explicit about what kind of
operand they expect (RegStorage, displacement, etc.). The goal
is to eventually use RegStorage all the way to the assembly phase.
TBD: MIPS needs verification.
TBD: Re-enable liveness tracking.
Change-Id: I388c006d5fa9b3ea72db4e37a19ce257f2a15964
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This CL replaces the typical use of LoadWord/StoreWord
utilities (which, in practice, were 32-bit load/store) in
favor of a new set that make the size explicit. We now have:
LoadWordDisp/StoreWordDisp:
32 or 64 depending on target. Load or store the natural
word size. Expect this to be used infrequently - generally
when we know we're dealing with a native pointer or flushed
register not holding a Dalvik value (Dalvik values will flush
to home location sizes based on Dalvik, rather than the target).
Load32Disp/Store32Disp:
Load or store 32 bits, regardless of target.
Load64Disp/Store64Disp:
Load or store 64 bits, regardless of target.
LoadRefDisp:
Load a 32-bit compressed reference, and expand it to the
natural word size in the target register.
StoreRefDisp:
Compress a reference held in a register of the natural word
size and store it as a 32-bit compressed reference.
Change-Id: I50fcbc8684476abd9527777ee7c152c61ba41c6f
|
|/
|
|
|
|
|
| |
This adds back using LIRSlowPath for ArrayIndexOutOfBoundsException.
And fix the host test crash.
Change-Id: Idbb602f4bb2c5ce59233feb480a0ff1b216e4887
|
|
|
|
| |
This reverts commit 9d46314a309aff327f9913789b5f61200c162609.
|
|
|
|
|
|
|
|
| |
Get rid of launchpads for throwing ArrayOutOfBoundsException
and use LIRSlowPath instead.
Bug: 13170824
Change-Id: I0e27f7a261a6a7fb5c0645e6113a957e098f699e
|
|
|
|
|
|
|
|
|
|
| |
Get rid of launchpads for throwing NPE and use LIRSlowPath instead.
Also clean up some code of using LIRSlowPath for checking div
by zero.
Bug: 13170824
Change-Id: I0c20a49c39feff3eb1f147755e557d9bc0ff15bb
|
|
|
|
|
|
| |
This reverts commit f9487c039efb4112616d438593a2ab02792e0304.
Change-Id: Id48a4aae4ecce73db468587967968a3f7618b700
|
|
|
|
|
|
|
|
|
| |
This reverts commit 081f73e888b3c246cf7635db37b7f1105cf1a2ff.
Change-Id: Ibd777f8ce73cf8ed6c4cb81d50bf6437ac28cb61
Conflicts:
compiler/dex/quick/mir_to_lir.h
|
|
|
|
|
|
|
|
|
| |
Get rid of launchpads for throwing div by zero exception and
use LIRSlowPath instead. Add a CallRuntimeHelper that takes no
argument for the runtime function.
Bug: 13170824
Change-Id: I7e0563e736c6f92bd63e3fbdfe3a777ad333e338
|
|
|
|
|
|
| |
This reverts commit 754ddad084ccb610d0cf486f6131bdc69bae5bc6.
Change-Id: Icd979adee1d8d781b40a5e75daf3719444cb72e8
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is an ARM specific optimization to the compiler
that uses trampoline islands to make calls to runtime
helper functions. The intention is to reduce the size
of the generated code (by 2 bytes per call) without
affecting performance.
By default this is on when generating an OAT file. It is
off when compiling to memory.
To switch this off in dex2oat, use the command line option:
--no-helper-trampolines
Enhances disassembler to print the trampoline entry on the
BL instruction like this:
0xb6a850c0: f7ffff9e bl -196 (0xb6a85000) ; pTestSuspend
Bug: 12607709
Change-Id: I9202bdb7cf21252ad807bd48701f1f6ce8e3d0fe
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
MIPS architecture includes internal registers HI and LO.
Similar to condition codes in other architectures, these internal
resouces must be accounted for during instruction scheduling.
Previously, the Quick backend for MIPS dealt with them by defining
rHI and rLO pseudo registers - treating them as actual registers for
def/use masks. This CL changes the handling of these resources to
be in line with how condition codes are used elsewhere - leaving
register definitions to be used for registers.
Change-Id: Idcd77f3107b0c9b081ad05b1aab663fb9f41492d
|
|/
|
|
|
|
| |
This also reduces sizeof(LIR) by 4 bytes (32-bit builds).
Change-Id: I0cb81f9bf098dfc50050d5bc705c171af26464ce
|
|
|
|
|
|
|
|
|
|
| |
X86 provides stronger memory guarantees and thus the memory barriers can be
optimized. This patch ensures that all memory barriers for x86 are treated
as scheduling barriers. And in cases where a barrier is needed (StoreLoad case),
an mfence is used.
Change-Id: I13d02bf3f152083ba9f358052aedb583b0d48640
Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We now subtract the frame size from the stack pointer for methods
which have a frame smaller than a certain size. Also changed code to
use slow paths instead of launchpads.
Delete kStackOverflow launchpad since it is no longer needed.
ARM optimizations:
One less move per stack overflow check (without fault handler for
stack overflows). Use ldr pc instead of ldr r12, b r12.
Code size (boot.oat):
Before: 58405348
After: 57803236
TODO: X86 doesn't have the case for large frames. This could case an
incoming signal to go past the end of the stack (unlikely however).
Change-Id: Ie3a5635cd6fb09de27960e1f8cee45bfae38fb33
|
|
|
|
|
|
|
|
| |
Also, move null check elimination temporaries to the
ScopedArenaAllocator and reuse the same variables in the
class initialization check elimination.
Change-Id: Ic746f95427065506fa6016d4931e4cb8b34937af
|
|
|
|
|
|
|
|
|
|
| |
Load Hoisting optimization can re-order operations with
FPU stack due to no dependency set.
Patch adds resource dependency between these operations.
Change-Id: Iccce98c8f3c565903667c03803884d9de1281ea8
Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
|
|
|
|
|
|
|
| |
Also move MIR's BasicBlock related code from arena_bit_vector.h to
bit_vector_block_iterator.cc.
Change-Id: I85c224b387d31cf57a1ef1f1a36eaadf22f1c85d
|
|
|
|
|
|
|
|
|
|
| |
The ARM implementation of range argument copying was specialized in some cases.
For all other architectures, it would fall back to generating memcpy. This patch
updates the x86 implementation so it does not call memcpy and instead generates
loads and stores, favoring movement of 128-bit chunks.
Change-Id: Ic891e5609a4b0e81a47c29cc5a9b301bd10a1933
Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
|
|
|
|
|
|
|
|
| |
This reverts commit 8ff67e3338952c70ccf3b609559bf8cc0f379cfd.
Fix applied to loc.fp usage.
Change-Id: I1eb3005392544fcf30c595923ed25bcee2dc4859
|
|
|
|
|
|
|
|
| |
The invalid usage of loc.fp must be corrected before this change can be submitted.
This reverts commit 766a5e5940b469ab40e52770862c81cfec1d835b.
Change-Id: I1173a9bf829da89cccd9c2898f5e11164987a22b
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, ART Quick mode assumes that a double FP register is composed
of two single consecutive FP registers. This is true for ARM and MIPS,
but not x86. This means that only half of the 8 XMM registers are
available for use by x86 doubles.
This patch breaks the assumption that a wide FP RegisterLocation must be
a paired set of FP registers. This is done by making some routines in
common code virtual and overriding them in the X86Mir2Lir class. For
these wide fp locations, the high register is set to the same value as
the low register, in order to minimize changes to common code. In a
couple of places, the common code checks for this case.
The changes are also supposed to allow the possibility of using the XMM
registers for vector operations,but that support is still WIP.
Change-Id: Ic6ef24ea764991c6f4d9fb88d483a619f5a468cb
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
X86 supports conditional moves which is useful for reducing branchiness.
This patch adds support to the x86 backend to generate conditional reg
to reg operations. Both encoder and decoder support was added for cmov.
The x86 version of GenMinMax used for generating inlined version Math.min/max
has been updated to make use of the conditional move support.
Change-Id: I92c5428e40aa8ff88bd3071619957ac3130efae7
Signed-off-by: Razvan A Lupusoru <razvan.a.lupusoru@intel.com>
|
|
|
|
| |
Change-Id: I3407b9073776b2b40638491d9316111fa793e4ab
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On X86, kCondUlt and kCondUge are bound to CS and CC,
respectively, while on ARM it's the other way around. The
explicit binding in ConditionCode was wrong and misleading
and could lead to subtle bugs. Therefore, we detach those
constants and clean up usage. The CS and CC conditions are
now effectively unused but we keep them around as they may
eventually be useful.
And some minor cleanup and comments.
Change-Id: Ic5ed81d86b6c7f9392dd8fe9474b3ff718fee595
|