>>12Note: each ARM opcode could be suffixed with s, telling it to set flags based of the result (carry, negative, zero, etc). That is compared to x86 where all opcodes mess the flags, even if you don't want them to. So there are `add` and `adds` mnemonics, and then there are `addsle` and `addle`, which execute only of flags are set. That allows avoiding a large portion of branches and also makes code more readable. `if` reads like an Python if, just with a FORCED FLAG SUFFIX instead of Python's FIOC.
ARM also discourages the use of big immediate values, compared to the unaligned x86 opcodes. But the existing immediate mechanism is very powerful, since they can be applied to any place in a register. For example you can elegantly load a byte at any place in a register or xor any any byte. Yet when you need a large immediate you just load it from the function's header (using PC+offset). Base ARM allows loading 4 registers at once, so it has some SIMD capabilities built into the core architecture.