>>104What's really going on is you're taking great care about endianness, presuming that it is in conflict, taking the time to realign your data and perform two extra copies to avoid screwing it up.
Suppose: the raw stream is in memory, the CPU clock rate is in the low/mid tens of MHz with memory clocked to match, memory is not byte-addressable, the cache is write-through and there are
two three other processors talking to RAM. Demux will still be pretty quick in isolation, by using something like 4-8x as much memory bandwidth.
All that
probably isn't the case, but you're
probably not writing console games for portables. You're
probaby not even writing code, but if you do, please don't let the likes of that narcisist tell you what you can and can not do.