assembly - ARMv8 NEON vector permute -
consider memory laid out such 8 consecutive 4-byte blocks read [abcdefgh]. load these 2 registers v0.4s , v1.4s, v0.4s = [abcd] , v1.4s = [efgh], each character represents 32-bit block. want reorder them obtain [abef] , [cdgh] in 2 (possibly different) registers.
my approach @ moment first reverse 64-bit halves of [efgh] [ghef]. can use extract [abef] , [ghcd]. can again reverse 64-bit halves of [ghcd] [cdgh].
can tell better approach?
Comments
Post a Comment