c - understanding asm blocks written for gcc -


what following assembly mean in simple c (this meant compiled gcc):

asm volatile     (     "mov.d %0,%4\n\t"     "l1: bge %2,%3,l2\n\t"     "gslqc1 $f2,$f0,0(%1)\n\t"     "gslqc1 $f6,$f4,0(%5)\n\t"     "madd.d %0,%0,$f6,$f2\n\t"     "madd.d %0,%0,$f4,$f0\n\t"     "add %1,%1,16\n\t"     "add %2,%2,2\n\t"     "add %5,%5,16\n\t"     "j l1\n\t"     "l2: nop\n\t"      :"=f"(sham)     :"r"(foo),"r"(bar),"r"(ro),"f"(sham),"r"(bo)     :"$f0","$f2","$f4","$f6"     ); 

after several hours of searching , reading i've come following assembly code in at&t syntax:

mov.d %xmm0,%xmm1 l1: bge %ebx,%ecx,l2 gslqc1 $f2,$f0,0(%eax) gslqc1 $f6,$f4,0(%esi) madd.d %xmm0,%xmm0,$f6,$f2 madd.d %xmm0,%xmm0,$f4,$f0 add %eax,%eax,16 add %ebx,%ebx,2 add %esi,%esi,16 jmp l1 l2: nop 

i'm in process of finding way run on windows , update when figure out way (after fixing of mistakes i'm sure i've made).

i have little experience x86 assembly, said, vaguely recognize loop, haven't been able find instruction gslqc1 means. or purpose of loop be.

if have questions me, i'll happy answer them. if have insights, love hear them. thank time.

edit:

the function dealing performing singular value decomposition (svd) has matrices.

i'm updating below comments of own, original writer of assembly did not write these 80% confident correct, given research of asm block notation gcc.

    asm volatile        (        "mov.d %0,%4\n\t"        "l1: bge %2,%3,l2\n\t"        "gslqc1 $f2,$f0,0(%1)\n\t"        "gslqc1 $f6,$f4,0(%5)\n\t"        "madd.d %0,%0,$f6,$f2\n\t"        "madd.d %0,%0,$f4,$f0\n\t"        "add %1,%1,16\n\t"        "add %2,%2,2\n\t"        "add %5,%5,16\n\t"        "j l1\n\t"        "l2: nop\n\t"         :"=f"(sham) /*corresponds %0 in above code*/        :"r"(foo) /*corresponds %1*/,"r"(bar) /*%2*/,"r"(ro) /*%3*/,"f"(sham) /*%4*/,"r"(bo) /*%5*/        :"$f0","$f2","$f4","$f6"        ); 

i assumed in x86, wrong. believe above mips64 assembly written processor in loongson family.

thank interest in question. appreciate time. again, if there other questions, happy try best answer them.

p.s. original code can found here, , assembly asking starts on line 189

this isn't answer, doesn't fit in comment either. given omit several critical pieces of information (what processor source instructions for, data types of parameters, general sense of code doing, etc), it's hard come answer.

in general sense, i'd thinking:

float messy(const float *foo, int bar, int ro, const float *bo) {     float sham = 0;      while (bar < ro)     {        __m256 = _mm256_load_ps(foo);        __m256 b = _mm256_load_ps(bar);         __m256 c = _mm256_add_ps(a, a);        __m256 d = _mm256_add_ps(b, b);         foo += 2;        bar += 2;        bo += 2;     }      return sham; } 

that's not going quite right, since (among other things) sham isn't getting set. it's place start. without details of madd.d (which hard without knowing hardware we're talking about), that's close can you.

just emphasize said in comment, original code not appear written (modifying read-only parameters, double jumps, no comments, etc).


Comments

Popular posts from this blog

ubuntu - PHP script to find files of certain extensions in a directory, returns populated array when run in browser, but empty array when run from terminal -

php - How can i create a user dashboard -

javascript - How to detect toggling of the fullscreen-toolbar in jQuery Mobile? -