performance - Logarithm in C++ and assembly -

September 15, 2014

apparently msvc++2017 toolset v141 (x64 release configuration) doesn't use fyl2x x86_64 assembly instruction via c/c++ intrinsic, rather c++ log() or log2() usages result in real call long function seems implement approximation of logarithm (without using fyl2x). performance measured strange: log() (natural logarithm) 1.7667 times faster log2() (base 2 logarithm), though base 2 logarithm should easier processor because stores exponent in binary format (and mantissa too), , seems why cpu instruction fyl2x calculates base 2 logarithm (multiplied parameter).

here code used measurements:

#include <chrono> #include <cmath> #include <cstdio>  const int64_t cnlogs = 100 * 1000 * 1000;  void benchmarklog2() {   double sum = 0;   auto start = std::chrono::high_resolution_clock::now();   for(int64_t i=1; i<=cnlogs; i++) {     sum += std::log2(double(i));   }   auto elapsed = std::chrono::high_resolution_clock::now() - start;   double nsec = 1e-6 * std::chrono::duration_cast<std::chrono::microseconds>(elapsed).count();   printf("log2: %.3lf ops/sec calculated %.3lf\n", cnlogs / nsec, sum); }  void benchmarkln() {   double sum = 0;   auto start = std::chrono::high_resolution_clock::now();   (int64_t = 1; <= cnlogs; i++) {     sum += std::log(double(i));   }   auto elapsed = std::chrono::high_resolution_clock::now() - start;   double nsec = 1e-6 * std::chrono::duration_cast<std::chrono::microseconds>(elapsed).count();   printf("ln: %.3lf ops/sec calculated %.3lf\n", cnlogs / nsec, sum); }  int main() {     benchmarklog2();     benchmarkln();     return 0; }

the output ryzen 1800x is:

log2: 95152910.728 ops/sec calculated 2513272986.435 ln: 168109607.464 ops/sec calculated 1742068084.525

so elucidate these phenomena (no usage of fyl2x , strange performance difference), test performance of fyl2x, , if it's faster, use instead of <cmath>'s functions. msvc++ doesn't allow inline assembly on x64, assembly file function uses fyl2x needed.

could answer assembly code such function, uses fyl2x or better instruction doing logarithm (without need specific base) if there on newer x86_64 processors?

here assembly code using fyl2x:

_data segment  _data ends  _text segment  public srlog2muld  ; xmm0l=tolog ; xmm1l=tomul srlog2muld proc   movq qword ptr [rsp+16], xmm1   movq qword ptr [rsp+8], xmm0   fld qword ptr [rsp+16]   fld qword ptr [rsp+8]   fyl2x   fstp qword ptr [rsp+8]   movq xmm0, qword ptr [rsp+8]   ret  srlog2muld endp  _text ends  end

the calling convention according https://docs.microsoft.com/en-us/cpp/build/overview-of-x64-calling-conventions , e.g.

the x87 register stack unused. may used callee, must considered volatile across function calls.

the prototype in c++ is:

extern "c" double __fastcall srlog2muld(const double tolog, const double tomul);

the performance 2 times slower std::log2() , more 3 times slower std::log():

log2: 94803174.389 ops/sec calculated 2513272986.435 fpu log2: 52008300.525 ops/sec calculated 2513272986.435 ln: 169392473.892 ops/sec calculated 1742068084.525

the benchmarking code follows:

void benchmarkfpulog2() {   double sum = 0;   auto start = std::chrono::high_resolution_clock::now();   (int64_t = 1; <= cnlogs; i++) {     sum += srplat::srlog2muld(double(i), 1);   }   auto elapsed = std::chrono::high_resolution_clock::now() - start;   double nsec = 1e-6 * std::chrono::duration_cast<std::chrono::microseconds>(elapsed).count();   printf("fpu log2: %.3lf ops/sec calculated %.3lf\n", cnlogs / nsec, sum); }

Search This Blog

Force Net

performance - Logarithm in C++ and assembly -

Comments

Post a Comment

Popular posts from this blog

ubuntu - PHP script to find files of certain extensions in a directory, returns populated array when run in browser, but empty array when run from terminal -

php - How can i create a user dashboard -

javascript - How to detect toggling of the fullscreen-toolbar in jQuery Mobile? -