The full source code for this example is below, but the most important part is the assembly code and results.
The test uses the Windows high performance timer to see how long it takes to execute a function written in Intel x64 assembly.
The simple loop below sets the rax register to a large number (in this case 2.33 billion, since I am running on a 2.33 GHz processor), and repeatedly decreases it until it hits zero. My console application then printed out the number of milliseconds this took to execute. On my machine, it took roughly 1000 ms to execute, even though it is executing a decrement instruction followed by a jump.
.CODE runAssemblyCode PROC mov rax, 2330 * 1000 * 1000 start: dec rax jnz start ret runAssemblyCode ENDP END
To complicate things even more, here I perform more, independent, operations in the same loop. Instead of 1 decrement, I now do 5 on various different registers. The total time taken is only about 2000 milliseconds.
.CODE runAssemblyCode PROC mov rax, 2330 * 1000 * 1000 start: dec rcx dec rdx dec r9 dec r10 dec rax jnz start ret runAssemblyCode ENDP END
As to the finer details of Intel CPU pipelining, I could not explain how to calculate the expected execution time for these examples, but they do demonstrate quite well that a sequence of carefully crafted instructions can execute faster than the clock speed of your CPU would have you believe.
To run this application yourself, simply create a Visual Studio 2010 (or newer) C++ project and create two files - one .ASM file with the assembly above, and one .CPP file with the C++ code below. You will also need to enable the MASM build customisation by right clicking on the project and clicking Build Customisations.
#include <Windows.h> #include <memory> #include <iostream> using namespace std; extern "C" void runAssemblyCode(); class Timer { public: Timer() { QueryPerformanceFrequency(&_ticksPerSecond); Reset(); } void Reset() { QueryPerformanceCounter(&_startedAt); } long long GetElapsedMilliseconds() { LARGE_INTEGER now; QueryPerformanceCounter(&now); return (now.QuadPart - _startedAt.QuadPart) * 1000 / _ticksPerSecond.QuadPart; } private: LARGE_INTEGER _startedAt; LARGE_INTEGER _ticksPerSecond; }; int wmain(int argc, wchar_t* argv[]) { Timer timer; runAssemblyCode(); auto elapsed = timer.GetElapsedMilliseconds(); cout << elapsed << endl; return 0; }
It is amazing and wonderful to visit your site. I've learn many things from your site.
ReplyDeletePipelining