caching - What is the semantics for Super Queue and Line Fill buffers? -
i asking question regarding haswell microarchitetcure(intel xeon e5-2640-v3 cpu). specifications of cpu , other resources found out there 10 lfbs , size of super queue 16. have 2 questions related lfbs , superqueues:
1) maximum degree of memory level parallelism system can provide, 10 or 16(lfbs or sq)?
2) according sources every l1d miss recorded in sq , sq assigns line fill buffer , @ other sources have written sq , lfbs can work independently. please explain working of sq in brief?
here example figure(not haswell) sq , lfb. references: https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf
for (1) logically maximum parallelism limited least-parallel part of pipeline 10 lfbs, , strictly true demand-load parallelism when prefetching disabled or can't help. in practice, more complicated once load @ least partly helped prefetching, since wider queues between l2 , ram can used make observed parallelism greater 10. practical approach direct measurement: given measured latency ram, , observed throughput, can calculate effective parallelism particular load.
for (2) understanding is other way around: demand misses in l1 first allocate lfb (unless of course hit existing lfb) , may involve "superqueue" later (or whatever called these days) if miss higher in cache hierarchy. diagram included seems confirm that: path l1 through lfb queue.
Comments
Post a Comment