And I just tested shuffle & division on the newest Skylake-X processor. Division latency is hidden entirely here, no performance drop at all: only 2.5% slowdown with shuffle and the same slowdown with shuffle and division. Funny thing is that the actual performance with one thread is lower on modern 4 GHz Skylake-X than on 6 years old 2.7 GHz Ivy Bridge notebook processor, even with the original Cryptonight. It proves once again that Cryptonight is memory latency bound and we can add a lot of computations without affecting performance.
Запись редактировалась последний раз: May 31, 2018, 10:28 am