ERMS = Enhanced Repeat Move String
FSRM = Fast Short Repeat Move String
With the new Zen3 CPUs, Fast Short REP MOV (FSRM) is finally added to AMD’s CPU functions analog to Intel’s X86_FEATURE_FSRM. Intel had already introduced this in 2017 with the Ice Lake Client microarchitecture. But now AMD is obviously using this feature to increase the performance of REP MOVSB for short and very short operations. This improvement applies to Intel for string lengths between 1 and 128 bytes and one can assume that AMD’s implementation will look the same for compatibility reasons. usually you even coordinate your actions.
As early as 2013, Intel decided to make a major revision to REP MOVS and implemented the CPUID ERMSB bit (Enhanced REP MOVSB) to indicate that the CPU could handle byte-sized motion and memory instructions quickly and efficiently. In addition to adding FSRM to the x86 feature code, ERMS is therefore also very interesting, as it allows the bandwidth to be increased considerably, which is a not inconsiderable advantage and logically complements FSRM.
Simplified it can be rewritten so that the MOVS command copies data from one memory area to another. If you now prefix the MOVS command with the prefix REP, this command is repeated as long as you specify it before. However, unlike FSRM, ERMS mainly concerns larger blocks from 256 bits upwards and must always be forward looking. However, ERMS also has a major disadvantage, as it requires a few cycles to be run first (startup latency). Depending on the size, FSRM may be the better alternative for smaller operations.
Of course all this is also operating system dependent and must be implemented in the kernel first. But since Intel has already been using this for years, AMD should be able to break down open doors with Vermeer