With asm cse enabled as in 2.5, I think it should be also doable to use ebp as general purpose register if the stack frame is omitted, this should squeeze out another few percents.