Handling 1..4(6) bytes is less efficient than handling surrogate *pairs*. =============== But surrogate pairs break array-like fast char access anyway, isn't it ? And there's a lot of room for optimizing utf-8 operation for instance http://bjoern.hoehrmann.de/utf-8/decoder/dfa/. Also a publication at http://www.utf8everywhere.org/.