Under the new string model in java > 8 a fairly frequent workflow is:
1) get external string
2) figure out if it is UTF-8, UTF-16, or some other recognizable encoding
3) validate the byte stream
4) figure out if the code points in the incoming string can be represented in Latin-1
5) instantiate a java string using either the Latin-1 encoder or the UTF-16 encoder
I know some or all of these steps are done using hotspot intrinsics, and then the JIT/VM does inlining, folding and so on, but I wonder how fast a custom assembly function to do all these steps at once could be.
Previous blog post on HN:
I wonder about the Joules per byte. AFAIK AVX units are quite expensive energy-wise.
What does linux utilities like sed, awk use for text manipulation because they were very slow when I was changing a few table names in a sql file.
I see a lot of applications trying to take advantage of SIMD, but what when you try to run them on systems that don't support these instructions? My guess is that you need to write multiple files taking advantage of different sets of instructions and then dynamically figure out which to use at runtime with cpuid, but isn't that cumbersome and a way to inflate a codebase dramatically?