lcms speed

Note for other open source color management system users searching for more transform speed from the LittleCMS library:

Turning off the one-entry cache cuts 40% from runtime – unless you’re transforming large uniform blocks for which a one-entry cache is actually suitable.

Eliminating the general-purpose byte packing and unpacking functions and replacing them with inline encoding-specific equivalents cuts another 15% of runtime.

Compound savings: 49%, or 2x speedup, which is what someone claimed on an lcms mailing list once without providing the code.

Future work: The cached performance could be made better by observing that all the thread-safe memory locking I find in lcms-1.17 is unnecessary if you assume that thread-local caches on the stack are just fine. Forget the locking, and inline the cache comparisons. I had no need to implement it though, so this is only theoretical.

