Beyond multi-core parallelism: faster Mandelbrot with SIMD