Can Multimodal LLMs "See" Images? A Deep Dive with ASCII Art