Three Algorithms for YSH Syntax Highlighting

  • I personally find syntax highlighting an annoying distraction, but I know this is a minority (and unpopular) viewpoint. For me, it actually has negative value, especially if I find myself spending time troubleshooting it (which I have extensively over the years) rather than actually working on the true problem at hand. I can't think of a single case where automatic syntax highlighting helped me solve a hard problem, but I have certainly wasted a lot of time futzing around with it.

    The vast majority of code you are reading is almost definitionally syntactically correct, unless you are in the process of editing it. In that case, syntax highlighting can provide a lightweight proxy for correctness, which I suspect is where much of the enthusiasm comes from. What I personally want is immediate feedback on the actual correctness of the code, and syntax is just a subset.

    That is not to say that highlighting is never useful. I just want it to be manual, like when you search for something with / in vim. Then the highlighted items actually pop and my eyes can go directly to the area I want to focus on. I immediately clear the highlighting as soon as I'm done because otherwise it creates a visual distraction.

    In my estimation, what we actually need more of are smarter, faster compilers that can immediately respond to edit changes and highlight only the problem areas in the code. Typically, this should be exactly where my cursor is. I should ideally be programming in a state where everything above the cursor can be assumed correct, but there might be a problem with the current word, which is helpfully reported to me exactly where my focus already lies.

  • Coarse parsing is really good for the basics in almost all programming languages. But it’s not good at semantic detail, even though editors like Vim try to put some in there. One of the most notable ones is splitting Identifier up by adding Function. These have routinely then been misused and inconsistently applied, with the result that historically a language like JavaScript would look completely different from C; I think there was some tidying up of things a few years ago, but can’t remember—I wrote a deliberately simple colorscheme that discards most of those differences anyway. Sometimes you’ll find Function being used for a function’s name at definition time; sometimes at call time too/instead; sometimes a `function` keyword instead.

    In many languages, it’s simply not possible to match function names in definitions or calls using coarse parsing. C is definitely such a language. A large part of the problem is when you don’t have explicit delimiting syntax. That’s what you need. Oils, by contrast, looks to say `proc demo { … }`, so you can look for the `proc` keyword.

    Vim’s syntax highlighting is unfortunately rather limited, and if you try to stretch what it’s capable of, it can get arbitrarily slow. It’s my own fault, but the Rust syntax files try to be too clever, and on certain patterns of curly braces after a few hundred lines, any editing can have multiple seconds of lag. I wish there were better tools for identifying what’s making it slow. I tried to figure it out once, but gave up.

    I’ve declared coarse parsing rerally good for the basics in almost all programming languages, and that explicit delimiting syntax is necessary. This leads to probably my least favourite limitation in Vim syntax highlighting: you can’t model indent-based mode switching. In Markdown, for example (keep the leading two spaces, they’re fine):

       Text
    
             Code
    
      1.  Text
    
             Vim says code, actually text
    
                     Code
    
    reStructuredText highlighting suffers greatly too, though it honestly can’t be highlighted correctly without a full parser (the appropriate mode inside the indented block can’t be known statically).

    This is a real problem for my own lightweight markup language too, which uses meaningful indentation.

  • (author here) I just noticed this link doesn’t work on my iPad because of the captcha – this is the same content:

    https://github.com/oils-for-unix/oils.vim/blob/main/doc/algo...

  • It's so nice when an editor can do completely accurate syntax-highlighting for a language. I think there is a subconscious disturbing effect when being presented with false-positive and false-negative colouring here and there, as traditional "good-enough" hacky syntax highlighting tends to result in.

  • vim’s syntax engine doesn’t track context. it matches tokens, not structure. in langs like ysh where command and expression modes mix mid-line, this breaks. no memory of nesting, no awareness of why you’re in a mode. one bad match and sync collapses. it’s not about regex power or file size. the engine just isn’t built to follow structure. stop layering hacks. generate semantic tokens outside, let vim just render them.