Writing a minimal Lua implementation with a virtual machine from scratch in Rust

  • Being that tokens are the leaves of the AST, there are a lot of them and they can take a lot of space. To save memory it is a good idea to store only a file location instead of a full token. Whenever token information is needed, just lex again to get the full token, starting at the file location. This works only for languages with a context-free lexical syntax, of course (and not entirely sure "context-free" is the right term here but you get what I mean).

    Storing row/column in file location data is wasteful - just a file offset should be enough. Whenever the row/column coordinates are needed (normally only in user messages) they can be quickly recomputed.

    In effect, parsed tokens can be stored as just an offset - a 4 or 8 byte integer.

  • The article looks great and I’m looking forward to reading it; this comment is not a criticism of the article.

    This API is the only bad thing about Rust!

      .expect("Could not read file")
    
    It’s so unfortunate to have an API that reads

      .expect("thing we don’t expect")
    
    I think we should all just forget it’s there and use

      .unwrap_or_else(|| panic!(“thing we don’t expect”))

  • Working on tokenization and parsing there have been two "lights clicking on" moments that I think every dev working on a PL implementation should have :

    - Tokens are the leaves of your syntax trees

    - File locations are relative, not absolute

    It's easier to build a parser that doesn't buy into these things, but it's way harder to build tooling and good error messaging if you don't.

  • Hey folks just saw this, author here. Happy to answer questions!

  • There's also Luster[1].

    [1] https://github.com/kyren/luster

  • Does Rust have computed goto, which really helps interpreter speed?

    It basically means you can do something like "goto opcode_table[*(++ip)];"

    GCC offers it as a non-standard extension to C.

      https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html
    
    FORTRAN has had it since 1957. But Pascal and C purged "evil computed GOTO" and only offered non-computed goto. Then Java etc. purged non-computed goto.

  • Thanks for sharing! A great learning

  • undefined

  • undefined