Hacker News

Writing a minimal Lua implementation with a virtual machine from scratch in Rust

by finite_jeston 1/16/2022, 1:54:28 AM with 9 comments

by jstimpfleon 1/16/2022, 10:11:22 AM
Being that tokens are the leaves of the AST, there are a lot of them and they can take a lot of space. To save memory it is a good idea to store only a file location instead of a full token. Whenever token information is needed, just lex again to get the full token, starting at the file location. This works only for languages with a context-free lexical syntax, of course (and not entirely sure "context-free" is the right term here but you get what I mean).
Storing row/column in file location data is wasteful - just a file offset should be enough. Whenever the row/column coordinates are needed (normally only in user messages) they can be quickly recomputed.
In effect, parsed tokens can be stored as just an offset - a 4 or 8 byte integer.
by da39a3eeon 1/16/2022, 5:11:39 AM
The article looks great and I’m looking forward to reading it; this comment is not a criticism of the article.
This API is the only bad thing about Rust!
```
  .expect("Could not read file")
```
It’s so unfortunate to have an API that reads
```
  .expect("thing we don’t expect")
```
I think we should all just forget it’s there and use
```
  .unwrap_or_else(|| panic!(“thing we don’t expect”))
```
by dupedon 1/16/2022, 9:51:52 AM
Working on tokenization and parsing there have been two "lights clicking on" moments that I think every dev working on a PL implementation should have :
- Tokens are the leaves of your syntax trees
- File locations are relative, not absolute
It's easier to build a parser that doesn't buy into these things, but it's way harder to build tooling and good error messaging if you don't.
by eatonphilon 1/16/2022, 6:37:23 AM
Hey folks just saw this, author here. Happy to answer questions!
by xvilkaon 1/16/2022, 4:29:11 AM
There's also Luster[1].
[1] https://github.com/kyren/luster
by cgoto89798on 1/16/2022, 5:04:19 AM
Does Rust have computed goto, which really helps interpreter speed?
It basically means you can do something like "goto opcode_table[*(++ip)];"
GCC offers it as a non-standard extension to C.
```
  https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html
```
FORTRAN has had it since 1957. But Pascal and C purged "evil computed GOTO" and only offered non-computed goto. Then Java etc. purged non-computed goto.
by debduton 1/16/2022, 5:17:49 AM
Thanks for sharing! A great learning
by luafgon 1/16/2022, 3:01:34 AM
undefined
by zgson 1/16/2022, 6:04:53 AM
undefined