What is a parser mismatch vulnerability? (2022)

  • I did research on parser differentials for my bachelor's thesis. My initial hope was that I would find a few mismatches for formats without a formal specification. I found mismatches for _every single_ pair of parsers I looked at and that included formats with formal specifications. My personal takeaway was "If you use one parser for validation and another parser for evaluation, you're fucked. No exceptions."

  • As the article mentions, Postel's Law is likely to create vulnerabilities. It makes individual systems more robust, but the whole becomes fragile.

  • > Well, these browsers "helpfully" fix the URL to change backslashes into regular forward slashes, I suppose because people sometimes type in URLs and get their forward and back slashes confused.

    More likely because Windows has historically used \ rather than the / that's standard in Unixish systems. Windows people are used to typing \, so it's indeed somewhat helpful for the browser to accept either (e.g., in file:// URLs).

  • Odd that the article doesn't use the more standard term "parser differential", with "differential fuzzing" as the fuzzing community's method for finding those.

  • This is a LANGSEC concept. A broader survey can be found at: https://www.computer.org/csdl/proceedings-article/spw/2023/1...

  • I guess if we add all the problems in IT that were caused by bugs and poor designs of parsers/serializations, e.g. SQL injections, XSS, null byte vulns etc., we get billions of human hours in damages.

    What should be instead is an absolutely clear serialization format into a byte string of ANY data structure that must processed by two different programs.

    Parsers are programs, they should "parse" bytes, not strings, like we humans do.

  • If BABLR succeeds in creating a shared instruction set for defining parsers, you'd just have portable parser grammars running on compatible parser VMs

  • Usually? a result of the parser not having a machine-readable specification.

    For parsing proper, `bison --xml` is useful if you're allergic to code-generation. I don't have an equivalent for lexing.

  • Honestly we should have a name for such class of bugs. It's not an "I didn't know" kind of mistake. Every person sufficiently intelligent to program should figure out by themselves that having 2 parser implementations can cause various undesired consequences.

  • Usually, some not verified and cleaned enough external input text managed to get into some complex and often brain damaged text parser (printf,sql,etc).