Should be the other way around, LLM should check against language spec to see compliance
Hey, I'm Aaron, co-founder of Mito. Funnily enough, doing "diff detection" in spreadsheets is like the first thing we made when building Mito. We built Git for Excel to enable better collaboration around Excel models -- turns out Excel power users would rather play in single player mode. So it's funny to be exploring spreadsheet difference detection again a few years later. This time, thinking about it purely in single player mode to understand the impact of LLM generated code on your data.
Cool work! You and your team may be interested in these two recent CHI papers from Microsoft Research, both on very relevant topics to what you've been doing:
1) “What It Wants Me To Say”: Bridging the Abstraction Gap Between End-User Programmers and Code-Generating Large Language Models (https://arxiv.org/abs/2304.06597) -- they try to tackle a similar problem as what you described above
2) On the Design of AI-powered Code Assistants for Notebooks (https://arxiv.org/abs/2301.11178) - uses Mito as part of their case study