Ironically one of the reasons markdown (and other text based file formats) were popular because you could use regular find/grep to analyze it, and version control to manage it.
My flow is to go through the Pandoc JSON AST and then use Jq. This works for other input formats, too.
Kind of aligned with this is MarkdownDB, providing an SQLite backend to your Markdown files [0]. Cool to see this, I feel the structure of .md files is not always equally respected or regarded as a data serialisation target.
I think you'd benefit of having some more real-world-ish examples in the README, as someone who doesn't intuit what I'd want to use this for.
Please don’t reimplement JQ. That problem is already solved. Instead, just provide a tool that can convert your target syntax into JSON, then it can be piped to JQ for querying.
Cool thanks for sharing! I'll have to check this out. I've wanted something similar.
After trying a bunch of the usual ones, the only "notes system" I've stuck with is just a directory of markdown files that's automatically committed to git on any change using watchexec.
I've wanted to add a little smarts to it so I could use it to track tasks (eg. sort, prune completed, forward uncomplete tasks over to the next day's journal, collect tasks from "projects", etc.) so I started writing some Rust code using markdown-rs. Then, to round-trip markdown with changes, only the javascript version of the library currently supports serializing github flavored markdown. So then I actually dumped the markdown ast to json from rust and picked it up in js to serialize it for a proof of concept. That's about as far as I got so far. But while markdown-rs saves position information, it doesn't save source token information (like, * and - are both list items) so you can't reliably round-trip.
FWIW, the other thing I was hoping to do was treat markdown documents as trees (based on headings) use an xpath kind of language to pull out sections. Anyway, will check out your code, thanks for posting.
Interesting; one thing you may have learned researching existing tools and libraries: many of them serialize markdown to html before running structured extraction/manipulation - even stuff like converting to pdf.
The core assumption here is that Markdown was/is designed to be serializeable to html - this is why a markdown document/AST is mostly not a tree structure, for tree-ish elements such as sub-sections. Instead, it is flat, an array of elements in order of appearance in the document. Apparently this most closely matches the structure of html, at both the block and inline levels. Only Lists and Blockquotes (afair) support nesting.
Ex: h1 -> paragraph -> h2 -> paragraph is not nested, it is an array of four ordered elements.
Anyway, you might throw a task at Cursor or Copilot to see how an equivalent implementation using html fares against your test suite, you may be able to develop more quickly.
Why not MD -> json, then use jq! That would be half a static site generator there!
Thanks for sharing! No immediate use-case for me right now, but good to know something like this exists.
I wanted to point out little nitpicks for the documented shell invocations:
cat example.md | mdq '# usage'
This can be changed into a stdin file redirect to avoid invoking an extra `cat` process (see Useless use of cat [1]): mdq '# usage' < example.md
In a similar fashion, you can avoid an extra `echo` process here: echo "$ISSUE_TEXT" | mdq -q '- [x] I have searched for existing issues'
by changing to this: mdq -q '- [x] I have searched for existing issues' <<< "$ISSUE_TEXT"
[1]: https://en.wikipedia.org/wiki/Cat_(Unix)#Useless_use_of_catThanks for sharing this Yuval! Thanks as well for using permissive licenses so I can use this at work.
I worked on a project converting word docs to markdown so they could more easily be ingested into an LLM, one issue was that context windows used to be very short, so we would basically split on `\n#` to get sections, but this turns into a whole thing where you have to make guesses about which header level is appropriate to split at, and then you turn each section into a separate chunk in FAISS. Anyways we ended up using HTML instead of MD but theres so much tooling for traversing HTML and not MD. This would have been helpful for that
This is one of those moments where you come across a tool _just_ at the right moment. I have a task for which this will be perfect
I've always wanted a "literate programming" / jupyter-style notebook based on markdown. Maybe this could help make something like that possible.
Thanks! I have to grapple with some markdown across multiple repos and this'll be a helpful tool in the toolchest.
congrats on your tool, will check it out. I have a side question on markdown: cursor messes up markdown generation quite often for me. I think its responses are always in markdown with sections for code and asking it to generate markdown breaks it. So the question: any ideas on how to have cursor generate markdown?
How is it parsing? Just normal string and regex matching or transforming markdown to an intermediate structured language?
What purpose does this serve that grep doesn't?
Love this! One persons opinion - I’d change it to mq - less chars are always better for command
[dead]
[flagged]
> GitHub PRs are Markdown documents, and some organizations have specific templates with checklists for all reviewers to complete. Enforcing these often requires ugly regexes that are a pain to write and worse to debug
This is because GitHub is not building the features we need, instead they are putting their energy towards the AI land grab. Bitbucket, by contrast, has a feature where you can block PRs using a checkbox list outside of the description box. There are better ways to solve this first example from OP readme. Cool project, I write mainly MDX these days, would be cool to see support for that dialect