Update: I admit I'm starting to wonder if this is a core contemporary computer design theory law: Source formats are textual and do not show imagery.
So, I came across some Markdown editors that almost get me there. But I'll probably end up designing my own workflow for limited HTML authoring.
This makes me wonder what a more image-native computer system could be like, where integration of imagery or even video is less of a third-party tooling question...
Plain HTML?
Compact file format. Contents & image links editable as easy as plain text ('cause it is). Images move on the rendered page as desired, but stay in their place in a filesystem - optionally changed independently from the HTML that references them. If looks matter, go crazy & add some CSS.
Sorting such lists among each other, is a few directory- or file renames away.
Win-win.