vmarkdown
vmarkdown
Why this shape
The public AST follows the DSL direction from your sketch:
-
Documentowns []BlockNode -
BlockNodeis a V sum type -
InlineNodeis a V sum type
One deliberate adjustment was made for production parsing:
ListItemNode.children
[]BlockNode
[]InlineNode
md4c
Layout
-
vmarkdown/ast.v: AST types -
vmarkdown/parser.v: md4c-backed parser and event builder -
vmarkdown/serialize.v: normalized stable IDs, chunk collection, and in-memory incremental ingest -
vmarkdown/render.v: HTML, plain-text, and JSON renderers -
vmarkdown/c/md4c_bridge.c: thin callback adapter -
thirdparty/md4c: vendored upstream parser
Quick Start
import vmarkdown
doc := vmarkdown.parse('# hello\n\nworld')!
println(doc.stable_id())
Run the bundled example with:
v run examples/basic.v
Rendering helpers:
html := vmarkdown.render_html(markdown)!
text := vmarkdown.render_text(markdown)!
json := vmarkdown.render_json(markdown)!
normalized_markdown := vmarkdown.render_markdown(markdown)!
markdown_from_html := vmarkdown.html_to_markdown(html)!
AST pretty printing:
doc := vmarkdown.parse(markdown)!
println(doc.pretty())
Example output:
Document
├─ Heading(level=1) "PollyDB"
├─ Paragraph "A **structured** memory with a [link](https://example.com)."
├─ UnorderedList(start=1)
│ ├─ ListItem(level=1, number=0)
│ │ └─ Paragraph "first item"
│ └─ ListItem(level=1, number=0)
│ └─ Paragraph "second item"
└─ CodeBlock(lang="v") "println("hi")\n"
Stable ID
There are now two encoding paths:
-
stable_id()/ encode()Uses the binary protocol intended for PollyDB-facing storage keys. -
semantic_stable_id()/ semantic_encode()Uses the older normalized semantic byte stream and is kept for comparison/debugging.
The binary protocol follows the type-tagged layout direction from your DSL notes. Current block tags are:
-
HeadingNode: 0x01+ level (u8)+ content_len (varint)+ encoded inline data -
ParagraphNode: 0x02+ content_len (varint)+ encoded inline data -
ListNode: 0x03+ is_ordered (u8)+ item_count (u16)+ start (u16)+ encoded items -
MetaNode: 0x04+ kv_pairs_count (u16)+ encoded key/value pairs -
BlockquoteNode: 0x05+ content_len (varint)+ encoded child blocks -
CodeBlockNode: 0x06+ lang_len (varint)+ lang+ content_len (varint)+ content -
HorizontalRuleNode: 0x07
Notes on stability:
- Plain text is normalized by collapsing repeated whitespace and trimming edges.
- Code text keeps internal spacing but normalizes newlines to
\n. - Structural changes change IDs.
- If the binary protocol changes in the future, previously computed
stable_id()values will also change.
Markdown Render
to_markdown()
render_markdown()
- This is semantic round-trip, not source-exact round-trip.
- Output formatting is normalized.
- Original trivia like exact blank lines, marker style, or emphasis delimiter choice is not preserved.
- The renderer is covered for nested lists, blockquotes, mixed list-item blocks, complex link/image destinations, and code span/code fence delimiter safety.
HTML To Markdown
html_to_markdown()
net.html
- Intended for clean HTML and especially the HTML produced by
render_html() - Supports headings, paragraphs, blockquotes, lists, links, images,
pre/code, strong/em, hr, and br - Unsupported tags are best-effort flattened to their children/text
Incremental ingest is available through the in-memory store:
mut store := vmarkdown.new_memory_store()
result := store.ingest(markdown)!
println(result.root_id)
println(result.added.len)
println(result.reused.len)
If you want PollyDB to own the final write path, you can split ingest into planning and commit:
mut store := vmarkdown.new_memory_store()
plan := vmarkdown.plan_ingest(markdown, store)!
result := vmarkdown.commit_ingest_plan(mut store, plan)!
println(plan.to_add.len)
println(result.root_id)
The ingest plan also exposes a pure semantic diff for top-level blocks:
plan := vmarkdown.plan_ingest(markdown, store)!
for entry in plan.diff {
println('${entry.op} ${entry.path} ${entry.kind} ${entry.id}')
}
summary := plan.diff_summary()
for line in summary.lines {
println(line)
}
Paths are recursive block paths, for example:
blocks[0]
blocks[1].items[0].children[1]
When a nested structure changes, both the changed descendant and any affected ancestor containers can appear in the diff.
Notes
- The parser currently targets the core node types from your DSL sketch.
-
MetaNodeis kept in the AST for your PollyDB layer, but it is not emitted by md4cdirectly. - Raw HTML, tables, and some extended spans are not yet projected into dedicated V nodes.