Esoteric topic only of interest to extreme Markdown nerds

agiletortoise · November 10, 2020, 8:34pm

Drafts currently ships with two Markdown engines:

MultiMarkdown 6. This is Fletcher Penney’s C implementation, which I have only made minor modifications to wrap the C code in suitable cross-platform frameworks. Currently version 6.5.2 ships in the app.
GHMarkdownParser. This is an implementation of GitHub-Flavored extensions to Markdown, built on the discount parser. This is also the same GitHub-Flavored parser used in Marked.

The MultiMarkdown parser is great, standard, and not changing, other than to be updated periodically to newer releases.

These have nothing to do with syntax highlighting, that is separate, just with Markdown output to previews, templates, and via scripting.

The GHMarkdownParser is fine. It’s a good, fast implementation, but it not maintained and has a few quirks relative to GitHub’s own implementation. So I have planned for a while to switch to the C-library actually used and maintained by GitHub, cmark-gfm.

There are a few differences, like GitHub’s version converts [ ] task marks to checkboxes. There are some different options available as well, but they are similar to those in GHMarkdownParser. And there are certainly syntaxical variances, but it seems like a no-brainer to switch to the version GitHub uses, so as to be consistent and predictable with their parsing.

So, long story already not short, the question is whether there is any compelling reason to keep GHMarkdownParser in the app. I would rather just swap it out and not have the app cluttered with deprecated libraries.

It might cause a few people to have to adjust to some minor Markdown syntax difference between the parsers, but I doubt it would even be noticed by most - and the C-Mark implementation is lightning fast and maintained.

Anyone think of any compelling reason not to just dump GHMarkdownParser and replace it with cmark-gfm?

sylumer · November 10, 2020, 8:53pm

To put this in context, here’s GitHub’s blog post on how (and why) they effectively did the same migration.

I don’t remember any firestorm coming from this, so I guess that the transition wasn’t particularly rough, though GitHub do note that there was a bit of mis-compatible content around that they worked around, which wouldn’t be an option for Drafts is in the user’s own iCloud data store, not some centralised AgileTortoise hosted database.

I would say as long as the change is sign-posted for users well in advance of the change to the public release, then this is just a natural progression inline with what GitHub themselves did.

technodad · November 11, 2020, 1:10am

I’d favor the switch. My two use cases are document production using Multimarkdown, which accounts for 80% of my use. The second is the odd documentation page on GitLab, which “extends” the CommonMark specification. Overall, having CommonMark as one of the paths forward makes sense.

martinpacker · November 11, 2020, 8:37am

(I guess I’m “one of the usual suspects” so I consider myself rounded up ) …

… One issue with GitHub Markdown is that the heading IDs for cross referencing works differently from how it works with most other Markdown processors.

One puts dashes between the words and lower cases them.
The other slams the words together.

I don’t remember which way round it is. But is it feasible for the new GFM converter to make this easier?

(I have documentation that falls foul of this for various projects.)

But maybe this is an irrelevance.

In any case I would want Drafts to align itself with what GitHub is doing - as it is valuable to write in Markdown in Drafts and push to GitHub.

technodad · November 11, 2020, 1:23pm

I checked the GitLab Markdown documentation section on Header IDS and Links and it follows the GFM convention of converting spaces in IDs into hyphens.

Digging deeper, GitLab uses the CommonMarker library to render Markdown, which is “GitHub’s fork of the reference parser for CommonMark”. Sounds like this could be a good cross-check on rendering, and gives further evidence that ComonMark/GFM is a good alternate syntax choice.

Are we being esoteric enough now?

martinpacker · November 11, 2020, 1:30pm

Did we talk about [[ ... ]] internal links and refid’s yet?

Andreas_Haberle · November 11, 2020, 1:39pm

Yes please switch!

I think we can catch problems and issues during the Beta tests

Andreas_Haberle · November 11, 2020, 1:40pm

I guess that is not connected to the Parser at all.
This is part of the syntax definition.

Am I right?

martinpacker · November 11, 2020, 2:45pm

It’s related in that - either built in or by action - we’d want stuff linked to using this syntax to translate into valid refids/ids. Remember it’s possible to use the [[...]] syntax within a draft.

technodad · November 11, 2020, 10:35pm

Have to insert my usual rant about standards here, as Multimarkdown uses [header text] for auto linking to in-document Markdown headers, a feature I use pretty much daily.

martinpacker · November 12, 2020, 8:23am

Can I assume that doesn’t work with GFM?

(If it did work with GFM compatibly I could teach md2pptx to use it for inter-slide linking in the PowerPoint it generates. I could still do it if there were two competing link syntaxes but it would be messier.)

technodad · November 12, 2020, 1:16pm

I’ll check, but the spec suggests otherwise. The syntax for links on the same page is:
- This links to [a different section on the same page, using a "#" and the header ID](#header-ids-and-links).

GFM automatically converts Markdown headers to anchors as well, e.g.,

# This header has spaces in it generates a linkable anchor of this-header-has-spaces-in-it which you can then use as the target of an in-page link per above.

I prefer the simplicity of Multimarkdown auto links if you are considering adding slide linking to md2pptx. The fact that the header text is the same as the link target make it trivial to generate the links via scripts. (I do this when producing formal minutes documents, for example - I use BBEdit text factories to generate both the TOC and the section headers from a bulleted list of agenda topics.)

martinpacker · November 12, 2020, 7:04pm

That works for me because - correct me if I’m wrong - I don’t have to mangle the heading text. (I’m not sure I could teach md2pptx to do it reliably (and already my TOC generation in mdpre has issues with this.))

I would just pick up headings and do a second sweep to fix up forward references.

The difficult part - off topic for here - is coaxing python-pptx to give me the links. (I need that for other purposes, such as making glossaries and footnotes work better in md2pptx.)