Markdown italics and snake case

Shamelessly taking a slightly lazy way out for this one… my grasp of regular expressions is loose at the best of times…

Markdown italics are defined in the GitHub Markdown syntax as:

"match": "((?<![\\\\\\*_])(\\*|_)(?![\\\\\\*_])([\\S?].*?)(?<![\\\\|\\*])(\\2)(?![\\\\|\\*]))",

How would I refine the above so that the following doesn’t happen?

no_italics_here

i.e. italicisation of every other word in a snake cased phrase.

I’m guessing I just need to ensure that the expression looks for a space preceding the opening underscore and after the closing underscore, but if anyone could save me some processing power by offering an assist, I’d be grateful!

Also, I’m assuming that such a modification wouldn’t break the way Markdown italics syntax is supposed to work?

You know, I would consider that a bug with the default syntaxes. Surprised no one has pointed that out, but if the Markdown parsers don’t output italics for that, the highlighting should not mark it.

Pretty easy fix. The regex pattern already looks to negate characters before and after. I think the easy fix is to the add \w (any word character) in those groups, like:

((?<![\\\*_\w])(\*|_)(?![\\\*_])([\S?].*?)(?<![\\|\*])(\2)(?![\\|\*\w]))

That’s the un-escaped version. If you are editing the JSON directly in a text editor you’ll need to escape the backslashes.

I will update the built-in syntaxes to catch this, however.

1 Like

Although, looking back at the Markdown spec it explicitly calls out that inline emphasis should be supported, so un*frigging*believable should be unfriggingbelievable…which all the Markdown parsers seem to support.

Yet, all of them also do not support un_frigging_believable, it comes out un_frigging_believable, even though the spec says * and _ are interchangeable. :man_shrugging:

Markdown. Am I right?

The parsers also do not treat inline ** bold the same as italics.

2 Likes

Thanks! Gotta love Markdown, eh? :wink:

That’s fun. I guess the consensus is that inline emphasis has to use *…and _ should be ignored. I think I’ll have to break the italic and bold syntax rules into to this properly.

2 Likes

This makes sense to me. Gives me reasons for the different ways of italicising or emboldening text, which I’d never really considered before. Glad I asked the question!

Putting “Markdown” and “spec” next to one another is a crime against humanity.

Gruber’s Markdown and other older parsers do as his syntax documentation says. But lots of programmers who use intra_word_underscores in their code—and are too lazy to put them inside backticks—wanted an exception for that particular case, so parser writers added it. They usually point it out as a difference between their parser and Gruber’s, but it’s sometimes hard to find the reference.

1 Like

Yet another reason I’m glad I don’t support underscores like this in md2pptx. :slight_smile:

(The claim is what you write for md2pptx is supported by other Markdown processors, but that is not bijective.)