Pasting text from Mail seems to strip URLs

I’m making an action to extract news stories from the Elixir Radar digest into tasks for OmniFocus (basically just keeping busy during lockdown). I copy the text from the email and create a new draft as the first step but it fails at the first hurdle. Here’s an example of the text (some editing to get past forum rules on numbers of links for new users):

A Brief Guide to OTP in Elixir

Gints Dreimanis gives an overview of what OTP is, as well as a clear explanation of processes, GenServer, and Supervisor.

As you can see, lovely referral URLs are included. But when you paste this text into Drafts, you only get the title:

A Brief Guide to OTP in Elixir
Gints Dreimanis gives an overview of what OTP is, as well as a clear explanation of processes, GenServer, and Supervisor.

What am I doing wrong? I want the raw content of the clipboard, without any pre-processing. Thanks.

It looks like you are copying rich text, but then pasting it into Drafts which is plain text - so you are just getting the plain text part.

You would have to pre-process to extract the rich elements you want (the links) and turn them into additional plain text content.

Thanks for confirming my fears, @sylumer.

So I guess I’ll have to make a feature request. It seems that handling rich text would be a normal use case, especially since Drafts can handle Markdown and even its support forum supports pasting rich text into Markdown, as you can see from my question. Surely I’m not the first person who has wanted to do this kind of thing?

Anyway, life goes on, I just wanted to avoid hitting AppleScript or Swift or whatever since the main point of this exercise for me was to learn something about Drafts actions. Thanks for your help.

No problem, but just note that you would face this around any editor based on plain text. It isn’t really the sort of feature those sorts of applications would have. Usually you would have pre-processing either at the source app or in-between to share the content in a valid plain text format - such as Markdown.

For example, Safari can share mutil-part content that Drafts is then able to take advantage of and get separate elements all in plain text.

2 Likes

Bit of help for anyone who does want to get links from the clipboard. Here’s a snippet (run it with some rich text on the clipboard):

$ pbpaste -Prefer public.rtf | grep HYPERLINK

The docs for pbpaste say to use “rtf”, but this just returns text. Switch to “public.rtf” and you get something useful to chew over with sed or something less archaic:

{\field{\*\fldinst{HYPERLINK "https://sendy.elixir-radar.com/l/2nPONT9TSV5oJoXU1azM763A/Zs763892RZY7QK8wmEOn31ECRg/T9jm1gfoSoU3TDqgUqnB2A"}}{\fldrslt

Is there any kind of open source library likely to run on iOS that could turn RTF into Markdown/HTML?

(I mention HTML as it might end up having to be a combination of the two - which is still Markdown and still usable in Drafts.)

I’m looking at textutil right now, @martinpacker , for the Mac. Not currently interested in the weird and wonderful world of iOS.

Edited:

This snippet outputs to the clipboard some well-formed HTML which drafts is happy to paste as plain text. Of course by the time I’ve made a script to do this I might as well do all the processing outside of Drafts…

pbpaste -Prefer public.rtf | textutil -cat html -stdin -stdout | pbcopy

1 Like

And finally, here’s an all-encompassing bit of shellery to go from a clipboard with rich text to a clipboard with Markdown:

pbpaste -Prefer public.rtf | textutil -stdin -convert html -stdout | pandoc --from=html --to=markdown | pbcopy

Phew, that was fun. Hope this helps someone else working in the same problem space.

Edit: it’s quite bad Markdown! Working on that now. Publish early and often seems to be my way!

1 Like

Thank you. I wondered in your first reply if you would get to Markdown. It will be interesting to see the quality of the Markdown created.

Obviously some HTML can’t be rendered as Markdown. (In md2pptx I’ve had to adopt things like HTML <span> wrapping to get some effects. I try to minimise that, of course. And in that case I have automation to make the wrapping easier.)

While I get the gist of what you are doing in looking to simplify complex formatting to simpler Markdown formatting, it is worth noting that HTML is valid Markdown, so HTML in its native form technically counters the above statement.

1 Like

Very aware of that. Take an example of a draft with <p>....</p> bracketing in it. You’d be bemused - and really not want to edit it in that form. And I think that sort of construct is rather common.

Ok, this is better (for my input):

pbpaste -Prefer public.rtf | textutil -stdin -convert html -stdout -excludedelements "(head,table,div,ul,p,font,span)" | pandoc --from=html --to=markdown | pbcopy

But there’s an errant opening bracket “[” for each link and some spurious .ul and \ action to handle. I might have reached the end of the easy bit of this since I can’t really exclude any more elements… Going to get some coffee and push on. Here’s the current output:

[[A Brief Guide to OTP in Elixir]{.ul}](https://sendy.elixir-radar.com/l/2nPONT9TSV5oJoXU1azM763A/Zs763892RZY7QK8wmEOn31ECRg/T9jm1gfoSoU3TDqgUqnB2A)\ [[serokell.io]{.ul}](http://serokell.io/)\ Gints Dreimanis gives an overview of what OTP is, as well as a clear explanation of processes, GenServer, and Supervisor.\

And big thanks to @sylumer and @martinpacker for their input/hints.

Got it (change output format to MultiMarkdown):

pbpaste -Prefer public.rtf | textutil -stdin -convert html -stdout -excludedelements "(head,table,div,ul,p,font,span)" | pandoc --from=html --to=markdown_mmd | pbcopy

Gives:

[<u>A Brief Guide to OTP in Elixir</u>](https://sendy.elixir-radar.com/l/2nPONT9TSV5oJoXU1azM763A/Zs763892RZY7QK8wmEOn31ECRg/T9jm1gfoSoU3TDqgUqnB2A) [<u>serokell.io</u>](http://serokell.io/) Gints Dreimanis gives an overview of what OTP is, as well as a clear explanation of processes, GenServer, and Supervisor.

Which is at least adjacent to what I want.

Edit: if you add “,u” to the excluded elements list, you get exactly what I want, without the <u></u> wrapping the title.

I did some conversions for my web site’s migration last year. I found switching away from Pandoc Markdown to a different flavour (I have a feeling I used markdown_mmd) got me around quite a few spurious mis-conversions.

It may be worth checking your mileage with a different to value.

1 Like

Two minds with but one thought, our posts crossed in the night.

1 Like

On the Mac, incorporating output from shell commands is possible using the ShellScript object, so if you have gotten the output you want worked out, you can work that into an action pretty easily.

Related to some of the side discussions in this thread, I have posted an example action for HTML > Markdown conversion using the Turndown library.

As far as the original idea, about getting rich-text in the clipboard in other forms, that is something on my list. I don’t think it would ever be part of the paste behavior, but something that could be done in scripted actions.

2 Likes

The ShellScript object definitely saved me from an afternoon of fighting with my least favourite language, AppleScript, so I salute you, oh Agile Tortoise!

1 Like

Final script to polish the input into clean Markdown:

pbpaste -Prefer public.rtf | textutil -stdin -convert html -stdout -excludedelements "(head,table,div,ul,p,font,span,u)" | pandoc --from=html --to=markdown_mmd --wrap=preserve | pbcopy

The last change was to tell pandoc to preserve line-breaks. Otherwise it seemed to take delight in splitting the link titles. Now the final output looks like:

[A Brief Guide to OTP in Elixir](https://sendy.elixir-radar.com/l/2nPONT9TSV5oJoXU1azM763A/Zs763892RZY7QK8wmEOn31ECRg/T9jm1gfoSoU3TDqgUqnB2A)  

[serokell.io](http://serokell.io/)  

Gints Dreimanis gives an overview of what OTP is, as well as a clear explanation of processes, GenServer, and Supervisor.

The simple Action is up in the directory, for the terminally-curious. It only does the Draft-creation part of my little project. I’ll do the OmniFocus part separately (maybe even just repurpose an existing Action) and then glue the two together so a theoretical other person can use the Markdown for their own nefarious reasons.

1 Like

And the final piece of the puzzle (subject to a bit of light fettling) is in the directory now.

Elixir Radar (Markdown) to OmniFocus.

Ironically OmniFocus ONLY supports RTF pasting or obscure JavaScript to put titled-links into a Note so I’ve effectively thrown away some functionality by doing all this (ie links lose their title when sent to OmniFocus via the x-callback-url mechanism).

Still, it was a valuable learning experience and may have other uses for me and others down the line.