[Source](https://brettterpstra.com/scripting-readability-markdownify-for-clipping-web-pages/ "Permalink to Scripting Readability and Markdownify for clipping web pages")
# Scripting Readability and Markdownify for clipping web pages
![read2text header image][1]
I wanted to share a handy tool that I realized I use daily but rarely talk about. I call it Read2Text, but it’s really just a Frankenstein script which combines [Python Readability][2] ([license][3]) with [html2text][4] ([license][5]). The combination allows you to grab web pages, process them with a port of [Arc90’s Readability][6] and convert the HTML to Markdown, ready for pasting or piping to a text file.
[nvALT][7] has this built in, but it’s been a little crashy lately. I find it more reliable to just do this from the command line. If you install it in your path (both the `read2text` script and the “readability” folder), you can run `read2text http://brettterpstra.com/keybinding-madness/ | pbcopy`.
You’ll get a Markdown-ified version of the page, with links, image links, headers, code blocks and text intact, but no comments, sidebars, ads, etc. It’s not perfect, but it does a solid job and cleanup only takes me a minute, even on huge sites. I use this most of the time instead of clipping to Evernote these days.
I alias it in my .bash_profile to `rtt`, and often redirect the output straight to a text file in my nvALT folder: `rtt http://grml.org/zsh/zsh-lovers.html > ~/Dropbox/Notes/nvALT2.1/zsh lovers.md`
Now I have a new note that automatically shows up in nvALT with the text of the zsh-lovers page (yeah, I tried switching to zsh this morning. I’ll have to come back to that). Anyway, I thought others might find this hack of use, so I’m making the download available below.
#### Gather CLI v2.1.6
[Download Gather CLI v2.1.6][8]
A Frankenstinian combination of html2text and Arc90 Readability. This command line tool makes clipping web pages into Markdown text without ads and comments simple.
Published 01/04/12.
Updated 09/18/23. Changelog
[Donate][9] • [More info…][10]
By the way, I also have [a web service][11] for this. You can get [raw markdown][12] or a [nice interface][13] for previewing and copying. There’s also an API and bookmarklets for integration into your favorite browser. Have fun!
[1]: /uploads/2012/01/read2textheader.jpg "read2text header"
[2]: https://github.com/gfxmonk/python-readability/blob/master/README
[3]: http://www.apache.org/licenses/LICENSE-2.0
[4]: http://www.aaronsw.com/2002/html2text/
[5]: https://github.com/aaronsw/html2text/blob/master/COPYING
[6]: http://lab.arc90.com/2009/03/02/readability/
[7]: http://brettterpstra.com/project/nvalt/
[8]: https://cdn3.brettterpstra.com/downloads/gather-cli-2.1.6.pkg
[9]: https://brettterpstra.com/donate/
[10]: http://brettterpstra.com/projects/gather-cli "More information on Gather CLI"
[11]: http://markdownrules.com/
[12]: http://fuckyeahmarkdown.com/go/?u=http%3A%2F%2Fbrettterpstra.com%2Fscripting-readability-markdownify-for-clipping-web-pages%2F&read=1
[13]: http://fuckyeahmarkdown.com/go/?read=1&showframe=1&u=http%3A%2F%2Fbrettterpstra.com%2Fscripting-readability-markdownify-for-clipping-web-pages%2F