[Source](https://brettterpstra.com/scripting-readability-markdownify-for-clipping-web-pages/ "Permalink to Scripting Readability and Markdownify for clipping web pages") # Scripting Readability and Markdownify for clipping web pages ![read2text header image][1] I wanted to share a handy tool that I realized I use daily but rarely talk about. I call it Read2Text, but it’s really just a Frankenstein script which combines [Python Readability][2] ([license][3]) with [html2text][4] ([license][5]). The combination allows you to grab web pages, process them with a port of [Arc90’s Readability][6] and convert the HTML to Markdown, ready for pasting or piping to a text file. [nvALT][7] has this built in, but it’s been a little crashy lately. I find it more reliable to just do this from the command line. If you install it in your path (both the `read2text` script and the “readability” folder), you can run `read2text http://brettterpstra.com/keybinding-madness/ | pbcopy`. You’ll get a Markdown-ified version of the page, with links, image links, headers, code blocks and text intact, but no comments, sidebars, ads, etc. It’s not perfect, but it does a solid job and cleanup only takes me a minute, even on huge sites. I use this most of the time instead of clipping to Evernote these days. I alias it in my .bash_profile to `rtt`, and often redirect the output straight to a text file in my nvALT folder: `rtt http://grml.org/zsh/zsh-lovers.html > ~/Dropbox/Notes/nvALT2.1/zsh lovers.md` Now I have a new note that automatically shows up in nvALT with the text of the zsh-lovers page (yeah, I tried switching to zsh this morning. I’ll have to come back to that). Anyway, I thought others might find this hack of use, so I’m making the download available below. #### Gather CLI v2.1.6 [Download Gather CLI v2.1.6][8] A Frankenstinian combination of html2text and Arc90 Readability. This command line tool makes clipping web pages into Markdown text without ads and comments simple. Published 01/04/12. Updated 09/18/23. Changelog [Donate][9] • [More info…][10] By the way, I also have [a web service][11] for this. You can get [raw markdown][12] or a [nice interface][13] for previewing and copying. There’s also an API and bookmarklets for integration into your favorite browser. Have fun! [1]: /uploads/2012/01/read2textheader.jpg "read2text header" [2]: https://github.com/gfxmonk/python-readability/blob/master/README [3]: http://www.apache.org/licenses/LICENSE-2.0 [4]: http://www.aaronsw.com/2002/html2text/ [5]: https://github.com/aaronsw/html2text/blob/master/COPYING [6]: http://lab.arc90.com/2009/03/02/readability/ [7]: http://brettterpstra.com/project/nvalt/ [8]: https://cdn3.brettterpstra.com/downloads/gather-cli-2.1.6.pkg [9]: https://brettterpstra.com/donate/ [10]: http://brettterpstra.com/projects/gather-cli "More information on Gather CLI" [11]: http://markdownrules.com/ [12]: http://fuckyeahmarkdown.com/go/?u=http%3A%2F%2Fbrettterpstra.com%2Fscripting-readability-markdownify-for-clipping-web-pages%2F&read=1 [13]: http://fuckyeahmarkdown.com/go/?read=1&showframe=1&u=http%3A%2F%2Fbrettterpstra.com%2Fscripting-readability-markdownify-for-clipping-web-pages%2F