MobileRead Forums - View Single Post

totob · 08-27-2011, 06:23 AM

I spend an inordinate amount of my time reading scholarly articles on my ereader.
PDFs on eink are OK-ish, but not that great (no or bad reflow, formatting intended for printing, sometimes terrible choice of font, etc). I'd need to convert my article collection to epubs.
Converting from PDF to epub is notoriously finicky, so I'd like to convert the html available from the publisher's website, using a recipe to trim down the fat (headers, footers, generally useless crap), convert tables, and use a decent-resolution image for figures. From the GUI, I tried to use the newsreader interface to download and convert an article ; however (unsurprisingly, since it's how it was designed) it only accepts RSS feeds as an input. (or, am I missing something?) I nonetheless managed to download the page in question, but I ended up with raw HTML in my epub. Apparently, not the right approach...
I could use wget to download the page (and maybe the css files?), but I would need to do some additional processing to find which image files I need to download and incorporate - thus negating a big advantage of recipes. Another option might be to use web2disk to download the webpage + recursion to 1 or 2 levels, and then convert from html to epub?

So, to make a long story short: is there a way to use a recipe on a webpage that is not a RSS feed? Or another way to address my problem?

08-27-2011, 06:23 AM	#1
totob Junior Member Posts: 1 Karma: 10 Join Date: Aug 2011 Device: pocket edge	Converting scholarly articles I spend an inordinate amount of my time reading scholarly articles on my ereader. PDFs on eink are OK-ish, but not that great (no or bad reflow, formatting intended for printing, sometimes terrible choice of font, etc). I'd need to convert my article collection to epubs. Converting from PDF to epub is notoriously finicky, so I'd like to convert the html available from the publisher's website, using a recipe to trim down the fat (headers, footers, generally useless crap), convert tables, and use a decent-resolution image for figures. From the GUI, I tried to use the newsreader interface to download and convert an article ; however (unsurprisingly, since it's how it was designed) it only accepts RSS feeds as an input. (or, am I missing something?) I nonetheless managed to download the page in question, but I ended up with raw HTML in my epub. Apparently, not the right approach... I could use wget to download the page (and maybe the css files?), but I would need to do some additional processing to find which image files I need to download and incorporate - thus negating a big advantage of recipes. Another option might be to use web2disk to download the webpage + recursion to 1 or 2 levels, and then convert from html to epub? So, to make a long story short: is there a way to use a recipe on a webpage that is not a RSS feed? Or another way to address my problem?