![]() |
#1 |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 135
Karma: 13892
Join Date: Mar 2010
Device: Ipad, Kindle 7
|
Quick and easy way to turn a website into a book?
Say I’ve used a tool to download my whole website and I have a bunch of .html files. I then want to turn these files into a pdf book with each page of the website linked so I can read it offline. These files don’t necessarily need to be in any order, though it would be nice if the file structure was the same. What’s the easiest way to do this in Calibre? Or is there another (OSX or online) tool appropriate for the job?
|
![]() |
![]() |
![]() |
#2 |
Evangelist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 401
Karma: 1597305
Join Date: Mar 2010
Device: Ipod G4, MacOS 10.12, Calibre, Pocketbook Touch HD 3
|
I've done this to a lesser degree.
In theory, you could download HTML files from a site (using something like SiteSucker). However, depending on the site, there's usually LOTS going on - ads etc. And a lot (most?) sites these days aren't static HTML but rather generated with a CMS (like this site). Anyway, if the HTML is vanilla enough, you could download the site. Then, I would create a new library* (especially if there's gazillions of pages) and import them into the blank library.Then convert them into epubs. Then, using the ePubMerge, merge them. Once you've got a Merged ebook, you can move it into your "normal" library (if you wish). Having said all that, there is lots that can wrong. I would convert one HTML file and view it and check that it's actually readable. To be honest, I post-process any HTML I import into Calibre: to remove menus, ads, images, formatting, styles etc etc. So, I try not to do it often. That's how I'd do it - there may well be a scripted or easier way but my programming skills are waaaaay out of date. Good luck! * I create a new/temp library because I don't want to miss/overlook a file etc and it's a way of quarantining. I usually delete my temp library after this sort of thing. You mileage may vary. |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 135
Karma: 13892
Join Date: Mar 2010
Device: Ipad, Kindle 7
|
I had used site sucker and was thinking to import the html into bbedit or textwrangler, then strip then html out automatically, then combine and turn into a pdf. Wouldn't keep the formatting and wouldn't be pretty.
Thanks for the pointers with merging with epubmerge, will try that later. |
![]() |
![]() |
![]() |
#4 |
Evangelist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 401
Karma: 1597305
Join Date: Mar 2010
Device: Ipod G4, MacOS 10.12, Calibre, Pocketbook Touch HD 3
|
I'm sure I had an app (retired, so is brain) that removed the fluffy HTML but left headings, bold, italic. I may have imagined it.
However, for removing ALL HTML, I use Clean Text. I'm not sure it does batch jobs though. Clean Text is excellent and does what it says on the tin. I wonder if something like Brackets would clean up the HTML but not remove it completely? Sorry I can't be more help. I feel your pain. Edit: doesn't look like Brackets is helpful in this case... Last edited by skb; 06-18-2019 at 05:24 PM. Reason: Brain snap |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Easy copy from website? | trianglekitty | Library Management | 3 | 07-04-2012 08:31 PM |
Quick and Easy eBook Landing Page | mintotsai | Writers' Corner | 2 | 05-16-2012 01:02 AM |
SonyFont - quick and easy font updater | pepak | Sony Reader Dev Corner | 19 | 06-13-2010 05:35 AM |
Quick/easy LIT to LRF converter? | OUTATIME | Sony Reader Dev Corner | 10 | 02-29-2008 09:44 AM |
Quick and Easy Diary | carandol | iRex | 3 | 02-22-2008 04:26 PM |