Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 06-16-2019, 12:38 PM   #1
bounce
Zealot
bounce is a rising star in the heavensbounce is a rising star in the heavensbounce is a rising star in the heavensbounce is a rising star in the heavensbounce is a rising star in the heavensbounce is a rising star in the heavensbounce is a rising star in the heavensbounce is a rising star in the heavensbounce is a rising star in the heavensbounce is a rising star in the heavensbounce is a rising star in the heavens
 
Posts: 135
Karma: 13892
Join Date: Mar 2010
Device: Ipad, Kindle 7
Quick and easy way to turn a website into a book?

Say I’ve used a tool to download my whole website and I have a bunch of .html files. I then want to turn these files into a pdf book with each page of the website linked so I can read it offline. These files don’t necessarily need to be in any order, though it would be nice if the file structure was the same. What’s the easiest way to do this in Calibre? Or is there another (OSX or online) tool appropriate for the job?
bounce is offline   Reply With Quote
Old 06-16-2019, 05:30 PM   #2
skb
Evangelist
skb ought to be getting tired of karma fortunes by now.skb ought to be getting tired of karma fortunes by now.skb ought to be getting tired of karma fortunes by now.skb ought to be getting tired of karma fortunes by now.skb ought to be getting tired of karma fortunes by now.skb ought to be getting tired of karma fortunes by now.skb ought to be getting tired of karma fortunes by now.skb ought to be getting tired of karma fortunes by now.skb ought to be getting tired of karma fortunes by now.skb ought to be getting tired of karma fortunes by now.skb ought to be getting tired of karma fortunes by now.
 
skb's Avatar
 
Posts: 401
Karma: 1597305
Join Date: Mar 2010
Device: Ipod G4, MacOS 10.12, Calibre, Pocketbook Touch HD 3
I've done this to a lesser degree.

In theory, you could download HTML files from a site (using something like SiteSucker).

However, depending on the site, there's usually LOTS going on - ads etc. And a lot (most?) sites these days aren't static HTML but rather generated with a CMS (like this site).

Anyway, if the HTML is vanilla enough, you could download the site. Then, I would create a new library* (especially if there's gazillions of pages) and import them into the blank library.Then convert them into epubs. Then, using the ePubMerge, merge them. Once you've got a Merged ebook, you can move it into your "normal" library (if you wish).

Having said all that, there is lots that can wrong. I would convert one HTML file and view it and check that it's actually readable.

To be honest, I post-process any HTML I import into Calibre: to remove menus, ads, images, formatting, styles etc etc. So, I try not to do it often.

That's how I'd do it - there may well be a scripted or easier way but my programming skills are waaaaay out of date.

Good luck!

* I create a new/temp library because I don't want to miss/overlook a file etc and it's a way of quarantining. I usually delete my temp library after this sort of thing. You mileage may vary.
skb is offline   Reply With Quote
Advert
Old 06-18-2019, 05:16 PM   #3
bounce
Zealot
bounce is a rising star in the heavensbounce is a rising star in the heavensbounce is a rising star in the heavensbounce is a rising star in the heavensbounce is a rising star in the heavensbounce is a rising star in the heavensbounce is a rising star in the heavensbounce is a rising star in the heavensbounce is a rising star in the heavensbounce is a rising star in the heavensbounce is a rising star in the heavens
 
Posts: 135
Karma: 13892
Join Date: Mar 2010
Device: Ipad, Kindle 7
I had used site sucker and was thinking to import the html into bbedit or textwrangler, then strip then html out automatically, then combine and turn into a pdf. Wouldn't keep the formatting and wouldn't be pretty.
Thanks for the pointers with merging with epubmerge, will try that later.
bounce is offline   Reply With Quote
Old 06-18-2019, 05:22 PM   #4
skb
Evangelist
skb ought to be getting tired of karma fortunes by now.skb ought to be getting tired of karma fortunes by now.skb ought to be getting tired of karma fortunes by now.skb ought to be getting tired of karma fortunes by now.skb ought to be getting tired of karma fortunes by now.skb ought to be getting tired of karma fortunes by now.skb ought to be getting tired of karma fortunes by now.skb ought to be getting tired of karma fortunes by now.skb ought to be getting tired of karma fortunes by now.skb ought to be getting tired of karma fortunes by now.skb ought to be getting tired of karma fortunes by now.
 
skb's Avatar
 
Posts: 401
Karma: 1597305
Join Date: Mar 2010
Device: Ipod G4, MacOS 10.12, Calibre, Pocketbook Touch HD 3
I'm sure I had an app (retired, so is brain) that removed the fluffy HTML but left headings, bold, italic. I may have imagined it.

However, for removing ALL HTML, I use Clean Text. I'm not sure it does batch jobs though. Clean Text is excellent and does what it says on the tin.

I wonder if something like Brackets would clean up the HTML but not remove it completely?

Sorry I can't be more help. I feel your pain.

Edit: doesn't look like Brackets is helpful in this case...

Last edited by skb; 06-18-2019 at 05:24 PM. Reason: Brain snap
skb is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Easy copy from website? trianglekitty Library Management 3 07-04-2012 08:31 PM
Quick and Easy eBook Landing Page mintotsai Writers' Corner 2 05-16-2012 01:02 AM
SonyFont - quick and easy font updater pepak Sony Reader Dev Corner 19 06-13-2010 05:35 AM
Quick/easy LIT to LRF converter? OUTATIME Sony Reader Dev Corner 10 02-29-2008 09:44 AM
Quick and Easy Diary carandol iRex 3 02-22-2008 04:26 PM


All times are GMT -4. The time now is 01:01 PM.


MobileRead.com is a privately owned, operated and funded community.