Multi Slide HTML files

Archon · 01-17-2011, 05:55 PM

Hey all,

Occasionally, I run into a book file that is HTML and has each chapter as a separate slide.

Such as:

arryn.html
asoiaf.css
barath.html
greyjoy.html
im_map-north.png
im_map-south.png
lannis.html
martell.html
slide2.html
slide3.html
slide4.html
slide5.html
slide6.html
slide7.html
slide8.html
slide9.html
slide10.html
slide11.html
slide12.html
slide13.html
slide14.html
slide15.html
slide16.html
slide17.html
slide18.html
slide19.html
slide20.html
slide21.html
slide22.html
slide23.html
slide24.html
slide25.html
slide26.html
slide27.html
slide28.html
slide29.html
slide30.html
slide31.html
slide32.html
slide33.html
slide34.html
slide35.html
slide36.html
slide37.html
slide38.html
slide39.html
slide40.html
slide41.html
slide42.html
slide43.html
slide44.html
slide45.html
slide46.html
slide47.html
slide48.html
slide49.html
slide50.html
slide51.html
slide52.html
slide53.html
slide54.html
slide55.html
slide56.html
slide57.html
slide58.html
slide59.html
slide60.html
slide61.html
slide62.html
slide63.html
slide64.html
slide65.html
slide66.html
slide67.html
slide68.html
slide69.html
slide70.html
slide71.html
slide72.html
slide73.html
slide74.html
stark.html
targ.html
toc.html
tully.html
tyrell.html

I am on a Mac and have been using textutil in the terminal to concatanate the slides and then find and replace unwanted text.

Is there an easier way to import these files into Calibre?

TIA

Happy Monday
Archon

CazMar · 01-17-2011, 06:06 PM

This is probably a result of converting a scanned book from PDF to EPUB. Unfortunately if OCR (optical character recognition) software is not used then the PDF file simply becomes a series of "pictures" of pages - much like scanning a series of photographs. PDF documents produced from a word processor treat the content as text, so they convert to EPUB quite easily. A lot of old books are scanned as "page images" rather than text, I don't know if there is a solution. You could try searching Google books or the Archive Org. for the original PDF of the book and if you have access to a good OCR program try the conversion to text - but don't expect miracles!

toddos · 01-17-2011, 06:54 PM

You can create a new HTML page that links to each of those pages in order (kinda like a table of contents), and then import that HTML page. Calibre should find all of the rest that are linked and put them into the same zip, and when you convert to other formats like epub it should just do the right thing.

DoctorOhh · 01-18-2011, 05:02 AM

Quote:

Originally Posted by Archon

toc.html

I am on a Mac and have been using textutil in the terminal to concatanate the slides and then find and replace unwanted text.

Is there an easier way to import these files into Calibre?

Generally speaking usually all you have to do is add the TOC.html or the index.html to calibre and it will grab up the other associated html files into one zip file ready for conversion.

Archon · 01-18-2011, 07:55 AM

Thanks toddos and dwanthny that will save me a lot of time with these files.

And thanks to Kovid and all the developers for making it so easy to import a split file.

Archon

01-17-2011, 05:55 PM	#1
Archon Zealot Posts: 110 Karma: 5176 Join Date: Dec 2010 Device: Mac OSX, iPad, iPod, & Nook	Multi Slide HTML files Hey all, Occasionally, I run into a book file that is HTML and has each chapter as a separate slide. Such as: arryn.html asoiaf.css barath.html greyjoy.html im_map-north.png im_map-south.png lannis.html martell.html slide2.html slide3.html slide4.html slide5.html slide6.html slide7.html slide8.html slide9.html slide10.html slide11.html slide12.html slide13.html slide14.html slide15.html slide16.html slide17.html slide18.html slide19.html slide20.html slide21.html slide22.html slide23.html slide24.html slide25.html slide26.html slide27.html slide28.html slide29.html slide30.html slide31.html slide32.html slide33.html slide34.html slide35.html slide36.html slide37.html slide38.html slide39.html slide40.html slide41.html slide42.html slide43.html slide44.html slide45.html slide46.html slide47.html slide48.html slide49.html slide50.html slide51.html slide52.html slide53.html slide54.html slide55.html slide56.html slide57.html slide58.html slide59.html slide60.html slide61.html slide62.html slide63.html slide64.html slide65.html slide66.html slide67.html slide68.html slide69.html slide70.html slide71.html slide72.html slide73.html slide74.html stark.html targ.html toc.html tully.html tyrell.html I am on a Mac and have been using textutil in the terminal to concatanate the slides and then find and replace unwanted text. Is there an easier way to import these files into Calibre? TIA Happy Monday Archon

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Multi-HTML to anything	sidd.artha	Calibre	4	10-05-2010 04:34 PM
Multi-HTML Help!	tarifelagund	Other formats	4	09-07-2009 03:49 PM
multi-page HTML with images to ePub or LRF	Nvidiot	Workshop	19	07-13-2009 07:20 PM
converting multi-page HTML to Mobipocket	shinew	Calibre	13	02-21-2009 01:33 PM
Multi-html files as chapters...	WigglePig	Sony Reader	5	09-16-2008 04:06 AM

01-17-2011, 06:06 PM	#2
CazMar Book Geek Posts: 596 Karma: 1499085 Join Date: Aug 2010 Location: Adelaide, Australia Device: Kobo Touch, Asus MemPad 7" tablet, Nexus 5, Asus 10" tablet	This is probably a result of converting a scanned book from PDF to EPUB. Unfortunately if OCR (optical character recognition) software is not used then the PDF file simply becomes a series of "pictures" of pages - much like scanning a series of photographs. PDF documents produced from a word processor treat the content as text, so they convert to EPUB quite easily. A lot of old books are scanned as "page images" rather than text, I don't know if there is a solution. You could try searching Google books or the Archive Org. for the original PDF of the book and if you have access to a good OCR program try the conversion to text - but don't expect miracles!

01-17-2011, 06:54 PM	#3
toddos Guru Posts: 695 Karma: 822675 Join Date: May 2010 Device: Kobo Aura, Nokia Lumia 920 (Freda)	You can create a new HTML page that links to each of those pages in order (kinda like a table of contents), and then import that HTML page. Calibre should find all of the rest that are linked and put them into the same zip, and when you convert to other formats like epub it should just do the right thing.

01-18-2011, 07:55 AM	#5
Archon Zealot Posts: 110 Karma: 5176 Join Date: Dec 2010 Device: Mac OSX, iPad, iPod, & Nook	Thanks toddos and dwanthny that will save me a lot of time with these files. And thanks to Kovid and all the developers for making it so easy to import a split file. Archon

Advert

Advert