Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 07-14-2010, 11:50 PM   #1
jUgGsY
Junior Member
jUgGsY began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Jul 2010
Device: Aldiko App - Android 2.1 HTC Hero
Detecting Chapters in PDF -> ePub conversion

Ok.. So I've searched the forums for the past couple hours, and I can't seem to get a good answer, or find one that solves my particular problem.

The PDF file I have doesn't have the Chapters tagged as h1 or h2, or at least Calibre isn't detecting them as such when I convert them to ePubs. So when I just throw the ePub onto my phone, the loading is unbearable, as the ePub is only split into 6 xhtml files in the end (and not split in any logical spots).

Is there an Xpath expression that somebody could cook up that would detect the chapters and split them properly? I know there are more tedious ways of doing it myself (opening the epub in Sigil, and inserting chapter breaks.. but for 100+ chapters, thats would take a while)

From what I can tell, the Chapters are just bold-faced ex. "Chapter 12"



If you need any more information, I'll try and give it.
jUgGsY is offline   Reply With Quote
Old 07-15-2010, 02:37 AM   #2
toddos
Guru
toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.toddos ought to be getting tired of karma fortunes by now.
 
toddos's Avatar
 
Posts: 695
Karma: 822675
Join Date: May 2010
Device: Kobo Aura, Nokia Lumia 920 (Freda)
Run a conversion from PDF to epub with debugging turned on and then look at the intermediate HTML output (conversion goes something like PDF -> very rough HTML -> cleaned up HTML -> epub). Looking at the cleaned up HTML step should help you figure out how to find chapter breaks.

Or you could even take the intermediate HTML, modify it yourself to make chapters <h1> or <h2> elements, load the HTML into calibre, and convert to epub from the HTML rather than PDF.
toddos is offline   Reply With Quote
Advert
Old 07-15-2010, 04:53 AM   #3
eping
ePub Maker
eping began at the beginning.
 
eping's Avatar
 
Posts: 120
Karma: 16
Join Date: Dec 2009
Location: Mordor
Device: iPad,Kindle 3, Nook 2
Maybe you can use Regular expression

If you have the HTML files converted from PDF, ( All PDF must be converted into HTML firstly, then to ePub) you can use a Regular Expression tool to replace all Chapter names to proper HTML code.
Such as if your chapter HTML code has a structure as
<b>Chapter 12</b>
You can use "<b>(Chapter \d+)</b>" to grab them all, and replace them to
"<h2>$1</h2>"
No matter what, this is a work needs professional knowledge on Regular Expression.

And if your Chapter names were written in various HTML code, they only can be grabbed manually.
eping is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
PDF to EPUB conversion jfontana Calibre 2 03-17-2010 03:09 AM
pdf to epub conversion mediax Sigil 16 11-19-2009 03:48 PM
Help with conversion from PDF to EPUB Fizz Calibre 5 10-25-2009 11:48 AM
ePub Chapters vs. Stanza Chapters kjk Sigil 4 09-14-2009 10:50 AM
Detecting chapters Tibor Calibre 4 01-17-2009 01:25 PM


All times are GMT -4. The time now is 11:02 PM.


MobileRead.com is a privately owned, operated and funded community.