Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 10-04-2010, 08:32 PM   #1
penguintri
Enthusiast
penguintri began at the beginning.
 
Posts: 29
Karma: 10
Join Date: Apr 2010
Device: Paperwhite 2
TOC Creation Problem

Lately I have been having problems with TOCs in my converted books, with the only entry being one for "Start"

I used to be able to run all my books through calibre and it would add TOCs. I have been able to fix this by changing all the Chapter heading to <h> tags in sigil however this is longwinded and I'm sure I didn't have to do this before.

From reading this forum I tried enabling preprocessing as I thought this would detect the Chapters and place them in <h> tags however this didn't appear to work and they just stayed in <p> tags.

One of my books that had a working TOC was all in bold. I edited just one line of the css in sigil to remove the bold and then even without conversion Calibre would just see the only TOC entry as "Start".

Any help with this would be greatly appreciated.
penguintri is offline   Reply With Quote
Old 10-04-2010, 09:57 PM   #2
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Preprocessing isn't guaranteed to work, it tries to match the most common cases, but there are myriad ways that the chapters could be marked. It also doesn't work for epub, if your source format is epub - try renaming it to .zip instead of .epub and add it to the book record as a compressed html bundle, and convert that format instead.

Also note preprocessing won't build the TOC, it just finds the most common types of chapter headings and wraps h2 tags around them, but the xpath for chapter detection still needs to match the contents of those h2 tags.


If you open a bug with some examples I can see if your case is something that can be accommodated.
ldolse is offline   Reply With Quote
Old 10-05-2010, 05:37 AM   #3
penguintri
Enthusiast
penguintri began at the beginning.
 
Posts: 29
Karma: 10
Join Date: Apr 2010
Device: Paperwhite 2
Ah I didn't realize preprocessing didn't work with epubs. I'll have to try that when I get home from work.

What formats does preprocessing work with?

Thanks for your help
penguintri is offline   Reply With Quote
Old 10-05-2010, 09:48 AM   #4
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
At the moment it works for lit, html, rtf, txt, & lrf. Will be adding pdb in the next release. I wasn't concerned about epub because most of the bad epubs floating around were sourced from one of those formats.

However the launch of the ipad & ibooks seems to have incented the darknet users enough to release torrents of crappy epubs lately (converted from the source formats above). Problem is that the bad epubs are already split into 260-300kb chunks at arbitrary points, so unless I also look into recombining them into a single flow they'll still be fairly crappy even after preprocessing.
ldolse is offline   Reply With Quote
Old 10-05-2010, 10:47 AM   #5
haegar333
Member
haegar333 began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Oct 2010
Device: Be-Book
H2 Tags

Quote:
Originally Posted by ldolse View Post

Also note preprocessing won't build the TOC, it just finds the most common types of chapter headings and wraps h2 tags around them, but the xpath for chapter detection still needs to match the contents of those h2 tags.
Hi,
Just started with Calibre and for the TOC generation I got stucked the same as described here. I also have a fundamental problem understanding the logic of the preprocessing and XHTML conversion:

If not even the preprocessor can detect chapter headings whats the point of a full fledged XPath Processor? And if h2 tags are inserted by the preprocessor why do then still need XPath to "modify" found h2 tags? For what purpose..??

My ASCII Text Testbook also ends up in one or two <p> blocks without any recognition of chapter headlines.. am I stucked now?
haegar333 is offline   Reply With Quote
Old 10-05-2010, 01:21 PM   #6
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
If you're working from ASCII text you should try some of the different text input options. Preprocessing won't work unless you first choose the right text input option. Text Input defaults to assuming hard line breaks with an empty space between paragraphs.

It sounds like you need to enable the "Treat each line as a paragraph" option under text input.

There are multiple stages in Calibre's conversion pipeline. Preprocessing is a very early stage, and it just does some additional reformatting of the doc. Chapter detection happens at a later stage in the conversion pipeline where Calibre has created and expects well formed xhtml, and it's at this point you need to use Xpath.

Xpath works great without any preprocessing when you have well formed html content, and there is plenty of well formed content out there. ASCII txt is not an example of well formed content...

Last edited by ldolse; 10-05-2010 at 01:26 PM.
ldolse is offline   Reply With Quote
Old 10-05-2010, 01:24 PM   #7
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
If you want all h2 tags, just change the xpath to this:
Code:
//h:h2
Note you can create variations on that quite easily by clicking the little magic wand next to the xpath textbox to launch the xpath wizard.
ldolse is offline   Reply With Quote
Old 10-06-2010, 09:31 AM   #8
haegar333
Member
haegar333 began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Oct 2010
Device: Be-Book
I have some basic knowledge of XPath so yes, no problem to find all h2 tags in a XHTML file assuming they have been created by the pre-processor. I understand now if this is not the case you cannot do anything useful with the build-in filters and XPath expression but rather need some extra preprocessing outside Calibre.

You are absolutely right that ASCII is not well formatted input. Actually my major input format would be PDF but when Calibre fails there I could still use some external pfd2text tools and then modify easily some ASCII Text.

Thus Calibre is not a full fledged end-2-end converter but could only be used as a final step of a longer conversion chain.
haegar333 is offline   Reply With Quote
Old 10-06-2010, 08:36 PM   #9
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Check out the sample doc in this thread:
https://www.mobileread.com/forums/sho...d.php?t=100747

That might give you a better idea what preprocessing does for chapter headings.
ldolse is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
TOC Creation & Calibre vs. MobiPocket Themus Calibre 8 05-15-2010 11:32 AM
possible epub creation problem with calibre. markiehill Calibre 0 04-23-2010 05:20 PM
TOC Creation - simplifying suggestion Fredom Calibre 1 04-21-2010 08:29 PM
Chapter or TOC Creation help needed gandor62 Calibre 4 04-15-2009 02:18 PM
Help on TOC creation MacZap Calibre 7 02-04-2009 12:43 AM


All times are GMT -4. The time now is 05:18 AM.


MobileRead.com is a privately owned, operated and funded community.