08-04-2009, 05:22 AM | #1 |
Junior Member
Posts: 8
Karma: 10
Join Date: Aug 2009
Device: sony prs-505
|
TOC not identifying all chapters
Hi,
I have just bought a sony prs-505 and have downloaded the calibre software. My books are all in pdf and i have converted some of them to epub. Because i am new to the software i kept the settings as it is in the conversion part. I am quite happy with it. Apart from the chapters. when i do a conversion it picks up chapters correctly upto a certain number then it does not. for example in a book with 51 chapters i will get chapters 1 - 20(so that is fine) but from 20 onwards i get chapter 30, chapter 40 chapter 50. It does this for most of my books. So it misses chapters between 30 - 40 and 40 - 50 etc. Does anybody now why this is happening? is there a setting that i need to change? |
08-05-2009, 12:07 AM | #2 |
Lord of the Universe
Posts: 670
Karma: 737849
Join Date: Jan 2008
Location: Maturin , Venezuela
Device: Sony Reader PRS-505 / PSP
|
Try modifying "table of contents" and/or the "the look and feel". For that you'll need to read the Xpath tutorial
If that doesn't help you let me know If the book is not in copyright you might find it here or on another site already formatted for your Sony. If its not in copyright and you can't find it post it here and I'll test it. Btw Welcome to Mobileread. |
Advert | |
|
08-05-2009, 04:28 AM | #3 |
Junior Member
Posts: 8
Karma: 10
Join Date: Aug 2009
Device: sony prs-505
|
ok thanks catire. I will try that.
|
08-05-2009, 08:34 AM | #4 |
Wizard
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
|
PDF conversion is error prone, and there is a point where a regex attempts to identify the chapters, marking them up so that the xpath will later catch them. Editing the xpath from the default won't help with pdf.
You could open a bug on this and attach your pdf, I can look into enhancing the regex if possible. That said, this will continue to happen with other pdfs as the conversion can cause numerous variations in the output text. Unfortunately, the best way to fix these in general is by hand. Now that Sigil is out it will make it trivial to scroll through the book and mark the missed chapters so that they're added to the TOC. Note if you're using Calibre .6.x you'll need to wait a couple days til a related crash fix is released for Sigil - watch for version 0.1.1. Last edited by ldolse; 08-05-2009 at 08:39 AM. |
08-05-2009, 09:58 AM | #5 |
Junior Member
Posts: 8
Karma: 10
Join Date: Aug 2009
Device: sony prs-505
|
I will wait for the crash fix of the sigil comes out.
Also i was wondering how do you enhance the regex of the pdf file? what file do you open and what is that you edit? |
Advert | |
|
08-05-2009, 10:35 AM | #6 |
Wizard
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
|
It's in preprocess.py You'd need to download a local version, edit it, and then update your local install.
On OS X I'd use the following command from the directory containing the source file to update my calibre installation (fixed according to Kovid's input): Code:
calibre-debug --update module calibre.ebooks.preprocess ./preprocess.py These are the regexes doing the chapter detection there: Code:
# Detect Chapters to match default XPATH in GUI (re.compile(r'(?=<(/?br|p))(<(/?br|p)[^>]*)?>\s*(?P<chap>(<i><b>|<i>|<b>)?(Chapter|Epilogue|Prologue|Book|Part)\s*(\d+|\w+)?(</i></b>|</i>|</b>)?)(</?p[^>]*>|<br[^>]*>)\n?((?=(<i>)?\s*\w+(\s+\w+)?(</i>)?(<br[^>]*>|</?p[^>]*>))((?P<title>(<i>)?\s*\w+(\s+\w+)?(</i>)?)(<br[^>]*>|</?p[^>]*>)))?', re.IGNORECASE), chap_head), (re.compile(r'(?=<(/?br|p))(<(/?br|p)[^>]*)?>\s*(?P<chap>([A-Z \'"!]{5,})\s*(\d+|\w+)?)(</?p[^>]*>|<br[^>]*>)\n?((?=(<i>)?\s*\w+(\s+\w+)?(</i>)?(<br[^>]*>|</?p[^>]*>))((?P<title>.*)(<br[^>]*>|</?p[^>]*>)))?'), chap_head), Last edited by ldolse; 08-05-2009 at 01:20 PM. |
08-05-2009, 11:29 AM | #7 |
creator of calibre
Posts: 44,334
Karma: 23661992
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Actually in 0.6 the correct syntax is
Code:
calibre-debug --update module calibre.ebooks.preprocess ./preprocess.py |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
[Old Thread] calibre detects chapters, doesn't add to TOC | Corey.Langner | Calibre | 17 | 09-25-2011 07:22 PM |
identifying metadata? | soondai | ePub | 0 | 09-24-2010 10:11 AM |
adjusting toc.ncx file to restore missing chapters viewed in Adobe Digital Editions | cyberbaffled | ePub | 5 | 12-06-2009 09:44 PM |
Kindle doesnt show little dots that represent chapters tho I have a TOC | ayman07 | 17 | 06-21-2009 03:28 AM | |
BookDesigner problem: TOC links to chapters and back again. | Dr. Drib | Sony Reader | 6 | 07-08-2007 12:26 PM |