![]() |
#1 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 75
Karma: 77598
Join Date: Apr 2010
Device: Sony PRS-650, Hanvon h516, Kobo Aura HD
|
Text to ePub Chapter conversion
I turned on the heuristics to spot the chapters in some text files during conversion. I found that files of the format:
"I didn't intend it to sound as if I was shooting off firecrackers." She stood up. "Will you come with me? I'll talk while we're driving." Chapter II THEY rode north through the oyster-colored rain in a cream convertible roadster ............. worked fine. However, some of the text files have chapter titles like so: It was a trap. Chapter V. THE HOMELY ONE IN his dangerous career................. In this case the heuristics just aren't spotting the chapters. Any advice? |
![]() |
![]() |
![]() |
#2 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 776
Karma: 2751519
Join Date: Jul 2010
Location: UK
Device: PW2, Nexus7
|
You could use Markdown before conversion to identify formatting/structuring. Heuristics can then be switched off. Calibre is fully Markdown aware. I realise that this doesn't answer your issue directly but if you use plain text sources a lot then you may find Markdown generally useful.
Full details can be found at http://daringfireball.net/projects/markdown/ (You don't need to download Markdown, just understand the syntax - which is very simple.) For a quick test: Identify chapters in your book by starting each chapter line with the prefix "## " (hash-hash-space) then convert using: Structure Detection Insert page breaks before = //*[name()='h1' or name()='h2'] Table of Contents Level 1 Toc = //h:h2 TXT Input Paragraph style = "off" and Formatting style = "markdown". (Textile formatting could also be used as calibre is also Textile aware. It's a bit more complicated than Markdown, but more flexible.) |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
|
Heuristics can't catch all scenarios, there are infinite possibilities for the way chapters can be formatted. In your case the word 'chapter' is explicitly covered, so my guess is it might be related to the underlying html code.
Tangential to Agama's solution would be to try to cleanse the html by converting to text with markdown or textile, and then convert back from text to ePub/Mobi with heuristics enabled. In some cases this will eliminate whatever html formatting tripped up heuristics, but basic formatting like italics/bold would be retained. |
![]() |
![]() |
![]() |
#4 | |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 75
Karma: 77598
Join Date: Apr 2010
Device: Sony PRS-650, Hanvon h516, Kobo Aura HD
|
Quote:
And while I get your statement about heuristics being difficult, I would have said that a line starting with the word 'Chapter' followed by a number (admittedly in Roman numerals) was a good candidate for a chapter break ![]() |
|
![]() |
![]() |
![]() |
#5 | |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 75
Karma: 77598
Join Date: Apr 2010
Device: Sony PRS-650, Hanvon h516, Kobo Aura HD
|
Quote:
About this, I got the impression from the help that the xpath chapter detection strings only functioned on html. Is this not the case? Last edited by dvd8n; 10-18-2012 at 11:22 AM. |
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 75
Karma: 77598
Join Date: Apr 2010
Device: Sony PRS-650, Hanvon h516, Kobo Aura HD
|
|
![]() |
![]() |
![]() |
#7 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 776
Karma: 2751519
Join Date: Jul 2010
Location: UK
Device: PW2, Nexus7
|
Glad to hear it. Presumably you only changed "Chapter" where it was not embedded within a sentence, otherwise you would get ## in front of these instead of triggering <h2>'s!
Most of my sources are plain-text and this makes the combination of calibre and Markdown so useful. You can do quite a bit of formatting and structuring very quickly, especially if your text editor has good regex facilities, (e.g. Notepad++ which is free). For a 2nd level Toc you can use a ### prefix and //h:h3. You can include links within your book, (can be useful for footnotes), and references to images, (though of course they won't display in the plain-text, only in the ePub.) Last edited by Agama; 10-19-2012 at 06:41 AM. |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
epub-kindle conversion blanks svg with text | bobb40 | Conversion | 2 | 09-28-2012 08:05 AM |
Creating Chapter Divisions with Plain Text Files | KitsuneStylinson | Conversion | 6 | 09-02-2012 09:57 AM |
Different text in TOC & actual chapter title | Algiedi | Conversion | 25 | 07-24-2011 01:12 AM |
Epub to Mobi Conversion - Designating Chapter Starts? | CAJensen01 | ePub | 18 | 09-29-2010 12:46 PM |
Chapter Head Text Over Image | theducks | ePub | 10 | 09-03-2010 03:32 PM |