Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 10-17-2012, 04:58 PM   #1
dvd8n
Connoisseur
dvd8n will give the Devil his due.dvd8n will give the Devil his due.dvd8n will give the Devil his due.dvd8n will give the Devil his due.dvd8n will give the Devil his due.dvd8n will give the Devil his due.dvd8n will give the Devil his due.dvd8n will give the Devil his due.dvd8n will give the Devil his due.dvd8n will give the Devil his due.dvd8n will give the Devil his due.
 
Posts: 51
Karma: 77598
Join Date: Apr 2010
Device: Sony PRS-650, Hanvon h516, Kobo Aura HD
Text to ePub Chapter conversion

I turned on the heuristics to spot the chapters in some text files during conversion. I found that files of the format:



"I didn't intend it to sound as if I was shooting off firecrackers." She stood up. "Will you come with me? I'll talk while we're driving."


Chapter II

THEY rode north through the oyster-colored rain in a cream convertible roadster .............




worked fine. However, some of the text files have chapter titles like so:



It was a trap.


Chapter V. THE HOMELY ONE

IN his dangerous career.................




In this case the heuristics just aren't spotting the chapters.

Any advice?
dvd8n is offline   Reply With Quote
Old 10-18-2012, 03:33 AM   #2
Agama
Guru
Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.
 
Agama's Avatar
 
Posts: 667
Karma: 436517
Join Date: Jul 2010
Location: UK
Device: PRS-300 (R.I.P.), PW2, Nexus7
You could use Markdown before conversion to identify formatting/structuring. Heuristics can then be switched off. Calibre is fully Markdown aware. I realise that this doesn't answer your issue directly but if you use plain text sources a lot then you may find Markdown generally useful.

Full details can be found at http://daringfireball.net/projects/markdown/
(You don't need to download Markdown, just understand the syntax - which is very simple.)


For a quick test:

Identify chapters in your book by starting each chapter line with the prefix "## " (hash-hash-space) then convert using:

Structure Detection

Insert page breaks before = //*[name()='h1' or name()='h2']

Table of Contents

Level 1 Toc = //h:h2

TXT Input

Paragraph style = "off" and Formatting style = "markdown".


(Textile formatting could also be used as calibre is also Textile aware. It's a bit more complicated than Markdown, but more flexible.)
Agama is offline   Reply With Quote
Old 10-18-2012, 07:13 AM   #3
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Heuristics can't catch all scenarios, there are infinite possibilities for the way chapters can be formatted. In your case the word 'chapter' is explicitly covered, so my guess is it might be related to the underlying html code.

Tangential to Agama's solution would be to try to cleanse the html by converting to text with markdown or textile, and then convert back from text to ePub/Mobi with heuristics enabled. In some cases this will eliminate whatever html formatting tripped up heuristics, but basic formatting like italics/bold would be retained.
ldolse is offline   Reply With Quote
Old 10-18-2012, 11:17 AM   #4
dvd8n
Connoisseur
dvd8n will give the Devil his due.dvd8n will give the Devil his due.dvd8n will give the Devil his due.dvd8n will give the Devil his due.dvd8n will give the Devil his due.dvd8n will give the Devil his due.dvd8n will give the Devil his due.dvd8n will give the Devil his due.dvd8n will give the Devil his due.dvd8n will give the Devil his due.dvd8n will give the Devil his due.
 
Posts: 51
Karma: 77598
Join Date: Apr 2010
Device: Sony PRS-650, Hanvon h516, Kobo Aura HD
Quote:
Originally Posted by ldolse View Post
Heuristics can't catch all scenarios, there are infinite possibilities for the way chapters can be formatted. In your case the word 'chapter' is explicitly covered, so my guess is it might be related to the underlying html code.

Tangential to Agama's solution would be to try to cleanse the html by converting to text with markdown or textile, and then convert back from text to ePub/Mobi with heuristics enabled. In some cases this will eliminate whatever html formatting tripped up heuristics, but basic formatting like italics/bold would be retained.
I think you might be misunderstanding me. The file that I am converting is plain vanilla text - no html.

And while I get your statement about heuristics being difficult, I would have said that a line starting with the word 'Chapter' followed by a number (admittedly in Roman numerals) was a good candidate for a chapter break
dvd8n is offline   Reply With Quote
Old 10-18-2012, 11:19 AM   #5
dvd8n
Connoisseur
dvd8n will give the Devil his due.dvd8n will give the Devil his due.dvd8n will give the Devil his due.dvd8n will give the Devil his due.dvd8n will give the Devil his due.dvd8n will give the Devil his due.dvd8n will give the Devil his due.dvd8n will give the Devil his due.dvd8n will give the Devil his due.dvd8n will give the Devil his due.dvd8n will give the Devil his due.
 
Posts: 51
Karma: 77598
Join Date: Apr 2010
Device: Sony PRS-650, Hanvon h516, Kobo Aura HD
Quote:
Originally Posted by Agama View Post
You could use Markdown before conversion to identify formatting/structuring.
So from what you are saying I could do a global replace of the word 'Chapter' with '## Chapter', put the suggested strings in the chapter detection xpath strings, and bob's your uncle?

About this, I got the impression from the help that the xpath chapter detection strings only functioned on html. Is this not the case?

Last edited by dvd8n; 10-18-2012 at 11:22 AM.
dvd8n is offline   Reply With Quote
Old 10-18-2012, 02:55 PM   #6
dvd8n
Connoisseur
dvd8n will give the Devil his due.dvd8n will give the Devil his due.dvd8n will give the Devil his due.dvd8n will give the Devil his due.dvd8n will give the Devil his due.dvd8n will give the Devil his due.dvd8n will give the Devil his due.dvd8n will give the Devil his due.dvd8n will give the Devil his due.dvd8n will give the Devil his due.dvd8n will give the Devil his due.
 
Posts: 51
Karma: 77598
Join Date: Apr 2010
Device: Sony PRS-650, Hanvon h516, Kobo Aura HD
Quote:
Originally Posted by Agama View Post
You could use Markdown before conversion to identify formatting/structuring.
This worked 100%!



David
dvd8n is offline   Reply With Quote
Old 10-19-2012, 03:11 AM   #7
Agama
Guru
Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.
 
Agama's Avatar
 
Posts: 667
Karma: 436517
Join Date: Jul 2010
Location: UK
Device: PRS-300 (R.I.P.), PW2, Nexus7
Glad to hear it. Presumably you only changed "Chapter" where it was not embedded within a sentence, otherwise you would get ## in front of these instead of triggering <h2>'s!

Most of my sources are plain-text and this makes the combination of calibre and Markdown so useful. You can do quite a bit of formatting and structuring very quickly, especially if your text editor has good regex facilities, (e.g. Notepad++ which is free).

For a 2nd level Toc you can use a ### prefix and //h:h3. You can include links within your book, (can be useful for footnotes), and references to images, (though of course they won't display in the plain-text, only in the ePub.)

Last edited by Agama; 10-19-2012 at 06:41 AM.
Agama is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
epub-kindle conversion blanks svg with text bobb40 Conversion 2 09-28-2012 08:05 AM
Creating Chapter Divisions with Plain Text Files KitsuneStylinson Conversion 6 09-02-2012 09:57 AM
Different text in TOC & actual chapter title Algiedi Conversion 25 07-24-2011 01:12 AM
Epub to Mobi Conversion - Designating Chapter Starts? CAJensen01 ePub 18 09-29-2010 12:46 PM
Chapter Head Text Over Image theducks ePub 10 09-03-2010 03:32 PM


All times are GMT -4. The time now is 08:12 AM.


MobileRead.com is a privately owned, operated and funded community.