Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 12-21-2011, 08:10 AM   #1
rbruce1314
Groupie
rbruce1314 ought to be getting tired of karma fortunes by now.rbruce1314 ought to be getting tired of karma fortunes by now.rbruce1314 ought to be getting tired of karma fortunes by now.rbruce1314 ought to be getting tired of karma fortunes by now.rbruce1314 ought to be getting tired of karma fortunes by now.rbruce1314 ought to be getting tired of karma fortunes by now.rbruce1314 ought to be getting tired of karma fortunes by now.rbruce1314 ought to be getting tired of karma fortunes by now.rbruce1314 ought to be getting tired of karma fortunes by now.rbruce1314 ought to be getting tired of karma fortunes by now.rbruce1314 ought to be getting tired of karma fortunes by now.
 
rbruce1314's Avatar
 
Posts: 179
Karma: 1021404
Join Date: Apr 2010
Location: Stroud, UK
Device: Xgody tablet, LG G3 (Android), moon+reader
txt to epub problems (chaptering)

Calibre has developed some strange "chaptering" problems with adding books.

I've just added two texts (plain, UTF8) to my library. Both are identical in that the layout (spaces between chapter end and next; space below chapter to subheading; style of Chapter etc.) is the same.

I converted them to epub: one produced 25 chapters as expected, the other split the (similar size) text into four sections. The split layout is as seen on "fbreader for android", but confirmed on Sigil.

The other observation is that the non-chaptered epub text has all the gaps between chapters, paragraphs etc. removed, leaving an almost continuous text, even though the original text had gaps between paras and 3-line gaps between chapters.
Oh, and I have made sure the 'preserve spaces' box is ticked in 'txt input options'.....

So, any ideas what is going on? Can someone confirm what the input/conversion/output settings should be to get epub chapters from plain text? This unpredicability is driving me mad!

Thanks
rbruce1314 is offline   Reply With Quote
Old 12-21-2011, 10:14 AM   #2
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 31,047
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
There is no single setting because there is no single plain text layout.

Auto (setting) conversion is just not cutting it for you on this one.

You will need to view the text file in a simple editor (Notepad...) then fiddle with the Input Structure settings (there is balloon help for each choice) that best describes what you see.

Chapters are detected by Common Options structure detection Xpath (patterns and key words)
theducks is offline   Reply With Quote
Old 12-22-2011, 02:40 AM   #3
Agama
Guru
Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.
 
Agama's Avatar
 
Posts: 776
Karma: 2751519
Join Date: Jul 2010
Location: UK
Device: PW2, Nexus7
With plain text documents I always use Markdown formatting prior to ePub conversion. This requires spending a little time preparing the text document but gives good control of the generated ePub.

You could start with simply using ## at the start of chapter title lines, (these become HTML H2 tags which calibre's default Detect Chapters Xpath will pick up), and leave a single blank line between paragraphs.

Calibre has Markdown support which works very well: at conversion time, on the Text Input page, set: Paragraph style to "off" and Formatting style to "markdown".

Full details of Markdown can be found at http://daringfireball.net/projects/markdown/
(You don't need to download Markdown, just understand the syntax.)
Agama is offline   Reply With Quote
Old 12-22-2011, 05:58 AM   #4
rbruce1314
Groupie
rbruce1314 ought to be getting tired of karma fortunes by now.rbruce1314 ought to be getting tired of karma fortunes by now.rbruce1314 ought to be getting tired of karma fortunes by now.rbruce1314 ought to be getting tired of karma fortunes by now.rbruce1314 ought to be getting tired of karma fortunes by now.rbruce1314 ought to be getting tired of karma fortunes by now.rbruce1314 ought to be getting tired of karma fortunes by now.rbruce1314 ought to be getting tired of karma fortunes by now.rbruce1314 ought to be getting tired of karma fortunes by now.rbruce1314 ought to be getting tired of karma fortunes by now.rbruce1314 ought to be getting tired of karma fortunes by now.
 
rbruce1314's Avatar
 
Posts: 179
Karma: 1021404
Join Date: Apr 2010
Location: Stroud, UK
Device: Xgody tablet, LG G3 (Android), moon+reader
Thanks both - I did a lot of testing and sovled it, but the behaviour is quite (IMHO) bizarre.

The unchaptered conversion had Roman numerals for the chapters .
Despite layout being:

Chapter XII

(note the gap), Calibre still wouldn't recognise it as a chapter break. Once I converted to arabic numerals (identical spacing) the chaptering went without a hitch.

I wonder if Kovid realizes this oddity in behaviour, and if so whether a bug report is worth making.....
rbruce1314 is offline   Reply With Quote
Old 12-22-2011, 07:57 AM   #5
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Quote:
Originally Posted by rbruce1314 View Post
I wonder if Kovid realizes this oddity in behaviour, and if so whether a bug report is worth making.....
The default for chapter detection is to use an xpath. Due to TXT not having any markup this is not possible. Heuristics are used to determine what is a chapter in a TXT file. In this case roman numerals are not supported for determining if we are looking at a chapter or not.

ldolse was the one who wrote the chapter heuristic detection code. He would have to say wether it's possible to add roman numerals or not. I seem to remember they were excluded because of false positives.

Calibre's heuristic processing is designed to be conservative. In cases where it's a maybe it prefers not to make any changes. Due to TXT having no markup it is impossible to create an automated conversion that handles every case perfectly. You're best bet is to fiddle with the options, find the one that gets closest to what you want, then make any corrections with Sigil. Or use Markdown (or Textile) or pre format the file before conversion and use the appropriate option for Markdown (or Textile).
user_none is offline   Reply With Quote
Old 12-23-2011, 03:33 AM   #6
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
I'd have to go back and look at the code, but I thought Roman Numerals which were all CAPS should be detected by heuristics. Anything using 'Chapter whatever' should also have been detected though, so not sure what's going on. Under text input your formatting is set to auto or heuristic? If it's auto it's possible that something about the file is triggering another formatting scheme, you can try forcing the heuristic setting.
ldolse is offline   Reply With Quote
Old 12-23-2011, 08:29 AM   #7
rbruce1314
Groupie
rbruce1314 ought to be getting tired of karma fortunes by now.rbruce1314 ought to be getting tired of karma fortunes by now.rbruce1314 ought to be getting tired of karma fortunes by now.rbruce1314 ought to be getting tired of karma fortunes by now.rbruce1314 ought to be getting tired of karma fortunes by now.rbruce1314 ought to be getting tired of karma fortunes by now.rbruce1314 ought to be getting tired of karma fortunes by now.rbruce1314 ought to be getting tired of karma fortunes by now.rbruce1314 ought to be getting tired of karma fortunes by now.rbruce1314 ought to be getting tired of karma fortunes by now.rbruce1314 ought to be getting tired of karma fortunes by now.
 
rbruce1314's Avatar
 
Posts: 179
Karma: 1021404
Join Date: Apr 2010
Location: Stroud, UK
Device: Xgody tablet, LG G3 (Android), moon+reader
Quote:
Originally Posted by ldolse View Post
I'd have to go back and look at the code, but I thought Roman Numerals which were all CAPS should be detected by heuristics. Anything using 'Chapter whatever' should also have been detected though, so not sure what's going on. Under text input your formatting is set to auto or heuristic? If it's auto it's possible that something about the file is triggering another formatting scheme, you can try forcing the heuristic setting.
Formatting set to 'auto' (I'm not that technical in this aspect to meddle..).

No doubt though - replacing (capitalised) Romans with identically spaced Arabics solved it....You're named as the expert so if YOU don't know what's going on, I doubt if I ever could...........
rbruce1314 is offline   Reply With Quote
Old 12-23-2011, 09:30 AM   #8
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 31,047
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by rbruce1314 View Post
Formatting set to 'auto' (I'm not that technical in this aspect to meddle..).

No doubt though - replacing (capitalised) Romans with identically spaced Arabics solved it....You're named as the expert so if YOU don't know what's going on, I doubt if I ever could...........
Roman chapter numbers are a pain, especially when not preceded by 'Chapter' to Validate that the 'I' is a chapter number and not just part of a sentence. 1 to 3 I's, with 1 'I' possible on either side of V or X
X on either side of L or C or M (Modessit is famous for reaching these heights )
theducks is offline   Reply With Quote
Old 12-30-2011, 06:23 AM   #9
rbruce1314
Groupie
rbruce1314 ought to be getting tired of karma fortunes by now.rbruce1314 ought to be getting tired of karma fortunes by now.rbruce1314 ought to be getting tired of karma fortunes by now.rbruce1314 ought to be getting tired of karma fortunes by now.rbruce1314 ought to be getting tired of karma fortunes by now.rbruce1314 ought to be getting tired of karma fortunes by now.rbruce1314 ought to be getting tired of karma fortunes by now.rbruce1314 ought to be getting tired of karma fortunes by now.rbruce1314 ought to be getting tired of karma fortunes by now.rbruce1314 ought to be getting tired of karma fortunes by now.rbruce1314 ought to be getting tired of karma fortunes by now.
 
rbruce1314's Avatar
 
Posts: 179
Karma: 1021404
Join Date: Apr 2010
Location: Stroud, UK
Device: Xgody tablet, LG G3 (Android), moon+reader
another weird one then:

text original:

Baroness Emmuska Orczy


Foreword. 3
1 A Roland For His Oliver 5
2 A Fool's Paradise 24
3 On The Brink 40
4 Carissimo 63
5 The Toys 85
6 Honour Among 108
7 An Over-Sensitive Heart 125



Foreword


(part of an opening chapter 1 is 3 pages on)
When Calibre converts to epub, it insists on taking "7 An Over-Sensitive Heart" as a new chapter then ignores "foreword" as a new 'chapter' No matter what I do I can't change the formatting - UNLESS I remove the numbers before the titles!!! Then it formats correctly.

What on earth is going on, or am I just unlucky??
rbruce1314 is offline   Reply With Quote
Old 12-30-2011, 11:19 AM   #10
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 31,047
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by rbruce1314 View Post
another weird one then:

text original:

Baroness Emmuska Orczy


Foreword. 3
1 A Roland For His Oliver 5
2 A Fool's Paradise 24
3 On The Brink 40
4 Carissimo 63
5 The Toys 85
6 Honour Among 108
7 An Over-Sensitive Heart 125



Foreword


(part of an opening chapter 1 is 3 pages on)
When Calibre converts to epub, it insists on taking "7 An Over-Sensitive Heart" as a new chapter then ignores "foreword" as a new 'chapter' No matter what I do I can't change the formatting - UNLESS I remove the numbers before the titles!!! Then it formats correctly.

What on earth is going on, or am I just unlucky??
Foreword is not a keyword in the current Structure Detection setting

BTW you are showing an inline TOC just before the Foreword

Why fight getting it perfect, just tune it up with Sigil after the basic conversion
theducks is offline   Reply With Quote
Old 12-30-2011, 02:50 PM   #11
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Agree with theducks, you shouldn't expect perfect conversions from Calibre when using heuristics. While there is certainly lots of room for improvement with heuristics, the primary goal of the feature is not to perfectly guess the document structure for any poorly formatted document - that's not really an achievable goal.

Before heuristics was added Calibre would seemingly arbitrarily split documents at every 260KB to guarantee that the document was compatible with Adobe based readers and other memory limited devices. Fixing these arbitrary split points then involved a ton of work in Sigil or some other app - many files need to be manually merged and then re-split. The primary goal of heuristics therefore wasn't to perfectly detect every chapter but to eliminate all that manual labor by guessing 'good' split points so that the post conversion cleanup effort is minimal. For many files heuristics will guess every split point correctly, but for many others it will make a few mistakes - this should be expected.

If you really want more control from the very beginning then learning markdown as Agama suggested earlier is your best bet.

btw, the behavior you see in that most recent conversion are all expected limitations in the current heuristics functionality. 'foreword' isn't in the dictionary based approach that heuristics uses, and inline text TOCs trip heuristics up - the last item will often get detected as a chapter.

Last edited by ldolse; 12-30-2011 at 02:55 PM.
ldolse is offline   Reply With Quote
Old 12-30-2011, 03:02 PM   #12
susan_cassidy
Wizard
susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.
 
Posts: 2,251
Karma: 3720310
Join Date: Jan 2009
Location: USA
Device: Kindle, iPad (not used much for reading)
Most, if not all, of Baroness Orczy's books are already available in ePub format. Why not just download one of them?
susan_cassidy is offline   Reply With Quote
Old 12-31-2011, 09:33 AM   #13
rbruce1314
Groupie
rbruce1314 ought to be getting tired of karma fortunes by now.rbruce1314 ought to be getting tired of karma fortunes by now.rbruce1314 ought to be getting tired of karma fortunes by now.rbruce1314 ought to be getting tired of karma fortunes by now.rbruce1314 ought to be getting tired of karma fortunes by now.rbruce1314 ought to be getting tired of karma fortunes by now.rbruce1314 ought to be getting tired of karma fortunes by now.rbruce1314 ought to be getting tired of karma fortunes by now.rbruce1314 ought to be getting tired of karma fortunes by now.rbruce1314 ought to be getting tired of karma fortunes by now.rbruce1314 ought to be getting tired of karma fortunes by now.
 
rbruce1314's Avatar
 
Posts: 179
Karma: 1021404
Join Date: Apr 2010
Location: Stroud, UK
Device: Xgody tablet, LG G3 (Android), moon+reader
Quote:
Originally Posted by susan_cassidy View Post
Most, if not all, of Baroness Orczy's books are already available in ePub format. Why not just download one of them?
Doing the last first, that's exactly what I did - but the d/l version I tried had no TOC at all!!!!

On the previous, I haven't had 'heuristics' switched on at all - it seemed to make matters worse - so I've stuck with 'auto'.

I'm not (much of ) a control freak, and I'm quite happy with large chunks (as mentioned in the 260k above) you still get from many d/l sites - it's just that I'd prefer it not to recognise gaps at all than to arbitrarily decide the seventh of seven continuous chapter titles is worth treating as a new chapter despite the gaps below it.


Thanks for all the help guys - calibre is still outstanding in what it does. It's just that I always try to make sure it's not me that's screwing up by poor settings etc. .
rbruce1314 is offline   Reply With Quote
Old 12-31-2011, 02:49 PM   #14
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Quote:
Originally Posted by rbruce1314 View Post
On the previous, I haven't had 'heuristics' switched on at all - it seemed to make matters worse - so I've stuck with 'auto'.
With TXT input auto tries to auto detect the formatting. One of these types is heuristics. Typically if Markdown or Textile is not detected the heuristic type is used. What heuristic does is turn on a limited set (you can add more by manually enabling heuristics) of the heuristic options. Basically the ones that were written specifically for TXT input are turned on in this case.
user_none is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Audiobook Chaptering Chirishman Amazon Kindle 3 01-25-2023 11:45 PM
Batch TXT to EPUB ? smallhagrid Conversion 6 04-08-2011 03:59 AM
Convert .TXT to .EPUB Arfer Calibre 6 09-02-2010 10:41 AM
Conversion: EPUB to TXT Starson17 Calibre 11 05-29-2010 12:31 PM
Help! I don't understand syntax etc and having problems converting from txt. BookCat LRF 3 05-12-2009 09:40 AM


All times are GMT -4. The time now is 08:46 AM.


MobileRead.com is a privately owned, operated and funded community.