![]() |
#1 |
Groupie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 179
Karma: 1021404
Join Date: Apr 2010
Location: Stroud, UK
Device: Xgody tablet, LG G3 (Android), moon+reader
|
txt to epub problems (chaptering)
Calibre has developed some strange "chaptering" problems with adding books.
I've just added two texts (plain, UTF8) to my library. Both are identical in that the layout (spaces between chapter end and next; space below chapter to subheading; style of Chapter etc.) is the same. I converted them to epub: one produced 25 chapters as expected, the other split the (similar size) text into four sections. The split layout is as seen on "fbreader for android", but confirmed on Sigil. The other observation is that the non-chaptered epub text has all the gaps between chapters, paragraphs etc. removed, leaving an almost continuous text, even though the original text had gaps between paras and 3-line gaps between chapters. Oh, and I have made sure the 'preserve spaces' box is ticked in 'txt input options'..... ![]() So, any ideas what is going on? Can someone confirm what the input/conversion/output settings should be to get epub chapters from plain text? This unpredicability is driving me mad! Thanks |
![]() |
![]() |
![]() |
#2 |
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 31,047
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
There is no single setting because there is no single plain text layout.
Auto (setting) conversion is just not cutting it for you on this one. You will need to view the text file in a simple editor (Notepad...) then fiddle with the Input Structure settings (there is balloon help for each choice) that best describes what you see. Chapters are detected by Common Options structure detection Xpath (patterns and key words) |
![]() |
![]() |
![]() |
#3 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 776
Karma: 2751519
Join Date: Jul 2010
Location: UK
Device: PW2, Nexus7
|
With plain text documents I always use Markdown formatting prior to ePub conversion. This requires spending a little time preparing the text document but gives good control of the generated ePub.
You could start with simply using ## at the start of chapter title lines, (these become HTML H2 tags which calibre's default Detect Chapters Xpath will pick up), and leave a single blank line between paragraphs. Calibre has Markdown support which works very well: at conversion time, on the Text Input page, set: Paragraph style to "off" and Formatting style to "markdown". Full details of Markdown can be found at http://daringfireball.net/projects/markdown/ (You don't need to download Markdown, just understand the syntax.) |
![]() |
![]() |
![]() |
#4 |
Groupie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 179
Karma: 1021404
Join Date: Apr 2010
Location: Stroud, UK
Device: Xgody tablet, LG G3 (Android), moon+reader
|
Thanks both - I did a lot of testing and sovled it, but the behaviour is quite (IMHO) bizarre.
The unchaptered conversion had Roman numerals for the chapters ![]() Despite layout being: Chapter XII (note the gap), Calibre still wouldn't recognise it as a chapter break. Once I converted to arabic numerals (identical spacing) the chaptering went without a hitch. I wonder if Kovid realizes this oddity in behaviour, and if so whether a bug report is worth making.... ![]() |
![]() |
![]() |
![]() |
#5 | |
Sigil & calibre developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
Quote:
ldolse was the one who wrote the chapter heuristic detection code. He would have to say wether it's possible to add roman numerals or not. I seem to remember they were excluded because of false positives. Calibre's heuristic processing is designed to be conservative. In cases where it's a maybe it prefers not to make any changes. Due to TXT having no markup it is impossible to create an automated conversion that handles every case perfectly. You're best bet is to fiddle with the options, find the one that gets closest to what you want, then make any corrections with Sigil. Or use Markdown (or Textile) or pre format the file before conversion and use the appropriate option for Markdown (or Textile). |
|
![]() |
![]() |
![]() |
#6 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
|
I'd have to go back and look at the code, but I thought Roman Numerals which were all CAPS should be detected by heuristics. Anything using 'Chapter whatever' should also have been detected though, so not sure what's going on. Under text input your formatting is set to auto or heuristic? If it's auto it's possible that something about the file is triggering another formatting scheme, you can try forcing the heuristic setting.
|
![]() |
![]() |
![]() |
#7 | |
Groupie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 179
Karma: 1021404
Join Date: Apr 2010
Location: Stroud, UK
Device: Xgody tablet, LG G3 (Android), moon+reader
|
Quote:
![]() No doubt though - replacing (capitalised) Romans with identically spaced Arabics solved it....You're named as the expert so if YOU don't know what's going on, I doubt if I ever could.......... ![]() |
|
![]() |
![]() |
![]() |
#8 | |
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 31,047
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
![]() X on either side of L or C or M (Modessit is famous for reaching these heights ![]() |
|
![]() |
![]() |
![]() |
#9 |
Groupie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 179
Karma: 1021404
Join Date: Apr 2010
Location: Stroud, UK
Device: Xgody tablet, LG G3 (Android), moon+reader
|
another weird one then:
text original: Baroness Emmuska Orczy Foreword. 3 1 A Roland For His Oliver 5 2 A Fool's Paradise 24 3 On The Brink 40 4 Carissimo 63 5 The Toys 85 6 Honour Among 108 7 An Over-Sensitive Heart 125 Foreword (part of an opening chapter 1 is 3 pages on) When Calibre converts to epub, it insists on taking "7 An Over-Sensitive Heart" as a new chapter then ignores "foreword" as a new 'chapter' No matter what I do I can't change the formatting - UNLESS I remove the numbers before the titles!!! Then it formats correctly. What on earth is going on, or am I just unlucky?? |
![]() |
![]() |
![]() |
#10 | |
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 31,047
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
BTW you are showing an inline TOC just before the Foreword ![]() |
|
![]() |
![]() |
![]() |
#11 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
|
Agree with theducks, you shouldn't expect perfect conversions from Calibre when using heuristics. While there is certainly lots of room for improvement with heuristics, the primary goal of the feature is not to perfectly guess the document structure for any poorly formatted document - that's not really an achievable goal.
Before heuristics was added Calibre would seemingly arbitrarily split documents at every 260KB to guarantee that the document was compatible with Adobe based readers and other memory limited devices. Fixing these arbitrary split points then involved a ton of work in Sigil or some other app - many files need to be manually merged and then re-split. The primary goal of heuristics therefore wasn't to perfectly detect every chapter but to eliminate all that manual labor by guessing 'good' split points so that the post conversion cleanup effort is minimal. For many files heuristics will guess every split point correctly, but for many others it will make a few mistakes - this should be expected. If you really want more control from the very beginning then learning markdown as Agama suggested earlier is your best bet. btw, the behavior you see in that most recent conversion are all expected limitations in the current heuristics functionality. 'foreword' isn't in the dictionary based approach that heuristics uses, and inline text TOCs trip heuristics up - the last item will often get detected as a chapter. Last edited by ldolse; 12-30-2011 at 02:55 PM. |
![]() |
![]() |
![]() |
#12 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,251
Karma: 3720310
Join Date: Jan 2009
Location: USA
Device: Kindle, iPad (not used much for reading)
|
Most, if not all, of Baroness Orczy's books are already available in ePub format. Why not just download one of them?
|
![]() |
![]() |
![]() |
#13 | |
Groupie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 179
Karma: 1021404
Join Date: Apr 2010
Location: Stroud, UK
Device: Xgody tablet, LG G3 (Android), moon+reader
|
Quote:
On the previous, I haven't had 'heuristics' switched on at all - it seemed to make matters worse - so I've stuck with 'auto'. I'm not (much of ![]() Thanks for all the help guys - calibre is still outstanding in what it does. It's just that I always try to make sure it's not me that's screwing up by poor settings etc. ![]() |
|
![]() |
![]() |
![]() |
#14 |
Sigil & calibre developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
With TXT input auto tries to auto detect the formatting. One of these types is heuristics. Typically if Markdown or Textile is not detected the heuristic type is used. What heuristic does is turn on a limited set (you can add more by manually enabling heuristics) of the heuristic options. Basically the ones that were written specifically for TXT input are turned on in this case.
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Audiobook Chaptering | Chirishman | Amazon Kindle | 3 | 01-25-2023 11:45 PM |
Batch TXT to EPUB ? | smallhagrid | Conversion | 6 | 04-08-2011 03:59 AM |
Convert .TXT to .EPUB | Arfer | Calibre | 6 | 09-02-2010 10:41 AM |
Conversion: EPUB to TXT | Starson17 | Calibre | 11 | 05-29-2010 12:31 PM |
Help! I don't understand syntax etc and having problems converting from txt. | BookCat | LRF | 3 | 05-12-2009 09:40 AM |