Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 09-29-2015, 09:54 PM   #1
morgon
Junior Member
morgon began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Sep 2015
Device: none
detecting chapters in plain txt

Hi,

I would like to convert a plain (ascii) text file to epub and generate a toc.

As this is a text file xpath does not make much sense (at least to me), but I can construct a regext that allows chapters to be detected.

How can I teach calibre to use this regex to detect chapters when converting plain text to epub?

many thanks!
morgon is offline   Reply With Quote
Old 09-29-2015, 10:05 PM   #2
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,762
Karma: 54401244
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
The default xpath looks for these keywords:
(chapter|book|section|part)\s+)|((prolog|prologue| epilogue)
theducks is offline   Reply With Quote
Old 10-17-2015, 04:49 PM   #3
porphyry5
Connoisseur
porphyry5 began at the beginning.
 
Posts: 63
Karma: 10
Join Date: Apr 2013
Device: Kobo Clara, Onyx Boox Monte Cristo
Quote:
Originally Posted by morgon View Post
Hi,

I would like to convert a plain (ascii) text file to epub and generate a toc.

As this is a text file xpath does not make much sense (at least to me), but I can construct a regext that allows chapters to be detected.

How can I teach calibre to use this regex to detect chapters when converting plain text to epub?

many thanks!
I think the controlling factor is any line of text that does NOT end with a period, or other sentence ender, will generate a bolded chapter title line on a fresh page and an entry in the auto-generated TOC
I know the above is true for lines consisting only of a number, or the word Chapter or Part followed by a single word or a number.
If the last character of such a line is a period it is treated as any other sentence in the text.
porphyry5 is offline   Reply With Quote
Old 10-17-2015, 10:56 PM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,826
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
you can use a regex inside an xpath expression. And as documented here: http://manual.calibre-ebook.com/conv...l#introduction

you need to look at the html produced by the intermediate steps in the conversion to figure out what xpath to use.
kovidgoyal is offline   Reply With Quote
Old 10-17-2015, 11:09 PM   #5
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Quote:
Originally Posted by porphyry5 View Post
I think the controlling factor is any line of text that does NOT end with a period, or other sentence ender, will generate a bolded chapter title line on a fresh page and an entry in the auto-generated TOC
I know the above is true for lines consisting only of a number, or the word Chapter or Part followed by a single word or a number.
If the last character of such a line is a period it is treated as any other sentence in the text.
That's more to do with the markdown-to-HTML step (if applicable) of the conversion.
The xpath sees an h1 tag.
eschwartz is offline   Reply With Quote
Old 10-23-2015, 02:37 PM   #6
porphyry5
Connoisseur
porphyry5 began at the beginning.
 
Posts: 63
Karma: 10
Join Date: Apr 2013
Device: Kobo Clara, Onyx Boox Monte Cristo
Quote:
Originally Posted by eschwartz View Post
That's more to do with the markdown-to-HTML step (if applicable) of the conversion.
The xpath sees an h1 tag.
The original poster said
Quote:
Originally Posted by morgon
As this is a text file xpath does not make much sense (at least to me), but I can construct a regext that allows chapters to be detected.

How can I teach calibre to use this regex to detect chapters when converting plain text to epub?
That is, he was asking how to do it without using xpath.
porphyry5 is offline   Reply With Quote
Old 10-23-2015, 03:14 PM   #7
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Quote:
Originally Posted by porphyry5 View Post
The original poster said

That is, he was asking how to do it without using xpath.
No, that is in fact a crystal-clear example of "asking how to do it using xpath".
On account of xpath is where you would put a regex.



This is beside the fact that your mention of heuristics operating on the text does not, in fact, help unless one is willing to rewrite the book.

As I said -- that is markdown-to-html.
Converting from one format to another.

You have to have the book conform to those heuristics, not the other way around.
eschwartz is offline   Reply With Quote
Old 10-23-2015, 03:43 PM   #8
porphyry5
Connoisseur
porphyry5 began at the beginning.
 
Posts: 63
Karma: 10
Join Date: Apr 2013
Device: Kobo Clara, Onyx Boox Monte Cristo
Quote:
Originally Posted by eschwartz View Post
No, that is in fact a crystal-clear example of "asking how to do it using xpath".
Apparently we are arguing over the meaning of @morgon's statement "As this is a text file xpath does not make much sense...". You tell him how to do what he wants using xpath, I tell him how he can structure his plain text, without using xpath, to achieve the same result.

As has been said before, there is more than one way to skin a cat.
porphyry5 is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Help with detecting chapters vmd108 Library Management 7 07-30-2015 02:45 AM
Aura Kobo Aura - chapters detecting anthaet Kobo Reader 4 10-29-2014 03:41 PM
detecting chapters with --markdown p3aul Conversion 7 05-15-2011 11:01 AM
azw to mobi: Not detecting chapters/page break at chapters and no TOC RachDvn Calibre 3 01-16-2011 09:53 AM
Detecting chapters Tibor Calibre 4 01-17-2009 01:25 PM


All times are GMT -4. The time now is 03:19 PM.


MobileRead.com is a privately owned, operated and funded community.