detecting chapters in plain txt

morgon · 09-29-2015, 09:54 PM

Hi,

I would like to convert a plain (ascii) text file to epub and generate a toc.

As this is a text file xpath does not make much sense (at least to me), but I can construct a regext that allows chapters to be detected.

How can I teach calibre to use this regex to detect chapters when converting plain text to epub?

many thanks!

theducks · 09-29-2015, 10:05 PM

porphyry5 · 10-17-2015, 04:49 PM

Quote:

Originally Posted by morgon

Hi,

I would like to convert a plain (ascii) text file to epub and generate a toc.

As this is a text file xpath does not make much sense (at least to me), but I can construct a regext that allows chapters to be detected.

How can I teach calibre to use this regex to detect chapters when converting plain text to epub?

many thanks!

I think the controlling factor is any line of text that does NOT end with a period, or other sentence ender, will generate a bolded chapter title line on a fresh page and an entry in the auto-generated TOC
I know the above is true for lines consisting only of a number, or the word Chapter or Part followed by a single word or a number.
If the last character of such a line is a period it is treated as any other sentence in the text.

kovidgoyal · 10-17-2015, 10:56 PM

you can use a regex inside an xpath expression. And as documented here: http://manual.calibre-ebook.com/conv...l#introduction

you need to look at the html produced by the intermediate steps in the conversion to figure out what xpath to use.

eschwartz · 10-17-2015, 11:09 PM

Quote:

Originally Posted by porphyry5

I think the controlling factor is any line of text that does NOT end with a period, or other sentence ender, will generate a bolded chapter title line on a fresh page and an entry in the auto-generated TOC
I know the above is true for lines consisting only of a number, or the word Chapter or Part followed by a single word or a number.
If the last character of such a line is a period it is treated as any other sentence in the text.

That's more to do with the markdown-to-HTML step (if applicable) of the conversion.
The xpath sees an h1 tag.

porphyry5 · 10-23-2015, 02:37 PM

Quote:

Originally Posted by eschwartz

That's more to do with the markdown-to-HTML step (if applicable) of the conversion.
The xpath sees an h1 tag.

The original poster said

Quote:

Originally Posted by morgon

As this is a text file xpath does not make much sense (at least to me), but I can construct a regext that allows chapters to be detected.

How can I teach calibre to use this regex to detect chapters when converting plain text to epub?

That is, he was asking how to do it without using xpath.

eschwartz · 10-23-2015, 03:14 PM

Quote:

Originally Posted by porphyry5

The original poster said

That is, he was asking how to do it without using xpath.

No, that is in fact a crystal-clear example of "asking how to do it using xpath".
On account of xpath is where you would put a regex.

This is beside the fact that your mention of heuristics operating on the text does not, in fact, help unless one is willing to rewrite the book.

As I said -- that is markdown-to-html.
Converting from one format to another.

You have to have the book conform to those heuristics, not the other way around.

porphyry5 · 10-23-2015, 03:43 PM

Quote:

Originally Posted by eschwartz

No, that is in fact a crystal-clear example of "asking how to do it using xpath".

Apparently we are arguing over the meaning of @morgon's statement "As this is a text file xpath does not make much sense...". You tell him how to do what he wants using xpath, I tell him how he can structure his plain text, without using xpath, to achieve the same result.

As has been said before, there is more than one way to skin a cat.

09-29-2015, 09:54 PM	#1
morgon Junior Member Posts: 3 Karma: 10 Join Date: Sep 2015 Device: none	detecting chapters in plain txt Hi, I would like to convert a plain (ascii) text file to epub and generate a toc. As this is a text file xpath does not make much sense (at least to me), but I can construct a regext that allows chapters to be detected. How can I teach calibre to use this regex to detect chapters when converting plain text to epub? many thanks!

Thread Tools	Search this Thread
Show Printable Version Email this Page	Search this Thread: Advanced Search

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Help with detecting chapters	vmd108	Library Management	7	07-30-2015 02:45 AM
Aura Kobo Aura - chapters detecting	anthaet	Kobo Reader	4	10-29-2014 03:41 PM
detecting chapters with --markdown	p3aul	Conversion	7	05-15-2011 11:01 AM
azw to mobi: Not detecting chapters/page break at chapters and no TOC	RachDvn	Calibre	3	01-16-2011 09:53 AM
Detecting chapters	Tibor	Calibre	4	01-17-2009 01:25 PM

09-29-2015, 10:05 PM	#2
theducks Well trained by Cats Posts: 29,762 Karma: 54401244 Join Date: Aug 2009 Location: The Central Coast of California Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A	The default xpath looks for these keywords: (chapter\|book\|section\|part)\s+)\|((prolog\|prologue\| epilogue)

10-17-2015, 10:56 PM	#4
kovidgoyal creator of calibre Posts: 43,826 Karma: 22666666 Join Date: Oct 2006 Location: Mumbai, India Device: Various	you can use a regex inside an xpath expression. And as documented here: http://manual.calibre-ebook.com/conv...l#introduction you need to look at the html produced by the intermediate steps in the conversion to figure out what xpath to use.