Multiple line regexp - Page 2

kovidgoyal · 11-02-2010, 11:57 AM

as far as I know Xpath will match only single tags. If you have an expression that matches multiple tags, calibre will treat them as different entries in the TOC.

Manichean · 11-02-2010, 12:00 PM

And there goes that idea. If the class name for the p- tags are identical and unique across the entire document, you might want to use those for matching instead. If not, use chapter numbers...

janvanmaar · 11-02-2010, 12:16 PM

@Kovid: Thanks for clarification (and, of course, for the nice piece of SW!)

@Manichean: The <p> tags are not unique. Clearly, I can do the chapter numbers - the very first regexp in this thread does that reliably.
I think I can do better than that though by processing of the XHTML by a simple python script that will first do the regexping of chapter numbers and then merging the matching tag with the subsequent tag.
A better possibility would be to construct hidden toc but from this thread https://www.mobileread.com/forums/sho...d.php?t=105019 it seems that this is broken at the moment for mobi file output.
Unfortunately, no nice generic solution seems to be available though, as far as I can see.

ldolse · 11-02-2010, 12:43 PM

Your best bet (aside from the script you mentioned) is to try to convert to epub, edit the doc using Sigil to make it look like what you want, and then convert from epub to mobi. You could also use the debug output to grab an intermediate version of the html - the version from after pre-processing, edit that as required, then convert from html to mobi.

I just looked at the source code, the current code doesn't actually seem to include the case of a '.' appearing just after a numeric chapter header. If you open a bug with an example I can see that that case is added to the default preprocessing. bugs.calibre-ebook.com

janvanmaar · 11-02-2010, 01:02 PM

The Sigil program looks nice, I will try this route first, thanks for suggestion.

Not sure what do you mean by the second paragraph though. As far as I can say, if the dot is not present after the chapter number, the behaviour is exactly the same (I have just tested). I guess the problem is I don't know what kind of preprocessing do you mean here? Sorry, I am new to this - got my first ebook reader ever yesterday

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
<Command Line> Add multiple books in multiple formats	himitsu	Calibre	8	09-25-2010 11:07 PM
Bug: entries with multiple formats trigger multiple conversions	flinx1	Calibre	12	05-21-2010 06:23 AM
Gen3 Multiple dictionaries?	miquele	Bookeen	3	05-19-2010 04:16 PM
Regexp and header/footer problems	concern	Calibre	0	02-07-2010 03:35 AM
I'm in line	Tangabird	Introduce Yourself	4	11-12-2009 08:13 AM

11-02-2010, 11:57 AM	#16
kovidgoyal creator of calibre Posts: 46,372 Karma: 29630884 Join Date: Oct 2006 Location: Mumbai, India Device: Various	as far as I know Xpath will match only single tags. If you have an expression that matches multiple tags, calibre will treat them as different entries in the TOC.

11-02-2010, 12:00 PM	#17
Manichean Wizard Posts: 3,130 Karma: 91256 Join Date: Feb 2008 Location: Germany Device: Cybook Gen3	And there goes that idea. If the class name for the p- tags are identical and unique across the entire document, you might want to use those for matching instead. If not, use chapter numbers...

11-02-2010, 12:16 PM	#18
janvanmaar Addict Posts: 219 Karma: 404 Join Date: Nov 2010 Device: Kindle 3G, Samsung SIII	@Kovid: Thanks for clarification (and, of course, for the nice piece of SW!) @Manichean: The <p> tags are not unique. Clearly, I can do the chapter numbers - the very first regexp in this thread does that reliably. I think I can do better than that though by processing of the XHTML by a simple python script that will first do the regexping of chapter numbers and then merging the matching tag with the subsequent tag. A better possibility would be to construct hidden toc but from this thread https://www.mobileread.com/forums/sho...d.php?t=105019 it seems that this is broken at the moment for mobi file output. Unfortunately, no nice generic solution seems to be available though, as far as I can see.

11-02-2010, 12:43 PM	#19
ldolse Wizard Posts: 1,337 Karma: 123457 Join Date: Apr 2009 Location: Malaysia Device: PRS-650, iPhone	Your best bet (aside from the script you mentioned) is to try to convert to epub, edit the doc using Sigil to make it look like what you want, and then convert from epub to mobi. You could also use the debug output to grab an intermediate version of the html - the version from after pre-processing, edit that as required, then convert from html to mobi. I just looked at the source code, the current code doesn't actually seem to include the case of a '.' appearing just after a numeric chapter header. If you open a bug with an example I can see that that case is added to the default preprocessing. bugs.calibre-ebook.com

11-02-2010, 01:02 PM	#20
janvanmaar Addict Posts: 219 Karma: 404 Join Date: Nov 2010 Device: Kindle 3G, Samsung SIII	The Sigil program looks nice, I will try this route first, thanks for suggestion. Not sure what do you mean by the second paragraph though. As far as I can say, if the dot is not present after the chapter number, the behaviour is exactly the same (I have just tested). I guess the problem is I don't know what kind of preprocessing do you mean here? Sorry, I am new to this - got my first ebook reader ever yesterday

Advert

Advert