Ancestor element text extraction (chapter titles) in PDF header templates

NullNix · 07-31-2016, 02:34 PM

So I'm trying to turn an ebook into a PDF for various people who want paper copies (some people are stuck in the past or something

). I've resigned myself to the fact that I can't put distinct content in the headers on facing pages, but I'd like to put the chapter title (the contents of the closest preceding h1 element) into each heading. Reactivating rusted ten-years-unused XPath neurons, I've tried (ignore the spurious space inserted in '.innerHTML', I don't know what the forum software is playing at)

--pdf-header-template '<p id="chaptitle" style="text-align:center;"></p><script>document.getElementById("chaptitle").inn erHTML = document.evaluate("ancestor-or-self::h1[last()]/text()", "", null, XPathResult.STRING_TYPE,null).stringValue;</script>'

but unfortunately this yields an immense stream of

JS: NotSupportedError: DOM Exception 9: The implementation did not support the requested type of object or operation.

and no header content at all. Further, it seems that *any* use of ancestor-or-self in this construction has the same effect.

Has anyone done this sort of thing before (extracted text, attribute content, or something similar related to a node ancestral to the content on the current page during PDF conversion)? What am I doing wrong? It's probably something really stupid...

kovidgoyal · 07-31-2016, 02:53 PM

You want to put chapter titles in the header, look at the manual, it has examples of doing that.

http://manual.calibre-ebook.com/conv...verting-to-pdf

See the use of _SECTION_

NullNix · 07-31-2016, 04:23 PM

Quote:

Originally Posted by kovidgoyal

You want to put chapter titles in the header, look at the manual, it has examples of doing that.

http://manual.calibre-ebook.com/conv...verting-to-pdf

See the use of _SECTION_

Oh ho! The question now is... was this introduced in the last week or two, or can I just not read? I've been more or less *living* on that page and in the PDF template substitution code and completely failed to notice the existence of _SECTION_ at all. ... but then given that I also overlooked odd_page and even_page I think I simply missed it.

Calibre is, as ever, the king of pre-emptive feature additions! and also of stunningly fast responses to questions

I needed to do a few regex substitutions on it (don't want the chapter number in there, if there is one) but that works very well, thank you!

eschwartz · 07-31-2016, 07:12 PM

The forum software breaks up words that are too long once they reach the cutoff length. But for the purposes of code, you can use the [CODE][/CODE]* tags (which also monospaces it).

* -- Rendered visible through the (now recursive) power of [NOPARSE][/NOPARSE]

NullNix · 08-01-2016, 06:35 AM

Quote:

Originally Posted by eschwartz

The forum software breaks up words that are too long once they reach the cutoff length. But for the purposes of code, you can use the [CODE][/CODE]* tags (which also monospaces it).

Yeah, I tried CODE, but it tried to render the thing in a one-line-high box in the preview, which I figured would be far harder to read than any alternative.

kovidgoyal · 08-01-2016, 11:49 AM

Quote:

Originally Posted by NullNix

Oh ho! The question now is... was this introduced in the last week or two, or can I just not read?

It's been there for donkey's years

NullNix · 08-07-2016, 07:48 AM

Aside: the attached patch gives you a _TL_SECTION_ which is nailed to top-level section names only. It's quite nasty: in particular the code in do_paged_render() is too repetitive. But it works, and lets you do things in the header like arranging to populate it only when the top-level section has not changed since the last header laid out (-> the last page): this gives you the behaviour you get in most real books, where the header is silently omitted when new chapters start.

(MobileRead's anaemic attachment system won't let me upload it without calling it .txt, not .diff, but it is still a diff. Ugh. Bring back mailing lists, all is forgiven.)

kovidgoyal · 08-08-2016, 12:23 AM

https://github.com/kovidgoyal/calibr...cb254ee0a4f3c2

NullNix · 08-13-2016, 07:45 AM

Quote:

Originally Posted by kovidgoyal

https://github.com/kovidgoyal/calibr...cb254ee0a4f3c2

Oh good that's better code than my repetitive mess all around. Better name, too. (And a bit more code I don't have to forward-port!)

Thank you

NullNix · 08-13-2016, 07:46 AM

Hm, there's a typo in the manual section describing it, though:

Quote:

Similarly, there is a variable named _TOP_LEVEL_SECTION_ that can be used to ge the name of the current top-level section.

'ge' should obviously be 'get'.

07-31-2016, 02:34 PM	#1
NullNix Guru Posts: 916 Karma: 13928438 Join Date: Jan 2013 Location: Ely, Cambridgeshire, UK Device: Kindle Oasis 3, Kindle Oasis 1	Ancestor element text extraction (chapter titles) in PDF header templates So I'm trying to turn an ebook into a PDF for various people who want paper copies (some people are stuck in the past or something ). I've resigned myself to the fact that I can't put distinct content in the headers on facing pages, but I'd like to put the chapter title (the contents of the closest preceding h1 element) into each heading. Reactivating rusted ten-years-unused XPath neurons, I've tried (ignore the spurious space inserted in '.innerHTML', I don't know what the forum software is playing at) --pdf-header-template '<p id="chaptitle" style="text-align:center;"></p><script>document.getElementById("chaptitle").inn erHTML = document.evaluate("ancestor-or-self::h1[last()]/text()", "", null, XPathResult.STRING_TYPE,null).stringValue;</script>' but unfortunately this yields an immense stream of JS: NotSupportedError: DOM Exception 9: The implementation did not support the requested type of object or operation. and no header content at all. Further, it seems that any use of ancestor-or-self in this construction has the same effect. Has anyone done this sort of thing before (extracted text, attribute content, or something similar related to a node ancestral to the content on the current page during PDF conversion)? What am I doing wrong? It's probably something really stupid...

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
no text extraction for pdf with images and OCR	fxp33	Conversion	7	12-15-2015 07:22 AM
ePub->pdf: How to narrow space between header and book text?	EbokJunkie	Conversion	17	01-07-2015 02:17 AM
Need Text extraction engin from editable PDF	qsipl	Workshop	17	05-23-2014 07:26 PM
Image as Header for Chapter	Peter21	Sigil	43	12-07-2013 02:31 PM
PDF Conversion - Removing Header / Footer Text	heb	Sony Reader	9	07-11-2010 11:02 PM

07-31-2016, 02:53 PM	#2
kovidgoyal creator of calibre Posts: 43,860 Karma: 22666666 Join Date: Oct 2006 Location: Mumbai, India Device: Various	You want to put chapter titles in the header, look at the manual, it has examples of doing that. http://manual.calibre-ebook.com/conv...verting-to-pdf See the use of _SECTION_

07-31-2016, 07:12 PM	#4
eschwartz Ex-Helpdesk Junkie Posts: 19,422 Karma: 85397180 Join Date: Nov 2012 Location: The Beaten Path, USA, Roundworld, This Side of Infinity Device: Kindle Touch fw5.3.7 (Wifi only)	The forum software breaks up words that are too long once they reach the cutoff length. But for the purposes of code, you can use the [CODE][/CODE]* tags (which also monospaces it). * -- Rendered visible through the (now recursive) power of [NOPARSE][/NOPARSE]

08-08-2016, 12:23 AM	#8
kovidgoyal creator of calibre Posts: 43,860 Karma: 22666666 Join Date: Oct 2006 Location: Mumbai, India Device: Various	https://github.com/kovidgoyal/calibr...cb254ee0a4f3c2

Advert

Advert