Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old 06-24-2015, 07:38 AM   #1
lealla
Enthusiast
lealla will become famous soon enoughlealla will become famous soon enoughlealla will become famous soon enoughlealla will become famous soon enoughlealla will become famous soon enoughlealla will become famous soon enoughlealla will become famous soon enough
 
lealla's Avatar
 
Posts: 39
Karma: 714
Join Date: Jun 2015
Device: Kobo Aura H2O
What is the Xpath for "Split html at the word 'chapter"

Hi all,

Can anybody tell me what the xpath is for "Split html at any time the word "chapter" is found.

I'm referring to an xpath code that can be used in the Calibre editing program, within the 'split at multiple locations' dialogue box?

This is to find ANY occasion of the word chapter in an epub, regardless of if it is in a <h> <p> or <span> tag. Any tag.

Thank you! I've tried and tried, but only failed so far!
lealla is offline   Reply With Quote
Old 06-24-2015, 08:24 PM   #2
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Do you want it to split on the actual word "chapter"? I don't think you can do that.

Splitting on any tag node which contains the word "chapter":
Code:
//*[re:test(., "chapter", "i")]


P.S. calibre includes an XPath builder wizard.
eschwartz is offline   Reply With Quote
Advert
Old 06-25-2015, 07:15 AM   #3
lealla
Enthusiast
lealla will become famous soon enoughlealla will become famous soon enoughlealla will become famous soon enoughlealla will become famous soon enoughlealla will become famous soon enoughlealla will become famous soon enoughlealla will become famous soon enough
 
lealla's Avatar
 
Posts: 39
Karma: 714
Join Date: Jun 2015
Device: Kobo Aura H2O
Hi erschwartz, thank you for your reply

Yes, I would like to split the file wherever there is the word chapter within the text itself (not within the coding).

I tried //*[re:test(., "chapter", "i")] but I got an error saying

"Cannot split on the
tag"

With the weird space between 'the' and 'tag'.

I don't know if this helps, but here is how the actual text is set up at the moment:

*EDIT - I just got rid of all the <span>+<div> tags, but this hasn't made any difference.
looks like this at the moment:
Here is the beginning of my html:

Quote:
<?xml version='1.0' encoding='utf-8'?>
<html xmlns="http://www.w3.org/1999/xhtml"><head>
<meta name="generator" content="http://calibre-ebook.com"/>
<title>unnamed</title>
<meta name="author" content="lealla"/>
<meta name="creation-time" content="2014-6-11"/>
<meta name="revision-time" content="2014-6-11"/>

<link href="stylesheet.css" rel="stylesheet" type="text/css"/>
<link href="page_styles.css" rel="stylesheet" type="text/css"/>
</head>
<body class="calibre">

<p class="calibre2">Chapter One</p><p class="calibre2">
I'd like it to split the html at any point before chapter headings. It's not a great epub - converted from a PDF. I can't use any of the classes in the builder wizard (calibre2) ect, as they appear all over the place, and not just in chapter headings.

Would it be worth uninstalling and reinstalling calibre?

Thank you again for your help

Last edited by lealla; 06-25-2015 at 08:08 AM.
lealla is offline   Reply With Quote
Old 06-25-2015, 06:57 PM   #4
gbm
Wizard
gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.
 
Posts: 2,181
Karma: 8888888
Join Date: Jun 2010
Device: Kobo Clara HD,Hisence Sero 7 Pro RIP, Nook STR, jetbook lite
Quote:
Originally Posted by lealla View Post
Hi erschwartz, thank you for your reply

Yes, I would like to split the file wherever there is the word chapter within the text itself (not within the coding).

I tried //*[re:test(., "chapter", "i")] but I got an error saying

"Cannot split on the
tag"

With the weird space between 'the' and 'tag'.

I don't know if this helps, but here is how the actual text is set up at the moment:

*EDIT - I just got rid of all the <span>+<div> tags, but this hasn't made any difference.
looks like this at the moment:
Here is the beginning of my html:



I'd like it to split the html at any point before chapter headings. It's not a great epub - converted from a PDF. I can't use any of the classes in the builder wizard (calibre2) ect, as they appear all over the place, and not just in chapter headings.

Would it be worth uninstalling and reinstalling calibre?

Thank you again for your help
Try this:
Code:
//*[((name()='p' ) and re:test(., 'chapter|book|section|part|prologue|epilogue\s+', 'i')) or @class = 'chapter']

bernie
gbm is offline   Reply With Quote
Old 06-25-2015, 07:21 PM   #5
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 31,001
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by lealla View Post
Hi erschwartz, thank you for your reply

Yes, I would like to split the file wherever there is the word chapter within the text itself (not within the coding).

I tried //*[re:test(., "chapter", "i")] but I got an error saying

"Cannot split on the
tag"

With the weird space between 'the' and 'tag'.

I don't know if this helps, but here is how the actual text is set up at the moment:

*EDIT - I just got rid of all the <span>+<div> tags, but this hasn't made any difference.
looks like this at the moment:
Here is the beginning of my html:



I'd like it to split the html at any point before chapter headings. It's not a great epub - converted from a PDF. I can't use any of the classes in the builder wizard (calibre2) ect, as they appear all over the place, and not just in chapter headings.

Would it be worth uninstalling and reinstalling calibre?

Thank you again for your help
FWIW Uninstalling Calibre will not clear (bad) Settings.
You can simply delete the entire (hidden) Calibre Configuration folder while calibre is NOT running. Preference: Miscellaneous: <the Button> to open that folder. It will re-create the defaults.

BTW Are you discussing CONVERSIONS, not (hand) editing using the Editor?
theducks is online now   Reply With Quote
Advert
Old 06-26-2015, 03:32 AM   #6
lealla
Enthusiast
lealla will become famous soon enoughlealla will become famous soon enoughlealla will become famous soon enoughlealla will become famous soon enoughlealla will become famous soon enoughlealla will become famous soon enoughlealla will become famous soon enough
 
lealla's Avatar
 
Posts: 39
Karma: 714
Join Date: Jun 2015
Device: Kobo Aura H2O
Quote:
Originally Posted by gbm View Post
Try this:
Code:
//*[((name()='p' ) and re:test(., 'chapter|book|section|part|prologue|epilogue\s+', 'i')) or @class = 'chapter']

bernie
Thanks bernie!

That worked a little bit, in that it did split the document, but it didn't only split it on the word chapter.
There are 19 chapters, but I ended up with 90 html files.
I couldn't see any reoccurring themes - i.e. they all were split on different words, none that were a part of the code. It split on words such as "The" "She" "It". It did also split on the word Chapter though.

As far as code goes, all the tags preceding the words are the same: </p><p class="calibre2">

Thanks to theducks advice, I deleted the calibre configuration folder before I started, but this didn't seem to make much difference either way (but was a good tip nonetheless.)

Just to confirm, yes this is within the Calibre Edit Book area, I'm right clicking within the code area and selecting "Split at multiple locations" and then pasting the code into the dialogue box that pops up.

Thank you for taking the time to help, I really appreciate it. I'm not too crash hot at all this, but I'm learning loads as I go.

Last edited by lealla; 06-26-2015 at 04:02 AM.
lealla is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Split long words using the "¬" character (small screens) DSpider Workshop 5 03-16-2012 07:09 AM
George R. R. Martin's "A Dance With Dragons" to be split into separate books. Exer General Discussions 4 04-02-2011 08:50 AM
PDF to WORD/HTML conversion, "special characters and marks" errors chengyibo PDF 3 11-06-2010 12:43 AM
MS Word "crap" at beginning of html files PatNY Sigil 23 10-21-2010 06:22 PM
Any way to revert the "Do No Split On Page Breaks" option? dsana123 Calibre 2 07-10-2010 02:37 PM


All times are GMT -4. The time now is 07:35 PM.


MobileRead.com is a privately owned, operated and funded community.