Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 04-24-2020, 05:05 PM   #1
LoganW
Enthusiast
LoganW began at the beginning.
 
Posts: 34
Karma: 10
Join Date: Nov 2015
Device: Kindle Touch, Paperwhite, Voyage, Oasis 2
Splitting many large html files

So I know from the manual and from another thread that you can only split on one html file at a time.

But I have a book with 60+ large files that I'd like to split, and rather than opening each one and right-clicking and splitting on <h2> I would just prefer to automate it. Can that be done with a conversion and XPath somehow? The book is currently in AZW3 format.
LoganW is offline   Reply With Quote
Old 04-24-2020, 05:27 PM   #2
gbm
Wizard
gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.
 
Posts: 2,185
Karma: 8888888
Join Date: Jun 2010
Device: Kobo Clara HD,Hisence Sero 7 Pro RIP, Nook STR, jetbook lite
Quote:
Originally Posted by LoganW View Post
So I know from the manual and from another thread that you can only split on one html file at a time.

But I have a book with 60+ large files that I'd like to split, and rather than opening each one and right-clicking and splitting on <h2> I would just prefer to automate it. Can that be done with a conversion and XPath somehow? The book is currently in AZW3 format.
Using the calibre ebook editor.

Yes:
https://manual.calibre-ebook.com/edi...ing-html-files

What I do is combine all the html files in the ebook into one file, then right click and click split at multiple locations, then use a xpath to do the splits.

bernie
Attached Thumbnails
Click image for larger version

Name:	Screenshot from 2020-04-24 17-22-03.png
Views:	59
Size:	367.8 KB
ID:	178654  
gbm is offline   Reply With Quote
Advert
Old 04-24-2020, 05:33 PM   #3
jackie_w
Grand Sorcerer
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 6,251
Karma: 16539642
Join Date: Sep 2009
Location: UK
Device: ClaraHD, Forma, Libra2, Clara2E, LibraCol, PBTouchHD3
Quote:
Originally Posted by LoganW View Post
So I know from the manual and from another thread that you can only split on one html file at a time.

But I have a book with 60+ large files that I'd like to split, and rather than opening each one and right-clicking and splitting on <h2> I would just prefer to automate it. Can that be done with a conversion and XPath somehow? The book is currently in AZW3 format.
If you're referring to splitting using the calibre Editor ... this isn't automated but it's what I do:
  1. Use the Merge option to merge all the text files into one huge file.
  2. Then use the normal right-click split on <h2> on that file.
jackie_w is offline   Reply With Quote
Old 04-24-2020, 05:36 PM   #4
LoganW
Enthusiast
LoganW began at the beginning.
 
Posts: 34
Karma: 10
Join Date: Nov 2015
Device: Kindle Touch, Paperwhite, Voyage, Oasis 2
That makes sense. And if I wanted to split on both h1 and h2 then do something like the following?

Code:
//*[name()='h1' or name()='h2']
LoganW is offline   Reply With Quote
Old 04-24-2020, 05:43 PM   #5
jackie_w
Grand Sorcerer
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 6,251
Karma: 16539642
Join Date: Sep 2009
Location: UK
Device: ClaraHD, Forma, Libra2, Clara2E, LibraCol, PBTouchHD3
Quote:
Originally Posted by LoganW View Post
That makes sense. And if I wanted to split on both h1 and h2 then do something like the following?

Code:
//*[name()='h1' or name()='h2']
You'd need to try it. I'm afraid XPath expressions are not my strong point

I think I've used this one in the past:
Code:
//h:h1|//h:h2
jackie_w is offline   Reply With Quote
Advert
Old 04-24-2020, 06:54 PM   #6
gbm
Wizard
gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.
 
Posts: 2,185
Karma: 8888888
Join Date: Jun 2010
Device: Kobo Clara HD,Hisence Sero 7 Pro RIP, Nook STR, jetbook lite
Quote:
Originally Posted by jackie_w View Post
You'd need to try it. I'm afraid XPath expressions are not my strong point

I think I've used this one in the past:
Code:
//h:h1|//h:h2
Yes that works.

I have also used something like this to match the heading class attribute when chapter heading used a tag other than an h*, e.g. a p tag.
Code:
//*[re:test(@class, "chapter|chapter1", "i")]
bernie

Last edited by gbm; 04-24-2020 at 06:56 PM.
gbm is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Converting extremely large HTML files dstone Conversion 2 09-30-2017 10:11 PM
Splitting multiple html files? nqk Editor 5 11-27-2015 02:08 AM
splitting html files? NASCARaddicted ePub 8 01-22-2013 04:13 AM
How To Stop It From Splitting HTML Files? Ransom Calibre 8 06-12-2011 02:08 PM
Does splitting EPUB among more HTML files improve Performance? purcelljf ePub 2 10-01-2010 01:15 AM


All times are GMT -4. The time now is 06:15 PM.


MobileRead.com is a privately owned, operated and funded community.