Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > ePub

Notices

Reply
 
Thread Tools Search this Thread
Old 03-08-2012, 02:13 PM   #1
osnova
Kindler of the Flame
osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.
 
osnova's Avatar
 
Posts: 582
Karma: 646016
Join Date: Oct 2009
Location: US of A
Device: K DX,3,KT,KP,KF, KFHD; Nook C, PRS600, iPad, Xoom, N900, N810, Zaurus
How to automatically split (x)html in epub?

Let's say I have created an epub by hand with only large one (x)html for text (so, not up to epub spec). This file is very large, where each chapter/section starts with an <h2> or some other consistent tag. Furthermore, this file has thousands of internal hyperlinks.

Question: Is there a tool that would split this (x)html into many files (as many chapters/sections) and modify the internal hyperlinks accordingly? I don't want to have any other changes in the tags though.
osnova is offline   Reply With Quote
Old 03-08-2012, 02:35 PM   #2
mmat1
Berti
mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.
 
mmat1's Avatar
 
Posts: 1,197
Karma: 4985964
Join Date: Jan 2012
Location: Zischebattem
Device: Acer Lumiread
Quote:
Originally Posted by osnova View Post
Let's say I have created an epub by hand with only large one (x)html for text (so, not up to epub spec).
You can do it with sigil.

First exchange "<h2"
to "<hr class="sigilChapterBreak" /><h2"

Then press F6

Done.

btw. "(so, not up to epub spec)" What's this ?? . How long is it ??
mmat1 is offline   Reply With Quote
Old 03-08-2012, 02:42 PM   #3
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,853
Karma: 207000000
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
You can do that with Sigil... however... I would not claim that it will absolutely leave all other tags (not to mention the formatting of the xhtml) alone. Turning off HTMLTidy will minimize the changes made to your code, but quite simply put... stuff's going to get changed (in addition to that which is necessary to accommodate the splitting/link-maintaining).
DiapDealer is online now   Reply With Quote
Old 03-08-2012, 04:30 PM   #4
osnova
Kindler of the Flame
osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.
 
osnova's Avatar
 
Posts: 582
Karma: 646016
Join Date: Oct 2009
Location: US of A
Device: K DX,3,KT,KP,KF, KFHD; Nook C, PRS600, iPad, Xoom, N900, N810, Zaurus
Quote:
Originally Posted by mmat1 View Post
You can do it with sigil.

First exchange "<h2"
to "<hr class="sigilChapterBreak" /><h2"

Then press F6

Done.
Thank you so much. I'll try it. The last time I looked at Sigil (was about a year ago), it coughed on such large files. I usually do everything by hand (using emeditor).

Quote:
Originally Posted by mmat1 View Post
btw. "(so, not up to epub spec)" What's this ?? . How long is it ??
If you look at the link to the OSNOVA List below, you'll see that I tend to make huge book collections and books. For example, a Bible commentary that has 9 huge volumes plus the Bible itself in one file. Or the works of Jonathan Edwards. I'd like to convert all my mobi files to epubs as well.

Last edited by osnova; 03-08-2012 at 04:34 PM.
osnova is offline   Reply With Quote
Old 03-08-2012, 04:35 PM   #5
osnova
Kindler of the Flame
osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.
 
osnova's Avatar
 
Posts: 582
Karma: 646016
Join Date: Oct 2009
Location: US of A
Device: K DX,3,KT,KP,KF, KFHD; Nook C, PRS600, iPad, Xoom, N900, N810, Zaurus
Quote:
Originally Posted by DiapDealer View Post
You can do that with Sigil... however... I would not claim that it will absolutely leave all other tags (not to mention the formatting of the xhtml) alone.
I just want to avoid radical changes that e.g. Calibre does.
osnova is offline   Reply With Quote
Old 03-08-2012, 05:55 PM   #6
mmat1
Berti
mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.
 
mmat1's Avatar
 
Posts: 1,197
Karma: 4985964
Join Date: Jan 2012
Location: Zischebattem
Device: Acer Lumiread
Quote:
Originally Posted by osnova View Post
The last time I looked at Sigil (was about a year ago), it coughed on such large files. I usually do everything by hand (using emeditor).
Fine. Diap Dealer is right, sigil will made some changes. But that's nothing compared to the "formatting" calibre creates.

Epub will be a bit smaller than mobi. I made one (just for curiosity), which has 53Mb. It works at last on a pocketbook-mobile device. up to 20MB seams not to be a critical size.
mmat1 is offline   Reply With Quote
Old 03-08-2012, 06:42 PM   #7
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,853
Karma: 207000000
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Oh yes. There'll be no drastic changes like if you converted with calibre. I didn't mean to imply that. I just didn't know how much of a perfectionist I might be dealing with, so I erred on the side of caution.

Sigil can still be quite sluggish when dealing with very large, single html files (more so in Book View than Code View, but still...), but the more you split it up... the more responsive it becomes. Depending on how huge your file is, it could still be quite painful—and possibly lock up. But I certainly wouldn't be afraid to try it.
DiapDealer is online now   Reply With Quote
Old 03-08-2012, 06:54 PM   #8
osnova
Kindler of the Flame
osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.
 
osnova's Avatar
 
Posts: 582
Karma: 646016
Join Date: Oct 2009
Location: US of A
Device: K DX,3,KT,KP,KF, KFHD; Nook C, PRS600, iPad, Xoom, N900, N810, Zaurus
Quote:
Originally Posted by mmat1 View Post
I made one (just for curiosity), which has 53Mb
Did you use Sigil for this one?
osnova is offline   Reply With Quote
Old 03-09-2012, 03:45 AM   #9
mmat1
Berti
mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.
 
mmat1's Avatar
 
Posts: 1,197
Karma: 4985964
Join Date: Jan 2012
Location: Zischebattem
Device: Acer Lumiread
Quote:
Originally Posted by osnova View Post
Did you use Sigil for this one?
Yes, it gets slow, but it works
mmat1 is offline   Reply With Quote
Old 03-09-2012, 02:18 PM   #10
osnova
Kindler of the Flame
osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.osnova ought to be getting tired of karma fortunes by now.
 
osnova's Avatar
 
Posts: 582
Karma: 646016
Join Date: Oct 2009
Location: US of A
Device: K DX,3,KT,KP,KF, KFHD; Nook C, PRS600, iPad, Xoom, N900, N810, Zaurus
Just reporting that Sigil worked as you described even with a large file (it took a while though). I wish it were using multithreading because I have many CPU cores and only one (?) was taken up by the process. Anyway, thank you.
osnova is offline   Reply With Quote
Old 03-10-2012, 05:04 PM   #11
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 80,660
Karma: 150249619
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Run Sigil without loading any files at all. Turn off Tidy and then load your ePub and do all the splitting. That will do the least amount of harmful changes.
JSWolf is offline   Reply With Quote
Old 03-11-2012, 05:05 PM   #12
SBT
Fanatic
SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.
 
SBT's Avatar
 
Posts: 580
Karma: 810184
Join Date: Sep 2010
Location: Norway
Device: prs-t1, tablet, Nook Simple, assorted kindles, iPad
Under unix-type operating systems (incl. OSX), you could use the csplit command, e.g.
csplit -f "chapters/" -b "%2.2d.xhtml" big_file.xhtml "/<h2/" "{*}"
That'll split your file into chapters/00.xhtml, chapters/01.xhtml, ...
However, everything before the first <h2> tag ends up in 00.xhtml, and the other files lack the enclosing <html><head>...</body></html> tags. Of course, a few shell commands can fix that, but I'll leave that as an exercise to the reader
SBT is offline   Reply With Quote
Old 03-12-2012, 02:19 AM   #13
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
Quote:
Originally Posted by SBT View Post
Under unix-type operating systems (incl. OSX), you could use the csplit command, e.g.
csplit -f "chapters/" -b "%2.2d.xhtml" big_file.xhtml "/<h2/" "{*}"
That'll split your file into chapters/00.xhtml, chapters/01.xhtml, ...
However, everything before the first <h2> tag ends up in 00.xhtml, and the other files lack the enclosing <html><head>...</body></html> tags. Of course, a few shell commands can fix that, but I'll leave that as an exercise to the reader
But this will not help you if you have internal links...
Toxaris is offline   Reply With Quote
Old 03-12-2012, 05:20 AM   #14
SBT
Fanatic
SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.
 
SBT's Avatar
 
Posts: 580
Karma: 810184
Join Date: Sep 2010
Location: Norway
Device: prs-t1, tablet, Nook Simple, assorted kindles, iPad
Well, that's another trivial scripting exercise left for the reader, then
( ~10 lines will do it)
SBT is offline   Reply With Quote
Old 03-12-2012, 07:37 AM   #15
SBT
Fanatic
SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.
 
SBT's Avatar
 
Posts: 580
Karma: 810184
Join Date: Sep 2010
Location: Norway
Device: prs-t1, tablet, Nook Simple, assorted kindles, iPad
This seems to do the job:
Code:
#!/bin/bash
mkdir -p chapters
sed  '/<h2/s/^/<\/body>\n<\/html>\n/' $1| sed -n '1,/<body/{1h;1!H};/<h2/{x;p;x};p'  | csplit -f "chapters/" -b "%2.2d.xhtml" - "/<\/html>/+1" "{$(( $(grep -c '<h2' $1) - 1 ))}"
cd chapters
for f in ??.xhtml
do for t in $(grep -ho "href=.#[^\"']\+" $f|cut -c8-180)
   do sed -i "s/\([\"']\)#${t}['\"]/\1$(grep -l "\(name\|id\)=['\"]${t}['\"]" ??.xhtml|grep -v "${f}")#${t}\1/" $f
   done
done

Last edited by SBT; 03-12-2012 at 10:54 AM. Reason: Even more compact solution...
SBT is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
The Videos in the EPUB get automatically played when the EPUB is opened in the iBook rangaman ePub 9 12-20-2012 06:57 PM
[Old Thread] Avoid epub split in several html files? mastroalex Calibre 18 12-03-2011 03:50 PM
[Old Thread] mergin split html files with Calibre? NASCARaddicted Conversion 13 12-03-2011 01:19 PM
Automatically resizing epub images dhume01 Conversion 4 03-15-2011 05:45 PM
Split HTML Size to Speed-Up Page Turns ade_mcc Conversion 2 02-01-2011 06:06 AM


All times are GMT -4. The time now is 12:07 PM.


MobileRead.com is a privately owned, operated and funded community.