03-08-2012, 02:13 PM | #1 |
Kindler of the Flame
Posts: 582
Karma: 646016
Join Date: Oct 2009
Location: US of A
Device: K DX,3,KT,KP,KF, KFHD; Nook C, PRS600, iPad, Xoom, N900, N810, Zaurus
|
How to automatically split (x)html in epub?
Let's say I have created an epub by hand with only large one (x)html for text (so, not up to epub spec). This file is very large, where each chapter/section starts with an <h2> or some other consistent tag. Furthermore, this file has thousands of internal hyperlinks.
Question: Is there a tool that would split this (x)html into many files (as many chapters/sections) and modify the internal hyperlinks accordingly? I don't want to have any other changes in the tags though. |
03-08-2012, 02:35 PM | #2 | |
Berti
Posts: 1,196
Karma: 4985964
Join Date: Jan 2012
Location: Zischebattem
Device: Acer Lumiread
|
Quote:
First exchange "<h2" to "<hr class="sigilChapterBreak" /><h2" Then press F6 Done. btw. "(so, not up to epub spec)" What's this ?? . How long is it ?? |
|
Advert | |
|
03-08-2012, 02:42 PM | #3 |
Grand Sorcerer
Posts: 27,549
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
You can do that with Sigil... however... I would not claim that it will absolutely leave all other tags (not to mention the formatting of the xhtml) alone. Turning off HTMLTidy will minimize the changes made to your code, but quite simply put... stuff's going to get changed (in addition to that which is necessary to accommodate the splitting/link-maintaining).
|
03-08-2012, 04:30 PM | #4 | ||
Kindler of the Flame
Posts: 582
Karma: 646016
Join Date: Oct 2009
Location: US of A
Device: K DX,3,KT,KP,KF, KFHD; Nook C, PRS600, iPad, Xoom, N900, N810, Zaurus
|
Quote:
Quote:
Last edited by osnova; 03-08-2012 at 04:34 PM. |
||
03-08-2012, 04:35 PM | #5 |
Kindler of the Flame
Posts: 582
Karma: 646016
Join Date: Oct 2009
Location: US of A
Device: K DX,3,KT,KP,KF, KFHD; Nook C, PRS600, iPad, Xoom, N900, N810, Zaurus
|
|
Advert | |
|
03-08-2012, 05:55 PM | #6 | |
Berti
Posts: 1,196
Karma: 4985964
Join Date: Jan 2012
Location: Zischebattem
Device: Acer Lumiread
|
Quote:
Epub will be a bit smaller than mobi. I made one (just for curiosity), which has 53Mb. It works at last on a pocketbook-mobile device. up to 20MB seams not to be a critical size. |
|
03-08-2012, 06:42 PM | #7 |
Grand Sorcerer
Posts: 27,549
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Oh yes. There'll be no drastic changes like if you converted with calibre. I didn't mean to imply that. I just didn't know how much of a perfectionist I might be dealing with, so I erred on the side of caution.
Sigil can still be quite sluggish when dealing with very large, single html files (more so in Book View than Code View, but still...), but the more you split it up... the more responsive it becomes. Depending on how huge your file is, it could still be quite painful—and possibly lock up. But I certainly wouldn't be afraid to try it. |
03-08-2012, 06:54 PM | #8 |
Kindler of the Flame
Posts: 582
Karma: 646016
Join Date: Oct 2009
Location: US of A
Device: K DX,3,KT,KP,KF, KFHD; Nook C, PRS600, iPad, Xoom, N900, N810, Zaurus
|
|
03-09-2012, 03:45 AM | #9 |
Berti
Posts: 1,196
Karma: 4985964
Join Date: Jan 2012
Location: Zischebattem
Device: Acer Lumiread
|
|
03-09-2012, 02:18 PM | #10 |
Kindler of the Flame
Posts: 582
Karma: 646016
Join Date: Oct 2009
Location: US of A
Device: K DX,3,KT,KP,KF, KFHD; Nook C, PRS600, iPad, Xoom, N900, N810, Zaurus
|
Just reporting that Sigil worked as you described even with a large file (it took a while though). I wish it were using multithreading because I have many CPU cores and only one (?) was taken up by the process. Anyway, thank you.
|
03-10-2012, 05:04 PM | #11 |
Resident Curmudgeon
Posts: 73,970
Karma: 128903378
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Run Sigil without loading any files at all. Turn off Tidy and then load your ePub and do all the splitting. That will do the least amount of harmful changes.
|
03-11-2012, 05:05 PM | #12 |
Fanatic
Posts: 580
Karma: 810184
Join Date: Sep 2010
Location: Norway
Device: prs-t1, tablet, Nook Simple, assorted kindles, iPad
|
Under unix-type operating systems (incl. OSX), you could use the csplit command, e.g.
csplit -f "chapters/" -b "%2.2d.xhtml" big_file.xhtml "/<h2/" "{*}" That'll split your file into chapters/00.xhtml, chapters/01.xhtml, ... However, everything before the first <h2> tag ends up in 00.xhtml, and the other files lack the enclosing <html><head>...</body></html> tags. Of course, a few shell commands can fix that, but I'll leave that as an exercise to the reader |
03-12-2012, 02:19 AM | #13 | |
Wizard
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
|
Quote:
|
|
03-12-2012, 05:20 AM | #14 |
Fanatic
Posts: 580
Karma: 810184
Join Date: Sep 2010
Location: Norway
Device: prs-t1, tablet, Nook Simple, assorted kindles, iPad
|
Well, that's another trivial scripting exercise left for the reader, then
( ~10 lines will do it) |
03-12-2012, 07:37 AM | #15 |
Fanatic
Posts: 580
Karma: 810184
Join Date: Sep 2010
Location: Norway
Device: prs-t1, tablet, Nook Simple, assorted kindles, iPad
|
This seems to do the job:
Code:
#!/bin/bash mkdir -p chapters sed '/<h2/s/^/<\/body>\n<\/html>\n/' $1| sed -n '1,/<body/{1h;1!H};/<h2/{x;p;x};p' | csplit -f "chapters/" -b "%2.2d.xhtml" - "/<\/html>/+1" "{$(( $(grep -c '<h2' $1) - 1 ))}" cd chapters for f in ??.xhtml do for t in $(grep -ho "href=.#[^\"']\+" $f|cut -c8-180) do sed -i "s/\([\"']\)#${t}['\"]/\1$(grep -l "\(name\|id\)=['\"]${t}['\"]" ??.xhtml|grep -v "${f}")#${t}\1/" $f done done Last edited by SBT; 03-12-2012 at 10:54 AM. Reason: Even more compact solution... |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
The Videos in the EPUB get automatically played when the EPUB is opened in the iBook | rangaman | ePub | 9 | 12-20-2012 06:57 PM |
[Old Thread] Avoid epub split in several html files? | mastroalex | Calibre | 18 | 12-03-2011 03:50 PM |
[Old Thread] mergin split html files with Calibre? | NASCARaddicted | Conversion | 13 | 12-03-2011 01:19 PM |
Automatically resizing epub images | dhume01 | Conversion | 4 | 03-15-2011 05:45 PM |
Split HTML Size to Speed-Up Page Turns | ade_mcc | Conversion | 2 | 02-01-2011 06:06 AM |