|  03-08-2012, 02:13 PM | #1 | 
| Kindler of the Flame            Posts: 582 Karma: 646016 Join Date: Oct 2009 Location: US of A Device: K DX,3,KT,KP,KF, KFHD; Nook C, PRS600, iPad, Xoom, N900, N810, Zaurus | 
				
				How to automatically split (x)html in epub?
			 
			
			Let's say I have created an epub by hand with only large one (x)html for text (so, not up to epub spec).  This file is very large, where each chapter/section starts with an <h2> or some other consistent tag.  Furthermore, this file has thousands of internal hyperlinks. Question: Is there a tool that would split this (x)html into many files (as many chapters/sections) and modify the internal hyperlinks accordingly? I don't want to have any other changes in the tags though. | 
|   |   | 
|  03-08-2012, 02:35 PM | #2 | |
| Berti            Posts: 1,197 Karma: 4985964 Join Date: Jan 2012 Location: Zischebattem Device: Acer Lumiread | Quote: 
 First exchange "<h2" to "<hr class="sigilChapterBreak" /><h2" Then press F6 Done. btw. "(so, not up to epub spec)" What's this ??  . How long is it ?? | |
|   |   | 
|  03-08-2012, 02:42 PM | #3 | 
| Grand Sorcerer            Posts: 28,863 Karma: 207000000 Join Date: Jan 2010 Device: Nexus 7, Kindle Fire HD | 
			
			You can do that with Sigil... however... I would not claim that it will absolutely leave all other tags (not to mention the formatting of the xhtml) alone. Turning off HTMLTidy will minimize the changes made to your code, but quite simply put... stuff's going to get changed (in addition to that which is necessary to accommodate the splitting/link-maintaining).
		 | 
|   |   | 
|  03-08-2012, 04:30 PM | #4 | ||
| Kindler of the Flame            Posts: 582 Karma: 646016 Join Date: Oct 2009 Location: US of A Device: K DX,3,KT,KP,KF, KFHD; Nook C, PRS600, iPad, Xoom, N900, N810, Zaurus | Quote: 
 Quote: 
 Last edited by osnova; 03-08-2012 at 04:34 PM. | ||
|   |   | 
|  03-08-2012, 04:35 PM | #5 | 
| Kindler of the Flame            Posts: 582 Karma: 646016 Join Date: Oct 2009 Location: US of A Device: K DX,3,KT,KP,KF, KFHD; Nook C, PRS600, iPad, Xoom, N900, N810, Zaurus | |
|   |   | 
|  03-08-2012, 05:55 PM | #6 | |
| Berti            Posts: 1,197 Karma: 4985964 Join Date: Jan 2012 Location: Zischebattem Device: Acer Lumiread | Quote: 
 Epub will be a bit smaller than mobi. I made one (just for curiosity), which has 53Mb. It works at last on a pocketbook-mobile device. up to 20MB seams not to be a critical size. | |
|   |   | 
|  03-08-2012, 06:42 PM | #7 | 
| Grand Sorcerer            Posts: 28,863 Karma: 207000000 Join Date: Jan 2010 Device: Nexus 7, Kindle Fire HD | 
			
			Oh yes. There'll be no drastic changes like if you converted with calibre. I didn't mean to imply that. I just didn't know how much of a perfectionist I might be dealing with, so I erred on the side of caution.   Sigil can still be quite sluggish when dealing with very large, single html files (more so in Book View than Code View, but still...), but the more you split it up... the more responsive it becomes. Depending on how huge your file is, it could still be quite painful—and possibly lock up. But I certainly wouldn't be afraid to try it.   | 
|   |   | 
|  03-08-2012, 06:54 PM | #8 | 
| Kindler of the Flame            Posts: 582 Karma: 646016 Join Date: Oct 2009 Location: US of A Device: K DX,3,KT,KP,KF, KFHD; Nook C, PRS600, iPad, Xoom, N900, N810, Zaurus | |
|   |   | 
|  03-09-2012, 03:45 AM | #9 | 
| Berti            Posts: 1,197 Karma: 4985964 Join Date: Jan 2012 Location: Zischebattem Device: Acer Lumiread | |
|   |   | 
|  03-09-2012, 02:18 PM | #10 | 
| Kindler of the Flame            Posts: 582 Karma: 646016 Join Date: Oct 2009 Location: US of A Device: K DX,3,KT,KP,KF, KFHD; Nook C, PRS600, iPad, Xoom, N900, N810, Zaurus | 
			
			Just reporting that Sigil worked as you described even with a large file (it took a while though).  I wish it were using multithreading because I have many CPU cores and only one (?) was taken up by the process.  Anyway, thank you.
		 | 
|   |   | 
|  03-10-2012, 05:04 PM | #11 | 
| Resident Curmudgeon            Posts: 80,675 Karma: 150249619 Join Date: Nov 2006 Location: Roslindale, Massachusetts Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3 | 
			
			Run Sigil without loading any files at all. Turn off Tidy and then load your ePub and do all the splitting. That will do the least amount of harmful changes.
		 | 
|   |   | 
|  03-11-2012, 05:05 PM | #12 | 
| Fanatic            Posts: 580 Karma: 810184 Join Date: Sep 2010 Location: Norway Device: prs-t1,  tablet, Nook Simple, assorted kindles, iPad | 
			
			Under unix-type operating systems (incl. OSX), you could use the csplit command, e.g. csplit -f "chapters/" -b "%2.2d.xhtml" big_file.xhtml "/<h2/" "{*}" That'll split your file into chapters/00.xhtml, chapters/01.xhtml, ... However, everything before the first <h2> tag ends up in 00.xhtml, and the other files lack the enclosing <html><head>...</body></html> tags. Of course, a few shell commands can fix that, but I'll leave that as an exercise to the reader   | 
|   |   | 
|  03-12-2012, 02:19 AM | #13 | |
| Wizard            Posts: 4,520 Karma: 121692313 Join Date: Oct 2009 Location: Heemskerk, NL Device: PRS-T1, Kobo Touch, Kobo Aura | Quote: 
 | |
|   |   | 
|  03-12-2012, 05:20 AM | #14 | 
| Fanatic            Posts: 580 Karma: 810184 Join Date: Sep 2010 Location: Norway Device: prs-t1,  tablet, Nook Simple, assorted kindles, iPad | 
			
			Well, that's another trivial scripting exercise left for the reader, then    ( ~10 lines will do it) | 
|   |   | 
|  03-12-2012, 07:37 AM | #15 | 
| Fanatic            Posts: 580 Karma: 810184 Join Date: Sep 2010 Location: Norway Device: prs-t1,  tablet, Nook Simple, assorted kindles, iPad | 
			
			This seems to do the job: Code: #!/bin/bash
mkdir -p chapters
sed  '/<h2/s/^/<\/body>\n<\/html>\n/' $1| sed -n '1,/<body/{1h;1!H};/<h2/{x;p;x};p'  | csplit -f "chapters/" -b "%2.2d.xhtml" - "/<\/html>/+1" "{$(( $(grep -c '<h2' $1) - 1 ))}"
cd chapters
for f in ??.xhtml
do for t in $(grep -ho "href=.#[^\"']\+" $f|cut -c8-180)
   do sed -i "s/\([\"']\)#${t}['\"]/\1$(grep -l "\(name\|id\)=['\"]${t}['\"]" ??.xhtml|grep -v "${f}")#${t}\1/" $f
   done
doneLast edited by SBT; 03-12-2012 at 10:54 AM. Reason: Even more compact solution... | 
|   |   | 
|  | 
| Thread Tools | Search this Thread | 
| 
 | 
|  Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| The Videos in the EPUB get automatically played when the EPUB is opened in the iBook | rangaman | ePub | 9 | 12-20-2012 06:57 PM | 
| [Old Thread] Avoid epub split in several html files? | mastroalex | Calibre | 18 | 12-03-2011 03:50 PM | 
| [Old Thread] mergin split html files with Calibre? | NASCARaddicted | Conversion | 13 | 12-03-2011 01:19 PM | 
| Automatically resizing epub images | dhume01 | Conversion | 4 | 03-15-2011 05:45 PM | 
| Split HTML Size to Speed-Up Page Turns | ade_mcc | Conversion | 2 | 02-01-2011 06:06 AM |