01-20-2013, 05:32 AM | #1 |
Addict
Posts: 340
Karma: 43106
Join Date: Apr 2009
Location: Germany
Device: BeBook One, Pocketbook Touch, Pocketbook Touch HD
|
splitting html files?
Hello, I hope you people can help me.
I want to convert a html file into an epub manually, without a converter like calibre (I love calibre, but I want to learn how to convert it by myself). I know, it is recommended to split the html file into multiple parts (especially for older, slower ereaders). I could do it with cut and copy, but this becomes tedious on big files. Is there a program that does the splitting automatically? I want to split the html file at a certain tag (like div class"xxx" or "h2"). I already found a small program called HTML Splitter (from around 2004). Basically, this program does what I want, but there is a problem. At the end, this program ads an unwanted "br". Also, the closing tags "body" and "html" (and the unwanted br tag) are written in upper case. But in xhtml they have to be lower case, so of course, the outcoming html parts are not xhtml valid. Is there another program that does the same? Just splitting a xhtml file into mutliple xhtml files at a certain tag? Thanks in advance. |
01-20-2013, 06:55 AM | #2 |
Color me gone
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
|
Why not just use Sigil?
Press control enter at the end of each chapter, just before the following <h2> tag. It is also possible to do this through search and replace adding <hr class="sigil_split_marker" /> Then choose edit, split at markers. Either way, work on a saved copy. |
01-20-2013, 07:38 AM | #3 |
Addict
Posts: 340
Karma: 43106
Join Date: Apr 2009
Location: Germany
Device: BeBook One, Pocketbook Touch, Pocketbook Touch HD
|
maybe I missed something, but as far as I know, in Sigil you can save a file only as epub? But I want to be able to save it as html.
|
01-20-2013, 08:15 AM | #4 |
Grand Sorcerer
Posts: 27,552
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
An epub is just a zipfile full of html files (among other things). Use Sigil to split the html file the way you want it, and then unzip the epub and snag the html files. You may have to fix some links afterward. I have to say, though, that that seems like a very long driveway to a small and rather unimpressive house.
You'll spend a lot of time looking for tools that will "automatically" help you construct an epub by hand. Last edited by DiapDealer; 01-20-2013 at 08:17 AM. |
01-20-2013, 08:16 AM | #5 |
Color me gone
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
|
That is true, you can only save as epub in Sigil. But epub is nothing more than a collection of html files and their associated images all zipped together.
In Sigil if you right click on any of these files you can select open with and open in any other editor you like. Or you can use a zip program to open the epub and work with the files in any program you like...but you need to make sure they are zipped up in certain order with certain files not zipped...which ones escapes me now. There is a tweak epub program which facilitates this and it is built into calibre. Sorry to repeat... DiapDealer got in first! If Sigil makes things too simple, you can stay in code view in Sigil and muck about in the html all you like. For me, I work in both views - code view to tweak and book view to preview. It is easier for me to join broken sentences in book view than code view. Last edited by mrmikel; 01-20-2013 at 08:20 AM. |
01-20-2013, 11:42 AM | #6 |
Sigil developer
Posts: 1,275
Karma: 1101600
Join Date: Jan 2011
Location: UK
Device: Kindle PW, K4 NT, K3, Kobo Touch
|
You can also right click on any file or files in Sigil and use Save As to export them if you want to avoid unzipping.
|
01-20-2013, 03:00 PM | #7 |
Curmudgeon
Posts: 629
Karma: 1623086
Join Date: Jan 2012
Device: iPad, iPhone, Nook Simple Touch
|
If you have a Perl interpreter, you could do something like this:
Code:
#!/usr/bin/perl $/ = undef; my $filename = $ARGV[0]; open(INPUT, "<$filename"); my $data = <INPUT>; close(INPUT); my @parts = split(/<splitmarker>/, $data); my $count = 1; for my $part (@parts) { open(OUTPUT, ">outfile_$count.html"); print OUTPUT $part; close(OUTPUT); $count++; } You'll want to then go back and add the starting and ending <html> tags, <head> tags, etc. from the first file to each of the other files. |
01-21-2013, 08:37 PM | #8 |
Connoisseur
Posts: 57
Karma: 1010
Join Date: Jul 2011
Device: Archos A70 eReader, Kindle Touch, Sony PRS-T2
|
On linux you can use csplit.
|
01-22-2013, 04:13 AM | #9 |
frumious Bandersnatch
Posts: 7,516
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
csplit alone will not output correct HTML files, as they will be missing the header and final closing tags. But I use csplit for all my books, this is what I do:
1. Put the whole book (at least the main part, title page, notes, etc. can be done separately) in a single XHTML file. Format as desired. 2. Add the head stuff before each chapter, i.e. something like: Code:
</body> </html> <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:ops="http://www.idpf.org/2007/ops" xml:lang="en"> <head> <title>Chapter IV</title> <link href="css/style.css" type="text/css" rel="stylesheet" /> </head> <body> Code:
csplit /encoding/ {*} |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
How To Stop It From Splitting HTML Files? | Ransom | Calibre | 8 | 06-12-2011 02:08 PM |
Splitting .prc (and .mobi files) | maddz | Other formats | 2 | 12-12-2010 06:02 PM |
Does splitting EPUB among more HTML files improve Performance? | purcelljf | ePub | 2 | 10-01-2010 01:15 AM |
Splitting the Bible into Multiple Files | SciFiGal777 | Ectaco jetBook | 3 | 03-27-2010 09:35 PM |
Splitting files... or something? | *Angie* | Calibre | 4 | 09-14-2009 07:42 PM |