Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > ePub

Notices

Reply
 
Thread Tools Search this Thread
Old 01-20-2013, 05:32 AM   #1
NASCARaddicted
Addict
NASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and grace
 
Posts: 340
Karma: 43106
Join Date: Apr 2009
Location: Germany
Device: BeBook One, Pocketbook Touch, Pocketbook Touch HD
splitting html files?

Hello, I hope you people can help me.

I want to convert a html file into an epub manually, without a converter like calibre (I love calibre, but I want to learn how to convert it by myself).

I know, it is recommended to split the html file into multiple parts (especially for older, slower ereaders). I could do it with cut and copy, but this becomes tedious on big files. Is there a program that does the splitting automatically? I want to split the html file at a certain tag (like div class"xxx" or "h2").

I already found a small program called HTML Splitter (from around 2004). Basically, this program does what I want, but there is a problem. At the end, this program ads an unwanted "br". Also, the closing tags "body" and "html" (and the unwanted br tag) are written in upper case. But in xhtml they have to be lower case, so of course, the outcoming html parts are not xhtml valid.

Is there another program that does the same? Just splitting a xhtml file into mutliple xhtml files at a certain tag?

Thanks in advance.
NASCARaddicted is offline   Reply With Quote
Old 01-20-2013, 06:55 AM   #2
mrmikel
Color me gone
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
Why not just use Sigil?

Press control enter at the end of each chapter, just before the following <h2> tag. It is also possible to do this through search and replace adding <hr class="sigil_split_marker" /> Then choose edit, split at markers.

Either way, work on a saved copy.
mrmikel is offline   Reply With Quote
Advert
Old 01-20-2013, 07:38 AM   #3
NASCARaddicted
Addict
NASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and grace
 
Posts: 340
Karma: 43106
Join Date: Apr 2009
Location: Germany
Device: BeBook One, Pocketbook Touch, Pocketbook Touch HD
maybe I missed something, but as far as I know, in Sigil you can save a file only as epub? But I want to be able to save it as html.
NASCARaddicted is offline   Reply With Quote
Old 01-20-2013, 08:15 AM   #4
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,536
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
An epub is just a zipfile full of html files (among other things). Use Sigil to split the html file the way you want it, and then unzip the epub and snag the html files. You may have to fix some links afterward. I have to say, though, that that seems like a very long driveway to a small and rather unimpressive house.

You'll spend a lot of time looking for tools that will "automatically" help you construct an epub by hand.

Last edited by DiapDealer; 01-20-2013 at 08:17 AM.
DiapDealer is offline   Reply With Quote
Old 01-20-2013, 08:16 AM   #5
mrmikel
Color me gone
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
That is true, you can only save as epub in Sigil. But epub is nothing more than a collection of html files and their associated images all zipped together.

In Sigil if you right click on any of these files you can select open with and open in any other editor you like.

Or you can use a zip program to open the epub and work with the files in any program you like...but you need to make sure they are zipped up in certain order with certain files not zipped...which ones escapes me now. There is a tweak epub program which facilitates this and it is built into calibre.

Sorry to repeat... DiapDealer got in first!

If Sigil makes things too simple, you can stay in code view in Sigil and muck about in the html all you like. For me, I work in both views - code view to tweak and book view to preview. It is easier for me to join broken sentences in book view than code view.

Last edited by mrmikel; 01-20-2013 at 08:20 AM.
mrmikel is offline   Reply With Quote
Advert
Old 01-20-2013, 11:42 AM   #6
meme
Sigil developer
meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.
 
Posts: 1,275
Karma: 1101600
Join Date: Jan 2011
Location: UK
Device: Kindle PW, K4 NT, K3, Kobo Touch
You can also right click on any file or files in Sigil and use Save As to export them if you want to avoid unzipping.
meme is offline   Reply With Quote
Old 01-20-2013, 03:00 PM   #7
dgatwood
Curmudgeon
dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.
 
dgatwood's Avatar
 
Posts: 629
Karma: 1623086
Join Date: Jan 2012
Device: iPad, iPhone, Nook Simple Touch
If you have a Perl interpreter, you could do something like this:

Code:
#!/usr/bin/perl

$/ = undef;

my $filename = $ARGV[0];

open(INPUT, "<$filename");
my $data = <INPUT>;
close(INPUT);

my @parts = split(/<splitmarker>/, $data);

my $count = 1;
for my $part (@parts) {
    open(OUTPUT, ">outfile_$count.html");
    print OUTPUT $part;
    close(OUTPUT);
    $count++;
}
Save it as split.pl, change "<splitmarker>" to match what you're splitting on, change the output filename if you want (currently outfile_1.html, outfile_2.html, .. outfile_n.html), and then run "split.pl mybook.html" or whatever.

You'll want to then go back and add the starting and ending <html> tags, <head> tags, etc. from the first file to each of the other files.
dgatwood is offline   Reply With Quote
Old 01-21-2013, 08:37 PM   #8
neufsix
Connoisseur
neufsix can extract oil from cheeseneufsix can extract oil from cheeseneufsix can extract oil from cheeseneufsix can extract oil from cheeseneufsix can extract oil from cheeseneufsix can extract oil from cheeseneufsix can extract oil from cheeseneufsix can extract oil from cheese
 
Posts: 57
Karma: 1010
Join Date: Jul 2011
Device: Archos A70 eReader, Kindle Touch, Sony PRS-T2
On linux you can use csplit.
neufsix is offline   Reply With Quote
Old 01-22-2013, 04:13 AM   #9
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,515
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
csplit alone will not output correct HTML files, as they will be missing the header and final closing tags. But I use csplit for all my books, this is what I do:

1. Put the whole book (at least the main part, title page, notes, etc. can be done separately) in a single XHTML file. Format as desired.

2. Add the head stuff before each chapter, i.e. something like:

Code:
</body>
</html>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:ops="http://www.idpf.org/2007/ops" xml:lang="en">
<head>
  <title>Chapter IV</title>
  <link href="css/style.css" type="text/css" rel="stylesheet" />
</head>
<body>
3. Now use csplit:

Code:
csplit /encoding/ {*}
This splits at every ({*}) appearence of the string "encoding", which is uncommon enough to usually give no problem. Then rename and move the resulting files (xx00, xx01, ...) to their final location. This part can be done with a script.
Jellby is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
How To Stop It From Splitting HTML Files? Ransom Calibre 8 06-12-2011 02:08 PM
Splitting .prc (and .mobi files) maddz Other formats 2 12-12-2010 06:02 PM
Does splitting EPUB among more HTML files improve Performance? purcelljf ePub 2 10-01-2010 01:15 AM
Splitting the Bible into Multiple Files SciFiGal777 Ectaco jetBook 3 03-27-2010 09:35 PM
Splitting files... or something? *Angie* Calibre 4 09-14-2009 07:42 PM


All times are GMT -4. The time now is 10:19 AM.


MobileRead.com is a privately owned, operated and funded community.