10-29-2011, 03:13 AM | #1 |
Addict
Posts: 311
Karma: 1078442
Join Date: Oct 2011
Location: Netherlands
Device: Kindle Paperwhite
|
epub to epub converting: making chapters?
Hello,
I have some poorly formatted epubs. Those are just 1 run of text, without chapters. Is there a way for Calibre to recognize the chapters, using epub to epub conversion, and make every chapter begin on a new page? Please note that I haven't used calibre much and know very very little about it. Thank you. Last edited by Joy736; 10-29-2011 at 10:01 AM. |
10-29-2011, 05:52 AM | #2 |
Wizard
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
|
Convert from ePub to HTMLZ and then convert back HTMLZ to ePub with Heuristics enabled.
|
Advert | |
|
10-29-2011, 06:01 AM | #3 |
Addict
Posts: 311
Karma: 1078442
Join Date: Oct 2011
Location: Netherlands
Device: Kindle Paperwhite
|
|
10-29-2011, 07:43 AM | #4 |
Linux User
Posts: 2,279
Karma: 6123806
Join Date: Sep 2010
Location: Heidelberg, Germany
Device: none
|
The HTML step is not necessary - Calibre can convert from epub to epub. A copy of the original epub will then be stored as EPUB_ORIG format.
Calibre will then make the chapters for you - if the chapter recognition pattern matches, otherwise you have to adapt the pattern yourself, or use the epub tweaking feature (explode, edit manually, implode/convert from there...). |
10-29-2011, 09:17 AM | #5 |
Sigil & calibre developer
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
@frostschutz, HTMLZ with default settings will clean up a lot of poorly formatted HTML. This coupled with Heuristics will find most chapters without needing to write pattern matches.
|
Advert | |
|
10-30-2011, 02:54 AM | #6 |
Wizard
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
|
The other problem with many poorly formatted ePubs is they were originally converted from Lit/text/whatever using a version of Calibre without heuristics (or heuristics was disabled), so there would be seemingly random split points every 260K - HTMLZ will merge the random split points back together, and without doing that first heuristics wouldn't be able to find the chapters correctly.
|
10-30-2011, 07:42 AM | #7 |
Linux User
Posts: 2,279
Karma: 6123806
Join Date: Sep 2010
Location: Heidelberg, Germany
Device: none
|
I didn't know calibre conversion behaved differently depending on input format used. Most converters I'm familiar with go to common ground first and go on from there so the input format does not matter (so instead of any->any it's really any->specific->any). Thanks for the info, I'll experiment with HTMLZ in the future when conversion doesn't go as expected.
|
10-30-2011, 10:53 AM | #8 |
Wizard
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
|
ePub 'is' the common ground with Calibre - and under many scenarios your advice of skipping htmlz and going ePub->ePub is perfectly legit. It's only because the OP specifically mentioned crappy ePubs and trouble with chapters that HTMLZ is a better route in this case.
|
10-30-2011, 04:47 PM | #9 |
Sigil & calibre developer
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
Even though the conversion process is always the same Input -> OEB -> output, different input will produce different output. Converting to HTMLZ (or any other format) then converting that instead of the EPUB will introduce differences in the final output.
Think about this converting a VHS to an MP4 file vs a DVD to an MP4. No matter what you are going to get similar output (the save video content) but there will be a difference in how the final output from each input looks due to differences in what the inputs support and how they are structured. Even if you were to take an MP3 and a WMA and convert to AAC you will still have a very different output file even though they sound alike. |
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Save original when converting epub to epub? | fitzhugh | Conversion | 18 | 07-28-2011 12:29 AM |
ePub division in chapters | TheWatt | ePub | 3 | 04-04-2011 04:02 PM |
No Chapters in my epub file | carolehughes | Kobo Reader | 4 | 10-29-2010 02:51 AM |
RTF to ePub, Chapters | Daddy Warpig | Calibre | 6 | 05-11-2010 11:06 PM |
ePub Chapters vs. Stanza Chapters | kjk | Sigil | 4 | 09-14-2009 10:50 AM |