![]() |
#1 |
Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 27
Karma: 53696
Join Date: Nov 2012
Device: Sony PRS T-1
|
AZW to EPUB conversion is split with one page per paragrah
Calibre 2.15 on Windows 8.1.
No custom conversion settings. The target reader is a Sony PRS-T1. When converting an AZW3 file to EPUB the size increased from 0.6MB to 3MB. The EPUB didn't contain any obviously large files, the cover image was 103kB and the OPF file was 600kB (but compressed well). However, the EPUB contained a lot of HTML files. When reading the EPUB I discovered that the text had been split with one page per paragraph, and the page count was 4911 (the AZW3 original has 593 pages). Is there a particular setting I should look at to make it not split on the paragraph level? I added metadata from the internet before converting, keeping the original cover, but that hasn't contributed much to the OPF. Most of the OPF size is related to entries for each individual HTML file. The conversion log is attached (with a lot of "Detected chapter" entries). The file I'm trying to convert is copyrighted, but I can create a bug report with the input and output files attached if this is necessary (as described here https://www.mobileread.com/forums/sho...d.php?t=186697 ). Last edited by steinarb; 01-09-2015 at 03:53 AM. Reason: Title change |
![]() |
![]() |
![]() |
#2 |
Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 27
Karma: 53696
Join Date: Nov 2012
Device: Sony PRS T-1
|
I have found the culprit I think.
I took an edit book on the AZW3 original, and looked at a random HTML file. Many, perhaps all, of the <p> elements look like this: Code:
<p class="chapter">“Yes,” he said. “It is.”</p> Code:
@class = 'chapter' |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 27
Karma: 53696
Join Date: Nov 2012
Device: Sony PRS T-1
|
What I did to fix this, was to:
The resulting EPUB looks ok in Calibre's EPUB reader, the TOC points to the correct places and contains meaningful entries for the real chapters. The resulting epub also have a size of 0.6MB, instead of 3MB. |
![]() |
![]() |
![]() |
#4 |
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 30,934
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
That is not a bug
By default, Calibre looks for certain words in the 'Structure Detection' section of Conversion, to split/make into headings, upon. Were you expecting Calibre to count the results and go 'That is an absurd number of Chapters' ? IMHO p class="chapter" was a weak choice for a pragraph selector name ![]() One I have not seen used by common ebook publishers |
![]() |
![]() |
![]() |
#5 | ||
Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 27
Karma: 53696
Join Date: Nov 2012
Device: Sony PRS T-1
|
I know that. I never said it was.
Quote:
I had a problem. I found a fix for the problem. I posted my findings as responses to my original problem. This makes it useful for someone who has the same problem and might google for a solution. Quote:
The previous books of the same series had no such problem. I would use "silly choice for a paragraph selector name", rather than a "weak choice for a paragraph selector name", but at least it made it pretty obvious what was happening when I found it. I worried for a bit that I would have to XSLT or perl-modify the individual HTML files of the AZW3 file, or mess with the chapter detection XPath expression, but was relived when removing the chapter mark gave a satisfactory result. Last edited by steinarb; 01-10-2015 at 03:28 PM. Reason: spelling error |
||
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,269
Karma: 5935030
Join Date: Jun 2011
Location: Ontario, Canada
Device: Kobo Aura HD
|
I find the default Calibre page splitting options to be a hindrance. They are no doubt great, (probably essential) when converting non e-book documents to e-book formats. However, E-book formats (epub, mobi, azw3, etc.) are probably already split (or at least have page breaks), and adding heuristics to create new splits can often muddle thing, so I change these defaults in my own config.:
Under Structure Detection, Chapter Mark setting is changed from "Page Break" to none, Insert Page Breaks before Xpath: is disabled ( / ) Some books need to have page breaks removed from their CSS when used inappropriately.. (I've seen many books with a css page-break-before: always in their chapter headings, but also have some kind of graphic at the top of the page. When run through an e-book conversion, this causes the graphic to be put on a page by itself.) |
![]() |
![]() |
![]() |
#7 |
Junior Member
![]() Posts: 1
Karma: 10
Join Date: May 2018
Device: kobo
|
Thanks for posting how you fixed it, I had been stuck at this for a while with epub->epub, I didn't think to look at the raw text
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
conversion to .azw3, epub as source? | Freeballer | Conversion | 4 | 05-08-2014 11:34 PM |
Problem with conversion from AZW3 to ePub | Kaetrin | Conversion | 3 | 05-30-2013 04:57 AM |
azw3 to epub conversion stuck on 1% | krysk | Conversion | 2 | 04-21-2013 11:53 AM |
AZW3 to EPUB Conversion Probs | grizedale | Conversion | 4 | 04-16-2013 06:47 PM |
Conversion from epub to azw3 | Joy736 | Conversion | 12 | 01-01-2013 11:00 AM |