![]() |
#1 |
Member
![]() Posts: 15
Karma: 10
Join Date: Nov 2007
Location: Germany
Device: Sony PRS-300
|
Prevent pagebreak between two html files
Hello
![]() since my Sony ereader seems to have problems with large html files I've had to break them down to smaller files. Which brings me to the Problem that ADE puts a hard pagebreak between two text paragraphs located in two consecutive html files (which is fine if each file contains a different chapter, not so if the different files are a technical reason). An epub with the following opf entries: Code:
<item id="section-1_part1" href="section-1_part1.html" media-type="application/xhtml+xml"/> <item id="section-1_part2" href="section-1_part2.html" media-type="application/xhtml+xml"/> [...] <itemref idref="section-1_part1"/> <itemref idref="section-1_part2"/> Is there a solution or a hack to prevent page brakes for consecutive html files so the textflow is the same as if those two files would be one? |
![]() |
![]() |
![]() |
#2 |
eBook Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 85,544
Karma: 93383099
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
|
I don't think there's any way of avoiding this, unfortunately. Could you not combine the two HTML files into one to avoid it, or have the break occur at a different place where it wouldn't matter?
|
![]() |
![]() |
Advert | |
|
![]() |
#3 | |
Member
![]() Posts: 15
Karma: 10
Join Date: Nov 2007
Location: Germany
Device: Sony PRS-300
|
Quote:
And as I ran into this problem while writing an epub generation library (my intention is to glue a document parser - i.e. markdown - to an epub generator backend, so an article/book/document can be written in a simple text based document language) you can understand that handpicking the pagebreak, while being the only reasonable solution, might not be a desirable or even viable option for an automated process. ![]() Seems I have to live with that inconvenience... Anyway, thanks for the info Harry. |
|
![]() |
![]() |
![]() |
#4 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
|
The limit of one xhtml file is around 300kb uncompressed, let's say 265 kb to be save. If you keep your files to that limit, you should be fine. Most people here will split the files at each chapter. There are books with chapters larger than 300kb, but not that many.
|
![]() |
![]() |
![]() |
#5 | |
Member
![]() Posts: 15
Karma: 10
Join Date: Nov 2007
Location: Germany
Device: Sony PRS-300
|
Quote:
That being said, please don't get me wrong - it was never my intention to always store content in one big file. Apart from the obvious Chapter pagebreak, it's also good practice to do it for technical considerations - navigating (i.e. directly jumping to specific points) imposes less constraints on the readers hardware if the navigation points are located in smaller files. Which is a good enough reason for me. But as always there's an exception to the rule: the book Flowers for Algernon from Daniel Keyes contains no pagebreaks at all (I only own the hardcopy, though I'd be in interested how they'd manage in a digital version - if they'd do it that way at all), since the book is organized as a diary of the protagonist. My intention was to cover those cases as well - if only just on principle alone ![]() Last edited by thydere; 06-09-2011 at 02:01 AM. |
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
eBook Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 85,544
Karma: 93383099
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
|
I guess that all you can really do in that case is look for suitable points at which to split the file. You could split immediately after an image, or immediately before <Hx> tags, for example.
|
![]() |
![]() |
![]() |
#7 | |
Member
![]() Posts: 15
Karma: 10
Join Date: Nov 2007
Location: Germany
Device: Sony PRS-300
|
Quote:
Fortunately I do not want to create a general purpose epub creation program, but a backend library that is intended to be glued to a frontend document parser. The difference is that while a converter like calibre or stanza has to recreate/guess the document structure (with a little help of the user in calibres case), I expect the already created structure together with the sectionized content. The postprocessing work from that point is relative simple: just create the html/toc/stylesheet/image/whatchamacalit files making up the oebps part of the epub from the document structure. The big work lies mostly with the front end and the processing pipeline in the middle. It takes a text document, runs it through the appropriate parser (markdown in my case, but thats relatively exchangeable as long as there's html + processing instructions at the end), then parses the resulting html looking for xinclude / xml preprocessing directives which describe the further processing of the document (including external sections into the text, resizing images to fit the proper resolution, create images/graphs from inline definitions, include references, run some external program and include the result, cook coffee, whatever). This process (hopefully) generated a plethora of information about the content of the files which will essentially result in the structural metadata which is used by the epub backend to create the ebook - and give it pointers on where exactly to cut the text to pieces. What that means is that I try to solve the issue by declaring it to be the problem of the person writing the front end parser (uhm... which will actually be me again - I knew there was a hole in my theory ![]() Which doesn't mean that html files cannot be preprocessed and used as input - in fact for my first prototype i used a simple xhtml frontend that works similar to what you proposed (creating a content tree by parsing for the hx elements, copying over the dc elements and adapting those that differ in their epub form, ...), tried it on some of the html-ized ebooks in my collection and got some nice results out of it. Once again, thank you both for your input ![]() |
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Merging multiple HTML files into one HTML file | skoobwoman | Workshop | 45 | 07-11-2014 10:46 AM |
How to prevent recipe read "files" pdf on web rss? | KRorschachZ | Recipes | 12 | 11-10-2010 02:59 PM |
Access to local HTML files and content, HTML ebooks, annotation on HTML ebooks | leo315 | enTourage Archive | 2 | 05-10-2010 02:40 PM |
XPath Help and Pagebreak | emellaich | Calibre | 3 | 07-28-2009 02:17 PM |
HTML Files | PDS | Workshop | 5 | 05-30-2008 06:41 PM |