Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > ePub

Notices

Reply
 
Thread Tools Search this Thread
Old 06-08-2011, 08:26 AM   #1
thydere
Member
thydere began at the beginning.
 
Posts: 15
Karma: 10
Join Date: Nov 2007
Location: Germany
Device: Sony PRS-300
Prevent pagebreak between two html files

Hello ,

since my Sony ereader seems to have problems with large html files I've had to break them down to smaller files.
Which brings me to the Problem that ADE puts a hard pagebreak between two text paragraphs located in two consecutive html files (which is fine if each file contains a different chapter, not so if the different files are a technical reason).

An epub with the following opf entries:
Code:
<item id="section-1_part1" href="section-1_part1.html" media-type="application/xhtml+xml"/>
<item id="section-1_part2" href="section-1_part2.html" media-type="application/xhtml+xml"/>
[...]
<itemref idref="section-1_part1"/>
<itemref idref="section-1_part2"/>
has a hard page-break between the last page of section-1_part1.html and the first page of section-1_part2.html.

Is there a solution or a hack to prevent page brakes for consecutive html files so the textflow is the same as if those two files would be one?
thydere is offline   Reply With Quote
Old 06-08-2011, 08:40 AM   #2
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 62,099
Karma: 39198909
Join Date: Nov 2006
Location: UK
Device: PW2, iPad Retina Mini, iPhone 4, MS Surface Pro, Onyx T68, N7,
I don't think there's any way of avoiding this, unfortunately. Could you not combine the two HTML files into one to avoid it, or have the break occur at a different place where it wouldn't matter?
HarryT is offline   Reply With Quote
Old 06-08-2011, 09:23 AM   #3
thydere
Member
thydere began at the beginning.
 
Posts: 15
Karma: 10
Join Date: Nov 2007
Location: Germany
Device: Sony PRS-300
Quote:
Originally Posted by HarryT View Post
I don't think there's any way of avoiding this, unfortunately. Could you not combine the two HTML files into one to avoid it, or have the break occur at a different place where it wouldn't matter?
Combining the html files would result in the forementioned problem of not being able to open the epub on the sony reader - or any other reader that imposes a filesize limit on epub content.

And as I ran into this problem while writing an epub generation library (my intention is to glue a document parser - i.e. markdown - to an epub generator backend, so an article/book/document can be written in a simple text based document language) you can understand that handpicking the pagebreak, while being the only reasonable solution, might not be a desirable or even viable option for an automated process.

Seems I have to live with that inconvenience...

Anyway, thanks for the info Harry.
thydere is offline   Reply With Quote
Old 06-08-2011, 02:05 PM   #4
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 2,850
Karma: 2417001
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-300, PRS-T1
The limit of one xhtml file is around 300kb uncompressed, let's say 265 kb to be save. If you keep your files to that limit, you should be fine. Most people here will split the files at each chapter. There are books with chapters larger than 300kb, but not that many.
Toxaris is offline   Reply With Quote
Old 06-09-2011, 01:37 AM   #5
thydere
Member
thydere began at the beginning.
 
Posts: 15
Karma: 10
Join Date: Nov 2007
Location: Germany
Device: Sony PRS-300
Quote:
Originally Posted by Toxaris View Post
The limit of one xhtml file is around 300kb uncompressed, let's say 265 kb to be save. If you keep your files to that limit, you should be fine. Most people here will split the files at each chapter. There are books with chapters larger than 300kb, but not that many.
Splitting the content at different Chapters is already done, since it naturally mimicks the design most books follow. I just wanted to have a contingency plan for those documents that have larger - or no - Chapters, which currently is to cut in front of elements whose size combined with the size of the preceeding elements exceed 256kb (thanks for the size info. when I finished the epub part yesterday I initially set the size to 128kb since I noticed it not working with larger files and up to now hadn't taken the time to pin down the exact size).

That being said, please don't get me wrong - it was never my intention to always store content in one big file. Apart from the obvious Chapter pagebreak, it's also good practice to do it for technical considerations - navigating (i.e. directly jumping to specific points) imposes less constraints on the readers hardware if the navigation points are located in smaller files. Which is a good enough reason for me.

But as always there's an exception to the rule: the book Flowers for Algernon from Daniel Keyes contains no pagebreaks at all (I only own the hardcopy, though I'd be in interested how they'd manage in a digital version - if they'd do it that way at all), since the book is organized as a diary of the protagonist.

My intention was to cover those cases as well - if only just on principle alone .

Last edited by thydere; 06-09-2011 at 02:01 AM.
thydere is offline   Reply With Quote
Old 06-09-2011, 03:52 AM   #6
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 62,099
Karma: 39198909
Join Date: Nov 2006
Location: UK
Device: PW2, iPad Retina Mini, iPhone 4, MS Surface Pro, Onyx T68, N7,
I guess that all you can really do in that case is look for suitable points at which to split the file. You could split immediately after an image, or immediately before <Hx> tags, for example.
HarryT is offline   Reply With Quote
Old 06-10-2011, 06:25 AM   #7
thydere
Member
thydere began at the beginning.
 
Posts: 15
Karma: 10
Join Date: Nov 2007
Location: Germany
Device: Sony PRS-300
Quote:
Originally Posted by HarryT View Post
I guess that all you can really do in that case is look for suitable points at which to split the file. You could split immediately after an image, or immediately before <Hx> tags, for example.
Along the way I probably have to semantically analyze the html content and do something like this, though I will drop that issue for the moment until it really becomes a problem and I have more real world test cases on which to decide how to proceed.

Fortunately I do not want to create a general purpose epub creation program, but a backend library that is intended to be glued to a frontend document parser. The difference is that while a converter like calibre or stanza has to recreate/guess the document structure (with a little help of the user in calibres case), I expect the already created structure together with the sectionized content. The postprocessing work from that point is relative simple: just create the html/toc/stylesheet/image/whatchamacalit files making up the oebps part of the epub from the document structure.

The big work lies mostly with the front end and the processing pipeline in the middle. It takes a text document, runs it through the appropriate parser (markdown in my case, but thats relatively exchangeable as long as there's html + processing instructions at the end), then parses the resulting html looking for xinclude / xml preprocessing directives which describe the further processing of the document (including external sections into the text, resizing images to fit the proper resolution, create images/graphs from inline definitions, include references, run some external program and include the result, cook coffee, whatever). This process (hopefully) generated a plethora of information about the content of the files which will essentially result in the structural metadata which is used by the epub backend to create the ebook - and give it pointers on where exactly to cut the text to pieces.

What that means is that I try to solve the issue by declaring it to be the problem of the person writing the front end parser (uhm... which will actually be me again - I knew there was a hole in my theory ).

Which doesn't mean that html files cannot be preprocessed and used as input - in fact for my first prototype i used a simple xhtml frontend that works similar to what you proposed (creating a content tree by parsing for the hx elements, copying over the dc elements and adapting those that differ in their epub form, ...), tried it on some of the html-ized ebooks in my collection and got some nice results out of it.


Once again, thank you both for your input .
thydere is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Merging multiple HTML files into one HTML file skoobwoman Workshop 45 Today 10:46 AM
How to prevent recipe read "files" pdf on web rss? KRorschachZ Recipes 12 11-10-2010 02:59 PM
Access to local HTML files and content, HTML ebooks, annotation on HTML ebooks leo315 enTourage Archive 2 05-10-2010 02:40 PM
XPath Help and Pagebreak emellaich Calibre 3 07-28-2009 02:17 PM
HTML Files PDS Workshop 5 05-30-2008 06:41 PM


All times are GMT -4. The time now is 07:01 PM.


MobileRead.com is a privately owned, operated and funded community.