Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 01-21-2009, 01:24 PM   #16
llasram
Reticulator of Tharn
llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.
 
llasram's Avatar
 
Posts: 618
Karma: 400000
Join Date: Jan 2007
Location: EST
Device: Sony PRS-505
Quote:
Originally Posted by akash View Post
I have no idea how the reader manages pagination of the text. I know that its possible to insert a page break in an RTF and the Reader will break the page accordingly for a Calibre conversion to ePub.
Do you mean "the screenful of text I see on the Reader, and navigate to the next one via pressing the 'forward' and 'backward' buttons"? Or do you mean "the 'pages' the Reader indicates by the little page numbers in the margins"?

If you mean the former, then there's nothing to be done really, other than use a fixed-page-oriented format like PDF instead. If you mean the latter, that's where the Adobe page-map stuff will help you out[*]. You'll need to do some custom modification of your markup and/or conversion process, but I'd be willing to at least help get you started.

* Although googling for more info on it revealed that it's given some IDPF people a case of the hissy fits and they want it to die die die. And I can't really blame them, as it seems that NCX already supports almost exactly the same information.
llasram is offline   Reply With Quote
Old 01-21-2009, 01:29 PM   #17
llasram
Reticulator of Tharn
llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.
 
llasram's Avatar
 
Posts: 618
Karma: 400000
Join Date: Jan 2007
Location: EST
Device: Sony PRS-505
Quote:
Originally Posted by kovidgoyal View Post
I suspect that's one part of the OPS spec that's going to change. It's rather ridiculuous to not supoprt javascript and in a few years when portable devices are powerful enough to handle javascript, it will make absolutely no sense.
I don't know... Is the "should not" because requiring scripting would limit the devices books could "run" on, or because the IDPF doesn't think scripting is really appropriate for book-like content? Terrible grammar aside, the OPS spec has this bit in section 2.5.1 (General Notes on SVG Usage):

Quote:
OPS supports the full SVG 1.1 Recommendation. The only exception is that since OPS is not targeting interactive content. SVG animation and scripting features are not supported and must not be used by publication authors; a Reading System should not render such content. [italics added]
llasram is offline   Reply With Quote
Advert
Old 01-21-2009, 01:37 PM   #18
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,410
Karma: 27757236
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Quote:
Originally Posted by llasram View Post
I don't know... Is the "should not" because requiring scripting would limit the devices books could "run" on, or because the IDPF doesn't think scripting is really appropriate for book-like content? Terrible grammar aside, the OPS spec has this bit in section 2.5.1 (General Notes on SVG Usage):
Regardless of the IDPF's reasons for doing this, ebooks are going to become truly digital, which means they will become interactive. Game books, alternate storylines, author provided alternate look and feel, all of this is easy to implement using scripting. As people's conception of the ebook moves further and further from the pbook, I predict that the decision to leave out scripting is going to seem more and more short sighted.

I suspect the reason for disallowing scripting in the current iteration of the spec is simply to not set the bar too high for viewers. But who knows...
kovidgoyal is offline   Reply With Quote
Old 01-21-2009, 03:50 PM   #19
Bierkonig
Member
Bierkonig began at the beginning.
 
Posts: 22
Karma: 10
Join Date: Dec 2008
Device: Sony PRS-700
Yes, I'm talking about the page numbers AdobeDE uses to delimit the text rather than pages as "screens full of text" (which obviously change when you change font sizes, though the location of page numbers within the document does not).

AdobeDE is turning the html OCR from 4 scanned pages into 5 pages of ePub. I want to figure out if there's a way to build a document (from html OCR-source) where those 4 pages end up as a 4 page document and the page breaks are where the original page breaks were. Currently those page breaks are denoted as <hr> in the html output from the OCR.

I'm not using 4 or 5 page documents but rather 2000 and 3000 page reference manuals. I want the ability to go to page 1773 within the document in the reader and read the same sentence that would be on the top of page 1773 of the scanned paper. And the pages in those manuals contain too much text to read on a single Reader page unless it was at 6pt font, so i want the ability to read a few screens full of text for a single page of scanned input, and then, without any blank space, start the next page of scanned input (with the appropriate page number in the right margin).

I know that nearly-absolute page break (page content) control is a feature of PDF. But PDF is so inefficient and slow and ABBYY Finereader's HTML output of the OCR is much much better in reflowable formatting than the PDF output.

I'm reading the ePUB best practices document pages on page map with interest, but think i'm a little bit over my head in terms of implementation.

thanks very very much for any further guidance.
Bierkonig is offline   Reply With Quote
Old 01-21-2009, 05:03 PM   #20
llasram
Reticulator of Tharn
llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.llasram ought to be getting tired of karma fortunes by now.
 
llasram's Avatar
 
Posts: 618
Karma: 400000
Join Date: Jan 2007
Location: EST
Device: Sony PRS-505
Quote:
Originally Posted by Bierkonig View Post
I'm reading the ePUB best practices document pages on page map with interest, but think i'm a little bit over my head in terms of implementation.

thanks very very much for any further guidance.
Do you know any programming languages? (Python...?)

Can you tell from examining the HTML ABBYY FineReader produces how it's indicating the beginning/end of pages? If it has a standard, simple way of doing it, I might write a general-purpose tool for adding the page-map (and/or NCX pageList).
llasram is offline   Reply With Quote
Advert
Old 01-21-2009, 06:12 PM   #21
Bierkonig
Member
Bierkonig began at the beginning.
 
Posts: 22
Karma: 10
Join Date: Dec 2008
Device: Sony PRS-700
Alas, no programming languages, but I'm getting a little better at adapting found code as a template.

The form of the ABBYY output is very straightforward....

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=WINDOWS-1252">
<meta name="generator" content="ABBYY FineReader 9.0">
<meta name="author" content="">
<meta name="description" content="">
<meta name="keywords" content="">

<title></title>
<style type="text/css">
table.main {}
tr.row {}
td.cell {}
div.block {}
div.paragraph {}
.font0 { font:6.00pt "Arial", sans-serif; }
.font1 { font:40.00pt "Arial", sans-serif; }
.font2 { font:5.00pt "Arial Narrow", sans-serif; }
.font3 { font:6.00pt "Arial Narrow", sans-serif; }
.font4 { font:7.00pt "Arial Narrow", sans-serif; }
.font5 { font:8.00pt "Arial Narrow", sans-serif; }
.font6 { font:11.00pt "Arial Narrow", sans-serif; }
.font7 { font:12.00pt "Arial Narrow", sans-serif; }
.font8 { font:13.00pt "Arial Narrow", sans-serif; }
.font9 { font:15.00pt "Arial Narrow", sans-serif; }
.......
</style>
</head>

<body>
<p></p>
<p><span class=font9>CHAPTER I</span></p>
<p><span class=font9>text</span></p>
<p><span class=font9>text</span></p>
<p><span class=font9>text</span></p>
<p><span class=font9>text</span></p>

<hr>

<p></p>
<p><span class=font9>CHAPTER I</span></p>
<p><span class=font9>text</span></p>
<p><span class=font9>text</span></p>
<p><span class=font9>text</span></p>
<p><span class=font9>text</span></p>

<hr>

<p></p>
<p><span class=font9>CHAPTER I</span></p>
<p><span class=font9>text</span></p>
<p><span class=font9>text</span></p>
<p><span class=font9>text</span></p>
<p><span class=font9>text</span></p>

<hr>

<p></p>
<p><span class=font9>CHAPTER 2</span></p>
<p><span class=font6>text</span></p>
<p><span class=font3>text</span></p>
<p><span class=font4>text</span></p>
<p><span class=font2>text</span></p>

<hr>

<p><span class=font9>text</span></p>
<p><span class=font8>text</span></p>
<p><span class=font4>text</span></p>
<p><span class=font9>text</span></p>

<hr>

Thus: that would be pages 1-5. Each chapter begins with <p></p>

each page break is represented by <hr>

that's it.
Bierkonig is offline   Reply With Quote
Old 01-21-2009, 06:15 PM   #22
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,410
Karma: 27757236
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
@llasram If you want to do this, the best way would be to add another option to html2epub --page-boundaries that would accept an XPath selector
kovidgoyal is offline   Reply With Quote
Old 04-23-2009, 09:41 AM   #23
setzer
Enthusiast
setzer has a complete set of Star Wars action figures.setzer has a complete set of Star Wars action figures.setzer has a complete set of Star Wars action figures.
 
Posts: 33
Karma: 264
Join Date: Mar 2009
Device: Sony PRS-505, Amazon Kindle2, Palm, iPhone
Hellooooo

i also need some help with "page-map"...
i've read the Best Practice ePub and generally i understand how it works.
BUT i would like to know where the code had to been put ?!
I think it will be an own file, but how should it be named ?
what's the filetype of this thing ?

if any1 know... i would be very happy for an answer.

thanks ant sorri foa mey bed englisch :P
setzer is offline   Reply With Quote
Old 10-31-2009, 01:51 PM   #24
martin-a
Junior Member
martin-a began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Oct 2009
Location: New York, US
Device: none
Sorry to reopen such an old thread... But it seems an ongoing topic

Hope I didn't miss it, but particularly to @Bierkonig, is your interest more to preserve the which page particular content is on, i.e. page 586 should have the same content in the paper and eBook, or (maybe and) that each eBook page displays as a single screenfull?

In the first case, you only need to replicate the structure (possibly down to sentence accuracy, e.g. for religious texts like the bible) in some way and then possibly accept that a particular paper page displays as 2 screens.

I just think that structural accuracy for reference and display accuracy are two issues...

Thoughts welcome!
martin-a is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Force page breaks in epubs alexvallette ePub 11 09-06-2010 07:53 AM
bookmark issues converting HTML to EPUB isabellkirsten Calibre 0 04-09-2010 11:47 PM
Remove page info from HTML when converting? JMikeD Calibre 5 04-04-2010 08:40 PM
converting multi-page HTML to Mobipocket shinew Calibre 13 02-21-2009 01:33 PM
Problem converting a webpage html to LRF, what program should I use? Long page turns seajewel Workshop 1 08-01-2008 06:32 AM


All times are GMT -4. The time now is 10:22 AM.


MobileRead.com is a privately owned, operated and funded community.