Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Readers > Sony Reader

Notices

Reply
 
Thread Tools Search this Thread
Old 10-15-2008, 09:26 PM   #16
Alfy
Liseur de Bonne Aventure
Alfy ought to be getting tired of karma fortunes by now.Alfy ought to be getting tired of karma fortunes by now.Alfy ought to be getting tired of karma fortunes by now.Alfy ought to be getting tired of karma fortunes by now.Alfy ought to be getting tired of karma fortunes by now.Alfy ought to be getting tired of karma fortunes by now.Alfy ought to be getting tired of karma fortunes by now.Alfy ought to be getting tired of karma fortunes by now.Alfy ought to be getting tired of karma fortunes by now.Alfy ought to be getting tired of karma fortunes by now.Alfy ought to be getting tired of karma fortunes by now.
 
Alfy's Avatar
 
Posts: 374
Karma: 2176666
Join Date: Sep 2008
Location: Paris, France
Device: PRS T1
Quote:
Originally Posted by Xenophon View Post
Many documents are nearly that simple. But PostScript (and its descendent, PDF) are really fully general page description languages. That means, among other things, that any glyph can be placed at any location on the page, in any size, rotation, thickness, etc. And there's a very rich language for computing those locations. In PostScript, it's turing-complete (that is, you can use it to compute anything you can compute in any ordinary programming language). I'm not sure whether PDF is quite that complex. In any case, the simplest PDF files might not be difficult to de-tag and reflow, but add in even a little of that complexity and it gets a whole lot more difficult.

PDF really wasn't originally designed for reflow. It's really a page description language. Period.

Xenophon

P.S. I invite more knowledgeable geeky types to correct any mistakes I've made in this explanation.
Thanks for the additional info.

I went on the ADE forum to ask the question there, and here is the answer I got:

It's not an issue with the 505, this is expected behavior. When you zoom in on PDF, to contents are "reflowed" - essentially stripping a lot of the formating so we can enlarge the font sizes. Because of technical limitations with PDF files (and the current implementation of the reflow algorithms), we do not reflow across pages, so you will get gaps between pages.
Also because of limitations with with the PDF file structure itself (it is not an easily reflowable content), line breaks will appear in odd places.


The bits I find interesting:
- They talk of the "current implementation" as being a cause, so there might be hope for a brighter future.
- They say "we", but what does it mean? I assume it's the PRS software that reflows text, not ADE?

I completely understand PDF weren't originally designed for reflow, but I still find it amazing how difficult it is, sometimes impossible, to get a proper html doc from a PDF. Even Acrobat pro can't get it right 90% of the time...
Alfy is offline   Reply With Quote
Old 10-15-2008, 09:29 PM   #17
pilotbob
Grand Sorcerer
pilotbob ought to be getting tired of karma fortunes by now.pilotbob ought to be getting tired of karma fortunes by now.pilotbob ought to be getting tired of karma fortunes by now.pilotbob ought to be getting tired of karma fortunes by now.pilotbob ought to be getting tired of karma fortunes by now.pilotbob ought to be getting tired of karma fortunes by now.pilotbob ought to be getting tired of karma fortunes by now.pilotbob ought to be getting tired of karma fortunes by now.pilotbob ought to be getting tired of karma fortunes by now.pilotbob ought to be getting tired of karma fortunes by now.pilotbob ought to be getting tired of karma fortunes by now.
 
pilotbob's Avatar
 
Posts: 19,832
Karma: 11844413
Join Date: Jan 2007
Location: Tampa, FL USA
Device: Kindle Touch
Quote:
Originally Posted by Alfy View Post
It's not an issue with the 505, this is expected behavior.
This is the typical "working as designed" answer. I've used it myself as a developer. However, this doesn't mean their design is correct. Why did they decide on this behavior. What makes it better? I would ask them that.

BOb
pilotbob is offline   Reply With Quote
Advert
Old 10-15-2008, 10:43 PM   #18
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
Quote:
Originally Posted by Alfy View Post
But if my understanding of PDF format is correct, it's just text and pictures which are encapsulated using page structures and tags to "fix" the exact look. Unless the document has an exceptionally complicated layout, would it not be easy for the PRS505's software to simply remove the tags to generate a simple flow?

I am asking because I, like many, have been trying unsuccessfully to reflow PDF documents for my sony reader, and my biggest hurdle was removing all the tags and page structures from a 400 pages document and replacing them with the bare minimum to produce an acceptable result (otherwise, even using the "export to html" from acrobat pro, I still got some very weird results). Obviously, I abandoned after a while, but I was wondering with the proper coding skills whether it would be not to difficult to automatize the process? And then even better, to include in the reader's own software?
Why do you believe it is necessary to redo the tags? It is not clear to me that the Sony implementation even uses the tags or needs them. Digital Editions does not.

Dale
DaleDe is offline   Reply With Quote
Old 10-15-2008, 10:51 PM   #19
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
Quote:
Originally Posted by Alfy View Post
Thanks for the additional info.

I went on the ADE forum to ask the question there, and here is the answer I got:

It's not an issue with the 505, this is expected behavior. When you zoom in on PDF, to contents are "reflowed" - essentially stripping a lot of the formating so we can enlarge the font sizes. Because of technical limitations with PDF files (and the current implementation of the reflow algorithms), we do not reflow across pages, so you will get gaps between pages.
Also because of limitations with with the PDF file structure itself (it is not an easily reflowable content), line breaks will appear in odd places.


The bits I find interesting:
- They talk of the "current implementation" as being a cause, so there might be hope for a brighter future.
- They say "we", but what does it mean? I assume it's the PRS software that reflows text, not ADE?

I completely understand PDF weren't originally designed for reflow, but I still find it amazing how difficult it is, sometimes impossible, to get a proper html doc from a PDF. Even Acrobat pro can't get it right 90% of the time...
The PRS software was written by the ADE team so they have ownership of it. This is why they say "we". They will need to pull two pages at a time in order to flow past the page boundary. That is doable but they haven't done it. The original PDF group did that for the PocketPC implementation.

PDF is based loosely on Post Script and its ilk PageMaker. A document is really a set of glyphs and a set of locations on the page. It looks to you like a book but it is really an illusion. In many cases a word may not even show up with the letters side by side in the database. If the document has been edited then the data can be in a totally different portion of the file in some cases. It can get very messy. If they PDF was created in one clean shot then it will translate pretty easy but not all PDF files were done that way. The ADE group is new and young and they are working hard I believe but they have a steep learning curve.

Dale
DaleDe is offline   Reply With Quote
Old 10-15-2008, 11:24 PM   #20
pilotbob
Grand Sorcerer
pilotbob ought to be getting tired of karma fortunes by now.pilotbob ought to be getting tired of karma fortunes by now.pilotbob ought to be getting tired of karma fortunes by now.pilotbob ought to be getting tired of karma fortunes by now.pilotbob ought to be getting tired of karma fortunes by now.pilotbob ought to be getting tired of karma fortunes by now.pilotbob ought to be getting tired of karma fortunes by now.pilotbob ought to be getting tired of karma fortunes by now.pilotbob ought to be getting tired of karma fortunes by now.pilotbob ought to be getting tired of karma fortunes by now.pilotbob ought to be getting tired of karma fortunes by now.
 
pilotbob's Avatar
 
Posts: 19,832
Karma: 11844413
Join Date: Jan 2007
Location: Tampa, FL USA
Device: Kindle Touch
Quote:
Originally Posted by DaleDe View Post
The ADE group is new and young and they are working hard I believe but they have a steep learning curve.

Dale
That's a silly excuse. Adobe is the inventor of PDF and has been creating tools to create/read/edit them for quite a while. Adobe claims they are the best tools on the market. They have more PDF knowledge in house than any other company. If you are saying that they can't leverage that knowledge to create DE then they are doing something wrong. I would bet there are a few people on the DE team that were on the Acrobat team.

BOb
pilotbob is offline   Reply With Quote
Advert
Old 10-16-2008, 10:06 AM   #21
Alfy
Liseur de Bonne Aventure
Alfy ought to be getting tired of karma fortunes by now.Alfy ought to be getting tired of karma fortunes by now.Alfy ought to be getting tired of karma fortunes by now.Alfy ought to be getting tired of karma fortunes by now.Alfy ought to be getting tired of karma fortunes by now.Alfy ought to be getting tired of karma fortunes by now.Alfy ought to be getting tired of karma fortunes by now.Alfy ought to be getting tired of karma fortunes by now.Alfy ought to be getting tired of karma fortunes by now.Alfy ought to be getting tired of karma fortunes by now.Alfy ought to be getting tired of karma fortunes by now.
 
Alfy's Avatar
 
Posts: 374
Karma: 2176666
Join Date: Sep 2008
Location: Paris, France
Device: PRS T1
Quote:
Originally Posted by pilotbob View Post
That's a silly excuse. Adobe is the inventor of PDF and has been creating tools to create/read/edit them for quite a while. Adobe claims they are the best tools on the market. They have more PDF knowledge in house than any other company. If you are saying that they can't leverage that knowledge to create DE then they are doing something wrong. I would bet there are a few people on the DE team that were on the Acrobat team.

BOb
Actually, I would say the Adobe team as a whole has been quite unable to make the format compatible or transferrable to anything else. I got the Acrobat 9 Pro version recently, and have been meddling around with the export to Word or to html features, and without being broken, it's quite incapable of producing satisfying results. But then, Adobe themselves say it is not their intention, so perhaps the format is indeed THAT incompatible with... Well, just anything else!

By the way, I found one way that seems to be giving good results when converting a PDF into a html: when running ReadIris OCR on a PDF text based document, the result looks quite good. It's a bit ridiculous, considering there's nothing to OCR, but that might prove an effective tool for converting PDFs. It'll do nothing for protected PDFs, obviously.
Alfy is offline   Reply With Quote
Old 10-16-2008, 11:23 AM   #22
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
Quote:
Originally Posted by Alfy View Post
By the way, I found one way that seems to be giving good results when converting a PDF into a html: when running ReadIris OCR on a PDF text based document, the result looks quite good. It's a bit ridiculous, considering there's nothing to OCR, but that might prove an effective tool for converting PDFs. It'll do nothing for protected PDFs, obviously.
Nice find. There is something to OCR and that is exactly why PDF is so tough to translate. The OCR scans the presented screen which solves the problem of the page data being scattered all over the page instead of being sequential.

Dale
DaleDe is offline   Reply With Quote
Old 10-16-2008, 11:41 AM   #23
DDHarriman
Guru
DDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura about
 
Posts: 860
Karma: 4380
Join Date: Feb 2008
Location: Almada, Portugal
Device: Cybook Gen3, Sony PRS 505, Kindle DXG and Samsung Galaxy Note
Yes.

The best solution so far to revert from PDF looks to see it as it was basic designed to be, an exact reproduction of the paper document it mimics.
So, one just must treat it as a paper document and, if it’s not protected, apply OCR over it.

Omnipage pro 16 and Finereader pro 9 do an outstanding job in this camp.
DDHarriman is offline   Reply With Quote
Old 10-16-2008, 06:44 PM   #24
Alfy
Liseur de Bonne Aventure
Alfy ought to be getting tired of karma fortunes by now.Alfy ought to be getting tired of karma fortunes by now.Alfy ought to be getting tired of karma fortunes by now.Alfy ought to be getting tired of karma fortunes by now.Alfy ought to be getting tired of karma fortunes by now.Alfy ought to be getting tired of karma fortunes by now.Alfy ought to be getting tired of karma fortunes by now.Alfy ought to be getting tired of karma fortunes by now.Alfy ought to be getting tired of karma fortunes by now.Alfy ought to be getting tired of karma fortunes by now.Alfy ought to be getting tired of karma fortunes by now.
 
Alfy's Avatar
 
Posts: 374
Karma: 2176666
Join Date: Sep 2008
Location: Paris, France
Device: PRS T1
MMMhhh... I guess I can make it work, although it requires quite a bit of work. The recognition works fine, and for text with pictures inlined, without wrap around, it should work fine, certainly better than any other conversion process I've tried so far. Alas, both ReadIris and Omnipage html output include arbitrary page breaks to correspond to the PDF version. Anyone here is knowledgeable enough with either of these software to know if there's an option to remove this behaviour?
Alfy is offline   Reply With Quote
Old 10-16-2008, 09:11 PM   #25
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
Quote:
Originally Posted by Alfy View Post
MMMhhh... I guess I can make it work, although it requires quite a bit of work. The recognition works fine, and for text with pictures inlined, without wrap around, it should work fine, certainly better than any other conversion process I've tried so far. Alas, both ReadIris and Omnipage html output include arbitrary page breaks to correspond to the PDF version. Anyone here is knowledgeable enough with either of these software to know if there's an option to remove this behaviour?
Once you have the html it should be easy to correct this behavior. You can just search and replace the br.

Dale
DaleDe is offline   Reply With Quote
Old 10-17-2008, 11:43 AM   #26
DDHarriman
Guru
DDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura about
 
Posts: 860
Karma: 4380
Join Date: Feb 2008
Location: Almada, Portugal
Device: Cybook Gen3, Sony PRS 505, Kindle DXG and Samsung Galaxy Note
In Omnipage Pro, and choosing the output to Html 3.2, in the screen to save and choosing formatted text, choose option (right side) and check if insert page breaks is activated, if so deactivate it and save again.

Also I advise you to test with the output options to see the ones that give you better results.
DDHarriman is offline   Reply With Quote
Old 10-17-2008, 12:05 PM   #27
Jim Lester
Evangelist
Jim Lester is less competitive than you.Jim Lester is less competitive than you.Jim Lester is less competitive than you.Jim Lester is less competitive than you.Jim Lester is less competitive than you.Jim Lester is less competitive than you.Jim Lester is less competitive than you.Jim Lester is less competitive than you.Jim Lester is less competitive than you.Jim Lester is less competitive than you.Jim Lester is less competitive than you.
 
Jim Lester's Avatar
 
Posts: 416
Karma: 14682
Join Date: May 2008
Location: SF Bay Area
Device: Nook HD, Nook for Windows 8
Quote:
Originally Posted by pilotbob View Post
This is the typical "working as designed" answer. I've used it myself as a developer. However, this doesn't mean their design is correct. Why did they decide on this behavior. What makes it better? I would ask them that.

BOb
We do not reflow across pages to reduce the demands for both memory and processor on mobile devices. For instance, in order to get a proper page count for a reflowed PDF (if we were to reflow across pages), the entire PDF would need to be loaded and rendered.
Jim Lester is offline   Reply With Quote
Old 10-17-2008, 12:54 PM   #28
gwynevans
Wizzard
gwynevans ought to be getting tired of karma fortunes by now.gwynevans ought to be getting tired of karma fortunes by now.gwynevans ought to be getting tired of karma fortunes by now.gwynevans ought to be getting tired of karma fortunes by now.gwynevans ought to be getting tired of karma fortunes by now.gwynevans ought to be getting tired of karma fortunes by now.gwynevans ought to be getting tired of karma fortunes by now.gwynevans ought to be getting tired of karma fortunes by now.gwynevans ought to be getting tired of karma fortunes by now.gwynevans ought to be getting tired of karma fortunes by now.gwynevans ought to be getting tired of karma fortunes by now.
 
gwynevans's Avatar
 
Posts: 1,402
Karma: 2000000
Join Date: Nov 2007
Location: UK
Device: iPad 2, iPhone 6s, Kindle Voyage & Kindle PaperWhite
Quote:
Originally Posted by Jim Lester View Post
We do not reflow across pages to reduce the demands for both memory and processor on mobile devices. For instance, in order to get a proper page count for a reflowed PDF (if we were to reflow across pages), the entire PDF would need to be loaded and rendered.
On the other hand, Mobipocket readers seem to manage pretty well with an approximate page count, so maybe that should be dropped & see where you get? Wouldn't that mean that all you need is to do would be to continue to read in the following pages until you'd filled the screen, which seems as if it shouldn't be significantly more than you currently do...

If someone really wants the accurate page count, then they would still be able to use the non-reflowed view to locate it.

Last edited by gwynevans; 10-17-2008 at 12:56 PM.
gwynevans is offline   Reply With Quote
Old 10-17-2008, 05:53 PM   #29
Alfy
Liseur de Bonne Aventure
Alfy ought to be getting tired of karma fortunes by now.Alfy ought to be getting tired of karma fortunes by now.Alfy ought to be getting tired of karma fortunes by now.Alfy ought to be getting tired of karma fortunes by now.Alfy ought to be getting tired of karma fortunes by now.Alfy ought to be getting tired of karma fortunes by now.Alfy ought to be getting tired of karma fortunes by now.Alfy ought to be getting tired of karma fortunes by now.Alfy ought to be getting tired of karma fortunes by now.Alfy ought to be getting tired of karma fortunes by now.Alfy ought to be getting tired of karma fortunes by now.
 
Alfy's Avatar
 
Posts: 374
Karma: 2176666
Join Date: Sep 2008
Location: Paris, France
Device: PRS T1
Quote:
Originally Posted by Jim Lester View Post
We do not reflow across pages to reduce the demands for both memory and processor on mobile devices. For instance, in order to get a proper page count for a reflowed PDF (if we were to reflow across pages), the entire PDF would need to be loaded and rendered.
But when do you need an accurate page count? Yes, there are the table of content and glossaries, but for many novels, those don't even exist. Plus, if it is easy to let the documents reflow at the cost of the page count, then why not make it an option?

ì am not sure how important ebooks are for Adobe and the PDF format, but if the format is to be successful on the current generation of readers (and probably the following ones, a screen the size of a paperback is what people really want in the end), SOME way has to be found to make the docs appear correctly without complex user's intervention like the one I mention above in the thread...
Alfy is offline   Reply With Quote
Old 10-17-2008, 05:54 PM   #30
Alfy
Liseur de Bonne Aventure
Alfy ought to be getting tired of karma fortunes by now.Alfy ought to be getting tired of karma fortunes by now.Alfy ought to be getting tired of karma fortunes by now.Alfy ought to be getting tired of karma fortunes by now.Alfy ought to be getting tired of karma fortunes by now.Alfy ought to be getting tired of karma fortunes by now.Alfy ought to be getting tired of karma fortunes by now.Alfy ought to be getting tired of karma fortunes by now.Alfy ought to be getting tired of karma fortunes by now.Alfy ought to be getting tired of karma fortunes by now.Alfy ought to be getting tired of karma fortunes by now.
 
Alfy's Avatar
 
Posts: 374
Karma: 2176666
Join Date: Sep 2008
Location: Paris, France
Device: PRS T1
By the way, Jim Lester is the person from Adobe who nicely answered when I went to their forum to enquire about reflow. Thanks for taking the time to post here!
Alfy is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
eBook PDF - free tool for creating PDF eBooks from text files KACartlidge PDF 6 01-04-2012 09:41 AM
【Best PDF Size】I find The reason of slowing When Read PDF file linlance Sony Reader 0 03-11-2010 08:13 AM
Unutterably Silly Help for a color challenged designer wannabe! Verencat Lounge 47 07-31-2009 10:43 AM
Help for Spatially Challenged Writers! sherryk_us Writers' Corner 1 06-16-2009 10:47 AM


All times are GMT -4. The time now is 05:25 PM.


MobileRead.com is a privately owned, operated and funded community.