Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Development

Notices

Reply
 
Thread Tools Search this Thread
Old 10-30-2011, 02:40 PM   #31
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Quote:
Originally Posted by MrWarper View Post
If you need ideas, I'd have a look at PDF.js. After all, I doubt conversion from PDF to HTML can go beyond that
There is a big difference between rendering a PDF on a canvas and turning a PDF into HTML. PDF.js is not converting the PDF to HTML. It is rendering it using HTML5 features.

One of the big issues with the demo for PDF.js is the text is not selectable or copyable. You are literally looking at a series of images. At least this is the case with Chrome 15.
user_none is offline   Reply With Quote
Old 11-01-2011, 09:54 PM   #32
MrWarper
Zealot
MrWarper knows what time it isMrWarper knows what time it isMrWarper knows what time it isMrWarper knows what time it isMrWarper knows what time it isMrWarper knows what time it isMrWarper knows what time it isMrWarper knows what time it isMrWarper knows what time it isMrWarper knows what time it isMrWarper knows what time it is
 
Posts: 133
Karma: 2142
Join Date: Oct 2011
Location: Spain
Device: I'm an iRex man: 8x DR1000S, 4x DR800SG, 4x DR800S
Hello,

Quote:
Originally Posted by user_none View Post
One of the big issues with the demo for PDF.js is the text is not selectable or copyable. You are literally looking at a series of images. At least this is the case with Chrome 15.
Supposedly, the FireFox thingy will be different from the current Chrome PDF engine. Even so, a very common use for PDFs is as containers for a bunch of images. If that's the case, no conversion can be done, period. OTOH, rendering text as an image... shame on Google.

Quote:
Originally Posted by user_none View Post
There is a big difference between rendering a PDF on a canvas and turning a PDF into HTML. PDF.js is not converting the PDF to HTML. It is rendering it using HTML5 features.
HTML5 is HTML. If the application extracts the PDF contents and shows them on screen the right way (not as images), it is converting PDF to HTML in the best possible way: to be displayed on the browser, which is nothing but an HTML viewer on steroids.

Whether you can use directly that HTML or not, for example saving the file as HTML, is a wholly different kettle of fish. Since it is open source, you can always get the code and use it to make a straight converter.
MrWarper is offline   Reply With Quote
Advert
Old 11-02-2011, 06:49 AM   #33
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Quote:
Originally Posted by MrWarper View Post
Supposedly, the FireFox thingy will be different from the current Chrome PDF engine. Even so, a very common use for PDFs is as containers for a bunch of images. If that's the case, no conversion can be done, period. OTOH, rendering text as an image... shame on Google.
I'm not talking about Chrome's PDF engine. I posted the link to the demo of PDF.js that will run in any HTML5 capable browser (Chrome, Firefox, Safari...). The demo demonstrates what will be included in Firefox. I mentioned that I was using Chrome in the off chance that PDF.js has special handling for Firefox for selectable and copyable text.


Quote:
Originally Posted by MrWarper View Post
HTML5 is HTML. If the application extracts the PDF contents and shows them on screen the right way (not as images), it is converting PDF to HTML in the best possible way: to be displayed on the browser, which is nothing but an HTML viewer on steroids.
HTML5 is much more than HTML. and PDF.js is not converting the PDF to HTML then inserting the new elements into the DOM. It is using a Canvas element which:

Quote:
Originally Posted by wikipedia
The canvas element is part of HTML5 and allows for dynamic, scriptable rendering of 2D shapes and bitmap images. It is a low level, procedural model that updates a bitmap and does not have a built-in scene graph.
HTML is not produced by a canvas based system. It just just rendered much the same way say Acrobat or Foxit Reader render a PDF. This is smilar to how HTML5 games are written. They are not written by pushing out <p> elements. Javascript is used to manipulate drawing on a canvas.

Quote:
Originally Posted by MrWarper View Post
Whether you can use directly that HTML or not, for example saving the file as HTML, is a wholly different kettle of fish. Since it is open source, you can always get the code and use it to make a straight converter.
There is no HTML to use. It does produce HTML then insert it into the DOM. It renders. Again PDF.js is not useful here because it does not convert to HTML it renders using a series of drawing commands.
user_none is offline   Reply With Quote
Old 11-03-2011, 03:08 AM   #34
MrWarper
Zealot
MrWarper knows what time it isMrWarper knows what time it isMrWarper knows what time it isMrWarper knows what time it isMrWarper knows what time it isMrWarper knows what time it isMrWarper knows what time it isMrWarper knows what time it isMrWarper knows what time it isMrWarper knows what time it isMrWarper knows what time it is
 
Posts: 133
Karma: 2142
Join Date: Oct 2011
Location: Spain
Device: I'm an iRex man: 8x DR1000S, 4x DR800SG, 4x DR800S
I think we could argue a bit, but surely that PDF.js doesn't look so good on closer examination. A real shame, I thought it would make easier to get rid of so much PDFed shit
MrWarper is offline   Reply With Quote
Old 11-05-2011, 07:09 AM   #35
Agama
Guru
Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.
 
Agama's Avatar
 
Posts: 776
Karma: 2751519
Join Date: Jul 2010
Location: UK
Device: PW2, Nexus7
Quote:
Originally Posted by roffLOL View Post
You wouldn't be interested in a PDF -> HTML converter? I'm currently developing one. For single page (one page per page, not those documents with double columns), justified PDF documents...
Yes, I would be very interested. How are the remove page numbers, headers & footers options coming along?

The quality of the result is more important than speed of conversion - I can always have a cup of tea while I wait!

Looking forward to road-testing it, so keep us all informed of your progress.
Agama is offline   Reply With Quote
Advert
Reply

Tags
conversion, pdf


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Problem with html -> Mobi conversion - html tags visible. khromov Calibre 9 08-06-2011 11:25 AM
HTML Conversion yoss15 Conversion 12 07-28-2011 04:42 PM
clean HTML or PDF before mobi conversion in Calibre mark235 Calibre 9 12-25-2010 09:37 PM
PDF to WORD/HTML conversion, "special characters and marks" errors chengyibo PDF 3 11-06-2010 12:43 AM
Today only - Free IntraPDF conversion tool (PDF -> HTML) Bob Russell PDF 7 04-10-2007 12:16 PM


All times are GMT -4. The time now is 06:07 AM.


MobileRead.com is a privately owned, operated and funded community.