Shiny New E-Book Gizmo: The Amazon Kindle


View Full Version : Create reflowable content for the Sony Reader with deskUNPDF


sammykrupa
05-12-2007, 01:50 PM
Docudesk's new program is out, and it is excellent (on Mac atleast!):

http://labs.docudesk.com/latest-technologies/2007/5/8/create-reflowable-content-for-the-sony-reader-with-deskunpdf.html

gdxf
05-12-2007, 03:26 PM
I've tested its Windows version. For pdf files based on images, the lrf output result is not desirable to me, obviously the conversion depends entirely on the program's OCR capability. In this respect the program does not have much advantage compared with ther OCR softwares.

gdxf
05-12-2007, 03:48 PM
For text based pdf documents, this program does a wonderful job. Its speed of conversion is fast. Batch file processing is great. It makes me wonder whether there could be a program that can reflow the image-based pdf to lrf without OCR.

nekokami
05-12-2007, 09:45 PM
I wish it had an output other than lrf, so we iLiad users could use it. But I guess that's what PDFtoHTML is for -- now that we have fbreader to read html. :)

jimmyzou
05-13-2007, 10:24 AM
This is really wonderful tools for Sony reader users. I try it and immediately put it on my first piority than Scansoft's PDF converter before

tsgreer
05-13-2007, 04:53 PM
This thing is awesome so far. Not sure if I can create a linked Table of Contents yet since I just downloaded it, but I like it better than Libriate for creating .lrf files. I can finally have italics and some formatting when I make books. I can also do illustrated versions now too. Yay!

kovidgoyal
05-13-2007, 06:00 PM
You'd get more features with pdftohtml + html2lrf/BookDesigner

tsgreer
05-13-2007, 06:20 PM
You'd get more features with pdftohtml + html2lrf/BookDesigner

Well I'm on a basic non-intel, non-windows having Mac, so my options were pretty limited until this came out. I don't know any programming code and I don't use Terminal, so I am the guy that has to wait for the nice and easy GUI's to come out. I may try to figure out the programming stuff, but I just don't have enough time in the day...

kovidgoyal
05-13-2007, 07:11 PM
Ah that would explain your reluctance. The hard part is really installing the tools, not using them. A simple use case would look like

pdftohtml my.pdf
html2lrf my.html


But yeah, until you can get past the installation hurdle, you're better off with the GUI.

ddesk
05-14-2007, 04:46 PM
The final release version of deskUNPDF Professional is spec'd to perform PDF-HTML conversion, handle pdf-BBeB TOC conversions and internal links, the OCR engine will be enabled for extracting text from images and fixing text from PDFs with non-standard font encodings (all of this is detailed in the readme file). On the pdftohtml->html2lrf solution, I can tell you that deskUNPDF will outperform pdftohtml in creating structured text, paragraphs etc, from PDFs hands down. Besides this, doing an extra conversion (pdf-html-lrf vs pdf-lrf) is always going to me more lossy.

kovidgoyal
05-14-2007, 05:02 PM
That's great, are you going to release the pdf->html converter as a standalone app/library as well. What's it written in?

dsyzling
05-15-2007, 03:23 AM
re pdftohtml - does this extact embedded images? Last time I tried the 0.39 Windows command line tool it only extracted text (in simple mode). Complex mode converted to png but for final conversion to lrf that wasn't too useful for me. All formatting, headings, document structure was lost as well.


Darren

nekokami
05-15-2007, 07:35 AM
... fixing text from PDFs with non-standard font encodings
This is particularly interesting to me. I've had a couple of PDFs that I wasn't able to convert using other tools because of non-standard encodings.

ddesk
05-16-2007, 11:51 AM
That's great, are you going to release the pdf->html converter as a standalone app/library as well. What's it written in?
The entire conversion engine, PDF to all formats, will be available as an API. It is written in Java, which is compiled to native code for various platforms using GCJ (incidentally, we have an article on our labs site about building an OS X cross compiler for GCJ). For the initial release, the library will be available as a Java class (linked via JNI) and a COM component for Windows. That said, our main focus is on creating simple to use end user applications. Its nice to have different tools available, especially open source ones. Have you thought of creating simple installer packages for your python prs500-gui app? It was a pretty high bar to get all of the needed dependencies installed (at least on OS X), too much for the average user. With all of the features it offers, I know it would be a welcomed contribution.

kovidgoyal
05-16-2007, 01:22 PM
There is an installer for windows and for linux its just a couple of commands. However, I don't have convenient access to an OSX machine, so I can't maintain an OSX installer. It's a pity...

A cross platform text extraction engine for PDF is a really useful thing. I'm looking forward to it.

sic
05-16-2007, 02:07 PM
how can I do html->lrf conversion with docudesk's software?
I'd prefer not to convert the html to pdf... that forgets about the structure and adds noise such as page headers footers
I tried using your PDF virtual printer, the results are acceptable... maybe it's just some more config i.e. turning these things off.

JSWolf
05-16-2007, 03:14 PM
how can I do html->lrf conversion with docudesk's software?
I'd prefer not to convert the html to pdf... that forgets about the structure and adds noise such as page headers footers
I tried using your PDF virtual printer, the results are acceptable... maybe it's just some more config i.e. turning these things off.

HTML2LRF would do what you want..

http://www.mobileread.com/forums/showthread.php?t=10582

Enjoy!

sic
05-16-2007, 05:30 PM
i tried a couple of converters
HTML2LRF does not preserve formatting
It did create a good TOC though...

kovidgoyal
05-16-2007, 06:10 PM
i tried a couple of converters
HTML2LRF does not preserve formatting
It did create a good TOC though...

Umm which html2lrf are you talking about?

JSWolf
05-16-2007, 10:54 PM
i tried a couple of converters
HTML2LRF does not preserve formatting
It did create a good TOC though...
Did you use HTML2LRF that came with LIBPRS500? Or did you use some other HTML2LRF? I did link you to the proper version. However, if you used some other version, then no wonder it did not work.

Jon