05-12-2007, 02:50 PM | #1 |
Reader of the Reader
Posts: 103
Karma: 107
Join Date: Apr 2006
Device: Sony Reader PRS-500
|
Create reflowable content for the Sony Reader with deskUNPDF
Docudesk's new program is out, and it is excellent (on Mac atleast!):
http://labs.docudesk.com/latest-tech...deskunpdf.html |
05-12-2007, 04:26 PM | #2 |
Enthusiast
Posts: 48
Karma: 27
Join Date: Oct 2006
Device: Sony Reader PRS-500
|
I've tested its Windows version. For pdf files based on images, the lrf output result is not desirable to me, obviously the conversion depends entirely on the program's OCR capability. In this respect the program does not have much advantage compared with ther OCR softwares.
|
Advert | |
|
05-12-2007, 04:48 PM | #3 |
Enthusiast
Posts: 48
Karma: 27
Join Date: Oct 2006
Device: Sony Reader PRS-500
|
For text based pdf documents, this program does a wonderful job. Its speed of conversion is fast. Batch file processing is great. It makes me wonder whether there could be a program that can reflow the image-based pdf to lrf without OCR.
|
05-12-2007, 10:45 PM | #4 |
fruminous edugeek
Posts: 6,745
Karma: 551260
Join Date: Oct 2006
Location: Northeast US
Device: iPad, eBw 1150
|
I wish it had an output other than lrf, so we iLiad users could use it. But I guess that's what PDFtoHTML is for -- now that we have fbreader to read html.
|
05-13-2007, 11:24 AM | #5 |
Connoisseur
Posts: 96
Karma: 11
Join Date: Jul 2006
Location: Montreal
Device: Sony Reader; Kobo; Nook color
|
This is really wonderful tools for Sony reader users. I try it and immediately put it on my first piority than Scansoft's PDF converter before
|
Advert | |
|
05-13-2007, 05:53 PM | #6 |
Lovin' the e-book life...
Posts: 633
Karma: 2509
Join Date: Nov 2006
Location: Colorado
Device: Ebookwise 1150, Sony PRS-505, Amazon Kindle, BeBook (with OpenInkpot)
|
This thing is awesome so far. Not sure if I can create a linked Table of Contents yet since I just downloaded it, but I like it better than Libriate for creating .lrf files. I can finally have italics and some formatting when I make books. I can also do illustrated versions now too. Yay!
|
05-13-2007, 07:00 PM | #7 |
creator of calibre
Posts: 43,835
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
You'd get more features with pdftohtml + html2lrf/BookDesigner
|
05-13-2007, 07:20 PM | #8 | |
Lovin' the e-book life...
Posts: 633
Karma: 2509
Join Date: Nov 2006
Location: Colorado
Device: Ebookwise 1150, Sony PRS-505, Amazon Kindle, BeBook (with OpenInkpot)
|
Quote:
|
|
05-13-2007, 08:11 PM | #9 |
creator of calibre
Posts: 43,835
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Ah that would explain your reluctance. The hard part is really installing the tools, not using them. A simple use case would look like
Code:
pdftohtml my.pdf html2lrf my.html |
05-14-2007, 05:46 PM | #10 |
Darren
Posts: 4
Karma: 51
Join Date: Apr 2007
Location: Plano, Texas
Device: PPC-6700/PRS-500
|
The final release version of deskUNPDF Professional is spec'd to perform PDF-HTML conversion, handle pdf-BBeB TOC conversions and internal links, the OCR engine will be enabled for extracting text from images and fixing text from PDFs with non-standard font encodings (all of this is detailed in the readme file). On the pdftohtml->html2lrf solution, I can tell you that deskUNPDF will outperform pdftohtml in creating structured text, paragraphs etc, from PDFs hands down. Besides this, doing an extra conversion (pdf-html-lrf vs pdf-lrf) is always going to me more lossy.
|
05-14-2007, 06:02 PM | #11 |
creator of calibre
Posts: 43,835
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
That's great, are you going to release the pdf->html converter as a standalone app/library as well. What's it written in?
|
05-15-2007, 04:23 AM | #12 |
Member
Posts: 13
Karma: 10
Join Date: Dec 2006
Device: Sony Reader
|
re pdftohtml - does this extact embedded images? Last time I tried the 0.39 Windows command line tool it only extracted text (in simple mode). Complex mode converted to png but for final conversion to lrf that wasn't too useful for me. All formatting, headings, document structure was lost as well.
Darren |
05-15-2007, 08:35 AM | #13 | |
fruminous edugeek
Posts: 6,745
Karma: 551260
Join Date: Oct 2006
Location: Northeast US
Device: iPad, eBw 1150
|
Quote:
|
|
05-16-2007, 12:51 PM | #14 | |
Darren
Posts: 4
Karma: 51
Join Date: Apr 2007
Location: Plano, Texas
Device: PPC-6700/PRS-500
|
Quote:
|
|
05-16-2007, 02:22 PM | #15 |
creator of calibre
Posts: 43,835
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
There is an installer for windows and for linux its just a couple of commands. However, I don't have convenient access to an OSX machine, so I can't maintain an OSX installer. It's a pity...
A cross platform text extraction engine for PDF is a really useful thing. I'm looking forward to it. |
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Cannot create table of content when converting my ebooks | ghostyjack | Calibre | 10 | 07-05-2009 09:28 PM |
Create a personal newspaper for the Sony Reader with xFruits + rss2book | neilm2 | Sony Reader | 9 | 04-03-2009 12:57 PM |
Google reader content downloading for the Sony Reader? | flamaest | Sony Reader | 2 | 01-28-2009 02:38 PM |
Can I Create New Content? | BRubble | Sony Reader | 3 | 02-20-2008 10:36 AM |
Managing content on the Sony Reader | Bob Russell | Sony Reader | 1 | 10-05-2006 07:06 AM |