| 
			
			 | 
		#1 | 
| 
			
			
			
			 Reader of the Reader 
			
			![]() ![]() Posts: 103 
				Karma: 107 
				Join Date: Apr 2006 
				
				
				
				Device: Sony Reader PRS-500 
				
				
				 | 
	
	
	
		
		
			
			 
				
				Create reflowable content for the Sony Reader with deskUNPDF
			 
			
			
			Docudesk's new program is out, and it is excellent (on Mac atleast!): 
		
	
		
		
		
		
		
		
		
		
		
		
	
	http://labs.docudesk.com/latest-tech...deskunpdf.html  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#2 | 
| 
			
			
			
			 Enthusiast 
			
			![]() Posts: 48 
				Karma: 27 
				Join Date: Oct 2006 
				
				
				
				Device: Sony Reader PRS-500 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			I've tested its Windows version.  For pdf files based on images, the lrf output result is not desirable to me, obviously the conversion depends entirely on the program's OCR capability.  In this respect the program does not have much advantage compared with ther OCR softwares.
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#3 | 
| 
			
			
			
			 Enthusiast 
			
			![]() Posts: 48 
				Karma: 27 
				Join Date: Oct 2006 
				
				
				
				Device: Sony Reader PRS-500 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			For text based pdf documents, this program does a wonderful job. Its speed of conversion is fast. Batch file processing is great.   It makes me wonder whether there could be a program that can reflow the image-based pdf to lrf without OCR.
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#4 | 
| 
			
			
			
			 fruminous edugeek 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,745 
				Karma: 551260 
				Join Date: Oct 2006 
				Location: Northeast US 
				
				
				Device: iPad, eBw 1150 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			I wish it had an output other than lrf, so we iLiad users could use it. But I guess that's what PDFtoHTML is for -- now that we have fbreader to read html.  
		
	
		
		
		
		
		
		
		
		
		
		
	
	 
		 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#5 | 
| 
			
			
			
			 Connoisseur 
			
			![]() Posts: 96 
				Karma: 11 
				Join Date: Jul 2006 
				Location: Montreal 
				
				
				Device: Sony Reader; Kobo; Nook color 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			This is really wonderful tools for Sony reader users. I try it and immediately put it on my first piority than Scansoft's PDF converter before
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#6 | 
| 
			
			
			
			 Lovin' the e-book life... 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 633 
				Karma: 2509 
				Join Date: Nov 2006 
				Location: Colorado 
				
				
				Device: Ebookwise 1150, Sony PRS-505, Amazon Kindle, BeBook (with OpenInkpot) 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			This thing is awesome so far. Not sure if I can create a linked Table of Contents yet since I just downloaded it, but I like it better than Libriate for creating .lrf files.  I can finally have italics and some formatting when I make books. I can also do illustrated versions now too. Yay!
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#7 | 
| 
			
			
			
			 creator of calibre 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,609 
				Karma: 28549044 
				Join Date: Oct 2006 
				Location: Mumbai, India 
				
				
				Device: Various 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			You'd get more features with pdftohtml + html2lrf/BookDesigner
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#8 | |
| 
			
			
			
			 Lovin' the e-book life... 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 633 
				Karma: 2509 
				Join Date: Nov 2006 
				Location: Colorado 
				
				
				Device: Ebookwise 1150, Sony PRS-505, Amazon Kindle, BeBook (with OpenInkpot) 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
  | 
|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#9 | 
| 
			
			
			
			 creator of calibre 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,609 
				Karma: 28549044 
				Join Date: Oct 2006 
				Location: Mumbai, India 
				
				
				Device: Various 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Ah that would explain your reluctance. The hard part is really installing the tools, not using them. A simple use case would look like 
		
	
		
		
		
		
		
		
		
		
		
		
	
	Code: 
	pdftohtml my.pdf html2lrf my.html  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#10 | 
| 
			
			
			
			 Darren 
			
			![]() Posts: 4 
				Karma: 51 
				Join Date: Apr 2007 
				Location: Plano, Texas 
				
				
				Device: PPC-6700/PRS-500 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			The final release version of deskUNPDF Professional is spec'd to perform PDF-HTML conversion, handle pdf-BBeB TOC conversions and internal links, the OCR engine will be enabled for extracting text from images and fixing text from PDFs with non-standard font encodings (all of this is detailed in the readme file).  On the pdftohtml->html2lrf solution, I can tell you that deskUNPDF will outperform pdftohtml in creating structured text, paragraphs etc, from PDFs hands down.  Besides this, doing an extra conversion (pdf-html-lrf vs pdf-lrf) is always going to me more lossy.
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#11 | 
| 
			
			
			
			 creator of calibre 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,609 
				Karma: 28549044 
				Join Date: Oct 2006 
				Location: Mumbai, India 
				
				
				Device: Various 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			That's great, are you going to release the pdf->html converter as a standalone app/library as well. What's it written in?
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#12 | 
| 
			
			
			
			 Member 
			
			![]() Posts: 13 
				Karma: 10 
				Join Date: Dec 2006 
				
				
				
				Device: Sony Reader 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			re pdftohtml - does this extact embedded images? Last time I tried the 0.39 Windows command line tool it only extracted text (in simple mode). Complex mode converted to png but for final conversion to lrf that wasn't too useful for me. All formatting, headings, document structure was lost as well. 
		
	
		
		
		
		
		
		
		
		
		
		
	
	Darren  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#13 | |
| 
			
			
			
			 fruminous edugeek 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,745 
				Karma: 551260 
				Join Date: Oct 2006 
				Location: Northeast US 
				
				
				Device: iPad, eBw 1150 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
  | 
|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#14 | |
| 
			
			
			
			 Darren 
			
			![]() Posts: 4 
				Karma: 51 
				Join Date: Apr 2007 
				Location: Plano, Texas 
				
				
				Device: PPC-6700/PRS-500 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
  | 
|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#15 | 
| 
			
			
			
			 creator of calibre 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,609 
				Karma: 28549044 
				Join Date: Oct 2006 
				Location: Mumbai, India 
				
				
				Device: Various 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			There is an installer for windows and for linux its just a couple of commands. However, I don't have convenient access to an OSX machine, so I can't maintain an OSX installer. It's a pity... 
		
	
		
		
		
		
		
		
		
		
		
		
	
	A cross platform text extraction engine for PDF is a really useful thing. I'm looking forward to it.  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
![]()  | 
            
        
            
            
  | 
    
			 
			Similar Threads
		 | 
	||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| Cannot create table of content when converting my ebooks | ghostyjack | Calibre | 10 | 07-05-2009 10:28 PM | 
| Create a personal newspaper for the Sony Reader with xFruits + rss2book | neilm2 | Sony Reader | 9 | 04-03-2009 01:57 PM | 
| Google reader content downloading for the Sony Reader? | flamaest | Sony Reader | 2 | 01-28-2009 03:38 PM | 
| Can I Create New Content? | BRubble | Sony Reader | 3 | 02-20-2008 11:36 AM | 
| Managing content on the Sony Reader | Bob Russell | Sony Reader | 1 | 10-05-2006 08:06 AM |