Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Other formats > LRF

Notices

Reply
 
Thread Tools Search this Thread
Old 12-02-2008, 06:21 AM   #1
BBRags
Connoisseur
BBRags began at the beginning.
 
BBRags's Avatar
 
Posts: 59
Karma: 12
Join Date: Nov 2008
Device: None
Confused about DJVU files and converting to LRF

I'm trying to convert a book at the Internet Archive into a LRF file for my Sony 505. The text is available in TXT, DJVU and PDF formats. The TXT file isn't pure text, but contains some HTML. The name of the TXT file also has the letter sequence DJVU, suggesting that the format is related to the DJVU format. Meanwhile, the actual DJVU file is about 20x the size of the TXT file (about 20M) and the PDF file is around 50Megs.

Which is the best file format for converting to LRF? What is the best program for converting one of these formats to compact LRF?
BBRags is offline   Reply With Quote
Old 12-04-2008, 07:37 PM   #2
Patricia
Reader
Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.
 
Patricia's Avatar
 
Posts: 11,504
Karma: 8720163
Join Date: May 2007
Location: South Wales, UK
Device: Sony PRS-500, PRS-505, Asus EEEpc 4G
I believe that the text files are automatic conversions from the djvu files, and the file extension is preserved, even though the result is txt.

I convert the text files in exactly the same way as I convert any other text file. I drop the file into a Doc and then edit, befor using either Book Designer or Calibre for the final conversion.
Unfortunately, the Internet Archive text files are of very poor quality, and require many hours of proofreading before conversion.
1. You will need to strip out the headers and footers.
2. You need to restore the italics.
3. You need to correct the OCR errors.
Patricia is offline   Reply With Quote
Old 12-04-2008, 07:51 PM   #3
RWood
Technogeezer
RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.
 
RWood's Avatar
 
Posts: 7,233
Karma: 1601464
Join Date: Nov 2006
Location: Virginia, USA
Device: Sony PRS-500
I have used pdflrf to convert DJVU files to LRF in the past. The problem you are facing is that it will render a full page of text as a full page graphic image that will not be readable on a 6" screen.

I have converted a number of books from the Internet Archives to LRF (part of the Harvard Classics series) and I had to bypass the provided TXT files due to the quality of their OCR. What I did was to edit the PDFs in Adobe Acrobat to remove the header and footer of each page and then convert the resultant file in ABBYY PDF Transformer 2.0. This yeilded a far superior OCR that required perhaps only 10% of the editing time that the Internet Archive TXT files would have required.
RWood is offline   Reply With Quote
Old 12-05-2008, 12:31 PM   #4
BBRags
Connoisseur
BBRags began at the beginning.
 
BBRags's Avatar
 
Posts: 59
Karma: 12
Join Date: Nov 2008
Device: None
Thanks Patricia and RWood. The OCR errors and format irregularities in the TXT files are pretty bad. Plus, the pagination of the original book (from which the scan was taken) has been retained, so I need to go back and excise the footers. This is going to be LOTS of work.

RWood, did you use the Internet Archives PDFs? Are they text or image-based? The PDF for my book is 50M. I didn't download it because my ISP meters bandwidth usage, and my connection is in constant use already.
BBRags is offline   Reply With Quote
Old 12-08-2008, 04:37 PM   #5
BobC
Guru
BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.
 
Posts: 691
Karma: 3026110
Join Date: Dec 2008
Location: Lancashire, U.K.
Device: BeBook 1, BeBook Pure, Kobo Glo, (and HD),Energy Sistem EReader Pro +
The DJVU format files are quite complex, consisting of a number of "layers". The text contained in them is OCRd from the original scans and as noted earlier is flaky in parts (some files I have downloaded have whole pages of OCRd text missing). I know there is a foreground and a background image layer as you can switch off the background layer for easier viewing.

You can extract the text from the DJVU but finish up with the same text as you can d/l from the Internet Archive direct.

As the PDFs are not text searchable I think they are just image containers.

The "text" layer in the DJVU enables you to search the text and dispaly the corresponding image page.

Bottom line is - use either the DJVU and extract the text from it or just grab the .djvu.txt file depending on whether you want to manually edit the text to align with the original before converting to eBook format. Both versions suffer from page numbers, headers etc being interspersed with the text.

BobC
BobC is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Problem converting HTML files to LRF red_five Calibre 2 06-09-2009 03:03 AM
converting files to lrf in Ebook library josecastanon1 Sony Reader 3 04-08-2009 05:42 PM
libprs500 Issues Converting .LIT to .LRF - .LRF crashes everything vasbinde Calibre 6 02-14-2008 12:16 PM
New PDF to LRF Tool (for DJVU and CBZ files too) RWood Sony Reader 0 08-29-2007 02:13 PM
Converting LIT to LRF Woes (or: Trouble with Images in LIT Files) JEMelby Sony Reader 0 07-27-2007 09:18 PM


All times are GMT -4. The time now is 12:29 AM.


MobileRead.com is a privately owned, operated and funded community.