07-07-2011, 04:07 AM | #1 |
Member
Posts: 13
Karma: 10
Join Date: Apr 2009
Device: Amazon Kindle 3
|
Help converting Djvu to mobi
I've been looking for a book that I can't find in my country.
A friend of mine got me a scanned djvu version of the book and I'm trying to convert it to read it on the Kindle. I understand that there are software that can convert it to pdf but I would prefer to convert it to mobi because the scanned djvu is really bad and I would prefer to have it in a reflowable format. I was able to save the djvu into txt via WinDjView then loaded the txt into Word. The book has illustrations so I will have to do some screen captures, edit the pictures and load them into Word. Plus I have to clean all the page numbers and headers. But the biggest problem I'm having now is that the txt file has line breaks in each line (because of the limited page size of the scanned document). Is there a way that I can eliminate these breaks without having to delete them manually one by one? |
07-07-2011, 05:40 AM | #2 |
Junior Member
Posts: 5
Karma: 10
Join Date: May 2011
Location: Queensland, Australia
Device: Kindle 3
|
HTML
You could use some very simple HTML tags to mark the beginnings of the paragraphs. All the lines in the paragraphs will then flow because of the nature of HTML.
Here is a little filter program written in perl which expects to read plain text from STDIN and prints simple HTML to STDOUT #!/usr/bin/perl # # Convert plain text with a blank line between paragraphs into html # use strict; my ($rope, @html); while (<STDIN>) { $_ =~ s/\r//; # make all text look like unix text $_ =~ s/\x0c//; $_ =~ s/\n/\xff/; push(@html,$_); } $rope = join("\xff",@html); # Make one huge string $rope =~ s/\xff\xff\xff/\n<p>/ig;# Convert double new-line into paragraph #print $rope; exit; $rope =~ s/\xff\s+/\n<p>/ig; # Convert single new-line followed by whitespace into paragraph $rope =~ s/\xff/ /ig; # Convert remaining new-lines into spaces $rope =~ s/\[\d+\]//g; # [32] etc tags from .PDF saved as .txt print "<HTML><HEAD><TITLE>From text2HTML</TITLE></HEAD><BODY>\n\n"; print $rope; print"\n\n</BODY></HTML>\n"; #EOF |
Advert | |
|
07-07-2011, 07:35 AM | #3 |
Groupie
Posts: 155
Karma: 200000
Join Date: Dec 2009
Location: Britania
Device: Android
|
That only helps if it has double-line breaks in the first place.
(And if you're using Word, you may as well just use Search and Replace All. Lessee... "^l" means line break. So replace ^l^l with ^p, and then all the remaining ^l with the empty string, deleting them). |
07-07-2011, 09:46 AM | #4 |
Bah, humbug!
Posts: 39,073
Karma: 157049943
Join Date: Jun 2009
Location: Chesapeake, VA, USA
Device: Kindle Oasis, iPad Pro, & a Samsung Galaxy S9.
|
Moderator Notice Thread closed. MR is firmly opposed to ebook piracy. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Converting .djvu to .pdf | BranMakMorn | Amazon Kindle | 5 | 01-21-2011 04:32 PM |
converting from standard mobi to compressed mobi | noideaatall | Kindle Formats | 6 | 07-11-2010 03:10 PM |
Easy DJVU Reader - reading DJVU books | Rsfor | Apple Devices | 5 | 02-05-2010 08:30 PM |
Converting to mobi | rcuadro | Calibre | 3 | 03-13-2009 01:14 AM |
Confused about DJVU files and converting to LRF | BBRags | LRF | 4 | 12-08-2008 04:37 PM |