04-13-2009, 01:40 PM | #1 |
Groupie
Posts: 159
Karma: 170
Join Date: Feb 2009
Device: PRS-505
|
How to properly reformat txt for conversion
Hi,
i did a few pdf<s conversion with ocr software, but a few of these files have formating problems. Some of them do not format the whole width of the page. Is there a way to reformat these files so they take the whole page? (using word but any software will do, tried a few things but they have /n right in the middle so no idea how to remove that, tried even to select formatted txt in finereader but its still the same deal with these pdfs)They were colums before so all the txt once in word takes that same width as a column which takes way too many pages and converts very baddly to other formats thanks Edit: Well i found a way, ocr in finereader, save it to pdf...yes pdf! Import to solid pdf converte and convert to doc, open word and copy and paste in wordpad... yes "workpad". Finally copy and paste into Komposer and save to html... long process but it works! Anyone has any shortcuts? Edit 2: Scratch all that finereader can save to lit... i ll test this soon ! Last edited by Student1; 04-13-2009 at 02:50 PM. |
04-13-2009, 09:07 PM | #2 |
Reader
Posts: 11,504
Karma: 8720163
Join Date: May 2007
Location: South Wales, UK
Device: Sony PRS-500, PRS-505, Asus EEEpc 4G
|
Txt files have a fixed line length, with a line break at the end of each line.
It is easy to change this into a reflowable doc. There are several ways. 1. Paste the text file into a doc and run stingo's word macro, which you can find here: https://www.mobileread.com/forums/showthread.php?t=8793 2. Try putting the text into BookCreator and tidying it there very quickly. It is here: https://www.mobileread.com/forums/showthread.php?t=28313 3. Run it through Gutenmark (google to find it). 3. Or do a search and replace: I paste the text file into a word doc, then click on "edit" and open "Find and Replace." 1. search for ^p^p. (Click on "more" and "special" to find the paragraph symbols.) Replace with ## or @@. (This is a placeholder for the paragraph breaks. You want to strip out the single linebreaks but leave the genuine paragraphs.) 2. Search for ^p. Replace with a line space. 3. Search for ##. Replace with paragraph mark. |
04-14-2009, 12:39 AM | #3 |
Groupie
Posts: 159
Karma: 170
Join Date: Feb 2009
Device: PRS-505
|
Thanks for the help!! I think this will do it =)!
|
04-14-2009, 06:08 AM | #4 |
Reader
Posts: 11,504
Karma: 8720163
Join Date: May 2007
Location: South Wales, UK
Device: Sony PRS-500, PRS-505, Asus EEEpc 4G
|
You're very welcome, Student1.
Yet another way is to put the text file into Book Designer. It usually manages to convert it into a reflowable format, though sometimes makes mistakes or loses all the line breaks. |
04-14-2009, 08:28 AM | #5 |
Dry fruit
Posts: 1,157
Karma: 1047086
Join Date: Dec 2008
Location: Paris, France
Device: Bookeen Opus + HTC Desire HD
|
If you have to use WordPad, do try Jarte http://www.jarte.com/ very clean word processor implementation based on WordPad... & it's free;
|
04-17-2009, 01:23 AM | #6 | |
Groupie
Posts: 159
Karma: 170
Join Date: Feb 2009
Device: PRS-505
|
Quote:
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Does conversion reformat the text? | dynalmadman | Calibre | 0 | 02-20-2010 08:33 PM |
HTML to TXT conversion | alkr | Calibre | 3 | 10-02-2009 09:54 AM |
PDF to TXT conversion | alkr | Calibre | 0 | 10-02-2009 04:34 AM |
Batch conversion of txt | BlackVoid | Sony Reader | 8 | 11-17-2007 09:53 PM |
conversion - pdf to txt? | fishcube | Sony Reader | 1 | 10-24-2007 02:02 PM |