Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 08-04-2009, 01:39 PM   #1
siulayhumga
Connoisseur
siulayhumga began at the beginning.
 
Posts: 55
Karma: 12
Join Date: Jun 2009
Device: Sony PRS 505
Help with reflow text file

I am trying to convert a lot of old text file to epub so I can read it in my Sony 505.

Most of them was scan in at mid 90s and have a lot of formatting errors like extra line break in the middle of the line. Guess the OCR technology for PC was weak back than.

I don't want to use a text edtior to remove all the CRLF and line wrap everything coz this will give me a "wall of text".


Anyone know a program which will do this kind of text paragraph "reflow"?

Thanks
siulayhumga is offline   Reply With Quote
Old 08-04-2009, 02:39 PM   #2
KACartlidge
Author
KACartlidge began at the beginning.
 
Posts: 12
Karma: 24
Join Date: Aug 2009
Device: Sony Reader PR505
Re-wrapping Text

I know I'm blowing my own trumpet, but try my free eBook PDF tool from my site.

I know you don't want a PDF, but that's fine. Opening the text file in the tool will automatically make a reasonable attempt to re-wrap it. You are then presented with that wrapped text on screen and, instead of clicking to create a PDF, just select the text, copy it and paste it into whatever editor you want.

Of course you can just save the wrapped text over the original with a single click, but you may prefer to keep the original. It's amateur software so may not be spot on, but at no cost it's worth a try. It tries to be intelligent with it's wrapping by looking at full stops and quote marks in addition to line breaks.
KACartlidge is offline   Reply With Quote
Advert
Old 08-05-2009, 09:57 AM   #3
rogue_ronin
Banned
rogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-books
 
Posts: 475
Karma: 796
Join Date: Sep 2008
Location: Honolulu
Device: Nokia 770 (fbreader)
In this thread here on MobileRead there is an attachment for InterParse4.

InterParse4 has a bit of a strange interface, but once you understand it, you can do a LOT with plain text. It's designed to help you do exactly what you are asking about. It has a backout feature (like undo) that lets you experiment.

It comes with quite a number of fixes built in.

m a r
rogue_ronin is offline   Reply With Quote
Old 08-13-2009, 05:58 AM   #4
MikeB1972
Gnu
MikeB1972 ought to be getting tired of karma fortunes by now.MikeB1972 ought to be getting tired of karma fortunes by now.MikeB1972 ought to be getting tired of karma fortunes by now.MikeB1972 ought to be getting tired of karma fortunes by now.MikeB1972 ought to be getting tired of karma fortunes by now.MikeB1972 ought to be getting tired of karma fortunes by now.MikeB1972 ought to be getting tired of karma fortunes by now.MikeB1972 ought to be getting tired of karma fortunes by now.MikeB1972 ought to be getting tired of karma fortunes by now.MikeB1972 ought to be getting tired of karma fortunes by now.MikeB1972 ought to be getting tired of karma fortunes by now.
 
Posts: 1,222
Karma: 15625359
Join Date: Jul 2009
Location: UK
Device: BeBook,JetBook Lite,PRS-300-350-505-650,+ran out of space to type
Hi

I've been having a look at the reflowing documents and came up with the attached.

It's still very much in beta, but have a play the interface should be fairly self explanatory.

Feel free to contact me with any questions / requests.

Regards

MikeB
Attached Files
File Type: zip Reflow.zip (19.7 KB, 492 views)
MikeB1972 is offline   Reply With Quote
Old 08-20-2009, 03:25 PM   #5
ahi
Wizard
ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.
 
Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
Quote:
Originally Posted by siulayhumga View Post
I am trying to convert a lot of old text file to epub so I can read it in my Sony 505.

Most of them was scan in at mid 90s and have a lot of formatting errors like extra line break in the middle of the line. Guess the OCR technology for PC was weak back than.

I don't want to use a text edtior to remove all the CRLF and line wrap everything coz this will give me a "wall of text".


Anyone know a program which will do this kind of text paragraph "reflow"?

Thanks
If the file is reasonably regular, other than for the erroneous linebreaks in the middle of paragraph, my very very under constructed python script might be able to get it in slightly better shape:

Try running it with:

pacify.py -i filename.txt -p

or with:

pacify.py -i filename.txt -rp

It outputs results into output.txt in the same directory wherefrom you run it.

- Ahi
Attached Files
File Type: zip pacify.zip (3.0 KB, 402 views)
ahi is offline   Reply With Quote
Advert
Old 07-24-2010, 06:16 PM   #6
slex
Addict
slex ought to be getting tired of karma fortunes by now.slex ought to be getting tired of karma fortunes by now.slex ought to be getting tired of karma fortunes by now.slex ought to be getting tired of karma fortunes by now.slex ought to be getting tired of karma fortunes by now.slex ought to be getting tired of karma fortunes by now.slex ought to be getting tired of karma fortunes by now.slex ought to be getting tired of karma fortunes by now.slex ought to be getting tired of karma fortunes by now.slex ought to be getting tired of karma fortunes by now.slex ought to be getting tired of karma fortunes by now.
 
Posts: 294
Karma: 1196776
Join Date: Nov 2008
Location: Bulgaria
Device: Kindle 4 NT, Onyx Boox M92
You can use Calibre and do:

ebook-convert unreflowedtextfile.txt refowedtextfile.txt

And if you want to do it for an entire folder and you use Linux, then use the following command from the CLI:

for file in *.txt ; do ebook-convert "$file" reflowed-"$file" ; done

Last edited by slex; 07-24-2010 at 06:56 PM.
slex is offline   Reply With Quote
Old 07-30-2010, 09:12 PM   #7
Tazina
Junior Member
Tazina began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Jun 2010
Device: Sony PRS-300
Hi and I know this is a really old thread but I had the same problem and here's how I fix it.

What to type is in the " " symbols and what I can't reproduce is in ().
1. Need word processor with find and replace features.
2. Open document.
3. Find ".(paragraph mark or crlf)"--what's important is that there is no space between the . and the paragraph mark.
4 Replace all those with "...." -This is so later you can find the real end of your paragraphs.
5. Now remove all the paragraph marks in the document. Yes, you'll have a wall of text but it's ok.
6. Find "...."
7. Replace that with the ".(paragraph mark or crlf)"
8. Your document should look normal now.

The assumption is that most of the . followed by the Paragraph mark will only be at the end of paragraphs. In books with lots quotes I'll also do a Find/Replace of these characters: " ? !
Then what you do is instead of the . as the first character just put in one of the others but always follow it with four .... -- use four periods because sometimes if a character in a book stops talking the words will trail off to ... three periods.
So you will Find/Replace "?...." or ""...." and then just replace the "?(paragraph mark)"
Just remember no spaces in the find or replace.

Hope this helps someone later on. I use this about every four or five documents I convert or old one's I find one a floppy... hehe

Doran
Tazina is offline   Reply With Quote
Old 07-31-2010, 04:35 AM   #8
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,514
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
Quote:
Originally Posted by Tazina View Post
Then what you do is instead of the . as the first character just put in one of the others but always follow it with four .... -- use four periods because sometimes if a character in a book stops talking the words will trail off to ... three periods.
It's not unusual to have rows of four dots too, when an ellipsis (3 dots) is followed or preceded by a normal full stop. Whether or not this is correct, depends on the language and time the text was written. I'd rather use some other character or combination, such as "%%%", or "¬", or "@@@", which is much less likely to appear anywhere in the text, but in case of doubt, just search for what you intend to use before doing the actual replace to see if it's actually used or not.
Jellby is offline   Reply With Quote
Old 07-31-2010, 11:49 AM   #9
frabjous
Wizard
frabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameter
 
frabjous's Avatar
 
Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
Quote:
Originally Posted by Jellby View Post
It's not unusual to have rows of four dots too, when an ellipsis (3 dots) is followed or preceded by a normal full stop. Whether or not this is correct, depends on the language and time the text was written. I'd rather use some other character or combination, such as "%%%", or "¬", or "@@@", which is much less likely to appear anywhere in the text, but in case of doubt, just search for what you intend to use before doing the actual replace to see if it's actually used or not.
"¬" is the symbol for logical negation. I use it as such in my documents!
frabjous is offline   Reply With Quote
Old 07-31-2010, 06:36 PM   #10
Solitaire1
Samurai Lizard
Solitaire1 ought to be getting tired of karma fortunes by now.Solitaire1 ought to be getting tired of karma fortunes by now.Solitaire1 ought to be getting tired of karma fortunes by now.Solitaire1 ought to be getting tired of karma fortunes by now.Solitaire1 ought to be getting tired of karma fortunes by now.Solitaire1 ought to be getting tired of karma fortunes by now.Solitaire1 ought to be getting tired of karma fortunes by now.Solitaire1 ought to be getting tired of karma fortunes by now.Solitaire1 ought to be getting tired of karma fortunes by now.Solitaire1 ought to be getting tired of karma fortunes by now.Solitaire1 ought to be getting tired of karma fortunes by now.
 
Solitaire1's Avatar
 
Posts: 14,188
Karma: 66544976
Join Date: Nov 2009
Device: NookColor
If the ebook is formatted with a blank line between each paragraph or with a spaced indent (say five blank spaces), OpenOffice.org (OO.o) can easily convert it into a properly flowing document.

If the paragraphs are indented, do the following:

- Open the text file in OO.o

- Replace the indent spaces with an unused character (such as "~").

- Save it as an HTML file.

- Open the HTML file.

- Select the entire document, and change the paragraph style to "body text."

- Click on the HTML Source view. This will bring up the document's HTML code.

- Replace the unused characters with "</p><p>"

- Close the HTML Source view.

- Insert and delete a space in the document (this will cause OO.o to clean up the HTML code).

- Close the document.

The document should be formatted with paragraphs separated. You can now format the document the way that you want.

If the paragraphs are separated by blank lines, then do the following:

- Open the text file in OO.o

- Save it as an HTML file.

- Open the HTML file.

- Select the entire document, and change the paragraph style to "body text."

- Click on the HTML Source view. This will bring up the document's HTML code.

- Replace each "<p>" in the document with a blank space.

- Replace each "</p>" in the document with a blank space.

- Replace each "<br><br>" in the document with "</p><p>"

- Insert a "<p>" at the beginning of the document's text.

- Delete the last "<p>" in the text.

- Close the HTML Source view.

- Insert and delete a space in the document (this will cause OO.o to clean up the HTML code).

- Close the document.

The document should be formatted with paragraphs separated. You can now format the document the way that you want.
Solitaire1 is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
pdfreflow: reflow text PDFs Pranananda PDF 45 11-03-2011 09:32 AM
<pre> tags and no text reflow in EPUB sergio blum Calibre 24 10-14-2010 08:07 PM
What is the best reader read real reflow PDF ( not refow text ) ? familyhandh Which one should I buy? 1 08-05-2010 08:44 AM
80-column text reflow - Hanlin V3 elewton Other formats 1 02-10-2009 05:00 AM
text file tricks kenbaldwin Sony Reader 0 07-27-2007 10:07 AM


All times are GMT -4. The time now is 01:32 AM.


MobileRead.com is a privately owned, operated and funded community.