Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 11-15-2008, 11:10 PM   #1
ficbot
Wizard
ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.
 
Posts: 2,389
Karma: 4115574
Join Date: Sep 2008
Device: Kindle Paperwhite/iOS Kindle App
Tell me there is an easier way!

I am looking for an easier way to convert a large batch of plain text ebooks that my sister sent me. There are lots of messy files with many paragraph breaks, or no paragraph breaks and other such issues. I was importing them into Open Office and manually going through to remove the breaks (using the CF option when I import them as some had no breaks at all) but still getting a lot of garbage in them once I converted them into pdb files to load on the ipod.

I finally figured out a way to get the books to appear in a satisfactory way in the finished file, but it is very labour-intensive, using Neo Office and Kompozer, which is an HTML program:

1) Open it in Neo Office and if it gives the option, say 'cf' only
2) Manually scan the document for large gaps and remove them
3) Save the file as a plain text document
4) Re-open the file
5) Select-all and copy
6) Paste it into a new window in Kompozer
7) In Kompozer, Select-all and copy
8) Paste this into a new Neo Office document
9) Save this as a Word file
10) Use conversion program to convert Word to PDB

Isn't there an easier way? All I want is regular old text, one line break between paragraphs, nothing fancy. It seems though that depending on the program originally used to make the text file, there are tabs or special characters used to indicate the line breaks, and I don't see them in Neo Office, but I do once the file is converted. It seems the only way to get "clean" text is to paste it into a web page program, which generates proper paragraph breaks where the line breaks are, and then when I paste that back into Neo Office, everything is fine. But this whole process can take upwards of 15 minutes per book!

I am on a Mac here and I don't have MS software on it. I ave Neo Office, and Pages. I am willing to buy a new program if need be, but as I am on a Mac, I suspect my options may be limited.

Advice?
ficbot is offline   Reply With Quote
Old 11-16-2008, 03:52 AM   #2
pepak
Fanatic
pepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura about
 
Posts: 594
Karma: 4150
Join Date: Mar 2008
Device: Sony Reader PRS-505
Quote:
Originally Posted by ficbot View Post
Isn't there an easier way? All I want is regular old text, one line break between paragraphs, nothing fancy.
You will not be able to avoid manual work. That said, it is often quite possible to fix errors like those you describe using the power of regular expressions. You will likely need to use some programmer's editor (such as PSPad) to make a full use of them, though.

(It might be a good idea to do your editing in such an editor even if you decide learning regexps is too much work to be worth it - these editors tend to be much better in displaying special characters than your average office application.)
pepak is offline   Reply With Quote
 
Enthusiast
Old 11-16-2008, 07:29 AM   #3
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 62,754
Karma: 40390405
Join Date: Nov 2006
Location: UK
Device: PW2, iPad Retina Mini, iPhone 4, MS Surface Pro, Onyx T68, N7,
What is the original source of the material? If you download free books from sources like PG, they should be reasonably well-formatted.
HarryT is offline   Reply With Quote
Old 11-16-2008, 09:41 AM   #4
Ralph Sir Edward
Gentleman & Cynic
Ralph Sir Edward ought to be getting tired of karma fortunes by now.Ralph Sir Edward ought to be getting tired of karma fortunes by now.Ralph Sir Edward ought to be getting tired of karma fortunes by now.Ralph Sir Edward ought to be getting tired of karma fortunes by now.Ralph Sir Edward ought to be getting tired of karma fortunes by now.Ralph Sir Edward ought to be getting tired of karma fortunes by now.Ralph Sir Edward ought to be getting tired of karma fortunes by now.Ralph Sir Edward ought to be getting tired of karma fortunes by now.Ralph Sir Edward ought to be getting tired of karma fortunes by now.Ralph Sir Edward ought to be getting tired of karma fortunes by now.Ralph Sir Edward ought to be getting tired of karma fortunes by now.
 
Ralph Sir Edward's Avatar
 
Posts: 5,554
Karma: 13104900
Join Date: Jan 2008
Location: 5 generation native Texan
Device: BeBook/Openinkpot, CYbook 3rd gen awaiting RTF software upgrade
Quote:
Originally Posted by ficbot View Post
I am looking for an easier way to convert a large batch of plain text ebooks that my sister sent me. There are lots of messy files with many paragraph breaks, or no paragraph breaks and other such issues. I was importing them into Open Office and manually going through to remove the breaks (using the CF option when I import them as some had no breaks at all) but still getting a lot of garbage in them once I converted them into pdb files to load on the ipod.

I finally figured out a way to get the books to appear in a satisfactory way in the finished file, but it is very labour-intensive, using Neo Office and Kompozer, which is an HTML program:

1) Open it in Neo Office and if it gives the option, say 'cf' only
2) Manually scan the document for large gaps and remove them
3) Save the file as a plain text document
4) Re-open the file
5) Select-all and copy
6) Paste it into a new window in Kompozer
7) In Kompozer, Select-all and copy
8) Paste this into a new Neo Office document
9) Save this as a Word file
10) Use conversion program to convert Word to PDB

Isn't there an easier way? All I want is regular old text, one line break between paragraphs, nothing fancy. It seems though that depending on the program originally used to make the text file, there are tabs or special characters used to indicate the line breaks, and I don't see them in Neo Office, but I do once the file is converted. It seems the only way to get "clean" text is to paste it into a web page program, which generates proper paragraph breaks where the line breaks are, and then when I paste that back into Neo Office, everything is fine. But this whole process can take upwards of 15 minutes per book!

I am on a Mac here and I don't have MS software on it. I ave Neo Office, and Pages. I am willing to buy a new program if need be, but as I am on a Mac, I suspect my options may be limited.

Advice?
I can't help you (in detail) with a Mac, but what you need is a good hex file editor. I use HexEdit 3.10 on a Windows machine. With a good hex file editor, you can open the .txt file, and see all the non-display control characters. Then you can find/replace and change them to whatever you need, as a batch process. It make take 2 or 3 passed to process the file exactly the way you want it, but it works. And you can do changes this way for any clear text file control set. (HTML, RTF, ect.)
Ralph Sir Edward is offline   Reply With Quote
Old 11-24-2008, 01:13 PM   #5
tompe
Grand Sorcerer
tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.
 
Posts: 7,019
Karma: 3896796
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Nexus 7, Nexus 5, iPad 2, Kindle PW
Quote:
Originally Posted by Ralph Sir Edward View Post
I can't help you (in detail) with a Mac, but what you need is a good hex file editor. I use HexEdit 3.10 on a Windows machine. With a good hex file editor, you can open the .txt file, and see all the non-display control characters. Then you can find/replace and change them to whatever you need, as a batch process. It make take 2 or 3 passed to process the file exactly the way you want it, but it works. And you can do changes this way for any clear text file control set. (HTML, RTF, ect.)
That was a new method for me. I would say that what you need is Emacs but if you are no an Emacs user already this solution probably have a too high learning threshold. The same could be said of my second default approach and that is write a Perl script.
tompe is online now   Reply With Quote
Old 11-25-2008, 06:21 PM   #6
brewt
Boo-Frickety-Hoo-Erizer
brewt will become famous soon enoughbrewt will become famous soon enoughbrewt will become famous soon enoughbrewt will become famous soon enoughbrewt will become famous soon enoughbrewt will become famous soon enough
 
brewt's Avatar
 
Posts: 254
Karma: 686
Join Date: Oct 2007
Device: SONY PRS 350!
Get yer shoppin' shoes on.

Go get TextSpresso from http://www.taylor-design.com/textspresso/overview.htm

Yes, it's for PC or Mac, too. $25.

With this, you can batch convert to rmove html garbage, bad line feeds, recombine sentences, and be left with reasonably formatted text. Small handful of button pushes and yer done.

-bjc
brewt is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Easier German Novels synlor Reading Recommendations 15 08-05-2013 01:27 AM
Easier way to use bookmarks in the K3 browser AthenaAtDelphi Amazon Kindle 3 10-14-2010 10:54 PM
Which is easier on the eyes? talaivan Which one should I buy? 8 11-05-2008 02:12 PM
Easier Navigation for Connect jerryleejr Sony Reader 5 06-20-2008 05:15 PM
Easier on the hands and eyes? AnnCook Which one should I buy? 17 05-25-2008 11:42 AM


All times are GMT -4. The time now is 07:23 PM.


MobileRead.com is a privately owned, operated and funded community.