Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 11-15-2008, 11:10 PM   #1
ficbot
Wizard
ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.ficbot ought to be getting tired of karma fortunes by now.
 
Posts: 2,409
Karma: 4132096
Join Date: Sep 2008
Device: Kindle Paperwhite/iOS Kindle App
Tell me there is an easier way!

I am looking for an easier way to convert a large batch of plain text ebooks that my sister sent me. There are lots of messy files with many paragraph breaks, or no paragraph breaks and other such issues. I was importing them into Open Office and manually going through to remove the breaks (using the CF option when I import them as some had no breaks at all) but still getting a lot of garbage in them once I converted them into pdb files to load on the ipod.

I finally figured out a way to get the books to appear in a satisfactory way in the finished file, but it is very labour-intensive, using Neo Office and Kompozer, which is an HTML program:

1) Open it in Neo Office and if it gives the option, say 'cf' only
2) Manually scan the document for large gaps and remove them
3) Save the file as a plain text document
4) Re-open the file
5) Select-all and copy
6) Paste it into a new window in Kompozer
7) In Kompozer, Select-all and copy
8) Paste this into a new Neo Office document
9) Save this as a Word file
10) Use conversion program to convert Word to PDB

Isn't there an easier way? All I want is regular old text, one line break between paragraphs, nothing fancy. It seems though that depending on the program originally used to make the text file, there are tabs or special characters used to indicate the line breaks, and I don't see them in Neo Office, but I do once the file is converted. It seems the only way to get "clean" text is to paste it into a web page program, which generates proper paragraph breaks where the line breaks are, and then when I paste that back into Neo Office, everything is fine. But this whole process can take upwards of 15 minutes per book!

I am on a Mac here and I don't have MS software on it. I ave Neo Office, and Pages. I am willing to buy a new program if need be, but as I am on a Mac, I suspect my options may be limited.

Advice?
ficbot is offline   Reply With Quote
Old 11-16-2008, 03:52 AM   #2
pepak
Guru
pepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura about
 
Posts: 610
Karma: 4150
Join Date: Mar 2008
Device: Sony Reader PRS-T3, Kobo Libra H2O
Quote:
Originally Posted by ficbot View Post
Isn't there an easier way? All I want is regular old text, one line break between paragraphs, nothing fancy.
You will not be able to avoid manual work. That said, it is often quite possible to fix errors like those you describe using the power of regular expressions. You will likely need to use some programmer's editor (such as PSPad) to make a full use of them, though.

(It might be a good idea to do your editing in such an editor even if you decide learning regexps is too much work to be worth it - these editors tend to be much better in displaying special characters than your average office application.)
pepak is offline   Reply With Quote
Advert
Old 11-16-2008, 07:29 AM   #3
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 85,544
Karma: 93383043
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
What is the original source of the material? If you download free books from sources like PG, they should be reasonably well-formatted.
HarryT is offline   Reply With Quote
Old 11-16-2008, 09:41 AM   #4
Greg Anos
Grand Sorcerer
Greg Anos ought to be getting tired of karma fortunes by now.Greg Anos ought to be getting tired of karma fortunes by now.Greg Anos ought to be getting tired of karma fortunes by now.Greg Anos ought to be getting tired of karma fortunes by now.Greg Anos ought to be getting tired of karma fortunes by now.Greg Anos ought to be getting tired of karma fortunes by now.Greg Anos ought to be getting tired of karma fortunes by now.Greg Anos ought to be getting tired of karma fortunes by now.Greg Anos ought to be getting tired of karma fortunes by now.Greg Anos ought to be getting tired of karma fortunes by now.Greg Anos ought to be getting tired of karma fortunes by now.
 
Posts: 11,248
Karma: 35000000
Join Date: Jan 2008
Device: Pocketbook
Quote:
Originally Posted by ficbot View Post
I am looking for an easier way to convert a large batch of plain text ebooks that my sister sent me. There are lots of messy files with many paragraph breaks, or no paragraph breaks and other such issues. I was importing them into Open Office and manually going through to remove the breaks (using the CF option when I import them as some had no breaks at all) but still getting a lot of garbage in them once I converted them into pdb files to load on the ipod.

I finally figured out a way to get the books to appear in a satisfactory way in the finished file, but it is very labour-intensive, using Neo Office and Kompozer, which is an HTML program:

1) Open it in Neo Office and if it gives the option, say 'cf' only
2) Manually scan the document for large gaps and remove them
3) Save the file as a plain text document
4) Re-open the file
5) Select-all and copy
6) Paste it into a new window in Kompozer
7) In Kompozer, Select-all and copy
8) Paste this into a new Neo Office document
9) Save this as a Word file
10) Use conversion program to convert Word to PDB

Isn't there an easier way? All I want is regular old text, one line break between paragraphs, nothing fancy. It seems though that depending on the program originally used to make the text file, there are tabs or special characters used to indicate the line breaks, and I don't see them in Neo Office, but I do once the file is converted. It seems the only way to get "clean" text is to paste it into a web page program, which generates proper paragraph breaks where the line breaks are, and then when I paste that back into Neo Office, everything is fine. But this whole process can take upwards of 15 minutes per book!

I am on a Mac here and I don't have MS software on it. I ave Neo Office, and Pages. I am willing to buy a new program if need be, but as I am on a Mac, I suspect my options may be limited.

Advice?
I can't help you (in detail) with a Mac, but what you need is a good hex file editor. I use HexEdit 3.10 on a Windows machine. With a good hex file editor, you can open the .txt file, and see all the non-display control characters. Then you can find/replace and change them to whatever you need, as a batch process. It make take 2 or 3 passed to process the file exactly the way you want it, but it works. And you can do changes this way for any clear text file control set. (HTML, RTF, ect.)
Greg Anos is offline   Reply With Quote
Old 11-24-2008, 01:13 PM   #5
tompe
Grand Sorcerer
tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.
 
Posts: 7,452
Karma: 7185064
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Kindle Voyage, Nexus 5, Kindle PW
Quote:
Originally Posted by Ralph Sir Edward View Post
I can't help you (in detail) with a Mac, but what you need is a good hex file editor. I use HexEdit 3.10 on a Windows machine. With a good hex file editor, you can open the .txt file, and see all the non-display control characters. Then you can find/replace and change them to whatever you need, as a batch process. It make take 2 or 3 passed to process the file exactly the way you want it, but it works. And you can do changes this way for any clear text file control set. (HTML, RTF, ect.)
That was a new method for me. I would say that what you need is Emacs but if you are no an Emacs user already this solution probably have a too high learning threshold. The same could be said of my second default approach and that is write a Perl script.
tompe is offline   Reply With Quote
Advert
Old 11-25-2008, 06:21 PM   #6
brewt
Boo-Frickety-Hoo-Erizer
brewt will become famous soon enoughbrewt will become famous soon enoughbrewt will become famous soon enoughbrewt will become famous soon enoughbrewt will become famous soon enoughbrewt will become famous soon enough
 
brewt's Avatar
 
Posts: 251
Karma: 686
Join Date: Oct 2007
Device: Kobo Glo HD!
Get yer shoppin' shoes on.

Go get TextSpresso from http://www.taylor-design.com/textspresso/overview.htm

Yes, it's for PC or Mac, too. $25.

With this, you can batch convert to rmove html garbage, bad line feeds, recombine sentences, and be left with reasonably formatted text. Small handful of button pushes and yer done.

-bjc
brewt is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Easier German Novels synlor Reading Recommendations 15 08-05-2013 01:27 AM
Easier way to use bookmarks in the K3 browser AthenaAtDelphi Amazon Kindle 3 10-14-2010 10:54 PM
Which is easier on the eyes? talaivan Which one should I buy? 8 11-05-2008 02:12 PM
Easier Navigation for Connect jerryleejr Sony Reader 5 06-20-2008 05:15 PM
Easier on the hands and eyes? AnnCook Which one should I buy? 17 05-25-2008 11:42 AM


All times are GMT -4. The time now is 12:30 PM.


MobileRead.com is a privately owned, operated and funded community.