View Single Post
Old 01-05-2015, 12:01 PM   #2
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 31,079
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by phossler View Post
I had a terribly badly formatted book that I had to go back to the raw TXT and try to start over

That file was clean ASCII with CR-LF's separating paragraphs (#1 and #2)

I added it to Calibre and converted to epub using the default TXT input preferences. I didn't see any knobs to turn that would help.

There were some H2 and H3 assumptions made and the text was divided in unexpected places. For example (#3) the TOC text from the TXT file was pretty much converted to the first 42 files with 2 or 3 lines per file. The bulk of the text was in the last 4 files. Since the TOC was in the ASCII text file two time it was converted 2 times, driving up the number of 2 line files.

Q1 - why were there so many 2 or 3 line files created? What is the conversion logic that decided H2 and H3's and separate file?

Q2 - is there a better way to add and convert txt files?

Q3 - RegEx will clean or fix a lot. For example

<p>6</p> into <h1>Chapter 6</h1>

but it can still be a lot of fiddly work. Are there any options or plug ins that might help?

Thanks
have you played with the Preferences: Input Options: (TXT input): Structure? Auto is not always best

Personally, I am in no big rush so I use the EPUB Editor (Sigil in my case as I have dozens of saved searches) to do line various 'Join" cleanup.
Then I do a spell check pass to find gross damage (gap (split) words or still hyphens,lost hyphens)
And (section)file merge when the basics have been smoothed.
theducks is online now   Reply With Quote