View Single Post
Old 01-05-2015, 10:35 AM   #1
phossler
Wizard
phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.phossler ought to be getting tired of karma fortunes by now.
 
Posts: 1,087
Karma: 447222
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
Best way for .TXT to be edited?

I had a terribly badly formatted book that I had to go back to the raw TXT and try to start over

That file was clean ASCII with CR-LF's separating paragraphs (#1 and #2)

I added it to Calibre and converted to epub using the default TXT input preferences. I didn't see any knobs to turn that would help.

There were some H2 and H3 assumptions made and the text was divided in unexpected places. For example (#3) the TOC text from the TXT file was pretty much converted to the first 42 files with 2 or 3 lines per file. The bulk of the text was in the last 4 files. Since the TOC was in the ASCII text file two time it was converted 2 times, driving up the number of 2 line files.

Q1 - why were there so many 2 or 3 line files created? What is the conversion logic that decided H2 and H3's and separate file?

Q2 - is there a better way to add and convert txt files?

Q3 - RegEx will clean or fix a lot. For example

<p>6</p> into <h1>Chapter 6</h1>

but it can still be a lot of fiddly work. Are there any options or plug ins that might help?

Thanks
Attached Thumbnails
Click image for larger version

Name:	1.JPG
Views:	165
Size:	37.2 KB
ID:	133350   Click image for larger version

Name:	2.JPG
Views:	167
Size:	80.8 KB
ID:	133351   Click image for larger version

Name:	3.JPG
Views:	177
Size:	107.8 KB
ID:	133352  
phossler is offline   Reply With Quote