Quote:
Originally Posted by Archon
I would like to learn more about cleaning up text files that need editing i.e. removing page numbers, adding indents to paragraphs, and creating a template for all text files to be converted to.
What should I begin learning about to accomplish this?
I have started to read a little about regular expressions and perl and I can use other programs or the Terminal (Mac OSX) to convert files. I am a somewhat experienced computer user but am not a programmer by any means.
I was hoping a Guru could tell me what to focus on to get Calibre to clean these files as they are imported.
OR
Give me a shortlist of what to learn to create scripts or small programs (Applescript or perl?) that I could drop a txt file or rtf file on and have it cleaned up and converted.
Taking all helpful advice.
Archon
|
I take an alternate approach (total EPUB bias
here).
I Import format x into Calibre,
I fix my meta-data first
Then I Convert to EPUB, getting the Paragraphs detected properly and don't spend a lot of time fine tuning the Regex for that 'perfect' convert.
(My experience: Each document needs a slightly different approach(es). OS does not FA
)
Then I use Sigil for the rest.
A "clean" book takes less than 5 Minutes in Sigil/Flightcrew.
A messy (Word sourced?) can take 30 minutes to trim the gross cruft.)
Really bad
(UC OCR?) might go to an hour-plus or get tossed.
Note: I run Mutiple Monitors, so I can have both versions displayed at once for visual comparison I also use a Programmable key pad with frequently used keystroke patterns (Del Del space), so my right hand controls the mouse and the left punches a macro button, thus reducing the hands motion back and forth.
YMMV