Thread: Perl processing
View Single Post
Old 11-23-2007, 04:57 PM   #2
maxk
Enthusiast
maxk began at the beginning.
 
Posts: 36
Karma: 14
Join Date: Oct 2007
Device: Sony PRS-505
Since no one else replied I thought I'd offer my beginners input after messing around for a few hours.

Not sure what Office has to do with it? You mean as an automatic paragraph formatter by importing into it? Would OpenOffice help? That opens all the office formats. I don't have Office either.

All the existing processors seem to do a reasonably good job for most generic cases, there's a few modules on CPAN that can reformat messed up text that might help Book Designer import things better.

I've found it easier to just run the text file through a few sed filters rather than setting up a big perl script, it's the same search/replace as perl but you can get some faster results if you don't need anything fancy. If you're familiar with this then maybe it will help some other Unix-type people who haven't realised how easy it is to do some command line text processing.

Eg. I fixed hard returns in one html file that wasn't importing into Book Designer properly because it had "<space><br>" at the end of every line by running

Code:
cat file.txt | sed -e 's/ <br>$/ /g' | more
Then check the output, if it needs a few more sed replacements, put them in the command:

Code:
cat file.txt |\
      sed -e 's/ <br>$/ /g' | \
      sed -e 's/-<br>$/- /g' | \
      more
... then eventually redirect the output to a new file (> newfile.txt).

Code:
cat file.txt |\
      sed -e 's/ <br>$/ /g' | \
      sed -e 's/-<br>$/- /g' | \
      >newfile.txt
This can be done through cygwin or a *ix shell. You can put it in a shell script and swap file.txt with $1 and newfile.txt with $2 for bulk processing and you will have a file full of some more commonly used replace patterns for future conversions handy.

I can't see how to automate these "special cases" of broken text files, the problems are too specific. But once it's clean enough Book Designer will import and do it's magic amazingly well. Once it is in Book Designer it has an amazing search/replace with regex which can help with the rough edges.

Last edited by maxk; 11-23-2007 at 05:02 PM.
maxk is offline   Reply With Quote