Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Readers > Sony Reader

Notices

Reply
 
Thread Tools Search this Thread
Old 11-20-2007, 11:13 AM   #1
alexxxm
Addict
alexxxm has a complete set of Star Wars action figures.alexxxm has a complete set of Star Wars action figures.alexxxm has a complete set of Star Wars action figures.alexxxm has a complete set of Star Wars action figures.
 
Posts: 205
Karma: 356
Join Date: Aug 2007
Device: Rocket; Hiebook; N700; Sony 505; Kindle DX ...
Perl processing

since I don't own MsOffice, I'm out of luck using BookDesigner.

I was wondering if some Perl-lover reads this forum, to know his ideas on basic formatting txt files using perl one-liners, e.g.

s/[[:^print:]]//g to eliminate nonprinting chars

... it's all about not re-inventing the wheel each time!
e.g. - how to deal with accented characters? Which punctuation to accept? And so on...

Alessandro
alexxxm is offline   Reply With Quote
Old 11-23-2007, 05:57 PM   #2
maxk
Enthusiast
maxk began at the beginning.
 
Posts: 36
Karma: 14
Join Date: Oct 2007
Device: Sony PRS-505
Since no one else replied I thought I'd offer my beginners input after messing around for a few hours.

Not sure what Office has to do with it? You mean as an automatic paragraph formatter by importing into it? Would OpenOffice help? That opens all the office formats. I don't have Office either.

All the existing processors seem to do a reasonably good job for most generic cases, there's a few modules on CPAN that can reformat messed up text that might help Book Designer import things better.

I've found it easier to just run the text file through a few sed filters rather than setting up a big perl script, it's the same search/replace as perl but you can get some faster results if you don't need anything fancy. If you're familiar with this then maybe it will help some other Unix-type people who haven't realised how easy it is to do some command line text processing.

Eg. I fixed hard returns in one html file that wasn't importing into Book Designer properly because it had "<space><br>" at the end of every line by running

Code:
cat file.txt | sed -e 's/ <br>$/ /g' | more
Then check the output, if it needs a few more sed replacements, put them in the command:

Code:
cat file.txt |\
      sed -e 's/ <br>$/ /g' | \
      sed -e 's/-<br>$/- /g' | \
      more
... then eventually redirect the output to a new file (> newfile.txt).

Code:
cat file.txt |\
      sed -e 's/ <br>$/ /g' | \
      sed -e 's/-<br>$/- /g' | \
      >newfile.txt
This can be done through cygwin or a *ix shell. You can put it in a shell script and swap file.txt with $1 and newfile.txt with $2 for bulk processing and you will have a file full of some more commonly used replace patterns for future conversions handy.

I can't see how to automate these "special cases" of broken text files, the problems are too specific. But once it's clean enough Book Designer will import and do it's magic amazingly well. Once it is in Book Designer it has an amazing search/replace with regex which can help with the rough edges.

Last edited by maxk; 11-23-2007 at 06:02 PM.
maxk is offline   Reply With Quote
Old 11-26-2007, 07:05 AM   #3
alexxxm
Addict
alexxxm has a complete set of Star Wars action figures.alexxxm has a complete set of Star Wars action figures.alexxxm has a complete set of Star Wars action figures.alexxxm has a complete set of Star Wars action figures.
 
Posts: 205
Karma: 356
Join Date: Aug 2007
Device: Rocket; Hiebook; N700; Sony 505; Kindle DX ...
Quote:
Originally Posted by maxk View Post
Since no one else replied I thought I'd offer my beginners input after messing around for a few hours.
Thanks a lot - most appreciated

Quote:
Originally Posted by maxk View Post
Not sure what Office has to do with it? You mean as an automatic paragraph formatter by importing into it? Would OpenOffice help? That opens all the office formats. I don't have Office either.
Without MsOffice installed, I believed it was impossible to use BookDesigner - I'll check


Quote:
Originally Posted by maxk View Post
I've found it easier to just run the text file through a few sed filters rather than setting up a big perl script, it's the same search/replace as perl but you can get some faster results if you don't need anything fancy.
The same way I use perl-oneliners.
I'm not an expert at all in sed, but I'll give it a try, thanks!

Alessandro
alexxxm is offline   Reply With Quote
Old 11-26-2007, 07:13 AM   #4
alexxxm
Addict
alexxxm has a complete set of Star Wars action figures.alexxxm has a complete set of Star Wars action figures.alexxxm has a complete set of Star Wars action figures.alexxxm has a complete set of Star Wars action figures.
 
Posts: 205
Karma: 356
Join Date: Aug 2007
Device: Rocket; Hiebook; N700; Sony 505; Kindle DX ...
Specifically, I get the error:
<Cannot convert to rtf format, Check your MS Word installation>

- even when I follow the advice at http://www.mobileread.com/forums/sho...&postcount=199
Alessandro
alexxxm is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Perl and Regex Alexander Turcic Lounge 3 01-25-2011 08:48 PM
Comic File Processing wonderboy Other formats 1 08-08-2009 05:17 AM
Image processing using html2epub? Portnull Calibre 2 06-03-2009 01:31 PM
Text Processing: Some Ideas ahi Workshop 4 05-29-2009 05:35 PM
Any perl or python gurus? jbenny Workshop 0 11-23-2007 04:27 PM


All times are GMT -4. The time now is 11:39 PM.


MobileRead.com is a privately owned, operated and funded community.