MobileRead Forums - View Single Post

SBT · 12-10-2011, 01:32 PM

@opitz: Thanks for your kind words; any kind of feedback is welcome.

When you've finished proofreading in LibreOffice, or just want to return to editing in a pure text editor, you can use the following function which is the reverse of makeproofread (which I think I'll rename txt2proof, so there'll be some consistency.

I thought it would be a good idea to read input from STDIN if no filename is given; I'll probably add that functionality to all functions where appropriate

Code:

function proof2txt {
# Usage: proof2txt [inputfile.html].
# If no inputfile, input is read from STDIN.
# Output written to STDOUT
[ $1 ] && inputfile=$1 || inputfile="/dev/stdin"
# Handle text marked as italic/bold.
# LibreOffice inserts </I> and <I> (ditto for bold) at the end and beginning 
# of italic sections than spans several lines.
# Enclosing <..> tags are replaced by html-encoded < & > for italics/bold.
sed  '1h;1!H;${g;s/<\/I>\n<I>/\n/g;s/<\/B>\n<B>/\n/g;p;}' $inputfile |\
sed s/'<\(\/\?[BI]\)>'/'\&lt;\1\&gt;'/g |\
lynx -dump -stdin |\
grep -v "^   \[[0-9]\{3,3\}\.jpg\]"
}

12-10-2011, 01:32 PM	#8
SBT Fanatic Posts: 580 Karma: 810184 Join Date: Sep 2010 Location: Norway Device: prs-t1, tablet, Nook Simple, assorted kindles, iPad	convert from proof-reading html back to text @opitz: Thanks for your kind words; any kind of feedback is welcome. When you've finished proofreading in LibreOffice, or just want to return to editing in a pure text editor, you can use the following function which is the reverse of makeproofread (which I think I'll rename txt2proof, so there'll be some consistency. I thought it would be a good idea to read input from STDIN if no filename is given; I'll probably add that functionality to all functions where appropriate Code: function proof2txt { # Usage: proof2txt [inputfile.html]. # If no inputfile, input is read from STDIN. # Output written to STDOUT [ $1 ] && inputfile=$1 \|\| inputfile="/dev/stdin" # Handle text marked as italic/bold. # LibreOffice inserts </I> and <I> (ditto for bold) at the end and beginning # of italic sections than spans several lines. # Enclosing <..> tags are replaced by html-encoded < & > for italics/bold. sed '1h;1!H;${g;s/<\/I>\n<I>/\n/g;s/<\/B>\n<B>/\n/g;p;}' $inputfile \|\ sed s/'<\(\/\?[BI]\)>'/'\<\1\>'/g \|\ lynx -dump -stdin \|\ grep -v "^ \[[0-9]\{3,3\}\.jpg\]" } Last edited by SBT; 12-10-2011 at 01:36 PM.