hey!! the file://.+\) worked a treat thanks!
right now I use some pdf to txt to convert the pdf to text
but when I used notepad++ to do all of the following
file://.+\:53 (this changed from docu to docu)
then I replace \r\n\r\n with a unique word
I then eliminate all \r\n
and then convert the unique word back to \r\n\r\n
this gets rid of sentence \r\n but KEEPS paragraph \r\n
alas SOME books "lose" the double \r\n when converted.
GRRRRR :-)
maybe adding in a conversion for .\r\n to \r\n\r\n might work. (yep it worked)
Basically you have to look over the document and do all kinds of custom replace strings to get it to look right.
I wish there was an INTELLIGENT converter that would retain the critical formatting for readability and get rid of the "junk" like footers and headers that are not RELEVANT on an ereader. etc.. etc..
extra spacing which is just "lost" formatting in the conversion process etc..
|