View Single Post
Old 02-27-2010, 11:39 PM   #6
nerys
Addict
nerys began at the beginning.
 
nerys's Avatar
 
Posts: 243
Karma: 48
Join Date: Dec 2006
Device: PRS 500 - REB 1200
hey!! the file://.+\) worked a treat thanks!

right now I use some pdf to txt to convert the pdf to text

but when I used notepad++ to do all of the following

file://.+\:53 (this changed from docu to docu)

then I replace \r\n\r\n with a unique word

I then eliminate all \r\n

and then convert the unique word back to \r\n\r\n

this gets rid of sentence \r\n but KEEPS paragraph \r\n

alas SOME books "lose" the double \r\n when converted.

GRRRRR :-)

maybe adding in a conversion for .\r\n to \r\n\r\n might work. (yep it worked)

Basically you have to look over the document and do all kinds of custom replace strings to get it to look right.

I wish there was an INTELLIGENT converter that would retain the critical formatting for readability and get rid of the "junk" like footers and headers that are not RELEVANT on an ereader. etc.. etc..

extra spacing which is just "lost" formatting in the conversion process etc..
nerys is offline   Reply With Quote