One item to double check is that all too often, the OCR text layer used for search has not been proofed and the quality can be absolutely atrocious. What I tend to start with now is extracting the images from the PDF, cleaning them up and then OCRring them. OTOH, this is often a case of the game not being worth the candle. Too much effort for too little return.
On a brighter note, if you look for messages by Tex2002ans, you will find much help.
See this recent thread for instance:
From print to ePub - how I did it.