View Single Post
Old 11-01-2011, 02:56 PM   #8
MacEvansCB
Enthusiast
MacEvansCB began at the beginning.
 
Posts: 25
Karma: 10
Join Date: Nov 2010
Location: Somewhere in Iowa
Device: Nook Color
Get the new PDF converter finished!!!!!!

dwanthny ... I HAVE read the PDF stickie. Several times.

This is *NOT* a ligature problem. There are no problems at all with words that have any of the 'f' ligatures in them. And this is not a problem with an 'LL' ligature disappearing or turning into some other character.

I can copy/paste an offending paragraph from the PDF to TextEdit and all the double LL's copy/paste just fine AND show as two distinct characters... and, when moving the cursor thru the words in the PDF, the LL's show as two separate characters. If ligatures are single characters, are they displayed as a single character or as two separate characters in a PDF???? ... and do ligatures copy/paste as two separate characters?????

And please note that this ONLY happens with ALL conversions from a SANS SERIF font... all conversions from a SERIF font do not have this problem AT ALL. So do people generating PDFs only use ligatures with Helvetica or Arial, but not with Times???

Doing word search/replace is practically impossible, given the gigantic number of different words with both single 'L's and double 'LL's. I have started cleaning one file by doing one search/replace for "l " -> "ll" for words with an embedded "ll" and a second search/replace for "l " -> "ll " for words with "ll" at the end of the word. But both searches have to be run as "find next" followed by either "replace" or "ignore" and each hit has to be decided on individually. This is ridiculous in a 200,000 word document.

I guess I should just try copy/paste as it seems to work just as messily as Calibre conversion does.

I don't have a choice here most of the time ... I HAVE to work from PDF originals.

It would be REALLY REALLY WONDERFUL if the new PDF engine was given more priority.

Between problems like this and problems with wrap/unwrap, I have to spend WAY TOO MUCH time scrubbing thru PDF conversions.
MacEvansCB is offline   Reply With Quote