View Single Post
Old 04-29-2011, 09:43 PM   #5
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Generally speaking that readme is not quite correct - a lot preprocessing happens on the input stage - Heuristics is normally already executed on that output. However for pdf and mobi there is also an earlier debug file in the input directory that shows the actual output of the input plugin.

Heuristics does have a small list of words, and the word 'Prelude' isn't among them, though it could still get caught on one of the other heuristics patterns, like all uppercase letters, etc. The fact that the second conversion is getting it seems to indicate it's getting picked up at some point.

You could also try simplifying the xpath to just use 'h2' - just click the magic wand next to the xpath and type h2 in the first box.
ldolse is offline   Reply With Quote