View Single Post
Old 08-19-2010, 11:26 PM   #11
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
This was done for pdf a while back, the 'Preprocess input file to possibly improve structure detection' option. I just doublechecked, and it does still appear to be working well in the latest version of Calibre. Is that function still in preprocess.py?

The regular expressions used for pdf don't apply to Lit/Txt files, but couldn't new regexes be defined for each input format, and then automatically used based on the input format when that box is checked? I've been running into more Lit/Txt files with this problem lately, so I could see if I could get it working if this is a good approach.

There used to be an option to tweak the average line length detection logic, but that appears to be removed (from the GUI at least).
ldolse is offline   Reply With Quote