MobileRead Forums - View Single Post

speakingtohe · 04-24-2010, 07:57 PM

I am by no means an expert although I can make it work usually.
When you click on Structure detection and click on one of the magic wands (I just use the Header one)
A dialog box comes up with raw input.
At the top is an example expression.
You have to modify this
The REGEX (example expression) is made up of pattern matching expressions and/or characters you want to match. It does have a finite length I believe but not sure what.

To see how this works copy a bit of text from the preview window and paste it into the Regex line. Click test. This will highlight the text you copied in light grey which is hard to see, but if you click in the preview window it will turn yellow.
If this is a multiple occurring line of text you can scroll down and see it highlighted everywhere it occurs.

I am still not totally understanding the pattern matching so won't confuse you on my conceptions and misconceptions there.

The book I did had a footer or header that contained a web address surrounded by brackets () etc. So I just matched a distinctive part of this and put .'s before and after in until it matched it all. The . (period) matches any character
If you had a line that said (This page is printed) the backets can't be entered in the ordinary way. But .This page is printed. would match it.
Not the most elegant solution especially if you use 37 .'s but quick and dirty is okay on occasion for me

I don't think it matters whether it is a header or a footer in pdf's and or centered or not.
Case does matter. Chapter ...... is not the same as CHAPTER ......

I am pretty new at the Python stuff myself and off an advanced age so not real fast on the uptake, but it isn't impossible, just a bit daunting at times

First step IMO is to type chapter (correct case) into the regex box and click in preview windo and scroll down to see what is highlighted.

You probably know that for pdf's 0.04 is a good default value for line unwrapping.

Just rememeber it is all easy once you have done it.
Helen

04-24-2010, 07:57 PM	#4
speakingtohe Wizard Posts: 4,812 Karma: 26912940 Join Date: Apr 2010 Device: sony PRS-T1 and T3, Kobo Mini and Aura HD, Tablet	I am by no means an expert although I can make it work usually. When you click on Structure detection and click on one of the magic wands (I just use the Header one) A dialog box comes up with raw input. At the top is an example expression. You have to modify this The REGEX (example expression) is made up of pattern matching expressions and/or characters you want to match. It does have a finite length I believe but not sure what. To see how this works copy a bit of text from the preview window and paste it into the Regex line. Click test. This will highlight the text you copied in light grey which is hard to see, but if you click in the preview window it will turn yellow. If this is a multiple occurring line of text you can scroll down and see it highlighted everywhere it occurs. I am still not totally understanding the pattern matching so won't confuse you on my conceptions and misconceptions there. The book I did had a footer or header that contained a web address surrounded by brackets () etc. So I just matched a distinctive part of this and put .'s before and after in until it matched it all. The . (period) matches any character If you had a line that said (This page is printed) the backets can't be entered in the ordinary way. But .This page is printed. would match it. Not the most elegant solution especially if you use 37 .'s but quick and dirty is okay on occasion for me I don't think it matters whether it is a header or a footer in pdf's and or centered or not. Case does matter. Chapter ...... is not the same as CHAPTER ...... I am pretty new at the Python stuff myself and off an advanced age so not real fast on the uptake, but it isn't impossible, just a bit daunting at times First step IMO is to type chapter (correct case) into the regex box and click in preview windo and scroll down to see what is highlighted. You probably know that for pdf's 0.04 is a good default value for line unwrapping. Just rememeber it is all easy once you have done it. Helen