| 
			
			 | 
		#1 | 
| 
			
			
			
			 Junior Member 
			
			![]() Posts: 4 
				Karma: 10 
				Join Date: Oct 2009 
				
				
				
				Device: Kindle 2 
				
				
				 | 
	
	
	
		
		
			
			 
				
				[Old Thread] Removing ABBYY header in a PDF
			 
			
			
			I have a few PDF files someone else converted using ABBYY PDF Transformer. 
		
	
		
		
		
		
		
		
		
		
		
		
	
	Each page has a graphic in both top corners. Looking at the page in Calibre's header wizard shows the encoding behind the graphic as this: <a href="http://www.abbyy.com/buy"><b>PDF Transform</b></a><p> <a href="http://www.abbyy.com/buy"><b>PDF Transform</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>Y</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>Y</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>Y</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>er</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>Y</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>er</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>B</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>2</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>B</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>2</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>B</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>.0</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>B</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>.0</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>A</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>A</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>Click here to buy</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>Click here to buy</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>w</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>w</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>w</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>w</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>w . </b></a></p><p> <a href="http://www.abbyy.com/buy"><b>w</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>A B B YY.com</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>.A B BYY.com</b></a></p><p> When converted to a mobi file, I get a bunch of lines that start with "PDF Transform" and then several letters, a couple of "Click here to buy," some more letters and then the "A B B YY.com" and ".A B BYY.com". On my Kindle 2, this makes up about 2.5 pages I have to skip through every 3 or so pages and is annoying. Can someone please tell me what I need to enter in the "Header Regular Expression" box? Thanks in advance!  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#2 | 
| 
			
			
			
			 Banned 
			
			![]() Posts: 82 
				Karma: 10 
				Join Date: Aug 2009 
				
				
				
				Device: Tolino Shine 3 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			I have same issue so if anybody has a fix I would be grateful as well.
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#3 | 
| 
			
			
			
			 Wizard 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,196 
				Karma: 1281258 
				Join Date: Sep 2009 
				
				
				
				Device: PRS-505 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Have you tried just using Notepad on the html and doing a global search/replace on the offending lines? (replace field left blank).
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#4 | 
| 
			
			
			
			 Banned 
			
			![]() Posts: 82 
				Karma: 10 
				Join Date: Aug 2009 
				
				
				
				Device: Tolino Shine 3 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Thats what I have been doing until now, except using textcrawler and a macro, but if it could be removed when imported it would be easier, esp. since I have to run the app in a VM.
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#5 | |
| 
			
			
			
			 Enthusiast 
			
			![]() Posts: 25 
				Karma: 16 
				Join Date: Aug 2009 
				
				
				
				Device: Pocketbook 360, Sony PRS-T1 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
 PDF Transform .+ \.com PDF Transform = Start of text, that should be removed .+ = one or more characters \.com = End of text, that should be removed  | 
|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#6 | 
| 
			
			
			
			 Banned 
			
			![]() Posts: 82 
				Karma: 10 
				Join Date: Aug 2009 
				
				
				
				Device: Tolino Shine 3 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Interestingly it didn't do anything for that, but changing to PDF.+\.com does remove all the junk 
		
	
		
		
		
		
		
		
		
		
		
		
	
	Cheers,must lean regex  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#7 | |
| 
			
			
			
			 Junior Member 
			
			![]() Posts: 4 
				Karma: 10 
				Join Date: Oct 2009 
				
				
				
				Device: Kindle 2 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
 Doesn't work when I put it in Calibre. The text is still there. How did you do it HairyBiker?  | 
|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#8 | 
| 
			
			
			
			 Junior Member 
			
			![]() Posts: 4 
				Karma: 10 
				Join Date: Oct 2009 
				
				
				
				Device: Kindle 2 
				
				
				 | 
	
	|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#9 | 
| 
			
			
			
			 Banned 
			
			![]() Posts: 82 
				Karma: 10 
				Join Date: Aug 2009 
				
				
				
				Device: Tolino Shine 3 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			I just selected the pdf, choose convert, then in the Structure Detection, clicked the Remove Header and put in the "PDF.+\.com" into the box removing the default one. If you click on the wizard it will show you what is being removed.
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#10 | 
| 
			
			
			
			 Junior Member 
			
			![]() Posts: 4 
				Karma: 10 
				Join Date: Oct 2009 
				
				
				
				Device: Kindle 2 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Did all that. Can't get rid of it all.
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#11 | 
| 
			
			
			
			 Banned 
			
			![]() Posts: 82 
				Karma: 10 
				Join Date: Aug 2009 
				
				
				
				Device: Tolino Shine 3 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			strange could you send me a copy of one that doesn't remove?
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#12 | |
| 
			
			
			
			 Wizard 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,196 
				Karma: 1281258 
				Join Date: Sep 2009 
				
				
				
				Device: PRS-505 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
 If you have a large number of files that need identical editing, it might be more efficient to write a script to pass them through grep and then pipe the output to calibre's command-line converter.  | 
|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#13 | 
| 
			
			
			
			 Banned 
			
			![]() Posts: 82 
				Karma: 10 
				Join Date: Aug 2009 
				
				
				
				Device: Tolino Shine 3 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			If I was better at Linux command scripting then that is what I would do, but since I am still learning it ...
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#14 | 
| 
			
			
			
			 Junior Member 
			
			![]() Posts: 9 
				Karma: 10 
				Join Date: Oct 2009 
				
				
				
				Device: prs 505 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Hi, I had a similar problem recently.  (see https://www.mobileread.com/forums/showthread.php?t=59282 ) 
		
	
		
		
		
		
		
		
		
		
		
		
	
	My problem was the regexp editor doesn't show you the text the regex acts on. Try something like: Code: 
	(?ism)<a href="http://www.abbyy.com/buy"><b>(\w|\s)*</b></a>(<br>)?  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#15 | |
| 
			
			
			
			 Zealot 
			
			![]() ![]() Posts: 115 
				Karma: 150 
				Join Date: Jul 2008 
				Location: Netherlands Veenendaal 
				
				
				Device: Palm T5, Sony PRS-505, Nook Color 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
 Problem that I have right now is that it doesn't highlight the text which the regexp works on. The regexp does work when I convert the pdf to epub ![]() I wanted to show what is displayed and what I think should happen but now I'm getting an error: Code: 
	ERROR: ERROR: Unhandled exception: <b>WindowsError</b>:[Error 6] The handle is invalid Traceback (most recent call last): File "site-packages\calibre\gui2\convert\regex_builder.py", line 101, in button_clicked File "site-packages\calibre\gui2\convert\regex_builder.py", line 90, in open_book File "site-packages\calibre\ebooks\oeb\iterator.py", line 141, in __enter__ File "site-packages\calibre\customize\conversion.py", line 208, in __call__ File "site-packages\calibre\ebooks\pdf\input.py", line 33, in convert File "site-packages\calibre\ebooks\pdf\pdftohtml.py", line 49, in pdftohtml File "subprocess.py", line 614, in __init__ File "subprocess.py", line 735, in _get_handles File "subprocess.py", line 761, in _make_inheritable WindowsError: [Error 6] The handle is invalid Regards, Joop  | 
|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
![]()  | 
            
        
            
            
  | 
    
			 
			Similar Threads
		 | 
	||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| removing unwanted pages ABBYY finereader | sovre | Workshop | 3 | 08-04-2011 04:05 AM | 
| Removing Header from .IMP | ronin688 | Fictionwise eBookwise | 2 | 12-12-2010 08:36 PM | 
| Removing a header | pckopp | Calibre | 1 | 12-11-2010 02:33 PM | 
| Removing header syntax. | boromirofborg | Calibre | 0 | 07-21-2010 01:33 AM | 
| PDF Conversion - Removing Header / Footer Text | heb | Sony Reader | 9 | 07-12-2010 12:02 AM |