10-11-2012, 02:34 PM | #1 |
Addict
Posts: 224
Karma: 10
Join Date: Jul 2012
Device: Kindle
|
Those pesky "file:///" headers/footers
Hola!
Regex/tag code noob here!! I successfully implemented the "ABC Amber" solution, but I simply cannot find a solution in Calibre for these header/footer issues. The input file is PDF. If I convert the file to HTMLZ, then look at the source html info, the footer info looks like this(c&P, I altered only the book title info to protect the innocent!): Code:
<p class="calibre1">|/eMaaa/Inbbb/Chccc,%20C.J.%20-%20Tddd%20of%20Seee%20and%20Jfff,%20The%20v2.htm (2 of 230)15-8-2005 22:23:09</p> Code:
file:///H|/eMaaa/Inbbb/Chccc,%20C.J.%20-%20Tddd%20of%20Seee%20and%20Jfff,%20The%20v2.htm (1 of 230)15-8-2005 22:23:09 Code:
file:///.+\d Code:
<p.*?> and <b.*?> I would like to keep any "normal" headers/footers, so 'cropping' or 'stripping' ALL headers/footers is a last resort. So either I am using the builder feature improperly, or my regex codes are missing the mark totally!...Or both, LoL. Thanks for any pointers! MontyJ Update: I found a long way around most of the problem: 1. Add this code in the conversion SEARCH regex definition- Code:
<p.*?>file:///\S\|.*?</p> 3. Then "Add book" and select the HTMLZ file 4. Select CONVERT to EPUB, MOBI, or whatever The offending header/footer code is now gone. However, I still have a few artifacts left that are left over, like the occasional "|" character. An "empty string" is left over in the html file, and this may be the issue. This code DOES NOT WORK directly on EPUB or MOBI files as is, so need to figure that out as well to save that extra conversion step! Last edited by MontyJ; 10-11-2012 at 11:50 PM. Reason: Update |
10-12-2012, 11:24 PM | #2 |
null operator (he/him)
Posts: 20,457
Karma: 26645808
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
@MontyJ - did you try doing a convert from PDF to PRC using the MobiCreator program - my experience is that 90% of the time gets rid of headers and footers without any effort on my part.
Then you can convert the PRC to EPUB and use Sigil or whatever to tidy up loose ends. BR |
Advert | |
|
10-12-2012, 11:53 PM | #3 |
Addict
Posts: 224
Karma: 10
Join Date: Jul 2012
Device: Kindle
|
Thanks BR, will look into that.
There is one other "insert obnoxious ad" outfit that does not seem to have a solution, however. it is ABB YYY or something like that; has big yellow images in every corner of a page with a clickable "Buy Now" link. While most of the image can be automatically removed with regex code like the above, they somehow embed the closing TAGS in the last section of the image within normal text, and it is random, not predictable. So if you look for a simple string with opening and closing tags and remove them, you remove some amount of normal text as well. Since it is a random process, the only way to get it out is a manual edit of every page. |
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Custom column: "Updated date", when adding new "versions" of the same file? | enriquep | Library Management | 16 | 11-03-2011 10:46 AM |
Removing Headers and Footers Here's What I Did | allowingtoo | Workshop | 0 | 02-16-2011 08:46 PM |
File names with "(" and ")" can cause screen freezes | greenapple | Ectaco jetBook | 5 | 02-04-2010 08:25 PM |
Help! the "Make Sony Reader File" under "Options" is different | Dr. Drib | Sony Reader | 6 | 04-23-2007 02:56 AM |