09-19-2010, 06:02 AM | #16 | |
Wizard
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
Quote:
But the conversion from epub to epub is still having no effect. here's a snippet of converted output from the calibre internal reader: Oh, for heaven's sake," I muttered. Maybe if I'd had drinks with dinner, I'd have run, too, but I doubted it. "Don't be an old stick in the mud," Catherine called back. Generated by ABC Amber LIT Converter, http://www.processtext.com/abclit.html Stick in the mud? I caught u... I reckon I could 1. convert epub to rtf. 2. use MS Word find + replace all to remove spam. 3. convert back from rtf to epub but thats a chore for many books & I'm concerned that a doube conversion could degrade the final epub ? Last edited by cybmole; 09-19-2010 at 06:05 AM. |
|
09-19-2010, 06:14 AM | #17 |
Wizard
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
|
Thanks for removing it. The problem is that it wasn't always wrapped in <p> tags like your first example. Hopefully this will work better:
Code:
(<A name=\d+>\s*</a>)?\s*(<[biu][^>]*>)?\s*Generated\s+by\s+(ABC)?\s+Amber[^<]*(<a\shref=.*?processtext.*?>)?\s*(.*?processtext.*?</a>)?(</[ibu]>)?\s*(<br>\s*)? |
09-19-2010, 06:33 AM | #18 |
Wizard
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
thanks - that worked - it missed just a sinlge instance on the very 1st line of the epub ( so maybe that's in metadata.
I have now found an alternative (mobi) source which I can use for this series, but I wanted to learn how to remove this spam in case I encouter it again in other books. I also have a lit source but it's incomplete - only 1-16 and vols 4 & 16 still have DRM. is LIT generally a "better" source - I'm still learning about formats. I read on Kindle3 ( mobi) but I'm buyin my son a new sony PRs350 ( epub) for his birthday so I will want to put my calibre collection into both output formats. |
09-19-2010, 06:40 AM | #19 |
Wizard
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
|
I wouldn't say Lit is best, generally commercial sources are the best. But for non-commercial sources, Lit and html were the most common source formats. Generally any pdf's, lrf's, epubs, etc floating around out there were originally converted from a Lit or HTML file.
The only format that is really bad is pdf, PDF is difficult in general even from a good pdf source, but when dealing with files like this, lots of information gets lost in the translation to pdf and back. |
09-19-2010, 06:51 AM | #20 | |
Wizard
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
Quote:
just one final observation - I wondered how you deduced that my example had gone LIt - pdf & back again. I'd guessed that the LIT converter had been used to go from lit to epub... but when I actulally follow that spam link, I see that lit converter does not do epub, so now I see how you reached your conclusion. thus my new rule of thumb will be IF encounter amber lit spam THEN look for better source :-) |
|
09-19-2010, 07:02 AM | #21 |
Wizard
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
|
You've got the right idea.
Amber Lit converter is a piece of trialware that converts from Lit to pdf, the free version inserts those spam messages. The tags in your document used the name 'calibre' in its' class names, which indicates that someone had used calibre to convert from the ABC pdf to the epub you downloaded. |
09-19-2010, 10:23 PM | #22 |
Member
Posts: 12
Karma: 10
Join Date: Sep 2010
Device: Kindle 3
|
HI there!
I found this post very interesting since I have the same problem!! But unfortunately I do not understand anything Reading your post...I think I will try to click the "remove heater" and copy this: file:///.+(\d|(txt|html|htm)) in the dialog box bellow and it will work?? It happens like you said that I have the name of the author and the page number in the middle of the text and I would love to remove that when I convert PDF to MOBI. So If someone could explain me how to do it I would really appreciate it! Thank you very much to everyone! |
09-20-2010, 04:50 AM | #23 | |
Wizard
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
|
Quote:
|
|
09-26-2010, 08:35 PM | #24 |
Enthusiast
Posts: 27
Karma: 10
Join Date: Aug 2010
Location: Ontario Canada
Device: Kindle 2; Kindle WIFI 6";IPAD 2
|
page number
I have tried a number of examples and have had some luck but how to I remove the following type of page numbers
<br>1/451<br> I tried <br>*/451 I tried <br>+\d etc |
09-26-2010, 08:52 PM | #25 |
Well trained by Cats
Posts: 29,799
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
|
09-26-2010, 09:02 PM | #26 | |
US Navy, Retired
Posts: 9,864
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
|
Quote:
|
|
09-27-2010, 11:03 AM | #27 |
Enthusiast
Posts: 27
Karma: 10
Join Date: Aug 2010
Location: Ontario Canada
Device: Kindle 2; Kindle WIFI 6";IPAD 2
|
shor tutorial tried no luck
yes, I have tried the tutorial it was helpful for most just not this page number issue, or I am just not getting it.
|
09-27-2010, 11:07 AM | #28 |
Wizard
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
|
Without intending any offense, at least in one point, it's the latter: You should have a look at where you put your quantifiers (they repeat the preceding characters). The tutorial actually discusses a case very similar to yours when quantifiers are introduced. You should be able to use the regexp developed there with minor adjustments.
|
09-27-2010, 11:08 AM | #29 |
Wizard
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
|
Try:
Code:
<br>\s*\d+\/\d+\s*<br> |
09-27-2010, 11:12 AM | #30 |
Well trained by Cats
Posts: 29,799
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Can you copy and paste a sample string into a MR code block?
The <br> tag appear to be older HTML convention style <br /> |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Regex help to remove HTML footer | neonbible | Calibre | 4 | 09-09-2010 09:42 AM |
Regex to remove header from PDF | neonbible | Calibre | 4 | 09-07-2010 10:08 AM |
Removing header and footer | radicalnomad | Calibre | 2 | 08-26-2010 10:34 AM |
Header/Footer removal | Solicitous | Calibre | 2 | 03-30-2010 05:53 AM |
Multiline Regex Footer | hover | Calibre | 10 | 02-03-2010 04:23 AM |