Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 09-19-2010, 06:02 AM   #16
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
Quote:
Originally Posted by ldolse View Post
I thought you were converting from pdf to epub?

Code:
(<A name=\d+>\s*</a>|<p[^>]*>)\s*(<[biu][^>]*>)?\s*Generated\s+by\s+(ABC)?\s+Amber[^<]*(<a\shref=.*?processtext.*?>)?\s*(.*?processtext.*?</a>)?(</[ibu]>)?\s*(<br>|</p>)
That should take care of more variants of the spam, including yours.

Please remove the book from your previous post - it doesn't matter whether you own it, the problem is posting it to a public bulletin board that doesn't condone piracy.
OK I've removed the attachement

But the conversion from epub to epub is still having no effect. here's a snippet of converted output from the calibre internal reader:

Oh, for heaven's sake," I muttered. Maybe if I'd had drinks with dinner, I'd have run, too, but I doubted it.

"Don't be an old stick in the mud," Catherine called back. Generated by ABC Amber LIT Converter, http://www.processtext.com/abclit.html

Stick in the mud? I caught u...


I reckon I could 1. convert epub to rtf. 2. use MS Word find + replace all to remove spam. 3. convert back from rtf to epub but thats a chore for many books & I'm concerned that a doube conversion could degrade the final epub ?

Last edited by cybmole; 09-19-2010 at 06:05 AM.
cybmole is offline   Reply With Quote
Old 09-19-2010, 06:14 AM   #17
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Thanks for removing it. The problem is that it wasn't always wrapped in <p> tags like your first example. Hopefully this will work better:
Code:
(<A name=\d+>\s*</a>)?\s*(<[biu][^>]*>)?\s*Generated\s+by\s+(ABC)?\s+Amber[^<]*(<a\shref=.*?processtext.*?>)?\s*(.*?processtext.*?</a>)?(</[ibu]>)?\s*(<br>\s*)?
Note the file is still going to be plenty messed up, it's been through multiple layers of munging with garbage in it - lit -> ABC pdf -> Calbre -> Calibre. You're better off trying to find a source for the original Lit.
ldolse is offline   Reply With Quote
Advert
Old 09-19-2010, 06:33 AM   #18
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
thanks - that worked - it missed just a sinlge instance on the very 1st line of the epub ( so maybe that's in metadata.

I have now found an alternative (mobi) source which I can use for this series, but I wanted to learn how to remove this spam in case I encouter it again in other books.

I also have a lit source but it's incomplete - only 1-16 and vols 4 & 16 still have DRM.

is LIT generally a "better" source - I'm still learning about formats. I read on Kindle3 ( mobi) but I'm buyin my son a new sony PRs350 ( epub) for his birthday so I will want to put my calibre collection into both output formats.
cybmole is offline   Reply With Quote
Old 09-19-2010, 06:40 AM   #19
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
I wouldn't say Lit is best, generally commercial sources are the best. But for non-commercial sources, Lit and html were the most common source formats. Generally any pdf's, lrf's, epubs, etc floating around out there were originally converted from a Lit or HTML file.

The only format that is really bad is pdf, PDF is difficult in general even from a good pdf source, but when dealing with files like this, lots of information gets lost in the translation to pdf and back.
ldolse is offline   Reply With Quote
Old 09-19-2010, 06:51 AM   #20
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
Quote:
Originally Posted by ldolse View Post
I wouldn't say Lit is best, generally commercial sources are the best. But for non-commercial sources, Lit and html were the most common source formats. Generally any pdf's, lrf's, epubs, etc floating around out there were originally converted from a Lit or HTML file.

The only format that is really bad is pdf, PDF is difficult in general even from a good pdf source, but when dealing with files like this, lots of information gets lost in the translation to pdf and back.
thanks - I'd discovered by experiment that PDFs are difficult.

just one final observation - I wondered how you deduced that my example had gone LIt - pdf & back again. I'd guessed that the LIT converter had been used to go from lit to epub...

but when I actulally follow that spam link, I see that lit converter does not do epub, so now I see how you reached your conclusion.

thus my new rule of thumb will be IF encounter amber lit spam THEN look for better source :-)
cybmole is offline   Reply With Quote
Advert
Old 09-19-2010, 07:02 AM   #21
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
You've got the right idea.

Amber Lit converter is a piece of trialware that converts from Lit to pdf, the free version inserts those spam messages.

The tags in your document used the name 'calibre' in its' class names, which indicates that someone had used calibre to convert from the ABC pdf to the epub you downloaded.
ldolse is offline   Reply With Quote
Old 09-19-2010, 10:23 PM   #22
derfel_spain
Member
derfel_spain began at the beginning.
 
Posts: 12
Karma: 10
Join Date: Sep 2010
Device: Kindle 3
HI there!

I found this post very interesting since I have the same problem!! But unfortunately I do not understand anything

Reading your post...I think I will try to click the "remove heater" and copy this: file:///.+(\d|(txt|html|htm)) in the dialog box bellow and it will work??

It happens like you said that I have the name of the author and the page number in the middle of the text and I would love to remove that when I convert PDF to MOBI. So If someone could explain me how to do it I would really appreciate it!

Thank you very much to everyone!
derfel_spain is offline   Reply With Quote
Old 09-20-2010, 04:50 AM   #23
Manichean
Wizard
Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Manichean's Avatar
 
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Quote:
Originally Posted by derfel_spain View Post
HI there!

I found this post very interesting since I have the same problem!! But unfortunately I do not understand anything

Reading your post...I think I will try to click the "remove heater" and copy this: file:///.+(\d|(txt|html|htm)) in the dialog box bellow and it will work??

It happens like you said that I have the name of the author and the page number in the middle of the text and I would love to remove that when I convert PDF to MOBI. So If someone could explain me how to do it I would really appreciate it!

Thank you very much to everyone!
If your header is a "file:///..."-type header, then yes, you copy the above where you said. If not, you need to create a new regular expression to describe the header. I've tried to explain how to do that in this post.
Manichean is offline   Reply With Quote
Old 09-26-2010, 08:35 PM   #24
PCreighton
Enthusiast
PCreighton began at the beginning.
 
PCreighton's Avatar
 
Posts: 27
Karma: 10
Join Date: Aug 2010
Location: Ontario Canada
Device: Kindle 2; Kindle WIFI 6";IPAD 2
page number

I have tried a number of examples and have had some luck but how to I remove the following type of page numbers
<br>1/451<br>
I tried <br>*/451 I tried <br>+\d etc
PCreighton is offline   Reply With Quote
Old 09-26-2010, 08:52 PM   #25
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,689
Karma: 54369090
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by PCreighton View Post
I have tried a number of examples and have had some luck but how to I remove the following type of page numbers
<br>1/451<br>
I tried <br>*/451 I tried <br>+\d etc
try escaping the forward slash with a back slash
theducks is offline   Reply With Quote
Old 09-26-2010, 09:02 PM   #26
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 9,864
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
Quote:
Originally Posted by PCreighton View Post
I have tried a number of examples and have had some luck but how to I remove the following type of page numbers
<br>1/451<br>
I tried <br>*/451 I tried <br>+\d etc
Check out this short tutorial, it may help.
DoctorOhh is offline   Reply With Quote
Old 09-27-2010, 11:03 AM   #27
PCreighton
Enthusiast
PCreighton began at the beginning.
 
PCreighton's Avatar
 
Posts: 27
Karma: 10
Join Date: Aug 2010
Location: Ontario Canada
Device: Kindle 2; Kindle WIFI 6";IPAD 2
shor tutorial tried no luck

yes, I have tried the tutorial it was helpful for most just not this page number issue, or I am just not getting it.
PCreighton is offline   Reply With Quote
Old 09-27-2010, 11:07 AM   #28
Manichean
Wizard
Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Manichean's Avatar
 
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Quote:
Originally Posted by PCreighton View Post
yes, I have tried the tutorial it was helpful for most just not this page number issue, or I am just not getting it.
Without intending any offense, at least in one point, it's the latter: You should have a look at where you put your quantifiers (they repeat the preceding characters). The tutorial actually discusses a case very similar to yours when quantifiers are introduced. You should be able to use the regexp developed there with minor adjustments.
Manichean is offline   Reply With Quote
Old 09-27-2010, 11:08 AM   #29
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Try:
Code:
<br>\s*\d+\/\d+\s*<br>
ldolse is offline   Reply With Quote
Old 09-27-2010, 11:12 AM   #30
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,689
Karma: 54369090
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Can you copy and paste a sample string into a MR code block?
The <br> tag appear to be older HTML convention style <br />
theducks is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Regex help to remove HTML footer neonbible Calibre 4 09-09-2010 09:42 AM
Regex to remove header from PDF neonbible Calibre 4 09-07-2010 10:08 AM
Removing header and footer radicalnomad Calibre 2 08-26-2010 10:34 AM
Header/Footer removal Solicitous Calibre 2 03-30-2010 05:53 AM
Multiline Regex Footer hover Calibre 10 02-03-2010 04:23 AM


All times are GMT -4. The time now is 04:05 AM.


MobileRead.com is a privately owned, operated and funded community.