Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 10-23-2009, 08:25 PM   #16
charleski
Wizard
charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.charleski ought to be getting tired of karma fortunes by now.
 
Posts: 1,196
Karma: 1281258
Join Date: Sep 2009
Device: PRS-505
Quote:
Originally Posted by hairybiker View Post
If I was better at Linux command scripting then that is what I would do, but since I am still learning it ...
Well the hard part about grep is just working out what regex you need to use. If you've sorted out the exact form of the regex that fixes your html, then you just need to use
grep regex input_html_file > edited_file
ebook-convert edited_file output_file.mobi

You can download a version of grep that will work on Windows systems, so you can do the whole thing in a command window on the same VM.
charleski is offline   Reply With Quote
Old 11-14-2009, 02:41 PM   #17
Punksmurf
Member
Punksmurf began at the beginning.
 
Posts: 16
Karma: 10
Join Date: Nov 2009
Device: Sony PRS-600
I'm having the same trouble here. Here's a piece of what I'm working trough:

Code:
There was a lot of art in the supposedly natural falling of women's hair. Her features were even and possessed the particular properties and proportions that appealed to him, though he could not define precisely what these were. His shyness loomed up inside him, so that he did not trust himself to speak. </p><p>
"I am Sheen," she said. "I would like to challenge you to a Game." </p><p>
<a href="http://www.abbyy.com/buy"><b>PDF Transform</b></a></p><p>
<a href="http://www.abbyy.com/buy"><b>PDF Transform</b></a></p><p>
<a href="http://www.abbyy.com/buy"><b>Y</b></a></p><p>
<a href="http://www.abbyy.com/buy"><b>Y</b></a></p><p>
<a href="http://www.abbyy.com/buy"><b>Y</b></a></p><p>
<a href="http://www.abbyy.com/buy"><b>er</b></a></p><p>
<a href="http://www.abbyy.com/buy"><b>Y</b></a></p><p>
<a href="http://www.abbyy.com/buy"><b>er</b></a></p><p>
<a href="http://www.abbyy.com/buy"><b>B</b></a></p><p>
<a href="http://www.abbyy.com/buy"><b>2</b></a></p><p>
<a href="http://www.abbyy.com/buy"><b>B</b></a></p><p>
<a href="http://www.abbyy.com/buy"><b>2</b></a></p><p>
<a href="http://www.abbyy.com/buy"><b>B</b></a></p><p>
<a href="http://www.abbyy.com/buy"><b>.0</b></a></p><p>
<a href="http://www.abbyy.com/buy"><b>B</b></a></p><p>
<a href="http://www.abbyy.com/buy"><b>.0</b></a></p><p>
<a href="http://www.abbyy.com/buy"><b>A</b></a></p><p>
<a href="http://www.abbyy.com/buy"><b>A</b></a></p><p>
<a href="http://www.abbyy.com/buy"><b>Click here to buy</b></a></p><p>
<a href="http://www.abbyy.com/buy"><b>Click here to buy</b></a></p><p>
<a href="http://www.abbyy.com/buy"><b>w</b></a></p><p>
<a href="http://www.abbyy.com/buy"><b>w</b></a></p><p>
<a href="http://www.abbyy.com/buy"><b>w</b></a></p><p>
<a href="http://www.abbyy.com/buy"><b>w</b></a></p><p>
<a href="http://www.abbyy.com/buy"><b>w</b></a></p><p>
<a href="http://www.abbyy.com/buy"><b>o m</b></a></p><p>
<a href="http://www.abbyy.com/buy"><b>. </b></a></p><p>
<a href="http://www.abbyy.com/buy"><b>w</b></a></p><p>
<a href="http://www.abbyy.com/buy"><b>A B B YY.c </b>She could </a>not be a top player. Stile knew every ranking player on every a<a href="http://www.abbyy.com/buy">ge-ladder by sight and style, </a></p><p>
<a href="http://www.abbyy.com/buy"><b>.A B BYY.com</b></a></p><p>
<a href="http://www.abbyy.com/buy">and she was on no </a>ladder. Therefore she was a dilettante, an occasional participant, possibly of some skill in selected modes but in no way a serious competitor. Her body was too lush for most physical sports; the top females in track, ball games and swimming were small breasted, lean-fleshed, and lanky, and this in no way described Sheen. Therefore he would have no physical competition here. </p><p>
Yet she was beautiful, and he was unable to speak. So he nodded acquiescence. She took his arm in an easy gesture of familiarity that startled him. Stile had known women, of course; they came to him seeking the notoriety of his company, and the known fact of his hesitancy lent them compensating</p><p>
I've a little experience with regexp, but I am by no means a pro. However, I managed to come up with the following:

Code:
<p>\s*<a href="http://www.abbyy.com/buy">(<b>.*?</b>)?|</a>(</p>)?
...which works perfectly over at our friends at http://regexpal.com (which is a lot quicker to work with than Calibre).

In Calibre, however, this seems not to be working. To narrow it down, I can't manage to select

Code:
<p>
<a
even with

Code:
<p>\s<a
or even
Code:
<p>\s?<a
Does anyone of you have any idea? Thank you and all!

Related: can anyone tell me where to find the 'debug' button mentioned at http://calibre.kovidgoyal.net/user_m...l#introduction? Tried starting calibre-debug.exe which does give some insights though in things I am not currently interested in.


...After that, I'm going to have to figure out how to get the document flow right again as every line ends with </p><p>... sigh, why do people insist on using PDF's to share eBooks... after all the trouble they've gone trough to scan and ocr it and all!

Last edited by Punksmurf; 11-14-2009 at 04:07 PM. Reason: First posting a good-for-nothing post and then pasting in the text I originally typed doesn't make it all vanish...!
Punksmurf is offline   Reply With Quote
Advert
Old 02-24-2010, 05:00 AM   #18
vipulmalhotra
Junior Member
vipulmalhotra began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Feb 2010
Device: none
Try something like:

<p>\s*<a href="http://www.omrhome.com/">(<b>.*?</b>)?|</a>(</p>)?
vipulmalhotra is offline   Reply With Quote
Old 03-17-2010, 02:50 AM   #19
vipulmalhotra
Junior Member
vipulmalhotra began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Feb 2010
Device: none
Removing ABBYY header in a PDF

why not try the one below

Last edited by vipulmalhotra; 03-17-2010 at 02:52 AM.
vipulmalhotra is offline   Reply With Quote
Old 03-17-2010, 02:51 AM   #20
vipulmalhotra
Junior Member
vipulmalhotra began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Feb 2010
Device: none
Removing ABBYY header in a PDF

why not try <a href="http://www.gingerwebs.com/"><b>Y</b></a></p><p>
vipulmalhotra is offline   Reply With Quote
Advert
Old 07-12-2010, 02:55 PM   #21
SavalBork
Member
SavalBork began at the beginning.
 
Posts: 12
Karma: 10
Join Date: Jul 2010
Device: Kindle 2
Sorry to resurrect an old thread, but did this ever get resolved? I know next to nothing about scripting and/or Calibre (in fact, it rather annoys me), so i don't want to start learning until I know a solution is possible. Thanks for your time.
SavalBork is offline   Reply With Quote
Old 07-20-2010, 01:50 AM   #22
radicalnomad
Junior Member
radicalnomad began at the beginning.
 
radicalnomad's Avatar
 
Posts: 3
Karma: 10
Join Date: Jul 2010
Device: Kindle 2
I just tried the following:

<a href="http://www.abbyy.com/buy">(<b>.*?</b>)?|(</p>)?

and it worked fine for me, though it may be a case by case.
radicalnomad is offline   Reply With Quote
Old 10-10-2010, 05:02 PM   #23
romanov99
Junior Member
romanov99 began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Oct 2010
Device: Kindle
Is this a problem in Calibre, or the content of the PDF?

I'm seeing this when I try to transform a PDF into a MOBI file and put it on my Kindle. What's not clear to me from reading this thread is whether this is an issue with the content of the PDF itself that just needs to stripped out, or if Calibre is using the free version of this abbyy.com software that I just need to upgrade.

Sorry for the newb question, but can someone clear up my confusion?
romanov99 is offline   Reply With Quote
Old 10-10-2010, 05:18 PM   #24
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 9,864
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
Quote:
Originally Posted by romanov99 View Post
What's not clear to me from reading this thread is whether this is an issue with the content of the PDF itself that just needs to stripped out,
This is an issue with the content of the PDF itself. Calibre does not add text to your file and uses its own conversion engine.
DoctorOhh is offline   Reply With Quote
Old 10-11-2010, 02:07 AM   #25
travger
Evangelist
travger ought to be getting tired of karma fortunes by now.travger ought to be getting tired of karma fortunes by now.travger ought to be getting tired of karma fortunes by now.travger ought to be getting tired of karma fortunes by now.travger ought to be getting tired of karma fortunes by now.travger ought to be getting tired of karma fortunes by now.travger ought to be getting tired of karma fortunes by now.travger ought to be getting tired of karma fortunes by now.travger ought to be getting tired of karma fortunes by now.travger ought to be getting tired of karma fortunes by now.travger ought to be getting tired of karma fortunes by now.
 
travger's Avatar
 
Posts: 480
Karma: 270594
Join Date: Aug 2010
Device: palm tx, Windows7, Galaxy A5
I have mostly used Mobipocket Creator for my pdf>prc and it seems to automatically strip lot of things people here gripe about - page numbers, text on the top of the page... It creates ms doc and html file together with prc, so I can correct spelling and such if I want to and make new prc.
travger is offline   Reply With Quote
Old 12-18-2010, 03:33 AM   #26
SavalBork
Member
SavalBork began at the beginning.
 
Posts: 12
Karma: 10
Join Date: Jul 2010
Device: Kindle 2
Quote:
Originally Posted by travger View Post
I have mostly used Mobipocket Creator for my pdf>prc and it seems to automatically strip lot of things people here gripe about - page numbers, text on the top of the page... It creates ms doc and html file together with prc, so I can correct spelling and such if I want to and make new prc.
I use the same program; however, I still get a "W_Click here to buy_W" twice on each page; did you find a way around that? Thank you.

Oh, also, is there a setting to create a doc file? My standard output files are prc, html, opf, xml, jpg, and a copy of the pdf, no doc file.

Last edited by SavalBork; 12-18-2010 at 03:37 AM.
SavalBork is offline   Reply With Quote
Old 12-18-2010, 04:09 AM   #27
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 9,864
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
Quote:
Originally Posted by SavalBork View Post
Oh, also, is there a setting to create a doc file?
No, doc format is not supported.
DoctorOhh is offline   Reply With Quote
Old 12-25-2010, 12:01 AM   #28
vulcan_girl
Groupie
vulcan_girl ought to be getting tired of karma fortunes by now.vulcan_girl ought to be getting tired of karma fortunes by now.vulcan_girl ought to be getting tired of karma fortunes by now.vulcan_girl ought to be getting tired of karma fortunes by now.vulcan_girl ought to be getting tired of karma fortunes by now.vulcan_girl ought to be getting tired of karma fortunes by now.vulcan_girl ought to be getting tired of karma fortunes by now.vulcan_girl ought to be getting tired of karma fortunes by now.vulcan_girl ought to be getting tired of karma fortunes by now.vulcan_girl ought to be getting tired of karma fortunes by now.vulcan_girl ought to be getting tired of karma fortunes by now.
 
Posts: 156
Karma: 1010345
Join Date: Jun 2009
Device: PRS 350
I didn't want to start a new thread to ask almost the same question.

I found one formula to remove headers from ABC Converter pdfs, and it works fine. Now I've unearthed an older pdf with a slightly different header and I can't figure out what I need to change to make it work. If someone could help me, I'd be very appreciative.

Here is the new header:
ABC Amber Text Converter Trial version, http://www.processtext.com/abctxt.html

Here is the header and formula that works:

[Generated by ABC Amber LIT Converter,
http://www.processtext.com/abclit.html]


(<A name=\d+>\s*</a>)?\s*(<[biu][^>]*>)?\s*Generated\s+by\s+(ABC)?\s
+Amber[^<]*(<a\shref=.*?processtext.*?>)?\s*(.*?processtext. *?</a>)?(</
[ibu]>)?\s*(<br>\s*)?

What do I need to change? I have no idea how to create one of these.
vulcan_girl is offline   Reply With Quote
Old 12-25-2010, 07:46 AM   #29
Manichean
Wizard
Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Manichean's Avatar
 
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Quote:
Originally Posted by vulcan_girl View Post
I didn't want to start a new thread to ask almost the same question.

I found one formula to remove headers from ABC Converter pdfs, and it works fine. Now I've unearthed an older pdf with a slightly different header and I can't figure out what I need to change to make it work. If someone could help me, I'd be very appreciative.

Here is the new header:
ABC Amber Text Converter Trial version, http://www.processtext.com/abctxt.html

Here is the header and formula that works:

[Generated by ABC Amber LIT Converter,
http://www.processtext.com/abclit.html]


(<A name=\d+>\s*</a>)?\s*(<[biu][^>]*>)?\s*Generated\s+by\s+(ABC)?\s
+Amber[^<]*(<a\shref=.*?processtext.*?>)?\s*(.*?processtext. *?</a>)?(</
[ibu]>)?\s*(<br>\s*)?

What do I need to change? I have no idea how to create one of these.
These "formulas" are called regular expressions and are, generally, just a way to describe texts. Your problem might be that you need to describe what the header looks like in the XHTML intermediate stage Calibre produces while converting. Personally, I'd recommend that you try to follow the tutorial from the manual. If you still have questions after that, ask.
Manichean is offline   Reply With Quote
Old 12-25-2010, 10:34 AM   #30
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,689
Karma: 54369090
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by Manichean View Post
These "formulas" are called regular expressions and are, generally, just a way to describe texts. Your problem might be that you need to describe what the header looks like in the XHTML intermediate stage Calibre produces while converting. Personally, I'd recommend that you try to follow the tutorial from the manual. If you still have questions after that, ask.
Personally, I just wait and do it in Sigil, where I can see the target of my 'find' (and fix the #*&*error in my replace, before it propagates).

ABBYY went to a lot of effort to make a single Regex NOT remove the TRIAL WARE marking code

Switch to Code View in Sigil and you will see that they vary the coding.
it just RENDERS the same.
theducks is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
removing unwanted pages ABBYY finereader sovre Workshop 3 08-04-2011 03:05 AM
Removing Header from .IMP ronin688 Fictionwise eBookwise 2 12-12-2010 07:36 PM
Removing a header pckopp Calibre 1 12-11-2010 01:33 PM
Removing header syntax. boromirofborg Calibre 0 07-21-2010 12:33 AM
PDF Conversion - Removing Header / Footer Text heb Sony Reader 9 07-11-2010 11:02 PM


All times are GMT -4. The time now is 03:15 AM.


MobileRead.com is a privately owned, operated and funded community.