10-23-2009, 08:25 PM | #16 | |
Wizard
Posts: 1,196
Karma: 1281258
Join Date: Sep 2009
Device: PRS-505
|
Quote:
grep regex input_html_file > edited_file ebook-convert edited_file output_file.mobi You can download a version of grep that will work on Windows systems, so you can do the whole thing in a command window on the same VM. |
|
11-14-2009, 02:41 PM | #17 |
Member
Posts: 16
Karma: 10
Join Date: Nov 2009
Device: Sony PRS-600
|
I'm having the same trouble here. Here's a piece of what I'm working trough:
Code:
There was a lot of art in the supposedly natural falling of women's hair. Her features were even and possessed the particular properties and proportions that appealed to him, though he could not define precisely what these were. His shyness loomed up inside him, so that he did not trust himself to speak. </p><p> "I am Sheen," she said. "I would like to challenge you to a Game." </p><p> <a href="http://www.abbyy.com/buy"><b>PDF Transform</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>PDF Transform</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>Y</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>Y</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>Y</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>er</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>Y</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>er</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>B</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>2</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>B</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>2</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>B</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>.0</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>B</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>.0</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>A</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>A</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>Click here to buy</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>Click here to buy</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>w</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>w</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>w</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>w</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>w</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>o m</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>. </b></a></p><p> <a href="http://www.abbyy.com/buy"><b>w</b></a></p><p> <a href="http://www.abbyy.com/buy"><b>A B B YY.c </b>She could </a>not be a top player. Stile knew every ranking player on every a<a href="http://www.abbyy.com/buy">ge-ladder by sight and style, </a></p><p> <a href="http://www.abbyy.com/buy"><b>.A B BYY.com</b></a></p><p> <a href="http://www.abbyy.com/buy">and she was on no </a>ladder. Therefore she was a dilettante, an occasional participant, possibly of some skill in selected modes but in no way a serious competitor. Her body was too lush for most physical sports; the top females in track, ball games and swimming were small breasted, lean-fleshed, and lanky, and this in no way described Sheen. Therefore he would have no physical competition here. </p><p> Yet she was beautiful, and he was unable to speak. So he nodded acquiescence. She took his arm in an easy gesture of familiarity that startled him. Stile had known women, of course; they came to him seeking the notoriety of his company, and the known fact of his hesitancy lent them compensating</p><p> Code:
<p>\s*<a href="http://www.abbyy.com/buy">(<b>.*?</b>)?|</a>(</p>)? In Calibre, however, this seems not to be working. To narrow it down, I can't manage to select Code:
<p> <a Code:
<p>\s<a Code:
<p>\s?<a Related: can anyone tell me where to find the 'debug' button mentioned at http://calibre.kovidgoyal.net/user_m...l#introduction? Tried starting calibre-debug.exe which does give some insights though in things I am not currently interested in. ...After that, I'm going to have to figure out how to get the document flow right again as every line ends with </p><p>... sigh, why do people insist on using PDF's to share eBooks... after all the trouble they've gone trough to scan and ocr it and all! Last edited by Punksmurf; 11-14-2009 at 04:07 PM. Reason: First posting a good-for-nothing post and then pasting in the text I originally typed doesn't make it all vanish...! |
Advert | |
|
02-24-2010, 05:00 AM | #18 |
Junior Member
Posts: 3
Karma: 10
Join Date: Feb 2010
Device: none
|
Try something like:
<p>\s*<a href="http://www.omrhome.com/">(<b>.*?</b>)?|</a>(</p>)?
|
03-17-2010, 02:50 AM | #19 |
Junior Member
Posts: 3
Karma: 10
Join Date: Feb 2010
Device: none
|
Removing ABBYY header in a PDF
why not try the one below
Last edited by vipulmalhotra; 03-17-2010 at 02:52 AM. |
03-17-2010, 02:51 AM | #20 |
Junior Member
Posts: 3
Karma: 10
Join Date: Feb 2010
Device: none
|
Removing ABBYY header in a PDF
why not try <a href="http://www.gingerwebs.com/"><b>Y</b></a></p><p>
|
Advert | |
|
07-12-2010, 02:55 PM | #21 |
Member
Posts: 12
Karma: 10
Join Date: Jul 2010
Device: Kindle 2
|
Sorry to resurrect an old thread, but did this ever get resolved? I know next to nothing about scripting and/or Calibre (in fact, it rather annoys me), so i don't want to start learning until I know a solution is possible. Thanks for your time.
|
07-20-2010, 01:50 AM | #22 |
Junior Member
Posts: 3
Karma: 10
Join Date: Jul 2010
Device: Kindle 2
|
I just tried the following:
<a href="http://www.abbyy.com/buy">(<b>.*?</b>)?|(</p>)? and it worked fine for me, though it may be a case by case. |
10-10-2010, 05:02 PM | #23 |
Junior Member
Posts: 3
Karma: 10
Join Date: Oct 2010
Device: Kindle
|
Is this a problem in Calibre, or the content of the PDF?
I'm seeing this when I try to transform a PDF into a MOBI file and put it on my Kindle. What's not clear to me from reading this thread is whether this is an issue with the content of the PDF itself that just needs to stripped out, or if Calibre is using the free version of this abbyy.com software that I just need to upgrade.
Sorry for the newb question, but can someone clear up my confusion? |
10-10-2010, 05:18 PM | #24 |
US Navy, Retired
Posts: 9,864
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
|
This is an issue with the content of the PDF itself. Calibre does not add text to your file and uses its own conversion engine.
|
10-11-2010, 02:07 AM | #25 |
Evangelist
Posts: 480
Karma: 270594
Join Date: Aug 2010
Device: palm tx, Windows7, Galaxy A5
|
I have mostly used Mobipocket Creator for my pdf>prc and it seems to automatically strip lot of things people here gripe about - page numbers, text on the top of the page... It creates ms doc and html file together with prc, so I can correct spelling and such if I want to and make new prc.
|
12-18-2010, 03:33 AM | #26 | |
Member
Posts: 12
Karma: 10
Join Date: Jul 2010
Device: Kindle 2
|
Quote:
Oh, also, is there a setting to create a doc file? My standard output files are prc, html, opf, xml, jpg, and a copy of the pdf, no doc file. Last edited by SavalBork; 12-18-2010 at 03:37 AM. |
|
12-18-2010, 04:09 AM | #27 |
US Navy, Retired
Posts: 9,864
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
|
|
12-25-2010, 12:01 AM | #28 |
Groupie
Posts: 156
Karma: 1010345
Join Date: Jun 2009
Device: PRS 350
|
I didn't want to start a new thread to ask almost the same question.
I found one formula to remove headers from ABC Converter pdfs, and it works fine. Now I've unearthed an older pdf with a slightly different header and I can't figure out what I need to change to make it work. If someone could help me, I'd be very appreciative. Here is the new header: ABC Amber Text Converter Trial version, http://www.processtext.com/abctxt.html Here is the header and formula that works: [Generated by ABC Amber LIT Converter, http://www.processtext.com/abclit.html] (<A name=\d+>\s*</a>)?\s*(<[biu][^>]*>)?\s*Generated\s+by\s+(ABC)?\s +Amber[^<]*(<a\shref=.*?processtext.*?>)?\s*(.*?processtext. *?</a>)?(</ [ibu]>)?\s*(<br>\s*)? What do I need to change? I have no idea how to create one of these. |
12-25-2010, 07:46 AM | #29 | |
Wizard
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
|
Quote:
|
|
12-25-2010, 10:34 AM | #30 | |
Well trained by Cats
Posts: 29,802
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
ABBYY went to a lot of effort to make a single Regex NOT remove the TRIAL WARE marking code Switch to Code View in Sigil and you will see that they vary the coding. it just RENDERS the same. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
removing unwanted pages ABBYY finereader | sovre | Workshop | 3 | 08-04-2011 03:05 AM |
Removing Header from .IMP | ronin688 | Fictionwise eBookwise | 2 | 12-12-2010 07:36 PM |
Removing a header | pckopp | Calibre | 1 | 12-11-2010 01:33 PM |
Removing header syntax. | boromirofborg | Calibre | 0 | 07-21-2010 12:33 AM |
PDF Conversion - Removing Header / Footer Text | heb | Sony Reader | 9 | 07-11-2010 11:02 PM |