Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 05-06-2015, 12:44 PM   #1
msshain
Junior Member
msshain began at the beginning.
 
Posts: 1
Karma: 10
Join Date: May 2015
Device: none
Calibre - How to erase page? numbers after heuristic processing

The heuristic processing worked great to unify paragraphs converting PDF to ePub. I am getting numbers at various intervals though. Please see example (24 & 25) below:

nonrealistic view suggested by quantum theory. 24 Einstein protested: “I cannot seriously believe in [the quantum theory] because it cannot be reconciled with the idea that physics should represent a reality in time and space, free from spooky actions at a distance.” 25 It was in a discussion of the EPR paper that Erwin Schrödinger first coined the term “entanglement.”

Any ideas how to omit these, thanks.

Last edited by msshain; 05-06-2015 at 02:00 PM. Reason: Improve title
msshain is offline   Reply With Quote
Old 05-06-2015, 01:10 PM   #2
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Sounds like page numbers. You will need to add a regex under Search and Replace to get rid of those.
eschwartz is offline   Reply With Quote
Advert
Old 05-06-2015, 04:23 PM   #3
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 30,932
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by msshain View Post
The heuristic processing worked great to unify paragraphs converting PDF to ePub. I am getting numbers at various intervals though. Please see example (24 & 25) below:

nonrealistic view suggested by quantum theory. 24 Einstein protested: “I cannot seriously believe in [the quantum theory] because it cannot be reconciled with the idea that physics should represent a reality in time and space, free from spooky actions at a distance.” 25 It was in a discussion of the EPR paper that Erwin Schrödinger first coined the term “entanglement.”

Any ideas how to omit these, thanks.
Your example is page numbers embedded within normal text (A very bad OCR).

This is a slightly tedious EDITOR job, not a conversion job.

REGEX in a conversion expects a FIXED pattern to the Page # appearance.
Long Winded 56
57 Short Story
Long Winded 103
104 Short Story

When it is (semi) random, you need to step through each find (there will be many patterns to find. you create a unique REGEX for each pattern you discover.

BTW This is probably a case to NOT have Heuristics clean up. The page pattern might have been easier to discover before the attempt to join lines. Every PDF is unique in the issues presented (see the sticky about PDF)
theducks is offline   Reply With Quote
Old 05-06-2015, 05:17 PM   #4
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
theducks -- if you use S&R in the conversion settings, it operates before line unwrapping. Handy.

Of course, you lose the ability to step through each match and confirm.

There are advantages either way.
eschwartz is offline   Reply With Quote
Old 05-06-2015, 06:00 PM   #5
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 30,932
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by eschwartz View Post
theducks -- if you use S&R in the conversion settings, it operates before line unwrapping. Handy.

Of course, you lose the ability to step through each match and confirm.

There are advantages either way.
I am a WYSWYG make a REGEX kind of guy, so I use a Editor and forgo the
theducks is offline   Reply With Quote
Advert
Old 05-06-2015, 06:50 PM   #6
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
You do actully get to preview the parsed xhtml that is extracted from the PDF -- it is part of the S&R wizard. So it isn't as dangerous as it could be.
eschwartz is offline   Reply With Quote
Old 05-07-2015, 05:27 AM   #7
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
Zapping numbers without manual checks is bad for dates, and guns, and times
E.g. He was shot in '66 with a colt 45. At 11am
We called 911 but.....
cybmole is offline   Reply With Quote
Old 05-07-2015, 06:05 AM   #8
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 21,657
Karma: 29711016
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Actually they could be reference link numbers - is there a numbered reference list at the back of the book, and do the numbers embedded in the text bear any relationship to the reference with the same number.

I suspect those quotes may be from the correspondence between E and S on the latter's thought experiments on cats in boxes and all that, and E's statement claiming God doesn't play dice etc.

If they are such - then you might want to 'fix' them in the editor by recreating the links.

BR
BetterRed is offline   Reply With Quote
Old 05-07-2015, 11:20 AM   #9
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Quote:
Originally Posted by cybmole View Post
Zapping numbers without manual checks is bad for dates, and guns, and times
E.g. He was shot in '66 with a colt 45. At 11am
We called 911 but.....
The right regex would look for numbers wrapped in the pre-tested footer XHTML.
eschwartz is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Page numbers kaufman Library Management 1 10-05-2014 04:26 AM
Chapter Page Numbers Instead of Title Page Numbers TheArtfulDodger Devices 1 11-18-2013 01:08 PM
Kindle (AZW3/MOBI) ebooks with "real page numbers" to PDF with same page numbers? abvgd Conversion 2 05-24-2013 01:24 PM
PRS-T1 Can you make page-numbers correspond to page-turns? bibahbuzemann Sony Reader 13 01-01-2012 12:03 AM
Is there a hack for displaying page numbers rather than location numbers? nesler Kindle Developer's Corner 16 02-15-2011 12:00 AM


All times are GMT -4. The time now is 04:09 AM.


MobileRead.com is a privately owned, operated and funded community.