Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 06-16-2020, 09:24 AM   #1
Blaineoreski
Zealot
Blaineoreski exercises by bench pressing the entire Harry Potter series in hardcoverBlaineoreski exercises by bench pressing the entire Harry Potter series in hardcoverBlaineoreski exercises by bench pressing the entire Harry Potter series in hardcoverBlaineoreski exercises by bench pressing the entire Harry Potter series in hardcoverBlaineoreski exercises by bench pressing the entire Harry Potter series in hardcoverBlaineoreski exercises by bench pressing the entire Harry Potter series in hardcoverBlaineoreski exercises by bench pressing the entire Harry Potter series in hardcoverBlaineoreski exercises by bench pressing the entire Harry Potter series in hardcoverBlaineoreski exercises by bench pressing the entire Harry Potter series in hardcoverBlaineoreski exercises by bench pressing the entire Harry Potter series in hardcoverBlaineoreski exercises by bench pressing the entire Harry Potter series in hardcover
 
Blaineoreski's Avatar
 
Posts: 110
Karma: 16268
Join Date: Apr 2020
Device: none
Question Convert Epub > RTF or TXT: Any Way To Eliminate Page Numbers?

Merged this thread - 800 - Page Long Book: After Conversion Every Line Has Words Stuck Together Post #5

Hi,

Converting a book from epub to either txt or rtf the physical page numbers persist. So that, even though you just have a scrolling screen of text, every 6 paragraphs or so your'll have a line with just

16

Is there any way to set the conversion variables so that the output of

txt
rtf

files excise the page numbers?

I was thinking could it be done via regular expressions?

Or, is there another setting that either one could rig to eliminate page numbers?

Or, a perhaps there's a built-in feature epub > txt / rtf conversion that handles this that I'm missing.

Many thanks!

Sincerely,

Blaine

Last edited by BetterRed; 06-18-2020 at 06:52 PM. Reason: Merge note
Blaineoreski is offline   Reply With Quote
Old 06-16-2020, 09:29 AM   #2
pdurrant
The Grand Mouse 高貴的老鼠
pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.
 
pdurrant's Avatar
 
Posts: 71,501
Karma: 306214458
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
You seem to have an unusual ePub that has the page numbers of the original book in the ePub text.

I suspect that if you look at the HTML in the ePub, you'll be able to spot the numbers and remove them with a simple search/replace.
pdurrant is offline   Reply With Quote
Advert
Old 06-16-2020, 07:59 PM   #3
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,565
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by pdurrant View Post
You seem to have an unusual ePub that has the page numbers of the original book in the ePub text.

I suspect that if you look at the HTML in the ePub, you'll be able to spot the numbers and remove them with a simple search/replace.
↑ ↑ ↑ ✔

The ebook may have started life as ink-on-paper or PDF, which was put through an OCR scanner to create text, which in turn was converted to EPUB without first removing the print artefacts - like page numbers, footnotes etc.

BR
BetterRed is offline   Reply With Quote
Old 06-17-2020, 03:00 AM   #4
Blaineoreski
Zealot
Blaineoreski exercises by bench pressing the entire Harry Potter series in hardcoverBlaineoreski exercises by bench pressing the entire Harry Potter series in hardcoverBlaineoreski exercises by bench pressing the entire Harry Potter series in hardcoverBlaineoreski exercises by bench pressing the entire Harry Potter series in hardcoverBlaineoreski exercises by bench pressing the entire Harry Potter series in hardcoverBlaineoreski exercises by bench pressing the entire Harry Potter series in hardcoverBlaineoreski exercises by bench pressing the entire Harry Potter series in hardcoverBlaineoreski exercises by bench pressing the entire Harry Potter series in hardcoverBlaineoreski exercises by bench pressing the entire Harry Potter series in hardcoverBlaineoreski exercises by bench pressing the entire Harry Potter series in hardcoverBlaineoreski exercises by bench pressing the entire Harry Potter series in hardcover
 
Blaineoreski's Avatar
 
Posts: 110
Karma: 16268
Join Date: Apr 2020
Device: none
Hi pdurrant & BetterRed,

Got it! Yep. Thanks.

Sincerely,

Blaine
Blaineoreski is offline   Reply With Quote
Old 06-17-2020, 03:05 AM   #5
Blaineoreski
Zealot
Blaineoreski exercises by bench pressing the entire Harry Potter series in hardcoverBlaineoreski exercises by bench pressing the entire Harry Potter series in hardcoverBlaineoreski exercises by bench pressing the entire Harry Potter series in hardcoverBlaineoreski exercises by bench pressing the entire Harry Potter series in hardcoverBlaineoreski exercises by bench pressing the entire Harry Potter series in hardcoverBlaineoreski exercises by bench pressing the entire Harry Potter series in hardcoverBlaineoreski exercises by bench pressing the entire Harry Potter series in hardcoverBlaineoreski exercises by bench pressing the entire Harry Potter series in hardcoverBlaineoreski exercises by bench pressing the entire Harry Potter series in hardcoverBlaineoreski exercises by bench pressing the entire Harry Potter series in hardcoverBlaineoreski exercises by bench pressing the entire Harry Potter series in hardcover
 
Blaineoreski's Avatar
 
Posts: 110
Karma: 16268
Join Date: Apr 2020
Device: none
Question 800 - Page Long Book: After Conversion Every Line Has Words Stuck Together

Hi,

Using Epub and even PDF to convert to

docx
text
rtf

Seems like at least every line has words that need spaces but didn't get them:

eighteenthcentury

betterthan

atleast

I've got access to different versions of the books: Epub & Adobe. Still, going into any of the above 3 formats I get the same stuck-together-words issue.

What do you think is happening?

Sincerely,

Blaine
Blaineoreski is offline   Reply With Quote
Advert
Old 06-17-2020, 07:47 AM   #6
jhowell
Grand Sorcerer
jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.
 
jhowell's Avatar
 
Posts: 6,496
Karma: 84420419
Join Date: Nov 2011
Location: Tampa Bay, Florida
Device: Kindles
It sounds like the book was scanned from a paper copy and the OCR conversion was done poorly and was never proofread and corrected.

Another possibility is that the book was converted to PDF and then back to a reflowable format. That sometimes loses word breaks.
jhowell is offline   Reply With Quote
Old 06-17-2020, 10:06 AM   #7
retiredbiker
Addict
retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.
 
retiredbiker's Avatar
 
Posts: 387
Karma: 1638210
Join Date: May 2013
Location: Ontario, Canada
Device: Kindle KB, Oasis, Pop_Os!, Jutoh, Kobo Forma
You imply (but don't say) that the original epub (or pdf) does not have the problem. You could open the epub version in the Editor and look for some of the problem words. If they are run together in the original, of course you're stuck, but if there is something there giving them space, that is not coming over in conversion, you can probably fix it there with a search and replace.
retiredbiker is offline   Reply With Quote
Old 06-17-2020, 04:49 PM   #8
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,565
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
@Blaineoreski - is this thread referring to the same 'book' as this thread Convert Epub > RTF or TXT: Any Way To Eliminate Page Numbers??

If it is I will merge the two threads, the page numbers, joined words and other issues are almost certain to stem from the problems inherent in converting OCR scanned text from PDFs or printed pages.

Even if you didn't do the original conversion of PDF or scan to what you have, you should read this … before Posting PDF Questions

And this ==>> How to ask a question about conversion problems

BR
BetterRed is offline   Reply With Quote
Old 06-18-2020, 10:12 AM   #9
Blaineoreski
Zealot
Blaineoreski exercises by bench pressing the entire Harry Potter series in hardcoverBlaineoreski exercises by bench pressing the entire Harry Potter series in hardcoverBlaineoreski exercises by bench pressing the entire Harry Potter series in hardcoverBlaineoreski exercises by bench pressing the entire Harry Potter series in hardcoverBlaineoreski exercises by bench pressing the entire Harry Potter series in hardcoverBlaineoreski exercises by bench pressing the entire Harry Potter series in hardcoverBlaineoreski exercises by bench pressing the entire Harry Potter series in hardcoverBlaineoreski exercises by bench pressing the entire Harry Potter series in hardcoverBlaineoreski exercises by bench pressing the entire Harry Potter series in hardcoverBlaineoreski exercises by bench pressing the entire Harry Potter series in hardcoverBlaineoreski exercises by bench pressing the entire Harry Potter series in hardcover
 
Blaineoreski's Avatar
 
Posts: 110
Karma: 16268
Join Date: Apr 2020
Device: none
Smile

Hi retiredbiker & Jhowell & BetterRed

Thanks for the replies!

@BetterRed

You wrote:

Quote:
- is this thread referring to the same 'book' as this thread Convert Epub > RTF or TXT: Any Way To Eliminate Page Numbers??

If it is I will merge the two threads, the page numbers, joined words and other issues are almost certain to stem from the problems inherent in converting OCR scanned text from PDFs or printed pages.
Yes. Sounds good.

Sincerely,

Blaine
Blaineoreski is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Hacked Up Reader for epub/fb2/txt/rtf/html/pdb/etc bhaak Kindle Developer's Corner 296 10-01-2016 01:11 PM
Convert HTML to RTF with Page Breaks odusto Conversion 11 03-18-2013 05:04 PM
convert PDF to Word/rtf/txt DrZoidberg Other formats 3 02-09-2010 06:12 AM
How to create non-embedded Unicode EPUB,LRF,TXT,RTF,PDF alexmobile Sony Reader 1 09-23-2009 10:04 PM
[Old Thread] unable to convert ebooks(rtf, txt,lit,html,pdf) to lrf in calibre .4.131 jackdeth191 Calibre 9 05-02-2009 02:55 AM


All times are GMT -4. The time now is 02:10 AM.


MobileRead.com is a privately owned, operated and funded community.