06-16-2020, 09:24 AM | #1 |
Zealot
Posts: 110
Karma: 16268
Join Date: Apr 2020
Device: none
|
Convert Epub > RTF or TXT: Any Way To Eliminate Page Numbers?
Merged this thread - 800 - Page Long Book: After Conversion Every Line Has Words Stuck Together Post #5
Hi, Converting a book from epub to either txt or rtf the physical page numbers persist. So that, even though you just have a scrolling screen of text, every 6 paragraphs or so your'll have a line with just 16 Is there any way to set the conversion variables so that the output of txt rtf files excise the page numbers? I was thinking could it be done via regular expressions? Or, is there another setting that either one could rig to eliminate page numbers? Or, a perhaps there's a built-in feature epub > txt / rtf conversion that handles this that I'm missing. Many thanks! Sincerely, Blaine Last edited by BetterRed; 06-18-2020 at 06:52 PM. Reason: Merge note |
06-16-2020, 09:29 AM | #2 |
The Grand Mouse 高貴的老鼠
Posts: 71,501
Karma: 306214458
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
|
You seem to have an unusual ePub that has the page numbers of the original book in the ePub text.
I suspect that if you look at the HTML in the ePub, you'll be able to spot the numbers and remove them with a simple search/replace. |
Advert | |
|
06-16-2020, 07:59 PM | #3 | |
null operator (he/him)
Posts: 20,565
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
The ebook may have started life as ink-on-paper or PDF, which was put through an OCR scanner to create text, which in turn was converted to EPUB without first removing the print artefacts - like page numbers, footnotes etc. BR |
|
06-17-2020, 03:00 AM | #4 |
Zealot
Posts: 110
Karma: 16268
Join Date: Apr 2020
Device: none
|
Hi pdurrant & BetterRed,
Got it! Yep. Thanks. Sincerely, Blaine |
06-17-2020, 03:05 AM | #5 |
Zealot
Posts: 110
Karma: 16268
Join Date: Apr 2020
Device: none
|
800 - Page Long Book: After Conversion Every Line Has Words Stuck Together
Hi,
Using Epub and even PDF to convert to docx text rtf Seems like at least every line has words that need spaces but didn't get them: eighteenthcentury betterthan atleast I've got access to different versions of the books: Epub & Adobe. Still, going into any of the above 3 formats I get the same stuck-together-words issue. What do you think is happening? Sincerely, Blaine |
Advert | |
|
06-17-2020, 07:47 AM | #6 |
Grand Sorcerer
Posts: 6,496
Karma: 84420419
Join Date: Nov 2011
Location: Tampa Bay, Florida
Device: Kindles
|
It sounds like the book was scanned from a paper copy and the OCR conversion was done poorly and was never proofread and corrected.
Another possibility is that the book was converted to PDF and then back to a reflowable format. That sometimes loses word breaks. |
06-17-2020, 10:06 AM | #7 |
Addict
Posts: 387
Karma: 1638210
Join Date: May 2013
Location: Ontario, Canada
Device: Kindle KB, Oasis, Pop_Os!, Jutoh, Kobo Forma
|
You imply (but don't say) that the original epub (or pdf) does not have the problem. You could open the epub version in the Editor and look for some of the problem words. If they are run together in the original, of course you're stuck, but if there is something there giving them space, that is not coming over in conversion, you can probably fix it there with a search and replace.
|
06-17-2020, 04:49 PM | #8 |
null operator (he/him)
Posts: 20,565
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
@Blaineoreski - is this thread referring to the same 'book' as this thread Convert Epub > RTF or TXT: Any Way To Eliminate Page Numbers??
If it is I will merge the two threads, the page numbers, joined words and other issues are almost certain to stem from the problems inherent in converting OCR scanned text from PDFs or printed pages. Even if you didn't do the original conversion of PDF or scan to what you have, you should read this … before Posting PDF Questions And this ==>> How to ask a question about conversion problems BR |
06-18-2020, 10:12 AM | #9 | |
Zealot
Posts: 110
Karma: 16268
Join Date: Apr 2020
Device: none
|
Hi retiredbiker & Jhowell & BetterRed
Thanks for the replies! @BetterRed You wrote: Quote:
Sincerely, Blaine |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Hacked Up Reader for epub/fb2/txt/rtf/html/pdb/etc | bhaak | Kindle Developer's Corner | 296 | 10-01-2016 01:11 PM |
Convert HTML to RTF with Page Breaks | odusto | Conversion | 11 | 03-18-2013 05:04 PM |
convert PDF to Word/rtf/txt | DrZoidberg | Other formats | 3 | 02-09-2010 06:12 AM |
How to create non-embedded Unicode EPUB,LRF,TXT,RTF,PDF | alexmobile | Sony Reader | 1 | 09-23-2009 10:04 PM |
[Old Thread] unable to convert ebooks(rtf, txt,lit,html,pdf) to lrf in calibre .4.131 | jackdeth191 | Calibre | 9 | 05-02-2009 02:55 AM |