![]() |
#1 |
Junior Member
![]() Posts: 1
Karma: 10
Join Date: Oct 2016
Device: iPad Mini
|
PDF to EPUB problem
I recently tried converting a pdf of a book into an ebook and when I did, the result was that every page of the pdf appeared in the pub almost like a picture, then the text was poorly copied into choppy lines of text in the pub. How do I fix this?
|
![]() |
![]() |
![]() |
#2 |
The Grand Mouse 高貴的老鼠
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 73,886
Karma: 315126578
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Oasis
|
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Evangelist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 448
Karma: 3886916
Join Date: May 2013
Location: Ontario, Canada
Device: Kindle KB, Oasis, Pop_Os!, Jutoh, Kobo Forma
|
PDF to Epub Process
It sounds like you may have a PDF with a bad text layer. Here is what I use to convert theses miserable things:
PDF to EPUB Conversion Using Calibre and LibreOffice Writer(Short version) 1. Test your PDF for a reasonable text layer. Open it in a non-adobe reader; try and copy-paste a paragraph or two into LibreOffice. If it at least does the words in the right order, carry on. 2. Use Calibre to convert your PDF to RTF. Settings: Look & Feel – Smarten Punctuation (optional, LibreOffice AutoCorrect will do this too). All the regex below has curly quotes – if you use straight (typewriter) quotes, you will have to change it. Heuristic Processing – Enable it, and set UnWrap factor between 30-50%, less for a lot of short lines, more for dense text. Trial & error process. PDF Input – put in same UnWrap factor as above. Convert to RTF 3. Open the RTF in LibreOffice and clean up some basics: NOTE: Use the Alternative Search and Replace plug-in for all this. The native LibreOffice Writer find and replace won’t work. Google for it if you don’t have it. Make sure “Match Case” is checked for most of this, too. Immediately save in Open Document format - .odt Get rid of initial spaces; Find ^ (that’s ^space) Replace with nothing Get rid of all tabs: Find \t Replace with nothing 4. Optional. Highlight the whole text, change to Default paragraph style, and run Tools/AutoCorrect. Check the AutoCorrect options first. Work on a copy until you know what you are doing. Can also un-wrap paragraphs – if you did this with Calibre, don’t do it again. 5. Highlight the whole text and make the paragraph style what you want for most of the text – usually Text Body or First Line Indent. 6. Find all the chapter headings and make them Header 1 style. Look in the Navigator. If they don’t appear check that Tools/Outline Numbering has level 1 set to Heading 1. When done, these chapters will become the book’s HTML files, and the headings will be your table of contents. Use Navigator to navigate through the book as you work. 7. Get rid of any page headers and footers: "Title.*\d+" or "\d+.*Author" (and similar, depends on your text) Finds page headers & footers – replace with nothing. This will leave “holes” in the text all over the place, but the tools below will fix that. May span paragraphs and need several different runs...every book is different. 8. Scan the text for common problems, and use the tools below to fix them: General warning: test what you want to do on a small section – be very careful of Replace All. Save often, and save to new versions, especially for a Replace All. Find: ^$ Replace with nothing Removes empty paragraphs. If the text uses empty paragraphs for scene breaks, find and fix them first. Use paragraph styles to control spacing, not empty paragraphs. Find: ([:lower:])$ Replace with \1 (that’s \1space) Finds paragraphs ending in lower case letter and joins to next line.. This is a major workhorse and will usually improve the text tremendously. Find: ,”$ ?”$ !”$ Mr.$ Mrs.$ Ms.$ Dr.$ Common paragraph bad split points. A common source of bad paragraphing in Calibre’s RTF conversion. Find, then fix by hand, usually. Find: ^([:upper:]).*”$ Starts Upper case, end with a quote. Scan through using this, look for the problems, fix them one at a time. Find: ^“.*[^”] Starts with quote, ends without one. Another way to look for errors. Find: (“) (”) Replace: \1\p\2 Fix doubled quotes in a paragraph. Some conversions have lots, others just a few. Depends on the conversation in the text and the UnWrap factor you used. Try also without the space: (“)(”) Find: (“[^“]*”) Finds quoted strings. Useful for checking sections of complex quotations marks, quotes within quotes, etc. Find: (.*)$ Replace: \1 (that’s \1space) Puts single lines into one paragraph. Empty paragraphs must be in between the final paragraphs you want. If you have a text or section with “each line is a paragraph”, this will fix it. Be very careful until you have practised with this. If you get a lot of double spaces, fix them at the end, regular find & replace. 9. Fix punctuation as needed or desired. Common needed characters here for a quick copy-paste: En Dash – ; Em Dash — ; Elipsis …; § ; ⦁ ; á à â ç é è ê ï ñ õ ô; F-ligature fi; Quotes “ ” ‘ ’ 10. If you haven’t already, format any front matter and end matter pages – usually by hand – if you do it at the end here, the above tools won’t clobber it. Same with quoted passages, verse, representations of signs, letters, notes, headlines – all that special stuff you may have. 11. Run spell check. Likely to be some or many words with dashes in them from original end-of-line hyphenation. 12. When you are happy with it, save one last time, go to Calibre, Add Book, point to your .odt file, and add it. Add or fix any metadata as you wish. Then convert the .odt to EPUB. Done. A much expanded version with more explanation is uploaded to Libgen. Look for author R. Frobnitz -- both epub and rtf formats are there. Last edited by retiredbiker; 11-01-2016 at 03:51 PM. Reason: clarified a point |
![]() |
![]() |
![]() |
#4 |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 79,654
Karma: 145864619
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
My solution is DON'T BOTHER!
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
EPUB to PDF problem | jayh3 | 1 | 09-11-2011 09:31 AM | |
Epub to PDF Problem | zonemama | Conversion | 6 | 09-10-2011 08:25 AM |
PDF to EPUB problem | Nightstalker | Calibre | 5 | 05-18-2011 01:04 PM |
PDF to ePub problem | Dark123 | Calibre | 16 | 08-08-2010 08:09 AM |
Problem converting pdf to epub | smartin | Calibre | 3 | 05-02-2010 06:55 AM |