Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 10-27-2016, 03:23 AM   #1
orbnas
Junior Member
orbnas began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Oct 2016
Device: iPad Mini
PDF to EPUB problem

I recently tried converting a pdf of a book into an ebook and when I did, the result was that every page of the pdf appeared in the pub almost like a picture, then the text was poorly copied into choppy lines of text in the pub. How do I fix this?
orbnas is offline   Reply With Quote
Old 10-27-2016, 03:25 AM   #2
pdurrant
The Grand Mouse 高貴的老鼠
pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.
 
pdurrant's Avatar
 
Posts: 73,886
Karma: 315126578
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Oasis
Perhaps this thread will help.

If you still have problems, ask in the conversions sub-forum.
pdurrant is offline   Reply With Quote
Advert
Old 11-01-2016, 03:49 PM   #3
retiredbiker
Evangelist
retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.
 
retiredbiker's Avatar
 
Posts: 448
Karma: 3886916
Join Date: May 2013
Location: Ontario, Canada
Device: Kindle KB, Oasis, Pop_Os!, Jutoh, Kobo Forma
PDF to Epub Process

It sounds like you may have a PDF with a bad text layer. Here is what I use to convert theses miserable things:

PDF to EPUB Conversion Using Calibre and LibreOffice Writer(Short version)

1. Test your PDF for a reasonable text layer. Open it in a non-adobe reader; try and copy-paste a paragraph or two into LibreOffice. If it at least does the words in the right order, carry on.

2. Use Calibre to convert your PDF to RTF. Settings:
Look & Feel – Smarten Punctuation (optional, LibreOffice AutoCorrect will do this too). All the regex below has curly quotes – if you use straight (typewriter) quotes, you will have to change it.

Heuristic Processing – Enable it, and set UnWrap factor between 30-50%, less for a lot of short lines, more for dense text. Trial & error process.

PDF Input – put in same UnWrap factor as above.

Convert to RTF

3. Open the RTF in LibreOffice and clean up some basics:

NOTE: Use the Alternative Search and Replace plug-in for all this. The native LibreOffice Writer find and replace won’t work. Google for it if you don’t have it. Make sure “Match Case” is checked for most of this, too.

Immediately save in Open Document format - .odt

Get rid of initial spaces; Find ^ (that’s ^space) Replace with nothing

Get rid of all tabs: Find \t Replace with nothing

4. Optional. Highlight the whole text, change to Default paragraph style, and run Tools/AutoCorrect. Check the AutoCorrect options first. Work on a copy until you know what you are doing. Can also un-wrap paragraphs – if you did this with Calibre, don’t do it again.

5. Highlight the whole text and make the paragraph style what you want for most of the text – usually Text Body or First Line Indent.

6. Find all the chapter headings and make them Header 1 style. Look in the Navigator. If they don’t appear check that Tools/Outline Numbering has level 1 set to Heading 1. When done, these chapters will become the book’s HTML files, and the headings will be your table of contents. Use Navigator to navigate through the book as you work.

7. Get rid of any page headers and footers:

"Title.*\d+" or "\d+.*Author" (and similar, depends on your text) Finds page headers & footers – replace with nothing. This will leave “holes” in the text all over the place, but the tools below will fix that. May span paragraphs and need several different runs...every book is different.

8. Scan the text for common problems, and use the tools below to fix them:
General warning: test what you want to do on a small section – be very careful of Replace All. Save often, and save to new versions, especially for a Replace All.

Find: ^$ Replace with nothing Removes empty paragraphs. If the text uses empty paragraphs for scene breaks, find and fix them first. Use paragraph styles to control spacing, not empty paragraphs.

Find: ([:lower:])$ Replace with \1 (that’s \1space) Finds paragraphs ending in lower case letter and joins to next line.. This is a major workhorse and will usually improve the text tremendously.

Find: ,”$ ?”$ !”$ Mr.$ Mrs.$ Ms.$ Dr.$ Common paragraph bad split points. A common source of bad paragraphing in Calibre’s RTF conversion. Find, then fix by hand, usually.

Find: ^([:upper:]).*”$ Starts Upper case, end with a quote. Scan through using this, look for the problems, fix them one at a time.

Find: ^“.*[^”] Starts with quote, ends without one. Another way to look for errors.

Find: (“) (”) Replace: \1\p\2 Fix doubled quotes in a paragraph. Some conversions have lots, others just a few. Depends on the conversation in the text and the UnWrap factor you used. Try also without the space: (“)(”)

Find: (“[^“]*”) Finds quoted strings. Useful for checking sections of complex quotations marks, quotes within quotes, etc.

Find: (.*)$ Replace: \1 (that’s \1space) Puts single lines into one paragraph. Empty paragraphs must be in between the final paragraphs you want. If you have a text or section with “each line is a paragraph”, this will fix it. Be very careful until you have practised with this. If you get a lot of double spaces, fix them at the end, regular find & replace.

9. Fix punctuation as needed or desired. Common needed characters here for a quick copy-paste:
En Dash – ; Em Dash — ; Elipsis …; § ; ⦁ ; á à â ç é è ê ï ñ õ ô; F-ligature fi; Quotes “ ” ‘ ’

10. If you haven’t already, format any front matter and end matter pages – usually by hand – if you do it at the end here, the above tools won’t clobber it. Same with quoted passages, verse, representations of signs, letters, notes, headlines – all that special stuff you may have.

11. Run spell check. Likely to be some or many words with dashes in them from original end-of-line hyphenation.

12. When you are happy with it, save one last time, go to Calibre, Add Book, point to your .odt file, and add it. Add or fix any metadata as you wish. Then convert the .odt to EPUB. Done.

A much expanded version with more explanation is uploaded to Libgen. Look for author R. Frobnitz -- both epub and rtf formats are there.

Last edited by retiredbiker; 11-01-2016 at 03:51 PM. Reason: clarified a point
retiredbiker is offline   Reply With Quote
Old 11-01-2016, 04:13 PM   #4
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 79,654
Karma: 145864619
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
My solution is DON'T BOTHER!
JSWolf is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
EPUB to PDF problem jayh3 PDF 1 09-11-2011 09:31 AM
Epub to PDF Problem zonemama Conversion 6 09-10-2011 08:25 AM
PDF to EPUB problem Nightstalker Calibre 5 05-18-2011 01:04 PM
PDF to ePub problem Dark123 Calibre 16 08-08-2010 08:09 AM
Problem converting pdf to epub smartin Calibre 3 05-02-2010 06:55 AM


All times are GMT -4. The time now is 07:30 AM.


MobileRead.com is a privately owned, operated and funded community.