|  08-19-2019, 09:42 PM | #1 | 
| null operator (he/him)            Posts: 22,006 Karma: 30277294 Join Date: Mar 2012 Location: Sydney Australia Device: none | 
				
				PDF to ePub conversion
			  In the OPs last post he wrote "I converted the book from PDF with Sigil." Is that possible - if so how? @joebob2 - most PDF's are created from something else - I've only known one person who wrote Postscript on a clean slate. They typically start life as WP or DTP files from programs such as Word, InDesign, Writer etc. If you can get hold of such a file that might be a better place to start. BR Last edited by BetterRed; 08-20-2019 at 01:00 AM. | 
|   |   | 
|  08-19-2019, 11:37 PM | #2 | 
| Bibliophagist            Posts: 47,971 Karma: 174315100 Join Date: Jul 2010 Location: Vancouver Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos | 
			
			I don't think it is possible to convert PDF to epub using Sigil.  I did run into one author who attempted to copy/paste pages from one of her old books into BookView as that was the only electronic format for that book she was able to obtain when the rights reverted.  The output of that was a right mess.
		 | 
|   |   | 
|  08-20-2019, 12:59 AM | #3 | |
| null operator (he/him)            Posts: 22,006 Karma: 30277294 Join Date: Mar 2012 Location: Sydney Australia Device: none | Quote: 
  PDF conversion must be amongst the top 5 topics at MR. BR | |
|   |   | 
|  08-20-2019, 11:36 AM | #4 | |
| Member  Posts: 12 Karma: 10 Join Date: Jun 2019 Device: epub | 
				
				Conversion maze
			 Quote: 
 On the Smashwords site it talks about a "nuclear option," i.e. copy and paste the entire document into a Word document and re-convert it. I'm tinkering enough right now, I may go that direction. | |
|   |   | 
|  08-20-2019, 12:35 PM | #5 | 
| Wizard            Posts: 1,086 Karma: 6719822 Join Date: Jul 2012 Device: Palm Pilot M105 | 
			
			That's what I've done when I've "transcribed" a short story from an old magazine when the PDF scans are on archive.org.  Exceedingly tedious.  In that case it's probably has more errors since the magazine has faded and the paper's brown and the typesetting can be dodgy.
		 | 
|   |   | 
|  08-20-2019, 12:59 PM | #6 | 
| Grand Sorcerer            Posts: 6,266 Karma: 16544702 Join Date: Sep 2009 Location: UK Device: ClaraHD, Forma, Libra2, Clara2E, LibraCol, PBTouchHD3 | |
|   |   | 
|  08-20-2019, 07:40 PM | #7 | ||||
| Wizard            Posts: 2,306 Karma: 13057279 Join Date: Jul 2012 Device: Kobo Forma, Nook | Quote: 
 Quote: 
 Quote: 
 Do you happen to know which version of Quark it used? (And ~ when this book was published?) I only worked on one QXD file many years ago, and surprisingly, LibreOffice was able to open it. It still required a lot of elbow grease, but it was a huge step up from having to OCR from scratch. Quote: 
 You lose all important formatting information (bold/italics/superscript), and underneath-the-surface is just as important as the text itself. And depending on how the PDF was put together, that copy/paste itself might introduce a massive amount of issues as well (like the hard hyphens issue you mentioned). You'll spend more time cleaning up all those errors than if you just worked from much cleaner OCR in the first place. Last edited by Tex2002ans; 08-20-2019 at 07:52 PM. | ||||
|   |   | 
|  08-20-2019, 07:45 PM | #8 | |
| null operator (he/him)            Posts: 22,006 Karma: 30277294 Join Date: Mar 2012 Location: Sydney Australia Device: none | Quote: 
 You can open PDF files directly in MS Word 2016/19, the result can be surprisingly good - but I suspect that's because the documents I'm thinking of were originally typed into Word by someone who didn't regard it as a Remington portable. An ex QuarkXpress PDF might not fare so well. BR | |
|   |   | 
|  08-20-2019, 07:50 PM | #9 | 
| null operator (he/him)            Posts: 22,006 Karma: 30277294 Join Date: Mar 2012 Location: Sydney Australia Device: none | |
|   |   | 
|  08-22-2019, 12:16 PM | #10 | ||||
| Member  Posts: 12 Karma: 10 Join Date: Jun 2019 Device: epub | 
				
				Grinding through it
			 
			
			No, I've already invested significant time in cleaning up the PDF.  The epub version is pretty close to where I want it, but it has all these technical issues.  that the validators don't like.   Quote: 
 Quote: 
 Quote: 
 Quote: 
 What's surprising to me is that there are all these great conversion utilities, yet nothing that addresses the validator errors. Thanks again for all the help. I'll keep plugging on this. | ||||
|   |   | 
|  08-22-2019, 02:31 PM | #11 | 
| Member  Posts: 12 Karma: 10 Join Date: Jun 2019 Device: epub | 
				
				Wading through the errors
			 
			
			I'm still awaiting moderation for another post, but in the meantime I've tried tackling the errors as a group.  Here's what I've found, plus a question: I was getting errors because the html files were named html rather than xhtml. Consolidating the files (thanks, Calibre) made that a simple matter. So those are gone. Next, I have many instances of this error: Error while parsing file: element "h3" not allowed here; expected the element end-tag, text or element "a", "abbr", "area", "audio", "b", "bdi", "bdo", "br", "button", "canvas", "cite", "code", "command", "datalist", "del", "dfn", "em", "embed", "epub:switch", "i", "iframe", "img", "input", "ins", "kbd", "keygen", "label", "link", "map", "mark", "meta", "meter", "ns1:math", "ns2:svg", "object", "output", "progress", "q", "ruby", "s", "samp", "script", "select", "small", "span", "strong", "sub", "sup", "textarea", "time", "u", "var", "video" or "wbr" (with xmlns:ns1="http://www.w3.org/1998/Math/MathML" xmlns:ns2="http://www.w3.org/2000/svg")In the code, I saw that Italic tags were outside the H3 tags like this: Code: <i> <h3 id="sigil_toc_id_106">August 17, 1984 </h3> </i> | 
|   |   | 
|  08-22-2019, 06:44 PM | #12 | |
| Resident Curmudgeon            Posts: 80,671 Karma: 150249619 Join Date: Nov 2006 Location: Roslindale, Massachusetts Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3 | Quote: 
 | |
|   |   | 
|  08-22-2019, 07:13 PM | #13 | |
| Member  Posts: 12 Karma: 10 Join Date: Jun 2019 Device: epub | 
				
				Technical cleanup?
			 Quote: 
 Ideas welcome! | |
|   |   | 
|  08-22-2019, 09:05 PM | #14 | 
| Member  Posts: 12 Karma: 10 Join Date: Jun 2019 Device: epub | 
			
			Just following up.  It appears that I've resolved many of those errors just looking at the HTML format.  However, now I'm getting errors in files related to the TOC.  This is the only error type I have: Code: Type File Line Position Message ERROR OEBPS/toc.ncx 23 63 Fragment identifier is not defined. Code:         </navLabel>
        <content src="Text/Pt0_Intro.xhtml#sigil_toc_id_282"/>
      </navPoint>Good night! | 
|   |   | 
|  08-22-2019, 11:21 PM | #15 | |
| Wizard            Posts: 2,306 Karma: 13057279 Join Date: Jul 2012 Device: Kobo Forma, Nook | Quote: 
 There are professionals around here... Did you try opening the QXD with LibreOffice? Does it open it? Or is the QXD created in a newer Quark? | |
|   |   | 
|  | 
| 
 | 
|  Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| epub 2 PDF conversion with OCR in PDF possible? | hobi2000 | Conversion | 2 | 03-25-2019 03:20 AM | 
| conversion from pdf to epub help | slushbilly | Workshop | 1 | 01-31-2011 08:07 AM | 
| pdf -> epub conversion | cristobalmx | Calibre | 1 | 12-12-2010 04:06 AM | 
| PDF to EPUB Conversion | LuchoResto | General Discussions | 1 | 11-19-2010 04:54 PM | 
| PDF to EPUB conversion | jfontana | Calibre | 2 | 03-17-2010 03:09 AM |