![]() |
#1 |
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 21,625
Karma: 29710338
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
PDF to ePub conversion
![]() In the OPs last post he wrote "I converted the book from PDF with Sigil." Is that possible - if so how? @joebob2 - most PDF's are created from something else - I've only known one person who wrote Postscript on a clean slate. They typically start life as WP or DTP files from programs such as Word, InDesign, Writer etc. If you can get hold of such a file that might be a better place to start. BR Last edited by BetterRed; 08-20-2019 at 01:00 AM. |
![]() |
![]() |
![]() |
#2 |
Bibliophagist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 44,802
Karma: 168802811
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
I don't think it is possible to convert PDF to epub using Sigil. I did run into one author who attempted to copy/paste pages from one of her old books into BookView as that was the only electronic format for that book she was able to obtain when the rights reverted. The output of that was a right mess.
|
![]() |
![]() |
Advert | |
|
![]() |
#3 | |
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 21,625
Karma: 29710338
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
![]() PDF conversion must be amongst the top 5 topics at MR. BR |
|
![]() |
![]() |
![]() |
#4 | |
Member
![]() Posts: 12
Karma: 10
Join Date: Jun 2019
Device: epub
|
Conversion maze
Quote:
On the Smashwords site it talks about a "nuclear option," i.e. copy and paste the entire document into a Word document and re-convert it. I'm tinkering enough right now, I may go that direction. |
|
![]() |
![]() |
![]() |
#5 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,086
Karma: 6719822
Join Date: Jul 2012
Device: Palm Pilot M105
|
That's what I've done when I've "transcribed" a short story from an old magazine when the PDF scans are on archive.org. Exceedingly tedious. In that case it's probably has more errors since the magazine has faded and the paper's brown and the typesetting can be dodgy.
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,248
Karma: 16539642
Join Date: Sep 2009
Location: UK
Device: ClaraHD, Forma, Libra2, Clara2E, LibraCol, PBTouchHD3
|
|
![]() |
![]() |
![]() |
#7 | ||||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
Quote:
Quote:
Do you happen to know which version of Quark it used? (And ~ when this book was published?) I only worked on one QXD file many years ago, and surprisingly, LibreOffice was able to open it. It still required a lot of elbow grease, but it was a huge step up from having to OCR from scratch. Quote:
You lose all important formatting information (bold/italics/superscript), and underneath-the-surface is just as important as the text itself. And depending on how the PDF was put together, that copy/paste itself might introduce a massive amount of issues as well (like the hard hyphens issue you mentioned). You'll spend more time cleaning up all those errors than if you just worked from much cleaner OCR in the first place. Last edited by Tex2002ans; 08-20-2019 at 07:52 PM. |
||||
![]() |
![]() |
![]() |
#8 | |
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 21,625
Karma: 29710338
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
You can open PDF files directly in MS Word 2016/19, the result can be surprisingly good - but I suspect that's because the documents I'm thinking of were originally typed into Word by someone who didn't regard it as a Remington portable. An ex QuarkXpress PDF might not fare so well. BR |
|
![]() |
![]() |
![]() |
#9 |
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 21,625
Karma: 29710338
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
|
![]() |
![]() |
![]() |
#10 | ||||
Member
![]() Posts: 12
Karma: 10
Join Date: Jun 2019
Device: epub
|
Grinding through it
No, I've already invested significant time in cleaning up the PDF. The epub version is pretty close to where I want it, but it has all these technical issues. that the validators don't like.
Quote:
Quote:
Quote:
Quote:
What's surprising to me is that there are all these great conversion utilities, yet nothing that addresses the validator errors. Thanks again for all the help. I'll keep plugging on this. |
||||
![]() |
![]() |
![]() |
#11 |
Member
![]() Posts: 12
Karma: 10
Join Date: Jun 2019
Device: epub
|
Wading through the errors
I'm still awaiting moderation for another post, but in the meantime I've tried tackling the errors as a group. Here's what I've found, plus a question:
I was getting errors because the html files were named html rather than xhtml. Consolidating the files (thanks, Calibre) made that a simple matter. So those are gone. Next, I have many instances of this error: Error while parsing file: element "h3" not allowed here; expected the element end-tag, text or element "a", "abbr", "area", "audio", "b", "bdi", "bdo", "br", "button", "canvas", "cite", "code", "command", "datalist", "del", "dfn", "em", "embed", "epub:switch", "i", "iframe", "img", "input", "ins", "kbd", "keygen", "label", "link", "map", "mark", "meta", "meter", "ns1:math", "ns2:svg", "object", "output", "progress", "q", "ruby", "s", "samp", "script", "select", "small", "span", "strong", "sub", "sup", "textarea", "time", "u", "var", "video" or "wbr" (with xmlns:ns1="http://www.w3.org/1998/Math/MathML" xmlns:ns2="http://www.w3.org/2000/svg") In the code, I saw that Italic tags were outside the H3 tags like this:Code:
<i> <h3 id="sigil_toc_id_106">August 17, 1984 </h3> </i> |
![]() |
![]() |
![]() |
#12 | |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 79,123
Karma: 144284184
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Quote:
|
|
![]() |
![]() |
![]() |
#13 | |
Member
![]() Posts: 12
Karma: 10
Join Date: Jun 2019
Device: epub
|
Technical cleanup?
Quote:
Ideas welcome! |
|
![]() |
![]() |
![]() |
#14 |
Member
![]() Posts: 12
Karma: 10
Join Date: Jun 2019
Device: epub
|
Just following up. It appears that I've resolved many of those errors just looking at the HTML format. However, now I'm getting errors in files related to the TOC. This is the only error type I have:
Code:
Type File Line Position Message ERROR OEBPS/toc.ncx 23 63 Fragment identifier is not defined. Code:
</navLabel> <content src="Text/Pt0_Intro.xhtml#sigil_toc_id_282"/> </navPoint> Good night! |
![]() |
![]() |
![]() |
#15 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
There are professionals around here... Did you try opening the QXD with LibreOffice? Does it open it? Or is the QXD created in a newer Quark? |
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
epub 2 PDF conversion with OCR in PDF possible? | hobi2000 | Conversion | 2 | 03-25-2019 03:20 AM |
conversion from pdf to epub help | slushbilly | Workshop | 1 | 01-31-2011 08:07 AM |
pdf -> epub conversion | cristobalmx | Calibre | 1 | 12-12-2010 04:06 AM |
PDF to EPUB Conversion | LuchoResto | General Discussions | 1 | 11-19-2010 04:54 PM |
PDF to EPUB conversion | jfontana | Calibre | 2 | 03-17-2010 03:09 AM |