07-06-2018, 05:56 PM | #1 |
Junior Member
Posts: 4
Karma: 10
Join Date: Jul 2018
Device: none
|
Problems with pasting content on Sigil
Hello everyone!
I am trying to convert PDF books into epub. Right now, the best way that I have seen in my case is to try to copy/paste in Sigil and then fix some formatting issues. I had lots of problems in InDesign, so that is why I have been following this. The problem whenever I paste stuff in sigil from PDF or even from InDesign are two:
I wanted to know what is the best option for both. Right now, I am copy/pasting and manually deleting newlines and also adding bold or italics whenever is required, but this is too manual. Is there a better way in doing this? Thanks a lot for your suggestions! |
07-06-2018, 07:57 PM | #2 |
Grand Sorcerer
Posts: 27,549
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
The best way is to find the source that was used to make the PDF and use that. PDF is a destination. Trying to use PDF as a source document will always take lots of manual tweaking.
That said... I believe calibre has some pdf conversion utilities. But nothing is going to be the turn-key solution you're probably looking for. |
07-06-2018, 08:29 PM | #3 | |
null operator (he/him)
Posts: 20,568
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
@davidchavez - as of today**, my #1 process for dealing with PDFs is:
It's worth noting that the eBook Tools add-in can save the Word document as an EPUB. If I think a PDF would require a lot of effort to shift into EPUB, I don't bother trying. ** 'as of today' - because my process has changed a number of times over the past several years BR |
|
07-06-2018, 09:59 PM | #4 | |
Bookmaker & Cat Slave
Posts: 11,462
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
|
Quote:
Read all the other posts, including Red's. They all have valuable info. No offense, but you're doing this all backwards. I don't know why on earth you'd actually be copy-pasting from a PDF--the WORST possible source format for a eBook--instead of using INDD's built-in functionality. Export the source file to ePUB and/or HTML, and work with that. If you're trying to find a push-button way, it doesn't exist. My business has literally done thousands of "PDF conversions" to ebooks, and to this day, our process is long and laborious, and tedious, at best. However, if you have the INDD source files, you're doing make-work. What you are NOT going to be able to do, however, is do a process that's in "bookview." Or WYWISYG. Going from INDD->EPUB requires a knowledge of code. It's the only realistic way to go from A-->B. Or, really, from A->F, for all intents and purposes. You're seeing that oddball line spacing because the PDF "paste" is regarding each and every line as a paragraph. That's why you're seeing it that way, and if you'd flip to code view, you'd see that, I believe. Right? Hitch |
|
07-07-2018, 04:07 AM | #5 |
Imperfect Perfectionist
Posts: 464
Karma: 724664
Join Date: Dec 2011
Location: Ølstykke, Denmark
Device: none
|
If you only have your book as a PDF with a text layer, you might try a tool like Softmaker's FlexiPDF, which are able to do an export to various formats (including HTML and ePub), and usually (in contrast to most other tools of this kind) does a great job with the paragraphs (taking only the "real paragraphs" in account, not the pdf-induced line-by-line-"paragraphs").
If the price seems a little steep, a company called Ashampoo has a version of the same tool, which usually are sold at a lower price (BTW. the free version of FlexiPDF are not able to export, so don't bother to try that). Of course, as others have said, there's no real turnkey solution for converting pdf to epub (or probably any other format to any other format). (And for those suspicious/wary: I am not in any way affiliated with either of the above companies.) Regards, Kim |
07-08-2018, 07:24 PM | #6 |
Junior Member
Posts: 4
Karma: 10
Join Date: Jul 2018
Device: none
|
wow, thanks a lot all for your comments. I will try them all. I am really new to all this stuff and your comments help a lot to know what to try next and that I am not alone with this manual process.
@Hitch, yes I know what you mean. Unfortunately I have tried the indd conversion to epub and given the files that I have been given, the process took me a lot more time than to copy/paste from PDF and work from there. I have seen that an indd file has to have everything in order and a very clean file so that the export work fine, even after doing the normal fine-tuning before export. So, I was dedicating more time to fine-tune everything and it was not very effective. Unfortunately for me, I cannot ask the team that creates the indd to be more cautious about this, so that is why I try to do it like this |
07-09-2018, 09:24 AM | #7 | |
Bookmaker & Cat Slave
Posts: 11,462
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
|
Quote:
If you have a novel in INDD, maybe, maybe, you can get to an ePUB/MOBI in a WYSIWYG manner, kinda. But not non-fiction. And not with the results you're getting, in which EVERY line is a new paragraph. Do you see what I'm saying? Of course the file has to have "everything" in order to work. That's why you have them give you the INDD package files, not simply the .indd file itself. If you're going to go around it this way, you may as well export the INDD to faux-Word, "story by story" instead of the entire file, and then make an ebook from Word. Unless you're doing hidden work here, that we're not seeing, I worry a bit about the output you're generating. Hitch |
|
07-09-2018, 09:34 AM | #8 | |
Junior Member
Posts: 4
Karma: 10
Join Date: Jul 2018
Device: none
|
Quote:
Yes, I get your point. I actually have the full indd package. The problem is on the declaration of styles. Each new chapter has a new style, instead of defining few reusable ones, and other problems I am having with this type of packages I have. Right now I am doing a manual deletion of the new lines which are actually new paragraphs. I do that right now manual, since I tried to delete them using regular expressions on the code view, and I encountered a bug in Sigil that the text did not overflow at the end of the margin and continued to the right. Checked the code and it was fine, so it was a Sigil bug apparently. Anyways, I don't always use WYSIWYG approach, but most of the time yes. I think that my best bet right now is either to work on fixing the indd (medium term) and continue to do the manual work via doc files or PDF editors. My pain points are new paragraph instead of new line and keeping italics from my copy/paste. So, thanks to all your comments I have a quick and dirty solution, and as I realize there is no best way to solve it, then I have to work with the people that creates the indd package and lets see if eventually I have a working file that does not mean more time to convert than my manual work that I'm doing for now. Thanks again for your comments |
|
07-09-2018, 09:48 AM | #9 | |
Grand Sorcerer
Posts: 27,549
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Quote:
If that's not what you're talking about, I'd love to see an example of valid code/css that Sigil doesn't render properly within the margins dictated by said css. Last edited by DiapDealer; 07-09-2018 at 01:22 PM. |
|
07-09-2018, 11:38 AM | #10 | |
Bookmaker & Cat Slave
Posts: 11,462
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
|
Quote:
We go through and fix the INDD files, first. To me, that's the simplest method, and then export to HTML/ePUB, and do regex cleaning from there. If you stopped using regex because the text was overflowing, your CSS is wrong, as Diap mentioned. OR, if you think that there's an actual bug, post it so that the guys can fix it, but I don't recall us running into this, not anytime in the past 5 years+. If you're doing this commercially, you can't "fix" the problem. You'll simply keep getting cruft, like we do. But working in WYSIWYG, or manually deleting the additional paragraph codes--that's utter crap use of your time and worse, you're going to have junk below. Why on EARTH anyone that's used INDD for more than 10 minutes would use different styles, from one chapter to another...only God knows. It's bad enough that we still get Book-mode files, in which I end up having to sometimes reassemble files. Just because that's "how we've always done it." (sigh). Have you tried exporting the main content to doc or docx format, if you're copy-pasting? At least that way, you should retain the font characteristics like italics, etc. (Of course, that assumes that they're not creating faux italics, which happens ALL the freaking time, or using spans, or, or or...) Hitch |
|
07-09-2018, 11:53 AM | #11 | |
Junior Member
Posts: 4
Karma: 10
Join Date: Jul 2018
Device: none
|
Quote:
Thanks Hitch a lot for your comments. Seems I am not the only one struggling with indd files. I'll also try your approach to fix the indd file. I am doing this actually as a volunteer to a non-profit and I am a complete noob, only been helping them for 3 months. So on my regular work (online marketing agency) I time all my work to calculate profit. So, I did the same, and it took me like 3x more to fix the indd file than to manually copy/paste into Sigil, take out the new paragraphs manually and do italics. Imagine how bad was the indd file But good, point, I will try it again and will check again of what I think it was a bug. It was strange because it only did it on some paragraphs, not some. I'll check and report if really a bug. Thanks to everyone for your comments. I have lots of new stuff to try and its good for me to have a reality check that this is not a simple process but depends a lot on the source file and has multiple steps. |
|
07-22-2018, 12:48 AM | #12 |
Guru
Posts: 668
Karma: 929286
Join Date: Apr 2014
Device: PW-3, iPad, Android phone
|
This is not a one-click either, but I've seen some very good books and found pdftohtml referenced in the headers.
http://pdftohtml.sourceforge.net/ -- freeeware. You will still have to manually compare and adjust things it gets wrong. |
07-22-2018, 12:20 PM | #13 |
Guru
Posts: 878
Karma: 2457540
Join Date: Nov 2011
Device: none
|
I used to charge less to make an EPUB from Word files than from PDF. But it ends up about the same amount of work. I have ONCE received perfectly-formatted DOC files which converted near-automatically. Just once.
|
07-22-2018, 12:40 PM | #14 |
Klak
Posts: 174
Karma: 150374
Join Date: Sep 2011
Location: Belgrade, Serbia
Device: many
|
in indesigns cc 2018 paragraph styles tab there is a export tagging options where you can map headings and other indesign styles to appropriate html tags.
in the same tab there is also edit all export tags where you can choose which tag to include in html so when you export indesign to epub you get pretty clean html file. if you have indesign files there is no reason to use pdf to make epub. |
07-22-2018, 12:54 PM | #15 |
Guru
Posts: 878
Karma: 2457540
Join Date: Nov 2011
Device: none
|
The trouble with InDesign is that it is very much a print layout program. It knows where lines break. It isn't always so interested in whether it's a linefeed or a paragraph.
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
sigil changes 'version' in content.opf from 2.0 to 1.0? | nalor78 | Sigil | 13 | 02-22-2016 02:25 PM |
Comments in content.opf with Sigil 0.9.2 | turbulent | Sigil | 4 | 02-15-2016 05:29 PM |
avoiding Sigil's meta in content.opf | sbin | Sigil | 25 | 01-08-2016 03:51 PM |
Sigil corrupting content.opf (0.4, 0.5) | cyana | Sigil | 21 | 02-11-2012 05:25 PM |
Sigil resets edits to content.opf | adv_dp_fan | Sigil | 7 | 09-28-2011 06:50 PM |