Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 07-06-2018, 05:56 PM   #1
davidchavez
Junior Member
davidchavez began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Jul 2018
Device: none
Problems with pasting content on Sigil

Hello everyone!

I am trying to convert PDF books into epub. Right now, the best way that I have seen in my case is to try to copy/paste in Sigil and then fix some formatting issues. I had lots of problems in InDesign, so that is why I have been following this.

The problem whenever I paste stuff in sigil from PDF or even from InDesign are two:
  1. Bold and italics are lost.
  2. Lots of new lines are added into the content. Example: http://share.epiclemon.com/smL4

I wanted to know what is the best option for both. Right now, I am copy/pasting and manually deleting newlines and also adding bold or italics whenever is required, but this is too manual.

Is there a better way in doing this?


Thanks a lot for your suggestions!
davidchavez is offline   Reply With Quote
Old 07-06-2018, 07:57 PM   #2
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 18,493
Karma: 95832288
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
The best way is to find the source that was used to make the PDF and use that. PDF is a destination. Trying to use PDF as a source document will always take lots of manual tweaking.

That said... I believe calibre has some pdf conversion utilities. But nothing is going to be the turn-key solution you're probably looking for.
DiapDealer is offline   Reply With Quote
Old 07-06-2018, 08:29 PM   #3
BetterRed
null operator
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 10,310
Karma: 10563064
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by DiapDealer View Post
<snip>
But nothing is going to be the turn-key solution you're probably looking for.


@davidchavez - as of today**, my #1 process for dealing with PDFs is:
  • open the PDF with Word 2016,
  • use Toxaris's eBook Tools MS Word add-in and a slew of macro's I've created and snagged over the decades to straighten out the text
  • save as DOCX.
  • convert the DOCX to EPUB in one of three ways: calibre conversion, import into the calibre editor, or use the Sigil DOCX Import plugin.

It's worth noting that the eBook Tools add-in can save the Word document as an EPUB.

If I think a PDF would require a lot of effort to shift into EPUB, I don't bother trying.

** 'as of today' - because my process has changed a number of times over the past several years

BR
BetterRed is offline   Reply With Quote
Old 07-06-2018, 09:59 PM   #4
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 6,684
Karma: 61133576
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, and NookColor. 2 Droid, 1 Win8 ePUB rdrs
Quote:
Originally Posted by davidchavez View Post
Hello everyone!

I am trying to convert PDF books into epub. Right now, the best way that I have seen in my case is to try to copy/paste in Sigil and then fix some formatting issues. I had lots of problems in InDesign, so that is why I have been following this.

The problem whenever I paste stuff in sigil from PDF or even from InDesign are two:
  1. Bold and italics are lost.
  2. Lots of new lines are added into the content. Example: http://share.epiclemon.com/smL4

I wanted to know what is the best option for both. Right now, I am copy/pasting and manually deleting newlines and also adding bold or italics whenever is required, but this is too manual.

Is there a better way in doing this?


Thanks a lot for your suggestions!

Read all the other posts, including Red's. They all have valuable info.

No offense, but you're doing this all backwards. I don't know why on earth you'd actually be copy-pasting from a PDF--the WORST possible source format for a eBook--instead of using INDD's built-in functionality. Export the source file to ePUB and/or HTML, and work with that.

If you're trying to find a push-button way, it doesn't exist. My business has literally done thousands of "PDF conversions" to ebooks, and to this day, our process is long and laborious, and tedious, at best. However, if you have the INDD source files, you're doing make-work.

What you are NOT going to be able to do, however, is do a process that's in "bookview." Or WYWISYG. Going from INDD->EPUB requires a knowledge of code. It's the only realistic way to go from A-->B. Or, really, from A->F, for all intents and purposes.

You're seeing that oddball line spacing because the PDF "paste" is regarding each and every line as a paragraph. That's why you're seeing it that way, and if you'd flip to code view, you'd see that, I believe.

Right?

Hitch
Hitch is offline   Reply With Quote
Old 07-07-2018, 04:07 AM   #5
elibrarian
Imperfect Perfectionist
elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.
 
elibrarian's Avatar
 
Posts: 159
Karma: 345678
Join Date: Dec 2011
Location: Ølstykke, Denmark
Device: none
If you only have your book as a PDF with a text layer, you might try a tool like Softmaker's FlexiPDF, which are able to do an export to various formats (including HTML and ePub), and usually (in contrast to most other tools of this kind) does a great job with the paragraphs (taking only the "real paragraphs" in account, not the pdf-induced line-by-line-"paragraphs").

If the price seems a little steep, a company called Ashampoo has a version of the same tool, which usually are sold at a lower price (BTW. the free version of FlexiPDF are not able to export, so don't bother to try that).

Of course, as others have said, there's no real turnkey solution for converting pdf to epub (or probably any other format to any other format).

(And for those suspicious/wary: I am not in any way affiliated with either of the above companies.)

Regards,

Kim
elibrarian is offline   Reply With Quote
Old 07-08-2018, 07:24 PM   #6
davidchavez
Junior Member
davidchavez began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Jul 2018
Device: none
wow, thanks a lot all for your comments. I will try them all. I am really new to all this stuff and your comments help a lot to know what to try next and that I am not alone with this manual process.

@Hitch, yes I know what you mean. Unfortunately I have tried the indd conversion to epub and given the files that I have been given, the process took me a lot more time than to copy/paste from PDF and work from there. I have seen that an indd file has to have everything in order and a very clean file so that the export work fine, even after doing the normal fine-tuning before export. So, I was dedicating more time to fine-tune everything and it was not very effective. Unfortunately for me, I cannot ask the team that creates the indd to be more cautious about this, so that is why I try to do it like this
davidchavez is offline   Reply With Quote
Old 07-09-2018, 09:24 AM   #7
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 6,684
Karma: 61133576
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, and NookColor. 2 Droid, 1 Win8 ePUB rdrs
Quote:
Originally Posted by davidchavez View Post
wow, thanks a lot all for your comments. I will try them all. I am really new to all this stuff and your comments help a lot to know what to try next and that I am not alone with this manual process.

@Hitch, yes I know what you mean. Unfortunately I have tried the indd conversion to epub and given the files that I have been given, the process took me a lot more time than to copy/paste from PDF and work from there. I have seen that an indd file has to have everything in order and a very clean file so that the export work fine, even after doing the normal fine-tuning before export. So, I was dedicating more time to fine-tune everything and it was not very effective. Unfortunately for me, I cannot ask the team that creates the indd to be more cautious about this, so that is why I try to do it like this
What concerns me a bit is you SEEM to be trying to do everything in a WYSIWYG manner. Are you? You didn't know what the line-spacing issue was, which says to me that you're trying to do this in bookview, or something similar. That's NOT going to work.

If you have a novel in INDD, maybe, maybe, you can get to an ePUB/MOBI in a WYSIWYG manner, kinda. But not non-fiction. And not with the results you're getting, in which EVERY line is a new paragraph. Do you see what I'm saying?

Of course the file has to have "everything" in order to work. That's why you have them give you the INDD package files, not simply the .indd file itself.

If you're going to go around it this way, you may as well export the INDD to faux-Word, "story by story" instead of the entire file, and then make an ebook from Word. Unless you're doing hidden work here, that we're not seeing, I worry a bit about the output you're generating.

Hitch
Hitch is offline   Reply With Quote
Old 07-09-2018, 09:34 AM   #8
davidchavez
Junior Member
davidchavez began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Jul 2018
Device: none
Quote:
Originally Posted by Hitch View Post
What concerns me a bit is you SEEM to be trying to do everything in a WYSIWYG manner. Are you? You didn't know what the line-spacing issue was, which says to me that you're trying to do this in bookview, or something similar. That's NOT going to work.

If you have a novel in INDD, maybe, maybe, you can get to an ePUB/MOBI in a WYSIWYG manner, kinda. But not non-fiction. And not with the results you're getting, in which EVERY line is a new paragraph. Do you see what I'm saying?

Of course the file has to have "everything" in order to work. That's why you have them give you the INDD package files, not simply the .indd file itself.

If you're going to go around it this way, you may as well export the INDD to faux-Word, "story by story" instead of the entire file, and then make an ebook from Word. Unless you're doing hidden work here, that we're not seeing, I worry a bit about the output you're generating.

Hitch


Yes, I get your point. I actually have the full indd package. The problem is on the declaration of styles. Each new chapter has a new style, instead of defining few reusable ones, and other problems I am having with this type of packages I have.

Right now I am doing a manual deletion of the new lines which are actually new paragraphs. I do that right now manual, since I tried to delete them using regular expressions on the code view, and I encountered a bug in Sigil that the text did not overflow at the end of the margin and continued to the right. Checked the code and it was fine, so it was a Sigil bug apparently.

Anyways, I don't always use WYSIWYG approach, but most of the time yes. I think that my best bet right now is either to work on fixing the indd (medium term) and continue to do the manual work via doc files or PDF editors. My pain points are new paragraph instead of new line and keeping italics from my copy/paste. So, thanks to all your comments I have a quick and dirty solution, and as I realize there is no best way to solve it, then I have to work with the people that creates the indd package and lets see if eventually I have a working file that does not mean more time to convert than my manual work that I'm doing for now.

Thanks again for your comments
davidchavez is offline   Reply With Quote
Old 07-09-2018, 09:48 AM   #9
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 18,493
Karma: 95832288
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by davidchavez View Post
and I encountered a bug in Sigil that the text did not overflow at the end of the margin and continued to the right. Checked the code and it was fine, so it was a Sigil bug apparently.
Not a bug. Sigil will happily allow you to use css that renders text out of the current viewport. It's valid css, after all.

If that's not what you're talking about, I'd love to see an example of valid code/css that Sigil doesn't render properly within the margins dictated by said css.

Last edited by DiapDealer; 07-09-2018 at 01:22 PM.
DiapDealer is offline   Reply With Quote
Old 07-09-2018, 11:38 AM   #10
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 6,684
Karma: 61133576
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, and NookColor. 2 Droid, 1 Win8 ePUB rdrs
Quote:
Originally Posted by davidchavez View Post
Yes, I get your point. I actually have the full indd package. The problem is on the declaration of styles. Each new chapter has a new style, instead of defining few reusable ones, and other problems I am having with this type of packages I have.

Right now I am doing a manual deletion of the new lines which are actually new paragraphs. I do that right now manual, since I tried to delete them using regular expressions on the code view, and I encountered a bug in Sigil that the text did not overflow at the end of the margin and continued to the right. Checked the code and it was fine, so it was a Sigil bug apparently.

Anyways, I don't always use WYSIWYG approach, but most of the time yes. I think that my best bet right now is either to work on fixing the indd (medium term) and continue to do the manual work via doc files or PDF editors. My pain points are new paragraph instead of new line and keeping italics from my copy/paste. So, thanks to all your comments I have a quick and dirty solution, and as I realize there is no best way to solve it, then I have to work with the people that creates the indd package and lets see if eventually I have a working file that does not mean more time to convert than my manual work that I'm doing for now.

Thanks again for your comments
Well...FWIW, we deal with this all day long. We not only get files from myriad different designers--and we have zero control over this--but we get a lot of them from, god help us, Indy authors that have signed up for a whole MONTH of INDD, and think that they know how to use it. What we get from those folks is typically worse than the worst Word files, for the same reason--ad hoc styles, etc.

We go through and fix the INDD files, first. To me, that's the simplest method, and then export to HTML/ePUB, and do regex cleaning from there.

If you stopped using regex because the text was overflowing, your CSS is wrong, as Diap mentioned. OR, if you think that there's an actual bug, post it so that the guys can fix it, but I don't recall us running into this, not anytime in the past 5 years+.

If you're doing this commercially, you can't "fix" the problem. You'll simply keep getting cruft, like we do. But working in WYSIWYG, or manually deleting the additional paragraph codes--that's utter crap use of your time and worse, you're going to have junk below.

Why on EARTH anyone that's used INDD for more than 10 minutes would use different styles, from one chapter to another...only God knows. It's bad enough that we still get Book-mode files, in which I end up having to sometimes reassemble files. Just because that's "how we've always done it." (sigh).

Have you tried exporting the main content to doc or docx format, if you're copy-pasting? At least that way, you should retain the font characteristics like italics, etc. (Of course, that assumes that they're not creating faux italics, which happens ALL the freaking time, or using spans, or, or or...)

Hitch
Hitch is offline   Reply With Quote
Old 07-09-2018, 11:53 AM   #11
davidchavez
Junior Member
davidchavez began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Jul 2018
Device: none
Quote:
Originally Posted by Hitch View Post
Well...FWIW, we deal with this all day long. We not only get files from myriad different designers--and we have zero control over this--but we get a lot of them from, god help us, Indy authors that have signed up for a whole MONTH of INDD, and think that they know how to use it. What we get from those folks is typically worse than the worst Word files, for the same reason--ad hoc styles, etc.

We go through and fix the INDD files, first. To me, that's the simplest method, and then export to HTML/ePUB, and do regex cleaning from there.

If you stopped using regex because the text was overflowing, your CSS is wrong, as Diap mentioned. OR, if you think that there's an actual bug, post it so that the guys can fix it, but I don't recall us running into this, not anytime in the past 5 years+.

If you're doing this commercially, you can't "fix" the problem. You'll simply keep getting cruft, like we do. But working in WYSIWYG, or manually deleting the additional paragraph codes--that's utter crap use of your time and worse, you're going to have junk below.

Why on EARTH anyone that's used INDD for more than 10 minutes would use different styles, from one chapter to another...only God knows. It's bad enough that we still get Book-mode files, in which I end up having to sometimes reassemble files. Just because that's "how we've always done it." (sigh).

Have you tried exporting the main content to doc or docx format, if you're copy-pasting? At least that way, you should retain the font characteristics like italics, etc. (Of course, that assumes that they're not creating faux italics, which happens ALL the freaking time, or using spans, or, or or...)

Hitch


Thanks Hitch a lot for your comments. Seems I am not the only one struggling with indd files. I'll also try your approach to fix the indd file. I am doing this actually as a volunteer to a non-profit and I am a complete noob, only been helping them for 3 months. So on my regular work (online marketing agency) I time all my work to calculate profit. So, I did the same, and it took me like 3x more to fix the indd file than to manually copy/paste into Sigil, take out the new paragraphs manually and do italics. Imagine how bad was the indd file

But good, point, I will try it again and will check again of what I think it was a bug. It was strange because it only did it on some paragraphs, not some. I'll check and report if really a bug.

Thanks to everyone for your comments. I have lots of new stuff to try and its good for me to have a reality check that this is not a simple process but depends a lot on the source file and has multiple steps.
davidchavez is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
sigil changes 'version' in content.opf from 2.0 to 1.0? nalor78 Sigil 13 02-22-2016 02:25 PM
Comments in content.opf with Sigil 0.9.2 turbulent Sigil 4 02-15-2016 05:29 PM
avoiding Sigil's meta in content.opf sbin Sigil 25 01-08-2016 03:51 PM
Sigil corrupting content.opf (0.4, 0.5) cyana Sigil 21 02-11-2012 05:25 PM
Sigil resets edits to content.opf adv_dp_fan Sigil 7 09-28-2011 06:50 PM


All times are GMT -4. The time now is 12:11 AM.


MobileRead.com is a privately owned, operated and funded community.