MobileRead Forums

MobileRead Forums (https://www.mobileread.com/forums/index.php)
-   Sigil (https://www.mobileread.com/forums/forumdisplay.php?f=203)
-   -   pdf to epub conversion (https://www.mobileread.com/forums/showthread.php?t=61755)

mediax 11-09-2009 05:09 PM

pdf to epub conversion
 
I have a book scanned in pdf format that I'm looking to tidy up and convert to epub format. The pdf is text only and will require a fair amount of editing to correct errors.

This is my first attempt at this sort of thing, so I'd appreciate advice as to the best way of tackling the project. Should I convert to epub and do all the work in Sigil, or should I convert to text (or maybe HTML), do the donkey work of text editing and correction in OpenOffice and then convert to epub and use Sigil to sort out the formatting issues? Or is there a better way that I haven't thought of?

I'm running Linux, in case anyone has any suggestions for utilities.

Thanks in advance.

weedfreak 11-10-2009 06:39 AM

I have just completed a couple of similar projects, I converted to epub using calibre then did all the editing and formatting in Sigil.
Difficulties encountered were Calibre does not create 'perfect' code, in one case there were 10k+ lines of CSS code as every paragraph was given a separate ID and needed a CSS entry even though every entry was the same, other books were more 'normal.' Calibre also seems to scatter chapter breaks at random, and does not necessarily recognize Chapter headings, but you will be editing in any case so these are not major drawbacks.
Sigil is not, yet, an ideal editor due to things like the missing find / replace tools (being worked on though) and sometimes it is easy to get lost when switching between code view and book view. But the whole process was fairly painless apart from the one book with all the ID's, with that one I cut all the code from Sigil, pasted into OpenOffice for mass find / replace editing the cut / pasted back into Sigil. Afterwards I decided that just cut and paste the text may of been simpler.

JohnnyD 11-10-2009 08:28 AM

presumably the best free tool for converting pdf's is Mobipocket Creator (in my opinion). It is meant to create mobipocket files, but it will give you a set of intermediate files (html files) which are just perfect to use as base material for editing.

It will not only convert your pdf text to html, but will also preserve text properties like italics or bold. I'm not really sure if it will preserve font types, however. :chinscratch:

Next to that, it will preserve the images, if there are any, in their proper place in the text.

mediax 11-10-2009 08:29 AM

Quote:

Originally Posted by weedfreak (Post 652799)
Sigil is not, yet, an ideal editor due to things like the missing find / replace tools (being worked on though)

That and the ability to spellcheck in OpenOffice are the main issues that are making me consider going the two-step route. The idea of cutting and pasting between Sigil and OpenOffice as necessary could swing me to going straight to Sigil, though.

Thanks for the input :thanks:

mediax 11-10-2009 08:39 AM

Quote:

Originally Posted by JohnnyD (Post 652847)
presumably the best free tool for converting pdf's is Mobipocket Creator (in my opinion).

Is there a linux version of Mobipocket Creator? The system requirements I found on the Mobipocket site were only for Windows.

I already use Calibre for library management and file conversion, so I was planning on using that. That said, alternatives are always good!

Valloric 11-10-2009 09:07 AM

Quote:

Originally Posted by mediax (Post 652851)
That and the ability to spellcheck in OpenOffice are the main issues that are making me consider going the two-step route.

I'm actually thinking of bumping the spellcheck feature to just after the 0.2.0 redesign and RTF import.

mediax 11-10-2009 09:27 AM

Quote:

Originally Posted by Valloric (Post 652882)
I'm actually thinking of bumping the spellcheck feature to just after the 0.2.0 redesign and RTF import.

As a totally new user, I'm not yet in any position to comment on the relative priorities of features, but Search & Replace and Spellcheck are two features I would imagine get used in virtually every editing (as opposed to format conversion) exercise.

Valloric 11-10-2009 09:51 AM

Quote:

Originally Posted by mediax (Post 652891)
As a totally new user, I'm not yet in any position to comment on the relative priorities of features, but Search & Replace and Spellcheck are two features I would imagine get used in virtually every editing (as opposed to format conversion) exercise.

I would certainly agree. Search&Replace is for instance being worked on right now.

Spellchecking is a new feature. It's absolutely necessary, as you've noticed. But Sigil has some things broken, and bug fixes are almost always more important than new features.

The redesign is there to fix performance problems and problems with per-flow CSS. The RTF import is important, but not as important as what will come with it: importing functions will now be loadable plugins with a (hopefully) consistent interface. Other people will then be able to write their own plugins. Designing new importers will also be easier, so future work on importing Mobi, LIT etc will benefit from this.

But we will see. If spellchecking turns out to be the next killer feature everyone wants (after S&R), then it could even come before the RTF import and general importing rewrite.

kjk 11-10-2009 04:12 PM

Quote:

Originally Posted by Valloric (Post 652906)
But we will see. If spellchecking turns out to be the next killer feature everyone wants (after S&R), then it could even come before the RTF import and general importing rewrite.

Count me in as a vote for spell check/S&R/ any editing type enhancements/performance enhancements/etc... before any importing stuff, including RTF.

For me, getting stuff to ePub is the easy part...its everything else that I need Sigil for :D

Slite 11-11-2009 04:26 AM

Quote:

Originally Posted by kjk (Post 653256)
Count me in as a vote for spell check/S&R/ any editing type enhancements/performance enhancements/etc... before any importing stuff, including RTF.

For me, getting stuff to ePub is the easy part...its everything else that I need Sigil for :D

I disagree, strongly :)

Not on S&R, that should be implemented asap, but I think it would be more productive to get an api for import plugins in place before a spellchecker.

darkmonk 11-15-2009 01:37 AM

Quote:

Originally Posted by Slite (Post 653627)
I disagree, strongly :)

Not on S&R, that should be implemented asap, but I think it would be more productive to get an api for import plugins in place before a spellchecker.

I don't. Calibre does an insanely good job with a million formats, and asking to reinvent the wheel seems silly... Remember the UNIX philosophy!

Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface

Valloric 11-15-2009 04:37 PM

Quote:

Originally Posted by darkmonk (Post 657529)
I don't. Calibre does an insanely good job with a million formats, and asking to reinvent the wheel seems silly... Remember the UNIX philosophy!

Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface

To tell you the truth, I don't necessarily subscribe to the UNIX philosophy.

It works OK for UNIX which ships with all these different userland tools that you can chain together in interesting and useful ways, but that's hardly the case for Windows... which on last count represents ~92% of the market and the vast majority of Sigil's users, too.

Requiring users to have calibre installed is not something I'm aiming for. It's a wonderful application, to be sure, but Sigil should be able to stand on its own as an ebook editor, which means being able to import various ebook formats.

It's just something you'd expect of an ebook editor, now wouldn't you? :)

mediax 11-16-2009 05:26 PM

Quote:

Originally Posted by Valloric (Post 658094)
It's just something you'd expect of an ebook editor, now wouldn't you? :)

I personally feel that depends on whether your intention is to create a multi-format editor, or an epub editor.

In the former case, you'd expect the editor not only to read, but also to write, multiple formats; in the latter case, there's a valid argument to the effect that the application's task is to edit a specified format.

Mind you, if are planning to build a multi-format editor, it would be one hell of a valuable tool.

Kivgaen 11-19-2009 01:19 PM

speaking as a publisher...
 
Quote:

Originally Posted by Valloric (Post 658094)

Requiring users to have calibre installed is not something I'm aiming for. It's a wonderful application, to be sure, but Sigil should be able to stand on its own as an ebook editor, which means being able to import various ebook formats.

It's just something you'd expect of an ebook editor, now wouldn't you? :)

While I do not disagree with anything that you have said, being able to import various ebook formats IS important, and should most definitely be one of your very important items on the development list, I too would not put it at the top of the list above S/R.

But that may be because my goals might be different from yours. If your ultimate goal is for the public to be able to load up their ebook from whatever format it is that they are using, fix problems with it, and then spit it back out so that it looks prettier, then I would agree with you.

But if your goal is more geared towards publishers -- if you want your epub file editor to be more useful RIGHT NOW to publishers who are struggling to get all of their files ready for e-Readers, then those publishers will use Calibre as a secondary tool if they need to.

Publishers don't want what we can already get elsewhere (features in Calibre), we want what doesn't exist yet -- and that's a GOOD way to edit .epub files!

Valloric 11-19-2009 04:09 PM

Quote:

Originally Posted by Kivgaen (Post 662256)
I too would not put it at the top of the list above S/R.

Search&Replace is (FINALLY) nearing completion and a new version of Sigil with it should hopefully be released in a few days.

Quote:

Originally Posted by Kivgaen (Post 662256)
But if your goal is more geared towards publishers -- if you want your epub file editor to be more useful RIGHT NOW to publishers who are struggling to get all of their files ready for e-Readers, then those publishers will use Calibre as a secondary tool if they need to.

Publishers don't want what we can already get elsewhere (features in Calibre), we want what doesn't exist yet -- and that's a GOOD way to edit .epub files!

Point taken. While I do care a lot about the people doing professional work with Sigil (or at least trying to), I have to consider the people who edit ebooks for personal use.

But you're right. Improving the WYSIWYG/general editing experience should come first. And it does. The redesign is all about improved editing. I even got sidetracked with the current release (<sigh>) and implemented more than a few performance enhancements.

This is also one of the reasons why I'll be spending time on creating a plugin framework for the importers after v0.2.0: I'll be able to decouple the importing functionality from the editing, and hopefully others will want to contribute their own plugins, even independently of Sigil's main development branch.


All times are GMT -4. The time now is 06:43 PM.

Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.