Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 07-31-2011, 11:10 AM   #1
Xyo
Junior Member
Xyo began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Jul 2011
Device: Kindle 3
Converting .pdf to ePub - Pages jumbled.

^ Sorry, the title should read .pdf to .mobi. ^

Hey guys,

So for a while now I've had some books in .pdf format that I've wanted to put onto my kindle 3. I've done this with normal .pdfs, but these particular files are formatted with two columns of text per page. When I converted them to .mobi, it was jumbled, as you would expect.

So today I found out about the BRISS software that crops pdf files, and everything seemed to be going well (I now have a .pdf file of all the text in a single column, in order), but I'm still having issues.

As soon as I convert the .pdf to .mobi (using the most recent version of Calibre), everything goes wrong. The first thing I notice is that everything is there twice, like every page is duplicated. Second, bits of the text seem jumbled together somewhat. I can't quite figure out the pattern of how it was jumbled, but it's definitely messed up.

Oh and I also tried converting to .ePub, which gave the same result.

Has anyone got any idea why this is happening?

EDIT: It seems maybe the .pdf BRISS is producing is the problem. Only my pdf reader (foxit reader) seems to be able to read it normally, everything else reads it and jumbles things up.

Last edited by Xyo; 07-31-2011 at 12:27 PM.
Xyo is offline   Reply With Quote
Old 07-31-2011, 12:15 PM   #2
DSpider
Addict
DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.
 
DSpider's Avatar
 
Posts: 399
Karma: 326969
Join Date: Nov 2009
Location: Romania
Device: iPod touch 2G (16 GB)
The Kindle 3 doesn't support ePub. In fact, Amazon itself doesn't support ePub.

It's been discussed before, PDF is an output format.

There are two types of PDF: regular and tagged. A regular PDF is what the majority of PDFs are, with objects that seem to "float" on a blank piece of paper. You can usually spot them very quickly by simply selecting the text n Adobe Reader, or whatever. If letters (or groups of letters) are separated from the next, it means each has a unique position on the page. Which makes it very difficult for converting software to process. Sometimes they have to approximate.

Tagged PDFs, however, use tags for paragraphs, formatting, etc. They're usually much easier to convert (tho far from perfect either). If you select some text and it doesn't have blank spaces between letters, then it should be ok.


Anyway, it's always better to use the source document instead of converting the PDF. Especially since it went through BRISS... Cropping software always messes up the PDF, even if you don't see it on the surface.

What you could do is OCR the PDF using ABBYY FineReader or similar, proofread it - which takes some time (basically means you read the whole thing), then save it as HTML and work your way from there (using Sigil or similar) to save it as ePub. Or Mobi. Or whatever format you want... I'd go with .docx personally, or .odt if you're using LibreOffice.

Last edited by DSpider; 07-31-2011 at 12:21 PM.
DSpider is offline   Reply With Quote
Old 07-31-2011, 12:41 PM   #3
Xyo
Junior Member
Xyo began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Jul 2011
Device: Kindle 3
Sorry, I meant to say .mobi in the title, not .ePub.

I see what you mean about the regular/tagged .pdfs and unfortunately all I have is a regular .pdf (no source document).

I tried putting the pdf I got out of BRISS on the kindle and it looks better than I had expected. I need to run it again to make sure every page is cropped to the same size, but it seems workable otherwise.

Thanks for your help
Xyo is offline   Reply With Quote
Old 07-31-2011, 12:52 PM   #4
DSpider
Addict
DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.
 
DSpider's Avatar
 
Posts: 399
Karma: 326969
Join Date: Nov 2009
Location: Romania
Device: iPod touch 2G (16 GB)
Alternatively you could export the PDF as JPG images, resized specifically for the reader's screen (most have a resolution of 800 x 600).
DSpider is offline   Reply With Quote
Old 07-31-2011, 01:15 PM   #5
Xyo
Junior Member
Xyo began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Jul 2011
Device: Kindle 3
That's a good idea, although I think cropping them is unavoidable anyway. I think the font is only just big enough on the BRISS cropped .pdfs, so I'll just have to put up with the variance in size.

Last edited by Xyo; 07-31-2011 at 01:17 PM.
Xyo is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Duplicate pages converting PDF in Mobi karlbx Kindle Formats 5 07-15-2011 03:42 PM
Converting CHM to PDF - pages not on 1 page Merritt Conversion 2 07-03-2011 11:24 AM
Images are broken across 2 pages after converting to ePub from RTF. iPhone. vital2k Conversion 2 05-31-2011 01:46 AM
PDF to HTML jumbled text jeero PDF 2 09-03-2010 04:12 AM
Converting PDF - Removing text at top of pages halljames Calibre 4 07-21-2009 07:00 AM


All times are GMT -4. The time now is 06:09 AM.


MobileRead.com is a privately owned, operated and funded community.