Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Readers > Amazon Kindle

Notices

Reply
 
Thread Tools Search this Thread
Old 10-18-2011, 06:56 PM   #31
emalvick
Groupie
emalvick , Klaatu Barada Niktu!emalvick , Klaatu Barada Niktu!emalvick , Klaatu Barada Niktu!emalvick , Klaatu Barada Niktu!emalvick , Klaatu Barada Niktu!emalvick , Klaatu Barada Niktu!emalvick , Klaatu Barada Niktu!emalvick , Klaatu Barada Niktu!emalvick , Klaatu Barada Niktu!emalvick , Klaatu Barada Niktu!emalvick , Klaatu Barada Niktu!
 
Posts: 166
Karma: 5358
Join Date: Aug 2010
Location: Davis, CA
Device: Kindle 3
Quote:
Originally Posted by DiapDealer View Post
Actually most people are fine with PDF's. I know I am (except on 6 inch screens). Most of the negativity comes into play when someone wants to turn a PDF into something else. A PDF just doesn't convert well... easily. I doubt it ever will.
Your last sentence is very important... The whole point of PDF was so that a document would always look the same regardless of the computer reading it or the printer printing it.

... That being said, as a former graduate student and a researcher who reads and writes technical PDF's I can't say I am happy or not about what you can and can't do with them on a Kindle. I appreciate what they do for journals, printing, etc, but I do wish I could easily read them on my Kindle.

I can imagine other tablets are a decent solution, even a DX may be a decent solution, but in the technical world color is becoming more common, and I personally hate reading off a backlit monitor.

I really just wish the Kindle had a mechanism so that PDF's could be somewhat cropped to a specific size (you can do that via the zoom already) and then the page forward buttons would quickly scroll you to the bottom of the page and then the next page... similar to the way the page-up and page-down button works on a PC with Acrobat Reader.

It isn't ideal, but I don't think PDF's should need converting to ebooks. I just wish they could be handled a little better. I hate having to scroll around with the arrow buttons and the page forward can have me skipping parts of pages that I don't want to be skipping. The color issue will have to wait until colored e-ink becomes an option if it ever does.

By the way, I do find that turning the Kindle sideways and reading a PDF (6 in screen) is a reasonable method of reading a PDF.
emalvick is offline   Reply With Quote
Old 10-18-2011, 07:06 PM   #32
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,546
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by Blossom
So it doesn't convert italics? I was going to give a shot but I do get good results with Acrobat Pro on Novel PDFs. It pulls the styles from the PDF just fine as long as the PDF is tagged.
No, I assume it just uses the OCR text layer, but I could be wrong. I use Acrobat Pro a lot too, but it's always been a bit of a toss-up between it and other programs for me. I like that Acrobat will retain a lot of the styles when exporting, but if the page numbers and such (headers and footers) are not true adobe headers and footers (as is usually the case)... I still have to rely on external programs to strip them. And even then they're not truly "removed" from the PDF only hidden from view (and conversion programs will add them right back in to the mobi or epub.

So I usually have to decide between HTML with italics—but with pesky headers and footers to track down and remove (Acrobat). Or really nice, clean HTML with no pesky headers and footers, but no italics (PDFMasher). Both need regexed for paragraph fragments.

Last edited by DiapDealer; 10-18-2011 at 07:08 PM.
DiapDealer is offline   Reply With Quote
Advert
Old 10-18-2011, 07:29 PM   #33
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 73,896
Karma: 128597114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by jswinden View Post
This whole PDF discussion thing is getting pretty old. Adobe designed PDFs to be printed, not read on E Ink readers. They designed PDFs over 20 years ago for the purpose of being able to exchange secured documents digitally without worrying about unauthorized editing of those documents. For example, a lawyer could send a contract to a client via email. PDFs were never designed for our viewing pleasure!!! True, Adobe has tried to update PDF over the years, but it is still THE WORST form of document for reading on an electronic device.
Not quite. PDF was created to allow you to send a document to someone to be printed so you don't need to have the same program/fonts that was used to create the document. It wasn't about not being able to edit. It was about being able to duplicate the document on paper so what I send you will look the same on paper as when I print it from whatever program created it.

PDF was never designed to have the information needed to convert it to another format and it never will. Basically, if you have a PDF, the only way to convert it is to pick a program to convert it and then A/B compare every single pixel/letter/punctuation/etc. and also do any format fixing that needs to be done. Then you'll have your conversion. There is NO program that can convert a PDF of any reasonable size error free.
JSWolf is offline   Reply With Quote
Old 10-18-2011, 07:42 PM   #34
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 73,896
Karma: 128597114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by DiapDealer View Post
No, I assume it just uses the OCR text layer, but I could be wrong. I use Acrobat Pro a lot too, but it's always been a bit of a toss-up between it and other programs for me. I like that Acrobat will retain a lot of the styles when exporting, but if the page numbers and such (headers and footers) are not true adobe headers and footers (as is usually the case)... I still have to rely on external programs to strip them. And even then they're not truly "removed" from the PDF only hidden from view (and conversion programs will add them right back in to the mobi or epub.

So I usually have to decide between HTML with italics—but with pesky headers and footers to track down and remove (Acrobat). Or really nice, clean HTML with no pesky headers and footers, but no italics (PDFMasher). Both need regexed for paragraph fragments.
Acrobat Pro can handle the headers/footers just fine. All you need do is crop the pages so the headers/footers don't exist and then convert. That gets rid of them very well. Better then any other method.
JSWolf is offline   Reply With Quote
Old 10-18-2011, 10:58 PM   #35
Abichuela
Junior Member
Abichuela began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Oct 2011
Device: Kindle
Forgive me if I missed something, but if PDF isn't the best format to convert from, what is? Is it better to convert from a Word format to .mobi or .epub?
Abichuela is offline   Reply With Quote
Advert
Old 10-18-2011, 11:06 PM   #36
Blossom
Treasure Seeker
Blossom ought to be getting tired of karma fortunes by now.Blossom ought to be getting tired of karma fortunes by now.Blossom ought to be getting tired of karma fortunes by now.Blossom ought to be getting tired of karma fortunes by now.Blossom ought to be getting tired of karma fortunes by now.Blossom ought to be getting tired of karma fortunes by now.Blossom ought to be getting tired of karma fortunes by now.Blossom ought to be getting tired of karma fortunes by now.Blossom ought to be getting tired of karma fortunes by now.Blossom ought to be getting tired of karma fortunes by now.Blossom ought to be getting tired of karma fortunes by now.
 
Blossom's Avatar
 
Posts: 18,708
Karma: 26026435
Join Date: Mar 2010
Device: Kobo HD Glo, Kindles, Kindle Fires, Andriod Devices
Quote:
Originally Posted by Abichuela View Post
Forgive me if I missed something, but if PDF isn't the best format to convert from, what is? Is it better to convert from a Word format to .mobi or .epub?
Lit, epub or html those are easy formats to work with.
I use Word html as my source then import it into Calibre and convert to mobi and epub.
Blossom is offline   Reply With Quote
Old 10-19-2011, 07:15 AM   #37
tentimes
Junior Member
tentimes began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Oct 2011
Device: Kindle 4
Quote:
Originally Posted by Blossom View Post
So it doesn't convert italics? I was going to give a shot but I do get good results with Acrobat Pro on Novel PDFs. It pulls the styles from the PDF just fine as long as the PDF is tagged.
Blossom, what is it you are doing with Acrobat Pro to convert please? I have got a trial of it, but really unsure of how it is going to help me. I have tried exporting to word, but the results were pretty poor unfortunately (free.kindle.com converted better).

Maybe you are doing a few things together that are helping to make a good conversion? I would be most grateful for any advice
tentimes is offline   Reply With Quote
Old 10-19-2011, 08:51 AM   #38
avantman42
Wizard
avantman42 ought to be getting tired of karma fortunes by now.avantman42 ought to be getting tired of karma fortunes by now.avantman42 ought to be getting tired of karma fortunes by now.avantman42 ought to be getting tired of karma fortunes by now.avantman42 ought to be getting tired of karma fortunes by now.avantman42 ought to be getting tired of karma fortunes by now.avantman42 ought to be getting tired of karma fortunes by now.avantman42 ought to be getting tired of karma fortunes by now.avantman42 ought to be getting tired of karma fortunes by now.avantman42 ought to be getting tired of karma fortunes by now.avantman42 ought to be getting tired of karma fortunes by now.
 
avantman42's Avatar
 
Posts: 1,090
Karma: 6058305
Join Date: Sep 2010
Location: UK
Device: Kindle Paperwhite
Quote:
Originally Posted by tentimes View Post
If it's a matter of a series of text boxes per page, then it's a matter of (assuming most pages don't overlap these box areas and overprint) taking the text boxes in order, getting the relative font sizes, assuming the large font sizes with the text form "Chapter XX" are start of chapter
Have you tried Calibre? The heuristic processing option does this sort of thing, but is disabled by default.

Quote:
Originally Posted by tentimes View Post
"Chapter XX" are start of chapter of there is no internal byte code to five you end of chapter (which I bet there is)
Honestly, I'd be prepared to bet there isn't, but I'd like to be wrong.

I've found pdftohtml gives good results with some PDFs. Calibre and pdftohtml are both open source, so if you do decide to try and write something better, it might be worth having a look at how they do things.
avantman42 is offline   Reply With Quote
Old 10-19-2011, 09:45 AM   #39
Zeebra
Evangelist
Zeebra ought to be getting tired of karma fortunes by now.Zeebra ought to be getting tired of karma fortunes by now.Zeebra ought to be getting tired of karma fortunes by now.Zeebra ought to be getting tired of karma fortunes by now.Zeebra ought to be getting tired of karma fortunes by now.Zeebra ought to be getting tired of karma fortunes by now.Zeebra ought to be getting tired of karma fortunes by now.Zeebra ought to be getting tired of karma fortunes by now.Zeebra ought to be getting tired of karma fortunes by now.Zeebra ought to be getting tired of karma fortunes by now.Zeebra ought to be getting tired of karma fortunes by now.
 
Zeebra's Avatar
 
Posts: 461
Karma: 956567
Join Date: Oct 2010
Location: Toronto, Canada
Device: Kindle Oasis 3
Quote:
Originally Posted by DiapDealer View Post
I've found the footnotes function to be a tad flaky with PDFMasher, but I've gotten pretty close a few time with documents that had buckets of footnotes. Close enough that I didn't mind fixing up the results by hand. And it seems to be getting better all the time. The different sorting abilities makes it pretty powerful and it's by far my favorite for very simply formatted novels, but losing italics really annoys me when converting PDF's (not PDFMasher's fault, I know).
Yeah when I figured out the sorting abilities it was a big "TA-DA!" moment for me to identify the headers and footers easily. I kinda like this app, not that I had many PDFs to convert but it's pretty cool.
Zeebra is offline   Reply With Quote
Old 10-19-2011, 01:04 PM   #40
Blossom
Treasure Seeker
Blossom ought to be getting tired of karma fortunes by now.Blossom ought to be getting tired of karma fortunes by now.Blossom ought to be getting tired of karma fortunes by now.Blossom ought to be getting tired of karma fortunes by now.Blossom ought to be getting tired of karma fortunes by now.Blossom ought to be getting tired of karma fortunes by now.Blossom ought to be getting tired of karma fortunes by now.Blossom ought to be getting tired of karma fortunes by now.Blossom ought to be getting tired of karma fortunes by now.Blossom ought to be getting tired of karma fortunes by now.Blossom ought to be getting tired of karma fortunes by now.
 
Blossom's Avatar
 
Posts: 18,708
Karma: 26026435
Join Date: Mar 2010
Device: Kobo HD Glo, Kindles, Kindle Fires, Andriod Devices
Quote:
Originally Posted by tentimes View Post
Blossom, what is it you are doing with Acrobat Pro to convert please? I have got a trial of it, but really unsure of how it is going to help me. I have tried exporting to word, but the results were pretty poor unfortunately (free.kindle.com converted better).

Maybe you are doing a few things together that are helping to make a good conversion? I would be most grateful for any advice
I just convert the PDF to html 3.2 I use Acrobat Pro 9 Then I open it up in Word 2003 and fix the broken sentences with regular expressions. Then I fixed the chapter headers to match each other. I then do several regular expressions to check for things I missed like page number, headers, footers...etc It takes about 5 to 10 minutes to get a good readable copy once you have the method down.
Blossom is offline   Reply With Quote
Old 10-19-2011, 01:10 PM   #41
alansplace
Grand Sorcerer
alansplace ought to be getting tired of karma fortunes by now.alansplace ought to be getting tired of karma fortunes by now.alansplace ought to be getting tired of karma fortunes by now.alansplace ought to be getting tired of karma fortunes by now.alansplace ought to be getting tired of karma fortunes by now.alansplace ought to be getting tired of karma fortunes by now.alansplace ought to be getting tired of karma fortunes by now.alansplace ought to be getting tired of karma fortunes by now.alansplace ought to be getting tired of karma fortunes by now.alansplace ought to be getting tired of karma fortunes by now.alansplace ought to be getting tired of karma fortunes by now.
 
alansplace's Avatar
 
Posts: 5,886
Karma: 464403178
Join Date: Feb 2010
Location: 33.9388° N, 117.2716° W
Device: Kindles K-2, K-KB, PW 1 & 2, Voyage, Fire 2, 5 & HD 8, Surface 3, iPad
Cool zip

Quote:
Originally Posted by Blossom View Post
I just convert the PDF to html 3.2 I use Acrobat Pro 9 Then I open it up in Word 2003 and fix the broken sentences with regular expressions. Then I fixed the chapter headers to match each other. I then do several regular expressions to check for things I missed like page number, headers, footers...etc It takes about 5 to 10 minutes to get a good readable copy once you have the method down.
if you've saved those regex[s] you should zip them up and share them in a post somewhere.
alansplace is offline   Reply With Quote
Old 10-19-2011, 01:39 PM   #42
Blossom
Treasure Seeker
Blossom ought to be getting tired of karma fortunes by now.Blossom ought to be getting tired of karma fortunes by now.Blossom ought to be getting tired of karma fortunes by now.Blossom ought to be getting tired of karma fortunes by now.Blossom ought to be getting tired of karma fortunes by now.Blossom ought to be getting tired of karma fortunes by now.Blossom ought to be getting tired of karma fortunes by now.Blossom ought to be getting tired of karma fortunes by now.Blossom ought to be getting tired of karma fortunes by now.Blossom ought to be getting tired of karma fortunes by now.Blossom ought to be getting tired of karma fortunes by now.
 
Blossom's Avatar
 
Posts: 18,708
Karma: 26026435
Join Date: Mar 2010
Device: Kobo HD Glo, Kindles, Kindle Fires, Andriod Devices
Quote:
Originally Posted by alansplace View Post
if you've saved those regex[s] you should zip them up and share them in a post somewhere.
They are really more Word 2003 wildcards but Basically This is my reference notes I hope you can make heads or tales out of them.

Code:
Do a S&R for Manual line breaks and replace with paragraph marks.

MS Word it uses ^13 for a return, with wildcard box checked in the Search Box

^13([a-z]) = This checks for broken sentences

([a-zA-Z])^13 = This checks for broken sentences

([a-z])^13([A-Z]) = This checks for broken sentences

Replace Box
\1 and \2 if there is more then one bracket, add appropriate spaces as needed.

[0-9]{1,}^13 = This checks for page numbers 
[0-9]{1,} = Second check for page numbers and OCR error where numbers replace letters. 

[A-Z]{3,} = Match Case checked, Replace 3, if needed for more word matches.
On Chapter Headers I use S&R if they are already in bold this makes it easier, then I do a search to find bold text using the formatting button. Word has a powerful search! You can search by formatting or wildcards, special word characters or just the regular way. I can then do a replace only on the formatting.

I also use the Styles panel to make batch changes. Alot of back titles I buy have inconsistency when it comes to formatting this feature comes in handy to fix that quick. Highlighting a chapter heading and then click Clear formatting and clicking the appropriate style will really help it to take on the correct formatting you want.

I also use Macros to make it alot faster!

Last edited by Blossom; 10-19-2011 at 01:41 PM.
Blossom is offline   Reply With Quote
Old 10-19-2011, 02:41 PM   #43
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,546
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
For broken sentences in HTML, I use the following search regex:
Code:
([^.”":?’'!>—…)])</p>\s+<p[^>]*>
And the replace would be:
Code:
\1
(NOTE: there needs to be a "space" character following the \1 for it to work properly)

I don't trust it enough to blindly do a "Replace All" on a whole book, but I rarely have to intervene when stepping through a document an incident at a time.
DiapDealer is offline   Reply With Quote
Old 10-19-2011, 02:45 PM   #44
Blossom
Treasure Seeker
Blossom ought to be getting tired of karma fortunes by now.Blossom ought to be getting tired of karma fortunes by now.Blossom ought to be getting tired of karma fortunes by now.Blossom ought to be getting tired of karma fortunes by now.Blossom ought to be getting tired of karma fortunes by now.Blossom ought to be getting tired of karma fortunes by now.Blossom ought to be getting tired of karma fortunes by now.Blossom ought to be getting tired of karma fortunes by now.Blossom ought to be getting tired of karma fortunes by now.Blossom ought to be getting tired of karma fortunes by now.Blossom ought to be getting tired of karma fortunes by now.
 
Blossom's Avatar
 
Posts: 18,708
Karma: 26026435
Join Date: Mar 2010
Device: Kobo HD Glo, Kindles, Kindle Fires, Andriod Devices
Quote:
Originally Posted by DiapDealer View Post
For broken sentences in HTML, I use the following search regex:
Code:
([^.”":?’'!>—…)])</p>\s+<p[^>]*>
And the replace would be:
Code:
\1
(NOTE: there needs to be a "space" character following the \1 for it to work properly)

I don't trust it enough to blindly do a "Replace All" on a whole book, but I rarely have to intervene when stepping through a document an incident at a time.
I will have try this when working with code.
What program does this work with? I've tried Notepad++ and Notepad2 and it can't find anything.

Last edited by Blossom; 10-19-2011 at 02:48 PM.
Blossom is offline   Reply With Quote
Old 10-19-2011, 03:04 PM   #45
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,546
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by Blossom View Post
I will have try this when working with code.
What program does this work with? I've tried Notepad++ and Notepad2 and it can't find anything.
I use it mostly with Sigil and Komodo Edit. I like Notepad++ as a code editor, but it gives me fits when trying to use more complex, multi-line, regex S&R.
DiapDealer is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
KINDLE DEAL: The Holy Bible: NKJV ($3.36 CANADA) gospelebooks Deals and Resources (No Self-Promotion or Affiliate Links) 2 04-09-2011 12:07 PM
Free Book (Kindle / Nook) - The Holy Bible koland Deals and Resources (No Self-Promotion or Affiliate Links) 21 11-14-2010 01:51 PM
Free Book (Kindle) - The Holy Bible koland Deals and Resources (No Self-Promotion or Affiliate Links) 21 10-09-2010 10:31 AM
Free Book (Kindle) - Holy Bible (GW) koland Deals and Resources (No Self-Promotion or Affiliate Links) 0 10-04-2010 03:29 AM
The search for the Holy Grail of reading lights continues Bob Russell News 19 04-01-2009 01:24 PM


All times are GMT -4. The time now is 09:27 AM.


MobileRead.com is a privately owned, operated and funded community.