Mac to Kindle 2 and Other Older Readers

MarjaE · 03-13-2018, 11:15 AM

Hi,

I'm still working out how to handle pdfs. But I've had a lot of trial and error and I'd like to share.

First: Keep your originals. If you don't have enough disk space, I'd suggest storing some on an external drive, and setting Time Machine to back up the external drive as well as the main drive.

Second: Many pdfs encode images as jpeg2000. It takes less space than jpeg, but some Macs will take longer to load pages from these pdfs, and Kindle 2s and other older readers won't be able to load images. It can be particularly bad with scanned pdfs, where it means the older readers won't be able to load anything. I use Easyfind and search for "jpxdecode" in file contents, to identify files with jpeg2000 and other jpx images.

Apple changed their Quartz decoder in Sierra, so it's much more reluctant to convert jpeg2000 images in pdf files to jpeg images. You'll need other tools if you want to convert jpeg2000 images in pdf files to jpeg images.

My suggestions:

-- Willus's k2pdfopt-- http://www.willus.com/k2pdfopt/

-- Homebrew-- https://brew.sh/ unless you use MacPorts instead.

-- Ghostscript-- can be installed through Homebrew

-- rwts-pdfwriter-- https://github.com/rodyager/RWTS-PDFwriter

-- cpdf-- can be installed through Homebrew

-- ocrmypdf-- can be installed through Homebrew-- you may need to brew uninstall Tesseract and brew install --all-languages tesseract

-- Automator-- comes with your computer and can help avoid typing and retyping terminal commands.

My workflow, more or less:

First, do I need to ocr the text? That's often the case with scanned texts, and occasionally with other texts due to text encoding errors.

If I need to ocr the text, then I need to use either ocrmypdf or Elucidate. For whatever reason, the resulting files don't play well with Ghostscript, so I will need to use k2pdfopt on them.

If I don't need to ocr the text, then is it raster or vector? is any text pixelated?

If it's raster, and I don't mind more pixellation, don't mind losing colors, and don't mind resetting fold-out pages to the same size as other pages, then I can use k2pdfopt with decent compression.

If it's raster, and I do mind, I can use k2pdfopt without compression or ghostscript converting to pdf 1.4.

If it's vector, I suggest ghostscript converting to pdf 1.4.

My command-line codes:

For ocring text:

ocrmypdf -l lan --force-ocr input.pdf output.pdf

-l lan allows a 3-letter code to specify the language. If you skip this, it defaults to English.

--force-ocr overwrites existing text layers. If the file has a Google Books intro but no text layer afterwards, or the files has a bad text layer, this is useful.

input.pdf I tend to drag and drop from the Finder into the terminal window.

output.pdf It should appear in your user folder.

For k2pdfopt with compression:

k2pdfopt -mode copy -dev dx
input.pdf

-dev dx sets it to reformat everything for the Kindle dx. There are other codes for some other devices.

I hit enter after the codes here, and then drag and drop the input file into the k2 window.

The customization tools here are handy: http://www.willus.com/k2pdfopt/help/mac.shtml

For k2pdfopt without compression:

k2pdfopt -mode copy

I hit enter after the codes here, and then drag and drop the input file into the k2 window.

The customization tools here are handy: http://www.willus.com/k2pdfopt/help/mac.shtml

For ghostscript to convert:

gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf

Output should appear in your user folder.

Modified from instructions here: http://www.spoonylife.org/level-3/co...to-1-5-1-6-etc

For Automator:

I haven't figured out how to use Automator with the other tools yet, but I use it to simplify that Ghostscript script.

I created an app with a single step: run shell script. "shell" is "/bin/bash" and "pass input" is "as arguments"; the actual code is:

for f in "$@"
do
suffix="-converted.pdf"
base=`basename "$f" .pdf`
outputfile=$base$suffix
/usr/local/bin/gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -sstdout=%sstderr -dPDFSETTINGS=/screen -dNOPAUSE -dQUIET -dBATCH -sOutputFile="$outputfile" "$f"
done

I can just drag files onto the app icon and Ghostscript converts them to 1.4, converting any jpeg2000 images to jpeg.

Output should appear in your user folder.

Anyway, I hope this helps.

MarjaE · 03-14-2018, 01:47 PM

Unfortunately, k2pdfopt sometimes drops and/or reorders ocr'd text. I think ocrmypdf -l lan --output-type pdfa-1 --force-ocr input.pdf output.pdf may be a better option, without running through k2 afterwards.

03-13-2018, 11:15 AM	#1
MarjaE Guru Posts: 924 Karma: 53902736 Join Date: Jun 2015 Device: multiple	Mac to Kindle 2 and Other Older Readers Hi, I'm still working out how to handle pdfs. But I've had a lot of trial and error and I'd like to share. First: Keep your originals. If you don't have enough disk space, I'd suggest storing some on an external drive, and setting Time Machine to back up the external drive as well as the main drive. Second: Many pdfs encode images as jpeg2000. It takes less space than jpeg, but some Macs will take longer to load pages from these pdfs, and Kindle 2s and other older readers won't be able to load images. It can be particularly bad with scanned pdfs, where it means the older readers won't be able to load anything. I use Easyfind and search for "jpxdecode" in file contents, to identify files with jpeg2000 and other jpx images. Apple changed their Quartz decoder in Sierra, so it's much more reluctant to convert jpeg2000 images in pdf files to jpeg images. You'll need other tools if you want to convert jpeg2000 images in pdf files to jpeg images. My suggestions: -- Willus's k2pdfopt-- http://www.willus.com/k2pdfopt/ -- Homebrew-- https://brew.sh/ unless you use MacPorts instead. -- Ghostscript-- can be installed through Homebrew -- rwts-pdfwriter-- https://github.com/rodyager/RWTS-PDFwriter -- cpdf-- can be installed through Homebrew -- ocrmypdf-- can be installed through Homebrew-- you may need to brew uninstall Tesseract and brew install --all-languages tesseract -- Automator-- comes with your computer and can help avoid typing and retyping terminal commands. My workflow, more or less: First, do I need to ocr the text? That's often the case with scanned texts, and occasionally with other texts due to text encoding errors. If I need to ocr the text, then I need to use either ocrmypdf or Elucidate. For whatever reason, the resulting files don't play well with Ghostscript, so I will need to use k2pdfopt on them. If I don't need to ocr the text, then is it raster or vector? is any text pixelated? If it's raster, and I don't mind more pixellation, don't mind losing colors, and don't mind resetting fold-out pages to the same size as other pages, then I can use k2pdfopt with decent compression. If it's raster, and I do mind, I can use k2pdfopt without compression or ghostscript converting to pdf 1.4. If it's vector, I suggest ghostscript converting to pdf 1.4. My command-line codes: For ocring text: ocrmypdf -l lan --force-ocr input.pdf output.pdf -l lan allows a 3-letter code to specify the language. If you skip this, it defaults to English. --force-ocr overwrites existing text layers. If the file has a Google Books intro but no text layer afterwards, or the files has a bad text layer, this is useful. input.pdf I tend to drag and drop from the Finder into the terminal window. output.pdf It should appear in your user folder. For k2pdfopt with compression: k2pdfopt -mode copy -dev dx input.pdf -dev dx sets it to reformat everything for the Kindle dx. There are other codes for some other devices. I hit enter after the codes here, and then drag and drop the input file into the k2 window. The customization tools here are handy: http://www.willus.com/k2pdfopt/help/mac.shtml For k2pdfopt without compression: k2pdfopt -mode copy I hit enter after the codes here, and then drag and drop the input file into the k2 window. The customization tools here are handy: http://www.willus.com/k2pdfopt/help/mac.shtml For ghostscript to convert: gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf Output should appear in your user folder. Modified from instructions here: http://www.spoonylife.org/level-3/co...to-1-5-1-6-etc For Automator: I haven't figured out how to use Automator with the other tools yet, but I use it to simplify that Ghostscript script. I created an app with a single step: run shell script. "shell" is "/bin/bash" and "pass input" is "as arguments"; the actual code is: for f in "$@" do suffix="-converted.pdf" base=`basename "$f" .pdf` outputfile=$base$suffix /usr/local/bin/gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -sstdout=%sstderr -dPDFSETTINGS=/screen -dNOPAUSE -dQUIET -dBATCH -sOutputFile="$outputfile" "$f" done I can just drag files onto the app icon and Ghostscript converts them to 1.4, converting any jpeg2000 images to jpeg. Output should appear in your user folder. Anyway, I hope this helps. Last edited by MarjaE; 03-13-2018 at 11:17 AM.

Thread Tools	Search this Thread
Show Printable Version Email this Page	Search this Thread: Advanced Search

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Best UI on Older E-readers?	Richwood	Which one should I buy?	7	12-11-2017 11:10 AM
Hacks Where can I find older version of kindlegen(mac platform)?	flyingfoxlee	Amazon Kindle	1	12-02-2013 07:16 AM
Mac epub readers with smooth scrolling?	apennebaker	Reading and Management	2	08-26-2013 01:49 PM
Older version of Kindle for iPhone? HELP!	allisondbl	Amazon Kindle	0	05-24-2012 09:09 PM
Epub/Pdf readers that display text horizontally (on Mac OS X 10.6)	ilovepurple2234	General Discussions	0	10-31-2011 10:50 AM

03-14-2018, 01:47 PM	#2
MarjaE Guru Posts: 924 Karma: 53902736 Join Date: Jun 2015 Device: multiple	Unfortunately, k2pdfopt sometimes drops and/or reorders ocr'd text. I think ocrmypdf -l lan --output-type pdfa-1 --force-ocr input.pdf output.pdf may be a better option, without running through k2 afterwards.

Advert