Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > PDF

Notices

Reply
 
Thread Tools Search this Thread
Old 02-25-2020, 10:28 PM   #1
ctop
Connoisseur
ctop herds cats with both ease and gracectop herds cats with both ease and gracectop herds cats with both ease and gracectop herds cats with both ease and gracectop herds cats with both ease and gracectop herds cats with both ease and gracectop herds cats with both ease and gracectop herds cats with both ease and gracectop herds cats with both ease and gracectop herds cats with both ease and gracectop herds cats with both ease and grace
 
Posts: 53
Karma: 43710
Join Date: Jun 2008
Device: zaurus->palm->iPad->Sony PRS-T1,T2,T3->Kobo Forma&Likebook Ares
Optimize PDFs from archive.org for E-Ink devices

The internet archive at archive has a lot of interesting books for borrowing and downloading. I have some downloads of older books, that are difficult to read on E-Ink devices because they include the background of the page, which has become yellow. So the contrast is low and the text becomes unclear, also the files are quite big. So I wonder if somebody knows a good way to trim the PDFs for ereaders. I would prefer to use a commandline on a Linux based system, if such a tool is available here.
An example of the PDFs I am looking at is this:

https://archive.org/details/smtliche...ge/n8/mode/2up

(This is the item page, the download link is here

https://archive.org/download/smtlich...r16goet_bw.pdf

Any help appreciated, Ctop
ctop is offline   Reply With Quote
Old 02-25-2020, 11:55 PM   #2
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 1,620
Karma: 7451779
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by ctop View Post
[...] the background of the page, which has become yellow. So the contrast is low and the text becomes unclear, [...]

I would prefer to use a commandline on a Linux based system, if such a tool is available here.
GUI-based:

Scan Tailor Advanced:

https://github.com/4lex4/scantailor-advanced

There isn't another tool like it.

If you want commandline, then there's nothing better than ImageMagick, but you'll have to come up with all the tweaks yourself.

There was also "What’s your “image rehab” routine?" from 2013 which discussed some image cleanup ideas. Although that mostly focused on cleaning up images within scans.

Side Note: Archive.org's B&W versions are usually okay. In this case, it requires lots of manual intervention. Go back to the color PDF (or like GrannyGrump mentions in the thread above, use the original JPEG2000 files), and do all your cleaning from there.

This specific file also has a lot of bleeding through the pages, so that may make your job extra harder when trying to darken text.

Quote:
Originally Posted by ctop View Post
also the files are quite big. So I wonder if somebody knows a good way to trim the PDFs for ereaders.
Scan Tailor Advanced should be able to do all the chopping/cropping/contrast adjustments for you. But if you need even more PDF tweaking beyond that, then there's k2pdfopt, by willus.

Last edited by Tex2002ans; 02-25-2020 at 11:57 PM.
Tex2002ans is offline   Reply With Quote
Advert
Old 02-26-2020, 01:52 AM   #3
ctop
Connoisseur
ctop herds cats with both ease and gracectop herds cats with both ease and gracectop herds cats with both ease and gracectop herds cats with both ease and gracectop herds cats with both ease and gracectop herds cats with both ease and gracectop herds cats with both ease and gracectop herds cats with both ease and gracectop herds cats with both ease and gracectop herds cats with both ease and gracectop herds cats with both ease and grace
 
Posts: 53
Karma: 43710
Join Date: Jun 2008
Device: zaurus->palm->iPad->Sony PRS-T1,T2,T3->Kobo Forma&Likebook Ares
Quote:
Originally Posted by Tex2002ans View Post
GUI-based:

Scan Tailor Advanced:

https://github.com/4lex4/scantailor-advanced

There isn't another tool like it.
.
Thanks. I was somehow hoping that I could just clean the images without disturbing the text layer. I have been using scantailor (though not the advanced version, thanks for pointing that out) for books I scanned myself, and am quite pleased with the results. So it seems what you are saying, it is best to throw away all the post-processing already done and start from the images. Sigh, with a GUI based program that is quite a lot of work...

All the best,
Ctop
ctop is offline   Reply With Quote
Old 02-26-2020, 04:37 AM   #4
doubleshuffle
Unicycle Daredevil
doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.
 
doubleshuffle's Avatar
 
Posts: 13,868
Karma: 182888522
Join Date: Jan 2011
Location: Planet of the Pudding Brains
Device: Aura HD (R.I.P. After six years the USB socket died.) tolino shine 3
Why not fix the epub and upload it to the MR library? Will be much nicer on your reader, and also a service to the community.
doubleshuffle is offline   Reply With Quote
Old 02-26-2020, 05:56 AM   #5
ctop
Connoisseur
ctop herds cats with both ease and gracectop herds cats with both ease and gracectop herds cats with both ease and gracectop herds cats with both ease and gracectop herds cats with both ease and gracectop herds cats with both ease and gracectop herds cats with both ease and gracectop herds cats with both ease and gracectop herds cats with both ease and gracectop herds cats with both ease and gracectop herds cats with both ease and grace
 
Posts: 53
Karma: 43710
Join Date: Jun 2008
Device: zaurus->palm->iPad->Sony PRS-T1,T2,T3->Kobo Forma&Likebook Ares
Quote:
Originally Posted by doubleshuffle View Post
Why not fix the epub and upload it to the MR library? Will be much nicer on your reader, and also a service to the community.
I had not even thought about that. I will have a look and see if it can be done in a reasonable timeframe.

Ctop
ctop is offline   Reply With Quote
Advert
Old 02-26-2020, 12:13 PM   #6
doubleshuffle
Unicycle Daredevil
doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.
 
doubleshuffle's Avatar
 
Posts: 13,868
Karma: 182888522
Join Date: Jan 2011
Location: Planet of the Pudding Brains
Device: Aura HD (R.I.P. After six years the USB socket died.) tolino shine 3
It is a lot of work, no denying that. But your pdf-fixing efforts sound pretty complicated too, so that's what gave me the idea.

I only now had a look at the book you have in mind. That's huuuge, of course, and seriously a lot of work.

BTW, there's a very nice epub edition of Goethe's works in our library, provided by pynch. But I'm not sure if the scientific writngs are complete in that one.
doubleshuffle is offline   Reply With Quote
Old 02-26-2020, 12:18 PM   #7
doubleshuffle
Unicycle Daredevil
doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.
 
doubleshuffle's Avatar
 
Posts: 13,868
Karma: 182888522
Join Date: Jan 2011
Location: Planet of the Pudding Brains
Device: Aura HD (R.I.P. After six years the USB socket died.) tolino shine 3
Just had a look at the txt file of the book - a very clean OCR result with surprisingly few errors. Fixing the epub may really be the way to go here.
doubleshuffle is offline   Reply With Quote
Old 02-26-2020, 06:41 PM   #8
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 1,620
Karma: 7451779
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by ctop View Post
I was somehow hoping that I could just clean the images without disturbing the text layer.
Yeah, that's the one disadvantage of Scan Tailor, it recreates/morphs the original text.

But if you're using it for personal copies, or a pre-processor for more accurate OCR, it's great.

The nice thing about it is you can also do page-by-page adjustments, and see how the final output will look. For example, speckle cleanup is fantastic, and you can see the diffs and adjust the strength if necessary.

Quote:
Originally Posted by ctop View Post
I have been using scantailor (though not the advanced version, thanks for pointing that out) for books I scanned myself, and am quite pleased with the results.
The original is not maintained any more, while the other forks added lots of functionality (like better multi-threading—you can see the entire enhancement list on Github).

Scan Tailor Advanced combines all the best functionality from all of them, and I believe it's the only one actively maintained.

Quote:
Originally Posted by ctop View Post
So it seems what you are saying, it is best to throw away all the post-processing already done and start from the images.
Yes. Archive.org just does a whole host of automated conversions... and I wouldn't use them if you could help it.

I usually just stick with their:

1. B&W PDF. Usually this is decent. In the case of this specific "yellowed book", it was crap.

2. Color PDF. This matches what they show in their online reader. Helpful if working with color, drawings, or "yellowed books". (You can do your own contrast/color corrections from this, and create a better grayscale/B&W version.)

3. As a last resort, work directly from the JPEG2000 images. These are the highest resolution/quality.

Do not touch their "EPUBs" or any of their other "ebook" formats (they are just automatically run through OCR, no proofing or anything). You're better off working from the source files and recreating your own OCR/ebooks from that.

Plus, if you have access to newer tools, you may get even more accurate conversion (according to the metadata, Finereader 8 was used, where Finereader 12+ is probably more accurate).

PS. If you need me to run any images/PDFs (pre-processed or not) through Finereader 12, just let me know.

Quote:
Originally Posted by ctop View Post
Sigh, with a GUI based program that is quite a lot of work...
You can always automate any pre-processing steps with ImageMagick. For example, I was working on a book with scanning artifacts that ran vertically through the text:

Detecting/Removing Vertical Scanlines from Scans

So it could be used to clean up the images, then run through further corrections/tools after.

But with ImageMagick... you'll have to spend time figuring out all the commands + recreating fixes that may already exist.

For example, Scan Tailor already does a fantastic job of dewarping, detecting and cropping spines+edges-of-pages, [...].

If you go pure commandline ImageMagick... you'll have to figure out all those algorithms on your own. (Plus each book is going to have its own unique challenges.)

Last edited by Tex2002ans; 02-26-2020 at 06:51 PM.
Tex2002ans is offline   Reply With Quote
Old 02-26-2020, 07:34 PM   #9
hobnail
Running with scissors
hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.
 
Posts: 748
Karma: 5112464
Join Date: Nov 2019
Device: none
Quote:
Originally Posted by doubleshuffle View Post
Just had a look at the txt file of the book - a very clean OCR result with surprisingly few errors. Fixing the epub may really be the way to go here.

I've also done it using the txt file and depending on the quality of the scan and the original book it can be a painful amount of work.
hobnail is offline   Reply With Quote
Old 02-26-2020, 09:50 PM   #10
Pajamaman
Wizard
Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.
 
Pajamaman's Avatar
 
Posts: 1,948
Karma: 6879034
Join Date: May 2016
Location: Quebec, QC
Device: Nook, Onyx
I suggest you try koreader. It contains ocr and reflow capacity on the fly. It also has contrast.

On another note, does anyone know a pdf tool that can ocr text that curves up at the end of a line as a result of the edge of a book page not being flat when scanned?
Pajamaman is offline   Reply With Quote
Old 02-27-2020, 12:34 AM   #11
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 1,620
Karma: 7451779
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by Pajamaman View Post
On another note, does anyone know a pdf tool that can ocr text that curves up at the end of a line as a result of the edge of a book page not being flat when scanned?
You have to dewarp the images. Scan Tailor Advanced can do that.

Convert the PDF into PNG or TIFF images, run Scan Tailor on them, then go back to PDF.
Tex2002ans is offline   Reply With Quote
Old 02-27-2020, 12:52 AM   #12
doubleshuffle
Unicycle Daredevil
doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.doubleshuffle ought to be getting tired of karma fortunes by now.
 
doubleshuffle's Avatar
 
Posts: 13,868
Karma: 182888522
Join Date: Jan 2011
Location: Planet of the Pudding Brains
Device: Aura HD (R.I.P. After six years the USB socket died.) tolino shine 3
Quote:
Originally Posted by Tex2002ans View Post
Yes. Archive.org just does a whole host of automated conversions... and I wouldn't use them if you could help it.

I usually just stick with their:

1. B&W PDF. Usually this is decent. In the case of this specific "yellowed book", it was crap.

2. Color PDF. This matches what they show in their online reader. Helpful if working with color, drawings, or "yellowed books". (You can do your own contrast/color corrections from this, and create a better grayscale/B&W version.)

3. As a last resort, work directly from the JPEG2000 images. These are the highest resolution/quality.

Do not touch their "EPUBs" or any of their other "ebook" formats (they are just automatically run through OCR, no proofing or anything). You're better off working from the source files and recreating your own OCR/ebooks from that.
I always use the original image files and run them through ABBYY, but not everybody has that, and then working from the text or epub files at archive.org is an option. Especially when their OCR is as clean as in this case.

Quote:
Originally Posted by hobnail View Post
I've also done it using the txt file and depending on the quality of the scan and the original book it can be a painful amount of work.
No denying this.
doubleshuffle is offline   Reply With Quote
Old 02-27-2020, 10:16 PM   #13
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,140
Karma: 8561592
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by ctop View Post
The internet archive at archive has a lot of interesting books for borrowing and downloading. I have some downloads of older books, that are difficult to read on E-Ink devices because they include the background of the page, which has become yellow. So the contrast is low and the text becomes unclear, also the files are quite big. So I wonder if somebody knows a good way to trim the PDFs for ereaders. I would prefer to use a commandline on a Linux based system, if such a tool is available here.
An example of the PDFs I am looking at is this:

https://archive.org/details/smtliche...ge/n8/mode/2up

(This is the item page, the download link is here

https://archive.org/download/smtlich...r16goet_bw.pdf

Any help appreciated, Ctop
The k2pdfopt app fits most of what you want (e.g. command-line, linux). It has a thread here in the PDF forum on MR. The command-line options below worked pretty well with your link above:

k2pdfopt -mode fitwidth -bpc 2 -n- -ls- -ac example1.pdf

If you want to try it on just a few pages first, add something like:

-p 1-40

Example conversion of pages 30-39 is attached.

The only thing is that the file size of the converted PDF will be even bigger because the original is actually very well compressed (fitting 900 bitmapped pages into 30 MB is no small trick--it uses JPEG 2000 JPX compression, whereas k2pdfopt converts it to .png lossless compression, which is not as compact). I used -bpc 2 to get the converted file size down a little.
Attached Files
File Type: pdf example1_k2opt.pdf (1.15 MB, 181 views)

Last edited by willus; 02-27-2020 at 10:21 PM.
willus is offline   Reply With Quote
Old 02-28-2020, 01:22 AM   #14
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 1,620
Karma: 7451779
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by willus View Post
The k2pdfopt app fits most of what you want (e.g. command-line, linux). It has a thread here in the PDF forum on MR.
Fantastic work as always. Yes, if you wanted to keep it in PDF form... your tool is always best. :P

Quote:
Originally Posted by ctop View Post
An example of the PDFs I am looking at is this:

https://archive.org/details/smtliche...ge/n8/mode/2up

(This is the item page, the download link is here

https://archive.org/download/smtlich...r16goet_bw.pdf

Any help appreciated
But if you want to take steps in making the PDF a proper ebook:

I grabbed this book and ran it through Scan Tailor Advanced + Finereader 12.

1. Finereader 12 did a MUCH better job with colored PDF's "yellowed pages", and had no issues creating a B&W version.

I attached it as [Finereader][BW].

(You can see how much better 12 converts compared to 8.)

Side Note: I manually erased markings from the first few pages, so they look pure white... just ignore that in your comparisons.

Note: Alternatively, you could've fed color images into Scan Tailor directly (it has 3 different methods to convert to B&W/Grayscale + you can mess with the thresholds).

2. I exported the B&W PDF into PNGs, then ran that through Scan Tailor Advanced.

I spent about an hour going through the various stages, and Scan Tailor did a FANTASTIC job at automatically picking all correct boxes. The page edges are nearly all gone.

I would say 95%+ I didn't have to touch at all.

Side Note: Despeckling + Outputting has gotten so much faster/better compared to how it used to be. And I only had to use Despeckling on a handful of pages to remove the occasional stray dots. (Being able to see the before/afters marked with red is an enormous help. This is one step where GUI beats the pants off of pure commandline.)

3. I took the Scan Tailor images, and reimported them into Finereader 12, ran OCR, and output as:

PDF = [ScanTailor][Finereader][BW].pdf. (30 MBs is too large to attach, so here's a download.)
EPUB = [Finereader].epub.

You can compare the text, and see how much more accurate 12 is compared to Archive.org's "EPUB". (Most importantly, the headers+page numbers are nearly all automatically removed and not clogging the text.)

4. I took Finereader's EPUB and ran it through my usual "Finereader cleanup Regex":

Attached it as [Finereader][CodeCleanup].epub.

Comparison Images

Archive.org Color PDF + Finereader B&W + Scan Tailor Cleanup:

Click image for larger version

Name:	smtlichewer16goet.-.p16-17[Original.Color].jpg
Views:	168
Size:	242.5 KB
ID:	177414Click image for larger version

Name:	smtlichewer16goet.-.p16-17[Finereader].png
Views:	163
Size:	284.4 KB
ID:	177416Click image for larger version

Name:	smtlichewer16goet.-.p16-17[ScanTailor].png
Views:	255
Size:	289.4 KB
ID:	177415

Last edited by Tex2002ans; 02-28-2020 at 03:14 AM.
Tex2002ans is offline   Reply With Quote
Old 02-28-2020, 05:52 AM   #15
ctop
Connoisseur
ctop herds cats with both ease and gracectop herds cats with both ease and gracectop herds cats with both ease and gracectop herds cats with both ease and gracectop herds cats with both ease and gracectop herds cats with both ease and gracectop herds cats with both ease and gracectop herds cats with both ease and gracectop herds cats with both ease and gracectop herds cats with both ease and gracectop herds cats with both ease and grace
 
Posts: 53
Karma: 43710
Join Date: Jun 2008
Device: zaurus->palm->iPad->Sony PRS-T1,T2,T3->Kobo Forma&Likebook Ares
Quote:
Originally Posted by willus View Post
The k2pdfopt app fits most of what you want (e.g. command-line, linux). It has a thread here in the PDF forum on MR. The command-line options below worked pretty well with your link above:

k2pdfopt -mode fitwidth -bpc 2 -n- -ls- -ac example1.pdf

If you want to try it on just a few pages first, add something like:

-p 1-40

Example conversion of pages 30-39 is attached.
Wow, this looks really great, exactly what I had in mind! Awesome! One question though, the file you created has the page breaks at different places than the original, which is astonishing. What is the reason for this?

And one more question, since I like to highlight things in my PDFs, is the text layer the same as before, or does k2pdfopt do its own OCR?

All the best,

Ctop
ctop is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
archive.org downloads abrogard Calibre 2 08-11-2018 06:08 PM
Archive.org crutledge General Discussions 129 08-28-2015 06:22 AM
do you try to optimize for different devices? sarah_pnix ePub 5 02-16-2011 05:05 AM
PDFs are blank when dled from archive.org rakista enTourage Archive 1 05-16-2010 09:58 AM
Archive.org copyright question Hatgirl General Discussions 7 03-23-2010 07:58 PM


All times are GMT -4. The time now is 10:50 AM.


MobileRead.com is a privately owned, operated and funded community.