PDFRead 1.7 released - Page 5

Azayzel · 05-05-2007, 12:11 PM

I agree with the whole multi-threading schema, it's more efficient (or should be) and will more fully realize the use of the system on which it runs. I guess the only bottleneck will be if something gets stuck in a deadlock waiting for a thread to finish if it stalls or fails. Guess that's where the learning part will really come into play with your scheduling system. I think Alex_D used this with his program, so if you need pointers he might be of help.

ashkulz · 05-05-2007, 01:30 PM

Quote:

Originally Posted by kovidgoyal

Besides I don't have much experience with multi-threaded programming and I'm looking at this as a good way to learn ;-)

You must really, really want to punish yourself if you want learn multithreading ... I try to avoid it whenever I can get away with it

kovidgoyal · 05-05-2007, 04:44 PM

Hey I'm a theoretical physicist...punishing myself is pretty much a given

alex_d · 05-06-2007, 02:28 AM

Quote:

I also do dilation and resizing differently from RasterFarian. I render the PDF/DJVU at the dilation DPI, without specifying a page size. So Ghostscript or DJVU automatically create a image with size appropriate for that resolution. I don't know what the size is up front at all, and in fact it varies from book to book.

That's what I was saying. The boldness intensification changes from book to book.

Quote:

Hmm, I think that I'll go the image -> pngnq -> pnmremap route.

Use pnmremap only. Pngnq is designed to try to find the best 16 colors out of your monitor's 16M. It then sets up a 16-entry mapping table whose elements are 24-bit rgb. This will of course look good on your PC, but those colors it will find (e.g. a gray that's 250,250,250) just don't exist on the 16-color Iliad. The image's mapping table can't be optimized, it must simply be the one that the Iliad/Reader/etc can natively support. Pngnq might be useful if you want a 4-color bitmap for displaying on a 16-color device, but I don't think pngnq lets you input the right settings for that (I think it always picks out of 16M rgb). Lastly, the "dithering" step (the mixing of pixels after you figure out which pixels to mix) is done just as well by pnmremap as by pngnq. Running pnmremap and pngnq will perform dithering twice, actually reducing quality. (Running pngnq and then displaying on a Reader also does double dithering).

Quote:

Hmm, I think that 7 does a little too much edge-enhancement, but I'm willing to change it. Actually, what would be nice if someone could volunteer to test various parameters and report what looks the best. Any volunteers?

Yes, we need a poll. I've done extensive testing, but I'm only one opinion. Also, the Sony Reader seems to best like settings which look too harsh on an LCD. But on your 1100, different settings will probably look better (but dude, quit doing image quality testing on your pc!)

Quote:

[Bitmap autohinting is] done by all desktop PDF Readers I've seen, and I think it would be rather processor intensive, even if you do it in C.

No, I don't think anyone has tried bitmap autohinting before. PDF viewers (and operating systems) do regular autohinting where they look at the vector information itself. That approach can't be applied because by the time I increase the boldness of the font via dilation, I've lost all the vector info. I'll have to shift the pixels themselves.

It might be easier to just go back to the font vector-changing efforts and see if I can get better results out of that. Thing is that it's easy to take embedded fonts out of a PDF, a bit hard to deal with the various formats, and nearly impossible to modify the vectors (there is one commercial program that can do it, and this program is scriptable, but it obviously can't be distributed). What I haven't done, though, is research how to write a program that does the vector modifying itself. However, I think the math of scaling bezier curves and etc is beyond me. The worst part, though, is i'm not sure how it'll all turn out after rastering. Ghostscript's autohinting engine, for example, isn't optimized to antialias. It treats it like it's higher-res and then downsamples. OpenType autohinting isn't supported. In the end, the letters seem to turn out blurry and require edge-enhancement anyway. Also, unfortunately, it probably won't be able to make any use of the font's internal hinting information (since it'll likely become meaningless after the vectors change).

Azayzel · 05-26-2007, 11:51 AM

I was just curious... since this extracts each page of the PDF as a rasterized image, is there any way you can make it use already rasterized images; e.g., PNG, JPG, GIF, etc. That way if we already have the images, only the crop, dilate, save functions need to be run. It's not often that this is the case, but I have a few ebooks that are already in image format.

Thanks!

ashkulz · 05-26-2007, 12:58 PM

Well, there is already support for such a scenario with the IMGLIST format. Create a simple text file containing the list of images in the order you want them, and select the input format as imglist (in GUI or via the command-line option -i). Almost all common image formats will be supported, see this page for all supported formats.

gdxf · 05-26-2007, 04:28 PM

ashkulz, can you give more details on how to convert multi-page tif files into IMGLIST format? I find PDFRead does not support multi-page tif(f) files. Also, any new development on PDFRead 1.8? Thanks.

Quote:

Originally Posted by ashkulz

Well, there is already support for such a scenario with the IMGLIST format. Create a simple text file containing the list of images in the order you want them, and select the input format as imglist (in GUI or via the command-line option -i). Almost all common image formats will be supported, see this page for all supported formats.

gdxf · 05-26-2007, 09:11 PM

I guess if the image file (tiff, tif) does not have to be rasterized, the final result would be much better, as the content quality won't be degraded too much. So even if the the screen still displays the same size content, the content would be much more legible. The results in "Just Another Printer" testified to this. The only issue with Just Another Printer is that it does not have batch processing capability. If we can integrate some advantages of JAP into PDFRead, I believe we are going to find a final solution for reading A5 sized tif(f) image files on the Sony Reader.

ashkulz · 05-26-2007, 11:09 PM

Quote:

Originally Posted by gdxf

ashkulz, can you give more details on how to convert multi-page tif files into IMGLIST format? I find PDFRead does not support multi-page tif(f) files. Also, any new development on PDFRead 1.8? Thanks.

Do not use multi-page TIFFs in the IMGLIST format, use the TIFF support directly (Input Type TIFF in the GUI or command line option -i tiff). This will explode the multi-page TIFF and convert it directly. Note that it assumes the TIFF is at 300dpi, or else you may want to turn off dilation (as dilation is good at higher DPIs).

Quote:

Originally Posted by gdxf

I guess if the image file (tiff, tif) does not have to be rasterized, the final result would be much better, as the content quality won't be degraded too much. So even if the the screen still displays the same size content, the content would be much more legible. The results in "Just Another Printer" testified to this. The only issue with Just Another Printer is that it does not have batch processing capability. If we can integrate some advantages of JAP into PDFRead, I believe we are going to find a final solution for reading A5 sized tif(f) image files on the Sony Reader.

The solution already exists! I must not have advertised the features enough, because both you and Azayel were asking me about things already added in 1.6.

About 1.8, I may not be able to work on it for a week or two as I am currently travelling out of the country. The features I've already added are:
- add the landscape-third and portrait-2col modes (actually can now
support NxN splitting)
- add support for color processing

But I haven't had time to release it yet. I'm delaying my image reflowing to 2.0, as it will take quite a bit of time. There's a commercial implementation of it called UbiText, I am studying it and trying to come up with something.

gdxf · 05-26-2007, 11:52 PM

ashkulz, these are all good news! I'm very much looking forward. Is the 1/3 mode any good?

As to the multipage tiff issue, when I use the tiff input mode, it always encounters such an error below. Any explanation? I am pretty sure I have the correct file type. Or is it because I've used .tiff files converted from .tif files?
Thanks.

Command Line
============
"C:\Program Files\PDFRead\bin\pdfread" -p prs500-l -i tiff -t "pages.tiff" -o "
C:\Tests\page.lrf" --no-crop --no-dilate --no-enhance -m "landscape-half" "C:\Tests\saved\Page.tiff"

Extracting TIFF pages ... done.

Temporary directory: c:\...\temp\pdfread-lsqvn8

Page 1/2: EXTRACT Traceback (most recent call last):
File "pdfread.py", line 201, in <module>
File "pdfread.py", line 84, in main
File "pdfread.py", line 43, in convert
File "input.pyo", line 160, in get_page
File "Image.pyo", line 1916, in open
IOError: cannot identify image file
Press any key to continue . . .

Azayzel · 05-27-2007, 10:01 AM

Quote:

Originally Posted by ashkulz

Well, there is already support for such a scenario with the IMGLIST format. Create a simple text file containing the list of images in the order you want them, and select the input format as imglist (in GUI or via the command-line option -i). Almost all common image formats will be supported, see this page for all supported formats.

Thanks for the response, I'll give it a whirl once I find a quick method of creating a list with 250+ images (probably just redirect a dir to a text file, now that I think about it).

The reason I had asked this was that the initial result with an older version of JEC gave some pretty buggered results; i.e., really fuzzy text with pieces missing. After reading a few of the latest responses, I think it might be the dilation filter over fuzzing the text too much. I'll play around a bit more.

igorsk · 05-27-2007, 01:08 PM

dir /b >filelist.txt

Bob Russell · 06-15-2007, 12:32 AM

Finally got this installed. (Needed to read a pdf on it!)
Works great. I am using either:

1) Layout mode = Default
Profile = prs-500

or

2) Layout mode = Landscape-half
Profile = prs-500

The only problem I have run into is that when I read the resulting .lrf book, the half page seems to sometimes cut right in the middle of a line of text, and I can't read it. Is there a way to get some overlap?

Also, I'm really not sure what profile does vs layout. Especially prs-500-l versus prs-500.

Can anyone clarify a bit, or point me to a post I may have missed with the info?

Thanks!

Bob Russell · 06-15-2007, 12:51 AM

Maybe I can sort of guess at the answer to my questions, but I'm not sure, and am also unsure about the optimal settings.

First the profiles:

When you choose prs-500, you get portrait. To see landscape, you need to hold down the size button on the Reader until it switches to landscape mode.

When choosing prs-500-l, you get landscape orientation even when the Reader is set to portrait. It's rotated to the right, presumably to allow the right thumb to change pages.

Next the Layout mode:

It seems that portrait will set the dimensions of the output to show the whole page on a single screen.

Setting it to landscape will cause it to be "wide and short", i.e. landscape dimensions that only look good when you switch the Reader to landscape mode so you can see all of it.

Landscape-half appears to cut the page in half and do landscape output for each half of the page (which the Reader then displays one-half at a time also, making for 4 screens per page on the original).

The disadvantages of landscape-half appear to be the following:
* Lines can be cut in the middle. There is not overlapping of the cut, so it can be hard or impossible to read the line that was split.
* When mixed with the Readers choice to do some overlap automatically in landscape mode, it can be confusing to read because it's not obvious what has been repeated and what is missing (e.g. cut off in the middle of the line and not repeated).

My tentative conclusion:

1) First try a few pages with Landscape/Prs-500
If you can read it at that size (with the Sony Reader in landscape mode), stick with it because that's the more natural version.

2) If you need a larger size, then use Landscape-half/Prs-500
You will have odd page breaks, but at least you can read it unless a line got split in a bad way in the half-page split that PDFRead made.

3) If you have something like a presentation (e.g. two slides per page, one over the other), then just use Portrait/Prs-500 because the slides are probably very large lettering, so you can shrink it a lot. At least that worked in the document I used it for. Actually, I didn't try it, but that sort of document is probably even readable by moving it directly to Connect from the original .pdf also.

Please take the above as the naive descriptions of someone that doesn't know what he's doing yet. Feel free to correct me and add other helpful info, or confirm parts that you folks agree with. I would really appreciate input on a better way to do this!

ashkulz · 06-15-2007, 07:41 AM

A "profile" is a collection of settings for the various command line options, one of which is the layout-mode. When you choose "Default" layout in the GUI, you are using the layout defined in the profile.

I have set it up to always use the reader's portrait mode: the reader's landscape mode is never used. If you choose to switch to that, it will not look good as the resolution targeted is for the portrait version. So avoid the reader's landscape mode in general.

As you correctly found, the prs500 profile is for portrait and prs500-l for landscape (holding the reader sideways). There is always some amount of overlap between pages in landscape mode (20 is default), so I'm surprised that you got no overlap. Can you just try using the default settings, just changing the profile to prs500-l and seeing the output?

There is also a major difference between landscape and landscape-half layout: landscape will take as many pages as necessary to show the page in correct aspect ratio (it may be anything from 2-4 pages) while landscape-half will resize the image to fit two pages then chop it up.

I've been meaning to release 1.8 for a long time now, but am travelling at the moment so no chance... probably will be resume development from next weekend onwards :-)

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
PDFRead 1.8.2 released!	nrapallo	Workshop	372	12-29-2011 11:26 AM
Need help using PDFRead	daithi81	Workshop	8	10-16-2009 09:33 AM
Hacks Kindle 2 and PDFRead 1.8	daffy4u	Amazon Kindle	38	05-06-2009 09:38 AM
Need help with PDFRead	pfisterfarm	PDF	8	03-23-2009 09:19 AM
PDFRead v5 available on Sourceforge	Alexander Turcic	PDF	3	04-08-2007 06:31 AM

05-05-2007, 12:11 PM	#61
Azayzel Cache Ninja! Posts: 643 Karma: 1002300 Join Date: Jan 2007 Location: Tokyo, Japan Device: PRS-500, HTC Shift, iPod Touch, iPaq 4150, TC1100, Panasonic WordsGear	I agree with the whole multi-threading schema, it's more efficient (or should be) and will more fully realize the use of the system on which it runs. I guess the only bottleneck will be if something gets stuck in a deadlock waiting for a thread to finish if it stalls or fails. Guess that's where the learning part will really come into play with your scheduling system. I think Alex_D used this with his program, so if you need pointers he might be of help.

05-05-2007, 04:44 PM	#63
kovidgoyal creator of calibre Posts: 45,188 Karma: 27110894 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Hey I'm a theoretical physicist...punishing myself is pretty much a given

05-26-2007, 11:51 AM	#65
Azayzel Cache Ninja! Posts: 643 Karma: 1002300 Join Date: Jan 2007 Location: Tokyo, Japan Device: PRS-500, HTC Shift, iPod Touch, iPaq 4150, TC1100, Panasonic WordsGear	I was just curious... since this extracts each page of the PDF as a rasterized image, is there any way you can make it use already rasterized images; e.g., PNG, JPG, GIF, etc. That way if we already have the images, only the crop, dilate, save functions need to be run. It's not often that this is the case, but I have a few ebooks that are already in image format. Thanks!

05-26-2007, 12:58 PM	#66
ashkulz Addict Posts: 350 Karma: 705 Join Date: Dec 2006 Location: Mumbai, India Device: Kindle 1/REB 1200	Well, there is already support for such a scenario with the IMGLIST format. Create a simple text file containing the list of images in the order you want them, and select the input format as imglist (in GUI or via the command-line option -i). Almost all common image formats will be supported, see this page for all supported formats.

05-26-2007, 09:11 PM	#68
gdxf Enthusiast Posts: 48 Karma: 27 Join Date: Oct 2006 Device: Sony Reader PRS-500	I guess if the image file (tiff, tif) does not have to be rasterized, the final result would be much better, as the content quality won't be degraded too much. So even if the the screen still displays the same size content, the content would be much more legible. The results in "Just Another Printer" testified to this. The only issue with Just Another Printer is that it does not have batch processing capability. If we can integrate some advantages of JAP into PDFRead, I believe we are going to find a final solution for reading A5 sized tif(f) image files on the Sony Reader.

05-26-2007, 11:52 PM	#70
gdxf Enthusiast Posts: 48 Karma: 27 Join Date: Oct 2006 Device: Sony Reader PRS-500	ashkulz, these are all good news! I'm very much looking forward. Is the 1/3 mode any good? As to the multipage tiff issue, when I use the tiff input mode, it always encounters such an error below. Any explanation? I am pretty sure I have the correct file type. Or is it because I've used .tiff files converted from .tif files? Thanks. Command Line ============ "C:\Program Files\PDFRead\bin\pdfread" -p prs500-l -i tiff -t "pages.tiff" -o " C:\Tests\page.lrf" --no-crop --no-dilate --no-enhance -m "landscape-half" "C:\Tests\saved\Page.tiff" Extracting TIFF pages ... done. Temporary directory: c:\...\temp\pdfread-lsqvn8 Page 1/2: EXTRACT Traceback (most recent call last): File "pdfread.py", line 201, in <module> File "pdfread.py", line 84, in main File "pdfread.py", line 43, in convert File "input.pyo", line 160, in get_page File "Image.pyo", line 1916, in open IOError: cannot identify image file Press any key to continue . . .

05-27-2007, 01:08 PM	#72
igorsk Wizard Posts: 3,442 Karma: 300001 Join Date: Sep 2006 Location: Belgium Device: PRS-500/505/700, Kindle, Cybook Gen3, Words Gear	dir /b >filelist.txt

06-15-2007, 12:32 AM	#73
Bob Russell Recovering Gadget Addict Posts: 5,381 Karma: 676161 Join Date: May 2004 Location: Pittsburgh, PA Device: iPad	Finally got this installed. (Needed to read a pdf on it!) Works great. I am using either: 1) Layout mode = Default Profile = prs-500 or 2) Layout mode = Landscape-half Profile = prs-500 The only problem I have run into is that when I read the resulting .lrf book, the half page seems to sometimes cut right in the middle of a line of text, and I can't read it. Is there a way to get some overlap? Also, I'm really not sure what profile does vs layout. Especially prs-500-l versus prs-500. Can anyone clarify a bit, or point me to a post I may have missed with the info? Thanks!

06-15-2007, 12:51 AM	#74
Bob Russell Recovering Gadget Addict Posts: 5,381 Karma: 676161 Join Date: May 2004 Location: Pittsburgh, PA Device: iPad	Maybe I can sort of guess at the answer to my questions, but I'm not sure, and am also unsure about the optimal settings. First the profiles: When you choose prs-500, you get portrait. To see landscape, you need to hold down the size button on the Reader until it switches to landscape mode. When choosing prs-500-l, you get landscape orientation even when the Reader is set to portrait. It's rotated to the right, presumably to allow the right thumb to change pages. Next the Layout mode: It seems that portrait will set the dimensions of the output to show the whole page on a single screen. Setting it to landscape will cause it to be "wide and short", i.e. landscape dimensions that only look good when you switch the Reader to landscape mode so you can see all of it. Landscape-half appears to cut the page in half and do landscape output for each half of the page (which the Reader then displays one-half at a time also, making for 4 screens per page on the original). The disadvantages of landscape-half appear to be the following: * Lines can be cut in the middle. There is not overlapping of the cut, so it can be hard or impossible to read the line that was split. * When mixed with the Readers choice to do some overlap automatically in landscape mode, it can be confusing to read because it's not obvious what has been repeated and what is missing (e.g. cut off in the middle of the line and not repeated). My tentative conclusion: 1) First try a few pages with Landscape/Prs-500 If you can read it at that size (with the Sony Reader in landscape mode), stick with it because that's the more natural version. 2) If you need a larger size, then use Landscape-half/Prs-500 You will have odd page breaks, but at least you can read it unless a line got split in a bad way in the half-page split that PDFRead made. 3) If you have something like a presentation (e.g. two slides per page, one over the other), then just use Portrait/Prs-500 because the slides are probably very large lettering, so you can shrink it a lot. At least that worked in the document I used it for. Actually, I didn't try it, but that sort of document is probably even readable by moving it directly to Connect from the original .pdf also. Please take the above as the naive descriptions of someone that doesn't know what he's doing yet. Feel free to correct me and add other helpful info, or confirm parts that you folks agree with. I would really appreciate input on a better way to do this!

06-15-2007, 07:41 AM	#75
ashkulz Addict Posts: 350 Karma: 705 Join Date: Dec 2006 Location: Mumbai, India Device: Kindle 1/REB 1200	A "profile" is a collection of settings for the various command line options, one of which is the layout-mode. When you choose "Default" layout in the GUI, you are using the layout defined in the profile. I have set it up to always use the reader's portrait mode: the reader's landscape mode is never used. If you choose to switch to that, it will not look good as the resolution targeted is for the portrait version. So avoid the reader's landscape mode in general. As you correctly found, the prs500 profile is for portrait and prs500-l for landscape (holding the reader sideways). There is always some amount of overlap between pages in landscape mode (20 is default), so I'm surprised that you got no overlap. Can you just try using the default settings, just changing the profile to prs500-l and seeing the output? There is also a major difference between landscape and landscape-half layout: landscape will take as many pages as necessary to show the page in correct aspect ratio (it may be anything from 2-4 pages) while landscape-half will resize the image to fit two pages then chop it up. I've been meaning to release 1.8 for a long time now, but am travelling at the moment so no chance... probably will be resume development from next weekend onwards :-)

Advert

Advert