PDFRead 1.7 released - Page 2

ashkulz · 04-27-2007, 11:16 AM

Quote:

Originally Posted by gdxf

I followed the batch mode instructions to run batch conversion in windows, but had encountered this notice in the command line:

"Unable to determine total number of pages in document
Please enter number of pages: "

When I put in a page number, it results in a blank lrf file.

Here is what the screen says:

"Unable to determine total number of pages in document
Please enter number of pages: 1

Temporary directory: c:\docume~1........

Page 1/1: EXTRACT RASTERIZE BLANK

Creating BBeB file ... done.

That's a very weird error, it usually results when your installation has not been set up correctly. Can you check the following:

Check whether you can convert PDF files normally via the GUI
Try the attached script with same instructions
Check that the PDFRead location is set correctly (set LOC=)
Uncomment the commented call in the file and try it again and send me the output.
zip up the directory and attach it here or send it to me

ashkulz · 04-27-2007, 11:25 AM

Quote:

Originally Posted by kovidgoyal

Also, this is my first time rasterizing a PDF (I usually have access to the LaTeX sources). Is the font rasterization always so bad? I've attached samples to show you what I mean.

I don't have Sony Reader, so I can't really see how the generated LRF looks. On the other hand, the converted PDF did look decent when I looked at the PNG. Do you have any particular points that felt really bad? I'm always interested in knowing where I can improve things...

kovidgoyal · 04-27-2007, 12:32 PM

You can install the connect reader software and use that to see how the files look. Basically the fonts look like they've been reasterized without any antialiasing.

ashkulz · 04-27-2007, 12:42 PM

Uhm, I don't have access to a Windows PC at home ... so if you could post some screenshots I'd be grateful. But yes, the fonts do look a bit ragged ... what happens is that I render at 300dpi (anti-aliased), perform dilation at that resolution and then reduce the size. Now, as a result of this anti-aliasing happens with the reduced image, which is bad because when you downsample it to 4 colors you can get "gaps" where the color information is lost due to the 2-bit grayscale limitation. As far as I know, even RasterFarian has pretty much the same output. Can you try with that and see how good the result is?

BTW, can you try again with 1.7? I replaced imagemagick with pngnq, this may give better output...

Gravitas · 04-28-2007, 10:58 AM

1.7 fixed the problems I was having, now works like a dream. Thanks

ashkulz · 04-28-2007, 11:13 AM

Quote:

Originally Posted by Gravitas

The text is not as clear as non-pdf converted documents, but is perfectly readable so long as I up the font size to medium. I may try the same document again with the pngs optimized to see if that improves the text any, but I'm happy with how it is at the moment.

Well, that's a side-effect of having native font rendering, and putting up with something that is rasterized from PDFs which target a much higher DPI. Also, PNG optimization will try to reduce the file size, not any of the display parameters! You may want to experiment with the DPI and/or edge enhancement level to find what looks best. I don't have a reader, so I don't know whether the default settings I've chosen are equally good for the reader.

ashkulz · 04-28-2007, 11:21 AM

Okay, I'm planning to release 1.8 in a day or two. The major feature planned would be an all-color pipeline (with option to downsample to grayscale, of course). This won't be of much use to anyone except people who own the REB 1200 (ie. me

) and those who get those newfangled color e-ink readers.

Some previews of things look in color: raw page, dilated page, and after color reduction. Regular text pages also work as they used to: raw text page and the dilated text.

Do any of you have any feature requests for 1.8? I don't feel comfortable with such short releases where only a few new things are added ...

gdxf · 04-28-2007, 06:41 PM

I used your batch file and changed the batch file conversion directory from "My Desktop" to another drive on my computer. It works! I guess there might be some restriction of user access issue involved, but I am not sure about that.

Some files are converted with no problem, others are still with this annoying "unable to determine total number of pages" problem. I later find that those files that cannot be converted include: 1. pdf files with OCR text underneath the image, 2. pdf files with non-alphabet file names. Hope it can be dealt with in later releases.

kovidgoyal · 04-28-2007, 08:26 PM

Quote:

Originally Posted by ashkulz

Uhm, I don't have access to a Windows PC at home ... so if you could post some screenshots I'd be grateful. But yes, the fonts do look a bit ragged ... what happens is that I render at 300dpi (anti-aliased), perform dilation at that resolution and then reduce the size. Now, as a result of this anti-aliasing happens with the reduced image, which is bad because when you downsample it to 4 colors you can get "gaps" where the color information is lost due to the 2-bit grayscale limitation. As far as I know, even RasterFarian has pretty much the same output. Can you try with that and see how good the result is?

BTW, can you try again with 1.7? I replaced imagemagick with pngnq, this may give better output...

I'm travelling but I'll do some experimentation when I return. I highly recommend vmware and an old windows installation disk.

ashkulz · 04-29-2007, 12:25 AM

Quote:

I used your batch file and changed the batch file conversion directory from "My Desktop" to another drive on my computer. It works! I guess there might be some restriction of user access issue involved, but I am not sure about that.

Did you use the new batch file and if so, did you run from both Desktop and some other place? There's no logical reason I can think of why it shouldn't run from Desktop -- did you get the same error as before or something else when you ran from there?

Quote:

Some files are converted with no problem, others are still with this annoying "unable to determine total number of pages" problem. I later find that those files that cannot be converted include: 1. pdf files with OCR text underneath the image, 2. pdf files with non-alphabet file names. Hope it can be dealt with in later releases.

That happens when pdftk cannot report how many pages there are in a document. You'll have to manually open each such document and find out how many pages there are and enter it. Can you link/post a sample file? I'll have to see how to detect the page count for those files -- they look like their information dictionary is corrupt or something.

gdxf · 04-29-2007, 04:23 AM

Quote:

Originally Posted by ashkulz

Did you use the new batch file and if so, did you run from both Desktop and some other place? There's no logical reason I can think of why it shouldn't run from Desktop -- did you get the same error as before or something else when you ran from there?

That happens when pdftk cannot report how many pages there are in a document. You'll have to manually open each such document and find out how many pages there are and enter it. Can you link/post a sample file? I'll have to see how to detect the page count for those files -- they look like their information dictionary is corrupt or something.

Yes, I did use the new batch file. It worked well in any other places except on desktop directories. But that doesn't matter very much for me, the point is it at least worked elsewhere.

I manually put in the page number and it encountered the decoding error. I've posted the command line error info below and also attached the zipped directory and problematic file. I think it is because the filename is non-unicode...

---------------------------------------------

Unable to determine total number of pages in document
Please enter number of pages: 2

Page 1/2: EXTRACT RASTERIZE CROP DILATE SPLIT SAVE DONE
Page 2/2: EXTRACT RASTERIZE CROP DILATE SPLIT SAVE DONE
Creating BBeB file ... Traceback (most recent call last):
File "pdfread.py", line 201, in <module>
File "pdfread.py", line 86, in main
File "output.pyo", line 212, in generate
File "pylrs\pylrs.pyo", line 472, in renderLrf
File "pylrs\pylrs.pyo", line 250, in toLrf
File "pylrs\pylrs.pyo", line 246, in toLrfDelegates
File "pylrs\pylrs.pyo", line 250, in toLrf
File "pylrs\pylrs.pyo", line 246, in toLrfDelegates
File "pylrs\pylrs.pyo", line 561, in toLrf
File "pylrs\elements.pyo", line 68, in toString
File "pylrs\elements.pyo", line 76, in write
File "pylrs\elements.pyo", line 51, in _write
File "pylrs\elements.pyo", line 51, in _write
File "pylrs\elements.pyo", line 42, in _write
File "pylrs\elements.pyo", line 25, in _writeAttribute
File "pylrs\elements.pyo", line 13, in _encodeCdata
File "encodings\utf_8.pyo", line 16, in decode
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb1 in position 0: unexpecte
d code byte
Press any key to continue . . .

ashkulz · 04-29-2007, 11:32 AM

Quote:

Originally Posted by gdxf

I manually put in the page number and it encountered the decoding error. I've posted the command line error info below and also attached the zipped directory and problematic file. I think it is because the filename is non-unicode...

Yes, you're right -- it did fail because of non-unicode filename (I think you have some kind of chinese/japanese encoding). That's a limitation of pylrs, you have to use the utf8 encoding (although this can be overridden, but is way too much trouble to implement and get it right).

If you ensure that fonts are embedded in the PDF and the filename doesn't have special characters, it should convert properly.

Jary · 04-29-2007, 05:55 PM

Hi people.

ashkulz, you did a great job ! I've been using 1.6 and I quite like it.

The install is just perfect.
The prs-500 mode is good, and prs500-l is very nice too. Maybe GUI isn't totally clear the first time, and it misses the .lrf extension on my files, but otherwise it rocks

One thing: why add "title" when there is "output" field ? When you fill output name, shouldn't it be auto copied in title ?

Good job !

Please keep it up.

gdxf · 04-29-2007, 06:09 PM

Thanks ashkulz! I'll convert the filenames to fit utf8 encoding. I've batch converted a dozen files overnight and it turned out quite well.

ashkulz · 04-30-2007, 01:06 AM

Quote:

Originally Posted by Jary

The prs-500 mode is good, and prs500-l is very nice too. Maybe GUI isn't totally clear the first time, and it misses the .lrf extension on my files, but otherwise it rocks

One thing: why add "title" when there is "output" field ? When you fill output name, shouldn't it be auto copied in title ?

You might want to upgrade to 1.7; the extension is now automatically added after processing. The output field is for the output filename which can be anything -- I might want to store books with filename "Author - Title" or any other scheme. That's why I have a separate title field. But yes, you can copy the basic filename as the title (which I do in the batch conversion script) but it's currently not very easy to implement in the GUI (which is actually based on the NSIS installer).

04-27-2007, 12:42 PM	#19
ashkulz Addict Posts: 350 Karma: 705 Join Date: Dec 2006 Location: Mumbai, India Device: Kindle 1/REB 1200	Uhm, I don't have access to a Windows PC at home ... so if you could post some screenshots I'd be grateful. But yes, the fonts do look a bit ragged ... what happens is that I render at 300dpi (anti-aliased), perform dilation at that resolution and then reduce the size. Now, as a result of this anti-aliasing happens with the reduced image, which is bad because when you downsample it to 4 colors you can get "gaps" where the color information is lost due to the 2-bit grayscale limitation. As far as I know, even RasterFarian has pretty much the same output. Can you try with that and see how good the result is? BTW, can you try again with 1.7? I replaced imagemagick with pngnq, this may give better output... Last edited by ashkulz; 04-27-2007 at 12:49 PM.

04-29-2007, 05:55 PM	#28
Jary Member Posts: 11 Karma: 10 Join Date: Apr 2007 Device: PRS-500	Hi people. ashkulz, you did a great job ! I've been using 1.6 and I quite like it. The install is just perfect. The prs-500 mode is good, and prs500-l is very nice too. Maybe GUI isn't totally clear the first time, and it misses the .lrf extension on my files, but otherwise it rocks One thing: why add "title" when there is "output" field ? When you fill output name, shouldn't it be auto copied in title ? Good job ! Please keep it up. Last edited by Jary; 04-29-2007 at 05:59 PM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
PDFRead 1.8.2 released!	nrapallo	Workshop	372	12-29-2011 11:26 AM
Need help using PDFRead	daithi81	Workshop	8	10-16-2009 09:33 AM
Hacks Kindle 2 and PDFRead 1.8	daffy4u	Amazon Kindle	38	05-06-2009 09:38 AM
Need help with PDFRead	pfisterfarm	PDF	8	03-23-2009 09:19 AM
PDFRead v5 available on Sourceforge	Alexander Turcic	PDF	3	04-08-2007 06:31 AM

04-27-2007, 12:32 PM	#18
kovidgoyal creator of calibre Posts: 43,858 Karma: 22666666 Join Date: Oct 2006 Location: Mumbai, India Device: Various	You can install the connect reader software and use that to see how the files look. Basically the fonts look like they've been reasterized without any antialiasing.

04-28-2007, 10:58 AM	#20
Gravitas Muppet Posts: 123 Karma: 107 Join Date: Apr 2007 Location: Nottingham, England, UK Device: Zen Vision :M / Nokia 5800 musicXpress / Sony PRS500	1.7 fixed the problems I was having, now works like a dream. Thanks

04-28-2007, 11:21 AM	#22
ashkulz Addict Posts: 350 Karma: 705 Join Date: Dec 2006 Location: Mumbai, India Device: Kindle 1/REB 1200	Okay, I'm planning to release 1.8 in a day or two. The major feature planned would be an all-color pipeline (with option to downsample to grayscale, of course). This won't be of much use to anyone except people who own the REB 1200 (ie. me ) and those who get those newfangled color e-ink readers. Some previews of things look in color: raw page, dilated page, and after color reduction. Regular text pages also work as they used to: raw text page and the dilated text. Do any of you have any feature requests for 1.8? I don't feel comfortable with such short releases where only a few new things are added ...

04-28-2007, 06:41 PM	#23
gdxf Enthusiast Posts: 48 Karma: 27 Join Date: Oct 2006 Device: Sony Reader PRS-500	I used your batch file and changed the batch file conversion directory from "My Desktop" to another drive on my computer. It works! I guess there might be some restriction of user access issue involved, but I am not sure about that. Some files are converted with no problem, others are still with this annoying "unable to determine total number of pages" problem. I later find that those files that cannot be converted include: 1. pdf files with OCR text underneath the image, 2. pdf files with non-alphabet file names. Hope it can be dealt with in later releases.

04-29-2007, 06:09 PM	#29
gdxf Enthusiast Posts: 48 Karma: 27 Join Date: Oct 2006 Device: Sony Reader PRS-500	Thanks ashkulz! I'll convert the filenames to fit utf8 encoding. I've batch converted a dozen files overnight and it turned out quite well.

Advert

Advert