View Full Version : PDFRead - reading PDFs on the 1100/1150/1200 eBook Readers


ashkulz
03-29-2007, 12:41 PM
UPDATE: PDFRead is now hosted at SourceForge (http://pdfread.sourceforge.net).

I have a REB1100 and all other things being fine, I disliked not being able to read PDFs on it. I then watched the development of PDFRasterFarian for the Sony Reader, and thought that the same logic could be applied to the REB 1100/1150/1200 family.

So, the tool has the following features:

support for PDF and DJVU documents

works on Windows, Linux and OS X

supports creating images for any ebook reader device

out of the box profiles for the 1100/1150/1200 and PRS-500.

fast and very accurate autocropping

image dilation for more "thicker" text enabled by default

automatically splitting into multiple pages for landscape mode

generate image will fit the screen size exactly in potrait mode

rotation of image for devices that don't support landscape mode

option for reducing the number of colors to reduce image size

output formats supported: currently html, rb, lrf, imp1, and imp2.


See the sample PDF input (http://www.ctan.org/tex-archive/macros/latex/contrib/bibleref/sample.pdf) and the sample output generated with the REB1100 profile (http://puggy.symonds.net/~ashish/downloads/samples/).

For Windows, download the PDFRead v5 installer (https://sourceforge.net/project/showfiles.php?group_id=87679&package_id=91453). For Linux and OS X, download the PDFRead v5 source code (https://sourceforge.net/project/showfiles.php?group_id=87679&package_id=91453). Please read the README in the source or windows installations for further details.

Changes since v4

added a README

add DJVU input support

allow specifying custom options in the Windows GUI

OS X support fixes (based on sammykrupa's input)

tweaks in the EB-1150 and REB-1200 profile

do not split page if the generated image width/height is less than the device parameters. This caused too many blank pages to be created.


Thanks to alex_d, Falstaff, Liviu_5 and sammykrupa for helping me and providing me inspiration.

Enjoy.....

nekokami
03-29-2007, 01:14 PM
Wow. :huh: This should work for the eBookwise 1150, too, right?

ashkulz
03-29-2007, 01:24 PM
Yep. Just use the eb1150 profile, and you're ready to go!

Shake
03-31-2007, 11:21 AM
Thank you very much for your work!!! This is *very* useful (I also use Ubuntu).

I have one suggestion for the UI: Could you add a crop function? This step has to be done for nearly every pdf...

ashkulz
03-31-2007, 01:16 PM
Thank you very much for your work!!! This is *very* useful (I also use Ubuntu).

I have one suggestion for the UI: Could you add a crop function? This step has to be done for nearly every pdf...

Autocropping is already built into the code. If you mean manual cropping, that's a whole new kettle of fish ... If you do manual cropping with Acrobat or set the CropBox via Ghostscript, then PDFRead should respect it.

If you have a PDF which is not being cropped properly, please post it here and I will look into it.

ashkulz
03-31-2007, 08:06 PM
Added a GUI and automatic creation of LRFs via PyLRS.

Please note that I don't have the Sony Reader, so I haven't really tested the LRF output. However, as I'm using PyLRS for the generation I'm reasonably sure it should work out of the box. Let me know if you have any problems.

sammykrupa
03-31-2007, 09:32 PM
Any word if the Sony Reader support works on OS X?

obelix
04-01-2007, 12:58 AM
I was not able to make it working on Windows XP. It crashes immideately after pushing the "Convert" button independent on profile settings.

sputnik
04-01-2007, 01:01 AM
This is great. Support for the imp format would be fantastic, as I have an 1150 and I won't have to make any additional conversion.

By the way, is there any way of putting rb files on a 1150? I think I read somewhere that if you just change the extension of the file from .rb to .imp, the ebookwise 1150 will recognize it. Is it true? Has anyone tried it?

sputnik
04-01-2007, 01:30 AM
I don't know what am I doing wrong, but it does not work for me. When I right-click on any pdf the first option is Read. I click on it and the Acrobat opens it. Then go to Program Files and I launch PDFRead (i.e., pdfread-gui.exe) and fill in the required information, with the exception of Output ebook, where I don't know what am I supposed to put. Then I click convert, a MS DOS black windows appears and quickly disapears and nothing else happens. It must be something that I did wrong, but what?

P.S. I use eb1150

ashkulz
04-01-2007, 01:47 AM
I don't know what am I doing wrong, but it does not work for me. When I right-click on any pdf the first option is Read. I click on it and the Acrobat opens it. Then go to Program Files and I launch PDFRead (i.e., pdfread-gui.exe) and fill in the required information, with the exception of Output ebook, where I don't know what am I supposed to put. Then I click convert, a MS DOS black windows appears and quickly disapears and nothing else happens. It must be something that I did wrong, but what?

P.S. I use eb1150

Ok, I discovered the problem. There are two workarounds:


Install to a location with no spaces (eg. C:\PDFRead)
Open up the file C:\Program Files\PDFRead\pdfread-run.cmd and change %~dp0bin\pdfread %* to "%~dp0bin\pdfread" %* (i.e. add quotes around the command)

ashkulz
04-01-2007, 01:48 AM
I was not able to make it working on Windows XP. It crashes immideately after pushing the "Convert" button independent on profile settings.

Could you elaborate on that? Does it do a Windows crash? If you could attach a screenshot of that it would be helpful.

sputnik
04-01-2007, 02:02 AM
it's a windows xp legal copy; the machine is fine; no windows crash, no error; it's just that nothing happens after that dos window flickers for a milisecond

ashkulz
04-01-2007, 02:05 AM
Any word if the Sony Reader support works on OS X? I don't really have access to an OS X machine, but it should work. If you have Fink installed, try installing the packages below and then trying the command-line version:


pdftk (http://www.pdfhacks.com/pdftk/#packages)
PIL: try this (http://pdb.finkproject.org/pdb/package.php/pil-py24), this (http://pdb.finkproject.org/pdb/package.php/pil-py25) or this (http://pdb.finkproject.org/pdb/package.php/pil).
xpdf (http://pdb.finkproject.org/pdb/package.php/xpdf)
imagemagick (http://pdb.finkproject.org/pdb/package.php/imagemagick)
ghostscript (http://pdb.finkproject.org/pdb/package.php/ghostscript) and standard fonts (http://pdb.finkproject.org/pdb/package.php/ghostscript-fonts)


Hope that helps.

ashkulz
04-01-2007, 02:07 AM
it's a windows xp legal copy; the machine is fine; no windows crash, no error; it's just that nothing happens after that dos window flickers for a milisecond

Did you try that workaround I mentioned? The crash comments were directed to obleix, not you. Scroll up to see the workaround.

I will release a new version on monday with the fix and IMP format support.

sputnik
04-01-2007, 02:23 AM
I did what you said and it is amazing. I like the landscape orientation better. I just converted a pdf image (scanned text) file and I almost prefer the quality to that of the normal imp files. This piece of soft is really great for me, since I read lots of pdf files which consist of scanned text images. Congratulations!

ashkulz
04-01-2007, 02:37 AM
I did what you said and it is amazing. I like the landscape orientation better. I just converted a pdf image (scanned text) file and I almost prefer the quality to that of the normal imp files. This piece of soft is really great for me, since I read lots of pdf files which consist of scanned text images. Congratulations!

Thanks, I added how to workaround this issue in the main post. I also love the ability to read PDFs -- I have a lot of good reading material, and going the PDF -> HTML route was just too much effort (especially for books with complex layouts).

ashkulz
04-01-2007, 06:53 AM
Okay, the bug which was discovered by sputnik has been fixed and support for IMP format has been added.

For the IMP format support, please install the eBook Publisher (http://www.ebooktechnologies.com/support_publisher_download.htm) from Ebook Technologies. Also note that imp1 = Color VGA (1200) and that imp2 = Grayscale Half-VGA (1150), similiar to their ETI-1 and ETI-2 designations. Also note that the IMP format support works only on Windows.

obelix
04-02-2007, 03:24 PM
Did you try that workaround I mentioned? The crash comments were directed to obleix, not you. Scroll up to see the workaround.

I will release a new version on monday with the fix and IMP format support.

The problem gone with the fix.
Great program, thank you very much.
PDF A4 are readble now on PRS-500. Inrerface is very simple and GUI'ed what makes this program much easy to use than PDFrasterFarian with about the same image quality.

Some notes:

PDF (portrait orientation) is smaller than screen size, it looks like the right and the bottom fields are not cropped. Actually, I think the PRS-500 image size is not correct rather than lack of cropping. It can be seen from the following, for 3 zoom settings:

1. At small size: Portrait PDF is alligned to the top and left, right and bottom margins are about 20% of the screen size.
2. At middle size. Image gets larger, but right and bottom margins still exist.
3. At the highest zoom. Image gets larger than the sceen size, right and bottom parts are out of the scope.

Please check if the image size is set exactly to the screen size: 584х754
(not 600x800 !!!) Reader is very sensitive to the proper screen size.

ashkulz
04-02-2007, 11:27 PM
PDF (portrait orientation) is smaller than screen size, it looks like the right and the bottom fields are not cropped. Actually, I think the PRS-500 image size is not correct rather than lack of cropping. It can be seen from the following, for 3 zoom settings:

1. At small size: Portrait PDF is alligned to the top and left, right and bottom margins are about 20% of the screen size.
2. At middle size. Image gets larger, but right and bottom margins still exist.
3. At the highest zoom. Image gets larger than the sceen size, right and bottom parts are out of the scope.

Please check if the image size is set exactly to the screen size: 584х754
(not 600x800 !!!) Reader is very sensitive to the proper screen size. Well, I guess you're right. According to the info I had, I used the following settings:Default options for the profile prs500:
rotate=none hres=565 format=lrf vres=754 nosplit=True colors=4

Default options for the profile prs500-l:
rotate=left hres=754 format=lrf vres=565 overlap=45 colors=4Should I change that to 584 instead of 565? I simply kept what was the default in PDFRasterFarian, which sets it to 565. Can you try running the following in a command prompt and let me know if the output looks OK?
<installdir>\pdfread-run.cmd -p prs500 --hres 584 <pdf-file>I'll be adding some form of "custom" profile via the GUI sometime soon, as time permits :-)

obelix
04-03-2007, 01:27 AM
Well, I guess you're right. According to the info I had, I used the following settings:Default options for the profile prs500:
rotate=none hres=565 format=lrf vres=754 nosplit=True colors=4

Default options for the profile prs500-l:
rotate=left hres=754 format=lrf vres=565 overlap=45 colors=4Should I change that to 584 instead of 565? I simply kept what was the default in PDFRasterFarian, which sets it to 565. Can you try running the following in a command prompt and let me know if the output looks OK?
<installdir>\pdfread-run.cmd -p prs500 --hres 584 <pdf-file>I'll be adding some form of "custom" profile via the GUI sometime soon, as time permits :-)

Nothing has changed.

1. The PNG images for pages are 497x754 (with slight variation of width for each of the pages)
2. In the LRS file:

"<ImageBlock blockheight="768" blockwidth="600" objid="3" objlabel="ImageBlock.3" refstream="13" x0="0" x1="600" xsize="565" y0="0" y1="800" ysize="754"/>"

i.e xsize is still 565, but this is not that important.

I was not right, you can use 600x800 (in contrast to the text its not very important).

Now I figured out whats happening (looking at the actual image size): To keep proportionality of the page, the program adjust one of the png image sizes (754), the other one is selected to maintain aspect ratio (497). Nothing can be done. My sample has this aspect ratio. The only improvement (I'm not sure if somebody really care) the image can be centered with "<BlockSpace xspace= yspace= ".

So there is no problem, my mistake. Thanks again for great program.

ashkulz
04-03-2007, 05:15 AM
"<ImageBlock blockheight="768" blockwidth="600" objid="3" objlabel="ImageBlock.3" refstream="13" x0="0" x1="600" xsize="565" y0="0" y1="800" ysize="754"/>"

i.e xsize is still 565, but this is not that important.

I was not right, you can use 600x800 (in contrast to the text its not very important).Yeah, I forgot -- I hard coded that 565 in the code, will fix that soon.

BTW, can you really use the full 600x800? I would imagine that some pixels would be taken by the user interface... please let me know what is the biggest size which is not cropped by the reader, as I don't have a reader to actually test it out.


Now I figured out whats happening (looking at the actual image size): To keep proportionality of the page, the program adjust one of the png image sizes (754), the other one is selected to maintain aspect ratio (497). Nothing can be done. My sample has this aspect ratio. The only improvement (I'm not sure if somebody really care) the image can be centered with "<BlockSpace xspace= yspace= ".

So there is no problem, my mistake. Thanks again for great program.

Yes, I forgot to mention that in the release notes. One of the ideas that I initially took from PDFRasterFarian but really annoyed me (and hence changed) is that if you do agressive cropping, and your page has only one paragraph, that whole paragraph fits the whole page (with the letters being really tall). So I took care to maintain the aspect ratio, as it is a slippery slope to decide when to respect it and when to forget it. I do a lot of calculations to find the optimal width and height.

Also, I'd suggest that you try the landscape mode once (prs-500l) -- I would expect it would look as good as a real PDF printout, however with lesser text per page.

sammykrupa
04-03-2007, 06:41 AM
When I tried to run PDFRead on my Mac (after installing all of the dependencies) I get this:

Traceback (most recent call last): File "src/pdfread.py", line 51, in ?
import os, sys, re, subprocess, Image, ImageFilter, ImageChops, optparse, shutil, traceback
ImportError: No module named subprocess

Thanks for the work done so far!

But what's up here?

Sam Krupa

ashkulz
04-03-2007, 07:26 AM
When I tried to run PDFRead on my Mac (after installing all of the dependencies) I get this:

Traceback (most recent call last): File "src/pdfread.py", line 51, in ?
import os, sys, re, subprocess, Image, ImageFilter, ImageChops, optparse, shutil, traceback
ImportError: No module named subprocess Please check that you are using python 2.4 or later (You can find the current version by running "python -V"). If you're using 2.3 or lower, you can dowload the missing subprocess module (http://svn.python.org/view/*checkout*/python/branches/release25-maint/Lib/subprocess.py?content-type=text%2Fplain&rev=53647) (which was introduced in 2.4), save it as subprocess.py in the same directory as pdfread.py, and try again.

sammykrupa
04-03-2007, 05:37 PM
New error:

Traceback (most recent call last):
File "src/pdfread.py", line 51, in ?
import os, sys, re, subprocess, Image, ImageFilter, ImageChops, optparse, shutil, traceback
ImportError: No module named Image


Should I just upgrade to 2.4?

Or is it something deeper?

Sam Krupa

sammykrupa
04-03-2007, 05:48 PM
Update:
Okay, got pass those Image errors. Figured those had to do with PIL. Reinstalled it and I got a little farther this time:

Unable to determine total number of pages in PDF
Please enter total page count: 3

Temporary directory: /tmp/pdfread-oDxOLM

Page 1/3: EXTRACT RASTERIZE BLANK
Page 2/3: EXTRACT RASTERIZE BLANK
Page 3/3: EXTRACT RASTERIZE BLANK
Traceback (most recent call last):
File "/Users/mikekrup/Desktop/pdfread-v4-src/src/pdfread.py", line 515, in ?
PdfConverter().main()
File "/Users/mikekrup/Desktop/pdfread-v4-src/src/pdfread.py", line 338, in main
delete = self.FORMATS[self.options.format](self)
File "/Users/mikekrup/Desktop/pdfread-v4-src/src/pdfread.py", line 279, in generate_lrf
from pylrs.pylrs import Book, PageStyle, BlockStyle, ImageStream, BlockSpace, ImageBlock
File "/Users/mikekrup/Desktop/pdfread-v4-src/src/pylrs/pylrs.py", line 11
from elementtree.ElementTree import (Element, SubElement)
^
SyntaxError: invalid syntax

-------------------------
So close!
Seems to be a problem with PDFs on Mac OS X?

Sam Krupa

ashkulz
04-03-2007, 11:24 PM
Unable to determine total number of pages in PDF
Please enter total page count: 3

Temporary directory: /tmp/pdfread-oDxOLM

Page 1/3: EXTRACT RASTERIZE BLANK
Page 2/3: EXTRACT RASTERIZE BLANK
Page 3/3: EXTRACT RASTERIZE BLANK
Traceback (most recent call last):
File "/Users/mikekrup/Desktop/pdfread-v4-src/src/pdfread.py", line 515, in ?
PdfConverter().main()
File "/Users/mikekrup/Desktop/pdfread-v4-src/src/pdfread.py", line 338, in main
delete = self.FORMATS[self.options.format](self)
File "/Users/mikekrup/Desktop/pdfread-v4-src/src/pdfread.py", line 279, in generate_lrf
from pylrs.pylrs import Book, PageStyle, BlockStyle, ImageStream, BlockSpace, ImageBlock
File "/Users/mikekrup/Desktop/pdfread-v4-src/src/pylrs/pylrs.py", line 11
from elementtree.ElementTree import (Element, SubElement)
^
SyntaxError: invalid syntax


Okay, there some to be two errors here:


All the PDF pages were detected as blank. This is not normal, as that means GhostScript did not produce the PNG file. Go to the temporary directory mentioned, and see if any png images are present. If not, change line 122 and add "print " before exec_cmd and paste the output here. The code should look like: print exec_cmd('gs', '-q', ...
You also need the elementtree module (this is included in python 2.5) -- download the ElementTree library (http://effbot.org/downloads/elementtree-1.2.6-20050316.tar.gz), extract it and copy the folder 'elementtree' to the same location as pdfread.py


BTW, once you get it working, could you post the steps you took to install everything? I'm sure it would be of interest to other people who use OS X.

sputnik
04-04-2007, 01:39 AM
Well, it is not really a cropping problem in itself. As one can see here (http://publish.uwo.ca/~amanafu2/pdfread/), the png's look just fine. However, when I open the imp file (either on EB-1150 or on the computer) the scroll bar of the EB-1150 at the bottom of the screen overlaps a little bit with the image and the first letters of the words in the text cannot be seen.

Also, in some pages, the last line of text in the page is being "cut" by the crop, and becomes completely illegible, as one can see here (http://publish.uwo.ca/~amanafu2/pdfread/1.png) . One possible solution that I can see is to crop the pages so that the bottom of one page would overlap with the top of the next page. In case the text is being cropped and the last line is illegible, this solution would allow one to read the line in question on the next page. Of course, this would probably generate some redundancies, but it's preferable to have the same line of text twice than not to have it at all.

sammykrupa
04-04-2007, 07:07 AM
Okay, I am messing up horribly now. I put the 'elementtree' folder in with the pdfread.py file, and that didn't make those "Element" warnings go away.

Also, there were no PNG files in the temp directory. And I couldn't figure out how to make that change in the code, I keep getting weird errors. It would probably be best for you to upload the changed code for me to try.

Sorry to bother you with this!
Sam Krupa

kgian
04-04-2007, 01:13 PM
Software seems great but has some bugs still. I am trying to convert the sample pdf file you posted in your first post and I get the following outpout. It creates the png files and stops there, not creating the actual imp file. I have a ebw 1150 and have selected eb1150 in the options.
I have ebook publisher installed. Maybe I should uninstall it and install it again after the pdfread software?

"Temporary directory: c:\docume~1\gianno~1\locals~1\temp\pdfread-ss_ddu

Page 1/5: EXTRACT RASTERIZE CROP DILATE RESIZE SPLIT DONE
Page 2/5: EXTRACT RASTERIZE CROP DILATE RESIZE SPLIT DONE
Page 3/5: EXTRACT RASTERIZE CROP DILATE RESIZE SPLIT DONE
Page 4/5: EXTRACT RASTERIZE CROP DILATE RESIZE SPLIT DONE
Page 5/5: EXTRACT RASTERIZE CROP DILATE RESIZE SPLIT DONE

Creating IMP file ... failed, error details follow.

Traceback (most recent call last):
File "pdfread.py", line 365, in generate_imp
File "win32com\gen_py\1103EA00-3A0C-11D3-A6F6-00104B2947FBx0x1x0.pyo", line 11
42, in NewUniqueID
com_error: (-2147352573, 'Member not found.', None, None)

Output directory: c:\docume~1\gianno~1\locals~1\temp\pdfread-ss_ddu

Press any key to continue . . .
"

Actually that was it, you have to reinstall Ebook Publisher AFTER you install pdfread.
Works great now!

ashkulz
04-04-2007, 01:28 PM
Well, it is not really a cropping problem in itself. As one can see here (http://publish.uwo.ca/~amanafu2/pdfread/), the png's look just fine. However, when I open the imp file (either on EB-1150 or on the computer) the scroll bar of the EB-1150 at the bottom of the screen overlaps a little bit with the image and the first letters of the words in the text cannot be seen.
Hmm, can you run it manually and try specifying a lower hres? ie. run the command <install-dir>\pdfread-run.cmd -p eb1150 --hres 454 <pdf-file>. Can you please experiment and let me know what is the optimal resolution at which it doesn't get cut? If you do so, I will update the profile in the next release.

Also, in some pages, the last line of text in the page is being "cut" by the crop, and becomes completely illegible, as one can see here (http://publish.uwo.ca/~amanafu2/pdfread/1.png) . One possible solution that I can see is to crop the pages so that the bottom of one page would overlap with the top of the next page. In case the text is being cropped and the last line is illegible, this solution would allow one to read the line in question on the next page. Of course, this would probably generate some redundancies, but it's preferable to have the same line of text twice than not to have it at all. Well, that is already implemented! If you look closely, over 3 lines of text are overlapping between pages. The line that is cropped on page 1 already appears as the 3rd line on page 2. The amount of overlap is controlled by the "overlap" parameter, which is set to 45 pixels in the eb1150 profile.

ashkulz
04-04-2007, 01:33 PM
Actually that was it, you have to reinstall Ebook Publisher AFTER you install pdfread.
Works great now! Actually, I would guess that you have GEB Librarian installed and you installed it AFTER eBook Publisher. GEB Librarian installs an older version of the SBPubX library which is used for IMP creation. Naturally, when you reinstalled it registered the newer version, which made the problem go away :-)

ashkulz
04-04-2007, 01:36 PM
Okay, I am messing up horribly now. I put the 'elementtree' folder in with the pdfread.py file, and that didn't make those "Element" warnings go away.

Also, there were no PNG files in the temp directory. And I couldn't figure out how to make that change in the code, I keep getting weird errors. It would probably be best for you to upload the changed code for me to try.

Sorry to bother you with this!
Sam Krupa

Okay, here you go. This goes a bit overboard, and prints the result of each command, so you should get a lot of debugging output. I suspect that Ghostscript is giving you an error, can you try with the sample PDF I have posted also?

sammykrupa
04-04-2007, 02:11 PM
Ashkulz!
I am reporting back with some juicy output from PDFRead!

The output is included in the attached text file.

sputnik
04-04-2007, 02:31 PM
Hmm, can you run it manually and try specifying a lower hres? ie. run the command <install-dir>\pdfread-run.cmd -p eb1150 --hres 454 <pdf-file>. Can you please experiment and let me know what is the optimal resolution at which it doesn't get cut? If you do so, I will update the profile in the next release.



I run in windows this command C:\PDFRead\pdfread-run.cmd -p eb1150 --hres 454 C:\Documents and Settings\Owner\Desktop\tempo\cheyne.pdf and then the attached dos window appears. As you can see, I do not know too much about how to run a program manually. The hres specified in the command does not show up in the resulting dos window. Could you please enumerate the steps that I have to follow so that I can run the program manually and experiment with different hres values? I tried changing the value for hres in pdfread.py, but no result.

ashkulz
04-04-2007, 03:47 PM
I run in windows this command C:\PDFRead\pdfread-run.cmd -p eb1150 --hres 454 C:\Documents and Settings\Owner\Desktop\tempo\cheyne.pdf and then the attached dos window appears.

The problem seems to be that you haven't put the file in quotes. So try putting the file somewhere in C:\ with no spaces, or try running with quotesC:\PDFRead\pdfread-run.cmd -p eb1150 --hres 454 "C:\Documents and Settings\Owner\Desktop\tempo\cheyne.pdf"

ashkulz
04-04-2007, 03:59 PM
Ashkulz!
I am reporting back with some juicy output from PDFRead!

The output is included in the attached text file.

Okay, got the problem. It seems that it is due to incorrect version of xpdf. I used 3.01, which has the option -pagecrop but which is not present in 3.00 which you have. Thus, the pdftops program printed a help message and did not create a postscript file, and ghostscript died with "Error: /undefinedfilename in (page.eps)". So you have two options:

Upgrade to 3.01, which will be a problem as I don't see a binary anywhere;
Use the file I have attached which does not use that option.


Note that I am unsure that I will make this fix in the new release, as I do not know what the effect will be if I leave out this option (it treats the CropBox as the page size, removing unnecessary whitespace and/or typesetting). In general, I would recommend updating to a manually-compiled 3.01.

Please let me know if it works, and document the steps you took to achieve it so everyone else who uses OS X can also benefit from it :-)

sammykrupa
04-04-2007, 04:36 PM
Almost there!

There is now a PNG file in the temp directory, but I get this error:

Unable to determine total number of pages in PDF
Please enter total page count: 5

Temporary directory: /tmp/pdfread-6ktemb

Page 1/5: EXTRACT RASTERIZE CROP DILATE RESIZE
Please check that ImageMagick is installed.


I can run the 'convert' command on my system, so I do not know what is up.

Bummer. I hope helping me isn't too much of a problem.

Sam Krupa

sputnik
04-04-2007, 05:44 PM
The problem seems to be that you haven't put the file in quotes. So try putting the file somewhere in C:\ with no spaces, or try running with quotesC:\PDFRead\pdfread-run.cmd -p eb1150 --hres 454 "C:\Documents and Settings\Owner\Desktop\tempo\cheyne.pdf"

hres 445 solved the problem and it also looks good (hres 448 also solves the problem (barely), but i prefer 445). Also, -- overlap 10 works better than -- overlap 45 (at least for smaller fonts), as there is almost no redundancy.

kgian
04-04-2007, 07:12 PM
I agree with sputnik for hres 445 and overlap 10. With these options everything looks fine!

So, for example, the right command should be from the command prompt:

pdfread-run.cmd -p eb1150 -o c:\02\kos.imp --hres 445 --overlap=10 c:\02\kos.pdf


for a file named kos.pdf in the c:\02 directory.

alex_d
04-04-2007, 10:11 PM
ashkulz, why don't you do what I do and bundle all supporting programs with the script? Maybe python can't be bundled, but at least bundle xpdf so that "no, don't use 3.00, use manually compiled 3.01" could be avoided.

The stuff that you did with auto-adjusting autocropping sounds cool. I haven't taken apart your stuff yet. How do you do it? You do the cropping directly with xpdf or you rasterize and then use image tools? How do you make measurements/calculations? Also, i'm confused... do you use ghostcript or xpdf? Could you maybe post a quick summary of your toolchain either here or on the "pythonized pdfrasterfarian" thread?

ashkulz
04-04-2007, 11:40 PM
Almost there!

There is now a PNG file in the temp directory, but I get this error:

Unable to determine total number of pages in PDF
Please enter total page count: 5

Temporary directory: /tmp/pdfread-6ktemb

Page 1/5: EXTRACT RASTERIZE CROP DILATE RESIZE
Please check that ImageMagick is installed.


I can run the 'convert' command on my system, so I do not know what is up.

Bummer. I hope helping me isn't too much of a problem.

Sam Krupa

Hey, no problem -- You're the only "user" who is on OS X and interested enough to try it on that platform, so I have to keep you happy :scholar:

Okay, I discovered that there are a few bugs on Linux caused by a workaround I made for Windows. Will have to see how to fix them, but in the meanwhile can you try the attached version?

ashkulz
04-04-2007, 11:44 PM
I agree with sputnik for hres 445 and overlap 10. With these options everything looks fine! Okay, I will change the hres to 445 in the new version I will be releasing by Saturday. I don't want to reduce the overlap to 10, as it's too small if the font is larger. I'll reduce it to 25, and will provide an option in the GUI where you can reduce it to whatever you prefer.

ashkulz
04-04-2007, 11:54 PM
ashkulz, why don't you do what I do and bundle all supporting programs with the script? Maybe python can't be bundled, but at least bundle xpdf so that "no, don't use 3.00, use manually compiled 3.01" could be avoided.
Well, I *do* bundle it for Windows (see the installer). sammykrupa is on OS X, which is why all this is required -- I don't have access to OS X, and installing private versions of tools is very hard to implement on non-Windows systems, and not recommended at all. It's much better to let the native package management system handle the installation and upgrade process for the individual tools.


The stuff that you did with auto-adjusting autocropping sounds cool. I haven't taken apart your stuff yet. How do you do it? You do the cropping directly with xpdf or you rasterize and then use image tools? How do you make measurements/calculations? Also, i'm confused... do you use ghostcript or xpdf? Could you maybe post a quick summary of your toolchain either here or on the "pythonized pdfrasterfarian" thread? Okay, will answer one by one:

I use xpdf to convert from pdf -> ps, and then rasterize that from Ghostscript
I don't use the Ghostscript cropbox detection at all, though you can enable it by --gscrop. I had problems with it when the PDF already had a CropBox which covered prepress marks.
I use the PIL to detect all the "white" space surrounding an image, and directly crop that. This is very fast and accurate -- unless you've got a scan (where there may be some noise) it will remove all of the whitespace (even more than what is detected by Ghostscript). I plan to add noise elemination and more agressive cropping (similiar to what curiouser did) soon.


I'll mention the technical details in the other thread.

sammykrupa
04-05-2007, 06:22 AM
Ashkulz,
Your magic seems to have made the ImageMagick error disappear, but that original pesky error we where getting before still exists:

p$ python pdfread.py -p prs500 sample.pdf
Unable to determine total number of pages in PDF
Please enter total page count: 5

Temporary directory: /tmp/pdfread-tCVV7p

Page 1/5: EXTRACT RASTERIZE CROP DILATE RESIZE DONE
Page 2/5: EXTRACT RASTERIZE CROP DILATE RESIZE DONE
Page 3/5: EXTRACT RASTERIZE CROP DILATE RESIZE DONE
Page 4/5: EXTRACT RASTERIZE CROP DILATE RESIZE DONE
Page 5/5: EXTRACT RASTERIZE CROP DILATE RESIZE DONE
Traceback (most recent call last):
File "/Users/mikekrup/Desktop/pdfread-v4-src/src/pdfread.py", line 496, in ?
PdfConverter().main()
File "/Users/mikekrup/Desktop/pdfread-v4-src/src/pdfread.py", line 319, in main
delete = self.FORMATS[self.options.format](self)
File "/Users/mikekrup/Desktop/pdfread-v4-src/src/pdfread.py", line 260, in generate_lrf
from pylrs.pylrs import Book, PageStyle, BlockStyle, ImageStream, BlockSpace, ImageBlock
File "/Users/mikekrup/Desktop/pdfread-v4-src/src/pylrs/pylrs.py", line 11
from elementtree.ElementTree import (Element, SubElement)
^
SyntaxError: invalid syntax


I have the folder called "elmenttree" in with the pdfread.py file.

Sam Krupa

ashkulz
04-05-2007, 07:21 AM
Ashkulz,
Your magic seems to have made the ImageMagick error disappear, but that original pesky error we where getting before still exists:
I assume you got the PNG page images done properly? That means that there's only a little more to go ;-)


Traceback (most recent call last):
File "/Users/mikekrup/Desktop/pdfread-v4-src/src/pdfread.py", line 496, in ?
PdfConverter().main()
File "/Users/mikekrup/Desktop/pdfread-v4-src/src/pdfread.py", line 319, in main
delete = self.FORMATS[self.options.format](self)
File "/Users/mikekrup/Desktop/pdfread-v4-src/src/pdfread.py", line 260, in generate_lrf
from pylrs.pylrs import Book, PageStyle, BlockStyle, ImageStream, BlockSpace, ImageBlock
File "/Users/mikekrup/Desktop/pdfread-v4-src/src/pylrs/pylrs.py", line 11
from elementtree.ElementTree import (Element, SubElement)
^
SyntaxError: invalid syntax
Now this is really weird. That is a SyntaxError, not an ImportError as I assumed earlier. I can't see why it is so, it works perfectly well for me. Can you open that file, and remove the brackets after import eg. make it "import Element, SubElement"? Maybe it isn't valid syntax in python 2.3. If you get further errors, I would recommend upgrading to Python 2.4 and trying again as pylrs is written by Falstaff (http://www.mobileread.com/forums/showthread.php?t=9768) and I don't really know the code that well.

sammykrupa
04-05-2007, 02:30 PM
It works! PDFRead runs great on OS X!

Upgraded python to 2.4.3, and everything works!

Instructions to come eventually!

sammykrupa
04-05-2007, 03:01 PM
Instructions for running PDFRead on Mac OS X have been posted here:

http://www.mobileread.com/forums/showthread.php?p=64075#post64075

Did I miss anything?

ashkulz
04-06-2007, 01:40 AM
Instructions for running PDFRead on Mac OS X have been posted here:

http://www.mobileread.com/forums/showthread.php?p=64075#post64075

Did I miss anything?

Cool! That means now OS X users will be able to use PDFs directly.

I will be creating a new release shortly which will remove the need for a lot of the workarounds needed right now, and then update the release notes.

Thanks for providing such detailed installation instructions!

sammykrupa
04-06-2007, 08:50 AM
It's the least I could do. You gave me PDFRead!

nekokami
04-06-2007, 09:10 AM
Instructions for running PDFRead on Mac OS X have been posted here:

http://www.mobileread.com/forums/showthread.php?p=64075#post64075

Did I miss anything?
Super! I'm also an OSX user, and had been waiting for this to get sorted before trying PDFreader (though meanwhile I've been a bit distracted because I now have an iLiad as well as my eBw 1150).

Thanks!

ashkulz
04-06-2007, 10:24 AM
Okay, new version released. Changes in this version:


added a README
add DJVU input support
allow specifying custom options in the Windows GUI
OS X support fixes (based on sammykrupa's input)
tweaks in the EB-1150 and REB-1200 profile
do not split page if the generated image width/height is less than the device parameters. This caused too many blank pages to be created.


sammykrupa/nekokami: can you see if the installation instructions for OS X I have included are proper?

Azayzel
04-06-2007, 10:27 AM
Sweet DJVU support, that's awesome! I was kind of hoping for this to get the few books in that format onto my Reader. Have you any experience with the IMP format?

ashkulz
04-06-2007, 10:45 AM
Do you mean for importing IMP? That's a dead-end format from which you can never convert back to anything -- I know, I tried reverse-engineering it a while back but gave up and bought a 1100. The ProcessText people (http://www.processtext.com/abcebookwise.html) are attempting to build one, but it may take some time (possibly never).

However, using PDFRead you can convert *to* IMP as long as you have eBook Publisher installed.

nekokami
04-06-2007, 11:49 AM
I'll try to test tonight or tomorrow.

Darn, if I'd known this was coming, I might have held off on buying that iLiad... though plenty of fun things are happening there, too. ;)

Azayzel
04-06-2007, 12:04 PM
Hmmm, I was hoping it wasn't. I found a tool on the Russian website where Book Designer is found called Win unIMP and got my hopes up. It appears to deconstruct the IMP's into to a resource directory, but I can't make anything out of the results. I checked the web looking for more details, but all I found were technical specs of the directory's contents; nothing that I saw was leading me in the right direction. I guess my only alternative is to convert it to PDF using the eBook Viewer and resize to Reader specs. I'm sure there has to be a way to get the stuff out of there is a piece of software on your PC can display it for you (it even lets you select words/paragraphs/etc.).

Thanks for the input!

ashkulz
04-06-2007, 12:56 PM
Darn, if I'd known this was coming, I might have held off on buying that iLiad... though plenty of fun things are happening there, too. ;) Well, that's one of the main reasons why I'm sticking to the REB1100. I was tempted briefly to get one of the Bookeen eReaders, but that was taking some time and the high price discouraged me. The main reason for me to upgrade (PDF support) is there in my 1100, I can live without the fantastic screen till the prices drop dramatically -- I expect that will happen when color eInk screens are released and start getting used in a year or two.

The one thing I'd buy in a second without (much) regard to price would of course be the InfoPad (http://mobileopportunity.blogspot.com/2006/05/desperately-seeking-info-pad.html), if it ever gets done :-)

ashkulz
04-06-2007, 01:00 PM
PDFRead now officially has a home at SourceForge (http://pdfread.sourceforge.net/) (I've been using their file release system for a while).

Please note that it may take some time for all the mirrors to get the new releases, so try again later if it doesn't work immediately.

ashkulz
04-07-2007, 03:12 AM
Hmm, I'm kind of running out of ideas on what new features to add. Do any of you have any suggestions/features that you'd like to see added in the next release (if any)?

sammykrupa
04-07-2007, 09:32 AM
Tested v5 with generating lrf files on Mac OS X.

Worked great!

Instructions were also proper.

alex_d
04-09-2007, 06:37 AM
Ha, I love it when someone asks "what features can i add?" And I love it even more that I know that you're the kind of guy to actually implement them.

Here are some things that I know people want given my experience gathering feedback for pdfrasterfarian:

Working with images as input (something I never got around to doing myself):
Lots of people have folders of images of scanned books or comics they'd like to collate. Part of this feature is the ability to process double-page scans. Ie, you first split the scan down the middle (down a user-adjustable line) and then again if you're converting into landscape mode. This was one of the most popular feature requests I received.

Some minor stuff:
When a user selects a file, the format (pdf/djvu) can be selected automatically based on the extension. Also, when a user selects a profile, the file extension (eg .lrf) MUST be added automatically. (Otherwise the device won't open the file and the user won't know why.)

Some more minor stuff:
Try to add explanations for your options, particularly on the "customize" page. Even I am not sure what some of the things mean, and you're really robbing your users by taking away configurability (either completely or just in practice). A good way to add explanations is a box at the bottom whose contents change depending on where the mouse points.

Some more minor stuff, continued:
When you give people configuration options, and even if you add good explanations, there should be a way to preview the result.

ashkulz
04-09-2007, 09:04 AM
Ha, I love it when someone asks "what features can i add?" And I love it even more that I know that you're the kind of guy to actually implement them. Bring 'em on! :)

Working with images as input (something I never got around to doing myself):
Lots of people have folders of images of scanned books or comics they'd like to collate. Part of this feature is the ability to process double-page scans. Ie, you first split the scan down the middle (down a user-adjustable line) and then again if you're converting into landscape mode. This was one of the most popular feature requests I received. Hmm, there's already a fantastic (but a tad intimidating) tool called unpaper (http://unpaper.berlios.de) which does all of that and more. As it has quite a few options to tweak the way it works, I'll have to see how to integrate it into the current pipeline (don't want to add too many options to PDFRead).

A quick and easy-to-implement way would be to write a wrapper GUI for unpaper, combine the unpaper output to a multi-page TIFF and then add TIFF support to PDFRead. That would solve most people's needs, and yet keep the complex unpaper configuration out of PDFRead.


Some minor stuff:
When a user selects a file, the format (pdf/djvu) can be selected automatically based on the extension. Also, when a user selects a profile, the file extension (eg .lrf) MUST be added automatically. (Otherwise the device won't open the file and the user won't know why.)
That's going to be a bit tough, as the Windows GUI isn't really a GUI. I've actually hijacked an installer (based on NSIS (http://nsis.sourceforge.net)) to provide GUI-like features. Main reason: small code (~60kB) which works on all Windows versions with zero dependencies. I don't really want to get into the GUI thing for now, I want to focus on features (If someone does write a GUI, I'd be happy to include it -- all the current GUI does is build a command line and then execute it).


Some more minor stuff:
Try to add explanations for your options, particularly on the "customize" page. Even I am not sure what some of the things mean, and you're really robbing your users by taking away configurability (either completely or just in practice). A good way to add explanations is a box at the bottom whose contents change depending on where the mouse points.
You're right, some of the options will not make sense unless you know what is actually happening during all the various stages. I'll make a document about it and post it on the website sometime tomorrow.


Some more minor stuff, continued:
When you give people configuration options, and even if you add good explanations, there should be a way to preview the result. The best way I can think of is to allow a page spec for conversion and convert for those pages only. ie. that way, you can convert a single page and see how it looks finally. Don't know how it will integrate with the windows GUI, though.

alex_d
04-09-2007, 11:13 AM
"(don't want to add too many options to PDFRead)"

aww.... come on. Don't let your windows software get bogged down by the shortcomings of the command line. (I know this is the reason because you've mentioned it elsewhere.) The whole point of a GUI is that it allows you to effectively support and present many more features. If you don't want to add them to the linux version, that's fine.

"That's going to be a bit tough, as the Windows GUI isn't really a GUI. I've actually hijacked an installer (based on NSIS) to provide GUI-like features. Main reason: small code (~60kB) which works on all Windows versions with zero dependencies."

Oh.

"I don't really want to get into the GUI thing for now, I want to focus on features"

It's kind of hard to separate the two, seeing as it's hard to add features to the command line (and you're already claiming to have hit the ceiling).

I know the real reason you don't want to do a gui. It's because you're unfamiliar with how to do it and it does take quite a bit of work. (I know, because that's the exact same reason pdfrasterfarian stayed text-based.)

"(If someone does write a GUI, I'd be happy to include it -- all the current GUI does is build a command line and then execute it)."

Yeah, we need that person.

"You're right, some of the options will not make sense unless you know what is actually happening during all the various stages. I'll make a document about it and post it on the website sometime tomorrow."

Eh... a document isn't really the right solution. No one reads those things. A program should be its own document. I understand that your installer approach presents big limitations, but on the other hand it should have strong facilities to create a "wizard" to ask questions next to explanations similar to pdfrasterfarian.

"The best way I can think of is to allow a page spec for conversion and convert for those pages only. ie. that way, you can convert a single page and see how it looks finally. Don't know how it will integrate with the windows GUI, though."

It shouldn't be a problem if you can just call a command in the middle rather than the end.

ashkulz
04-09-2007, 12:34 PM
don't want to add too many options to PDFRead aww.... come on. Don't let your windows software get bogged down by the shortcomings of the command line. (I know this is the reason because you've mentioned it elsewhere.) The whole point of a GUI is that it allows you to effectively support and present many more features. If you don't want to add them to the linux version, that's fine. That's a bit out of context -- I don't want to add too many options unrelated to PDFRead (as the ones related to unpaper would be). In fact, I'd say that the command line has a better UI than any GUI, but that's something that everyone has [highly divergent] opinions on :)

I don't really want to get into the GUI thing for now, I want to focus on featuresIt's kind of hard to separate the two, seeing as it's hard to add features to the command line (and you're already claiming to have hit the ceiling). Uhm .. not really. I can (and will) add many more features related to the core functionality of PDFRead. I've not reached any kind of ceiling, it's just that I want to keep the current code lean and mean -- it's much easier to maintain stuff that way. I'd like to do a mini-rewrite (like I mentioned in the other thread) but that's not really necessary at all (I'm just very finicky when it comes to code quality).

I know the real reason you don't want to do a gui. It's because you're unfamiliar with how to do it and it does take quite a bit of work. (I know, because that's the exact same reason pdfrasterfarian stayed text-based.) Now that's right on the mark ;) I don't want to take the effort of maintaining a GUI, when it is much simpler to maintain a CUI. I have written GUIs (although now I do mostly web-based at work), but it takes much more code and effort. As a comparison -- all of PDFRead is ~620 lines, while the GUI code + configuration is ~518 lines (and that's very short, mind you).

You're right, some of the options will not make sense unless you know what is actually happening during all the various stages. I'll make a document about it and post it on the website sometime tomorrow. Eh... a document isn't really the right solution. No one reads those things. A program should be its own document. I understand that your installer approach presents big limitations, but on the other hand it should have strong facilities to create a "wizard" to ask questions next to explanations similar to pdfrasterfarian. Point taken, but some of the options will require you to understand the process and how that option fits in it. And yes, the installer approach does has its limitations -- but I got it up in less than a day, and it's good enough for everyday tasks. The folks who want to customize to the Nth degree can always use the command line...

The best way I can think of is to allow a page spec for conversion and convert for those pages only. ie. that way, you can convert a single page and see how it looks finally. Don't know how it will integrate with the windows GUI, though. It shouldn't be a problem if you can just call a command in the middle rather than the end. I'm not sure I get what you mean.

alex_d
04-09-2007, 08:04 PM
nope, command lines are worse because a) you're gonna hate typing even a measly dozen options and b) you're gonna have to keep switching to the man page just to remember/understand what to type.

It infuriates me how the *nix crowd doesn't understand the whole point of guis. They think it's just about pretty buttons and using the mouse. No, it's a fundamentally better way to present options and features.

It's also DEFINATELY not about mouse-vs-keyboard. (Well, on *nix it might be, but that's because the guis there are programmed so poorly.) A good gui will let you do everything from the keyboard. (E.g., ever notice those underlined letters?) You can use VS.net from the keyboard same as you can vi, but VS will expose 100x the features and the fraction of features you'll actually know how to use will also be far greater.

Of course none of that changes the fact that neither I nor you know how to do a good gui, but at least I don't try to esteem the limitations of my medium. The *nix crowd always does that, and it saddens me that *nix has hardly evolved in 30 years. 30 years!

On the other hand, though, I must say the command-line is very very useful when you're tying together various programs to do something new. Obviously that's the only way pdfrasterfarian was constructued, and I owe much gratitude. But if that's the rightful use of the command-line, then I think no one should have qualms with cramming in hundreds of options and parameters and a long manual that no human should ever have to use directly in the day-to-day.

This was an issue of discussion in the backend thread, and it surprised me when you said "no, the backend shouldn't have so many command-line options."

ashkulz
04-10-2007, 01:30 AM
nope, command lines are worse because a) you're gonna hate typing even a measly dozen options and b) you're gonna have to keep switching to the man page just to remember/understand what to type.

It infuriates me how the *nix crowd doesn't understand the whole point of guis. They think it's just about pretty buttons and using the mouse. No, it's a fundamentally better way to present options and features.

It's also DEFINATELY not about mouse-vs-keyboard. (Well, on *nix it might be, but that's because the guis there are programmed so poorly.) A good gui will let you do everything from the keyboard. (E.g., ever notice those underlined letters?) You can use VS.net from the keyboard same as you can vi, but VS will expose 100x the features and the fraction of features you'll actually know how to use will also be far greater.

Of course none of that changes the fact that neither I nor you know how to do a good gui, but at least I don't try to esteem the limitations of my medium. The *nix crowd always does that, and it saddens me that *nix has hardly evolved in 30 years. 30 years!

On the other hand, though, I must say the command-line is very very useful when you're tying together various programs to do something new. Obviously that's the only way pdfrasterfarian was constructued, and I owe much gratitude. But if that's the rightful use of the command-line, then I think no one should have qualms with cramming in hundreds of options and parameters and a long manual that no human should ever have to use directly in the day-to-day. And now, that's one opinion which I won't attempt to address at all. That's start another thread altogether, so I'd really prefer not to move off the topic we're discussing here :)

This was an issue of discussion in the backend thread, and it surprised me when you said "no, the backend shouldn't have so many command-line options." alex_d, all I said was that "I don't want to add too many options unrelated to PDFRead (as the ones related to unpaper would be)". I don't think that's being unreasonable. To put it in context, would you replicate all the options in any GUI in a seperate GUI, then call the earlier GUI with the options filled in the second one? That's what I was objecting to, adding all the unpaper command-line options (http://unpaper.berlios.de/#options) (and they are quite a lot) to satisfy maybe < 20% of the users. I have no problems in adding unpaper related functionality, but not in a way which would make it overly complex for everyone ie. use the principle "Make things as simple as they can be, and no simpler".

nekokami
04-10-2007, 02:38 PM
Would it be possible/feasible to have an unpaper CLI option that takes unpaper arguments and passes them straight through?

Speaking as an 11-year employee of Sun Microsystems, I think Unix has come quite a long way in the last 30 years. I can now recommend Ubuntu, for example, to my non-geek friends, with a straight face. Never mind OSX, which is unix under the hood, and is being sold by Apple, the company that thinks you shouldn't need to know what's really going on with your computer. (I'm typing this on OSX.)

ashkulz
04-11-2007, 01:10 AM
Would it be possible/feasible to have an unpaper CLI option that takes unpaper arguments and passes them straight through? Now, that's something which is quite feasible. I'll have to study it a bit though because if you use unpaper to split a double page layout, then more pages will have to be created. The single page layout should be very easy to implement. nekokami, do you have a few samples of such scanned pages? I don't have a scanner, so I can't really test it out

I'll have to think of something to do in the Windows GUI. Is anyone volunteering to create one, even after hearing the opinions of alex_d and me? :)

dstampe
04-11-2007, 12:52 PM
What kind of file vs. Reader space sizes can be expected? I tried converting a 3300-page PDF file (large print) and got a 100 MB LRF file! I thought file sizes were supposed to be comparable to the source PDF file? What kind of compression is used (if any)?

ashkulz
04-11-2007, 01:44 PM
What kind of file vs. Reader space sizes can be expected? I tried converting a 3300-page PDF file (large print) and got a 100 MB LRF file! I thought file sizes were supposed to be comparable to the source PDF file? What kind of compression is used (if any)?

That's quite a lot of pages! Must have taken ages to convert, too.

File sizes will be comparable to the source PDF if it mostly consisted of graphics, otherwise it can be expected to be in the order of 2-3 times the PDF. However, I've tried this mostly with small PDFs (300-400 pages). Some possibilites on reducing file size:

reduce the colors used to 2 (ie. monochrome). You can do this currently by customizing the profile
Use the portrait mode (should result is lesser pages)
optimize the PNG using optipng (extra processing step, not yet implemented)


Can you try #1 and #2 above?

dstampe
04-11-2007, 03:38 PM
The file was already portrait. I tried a similar file with only 20 pages: 600K with 4 colors, 400K with 2 colors. I don't expect adding extra PNG processing is going to save more than 20% on file size.

Original PDF file size was 130K for the 20-page file, and 7500K for the 3300-page file. Both had embedded fonts. The 20-page file increased in size by a factor of 5, the 3300-page file by a factor of 13. Not sure why the uncrease is size was different in each case, as settings were the same.


Is there any way to disable dilation and resizing in order to keep fonts cleaner? Without using 4 grayscale levels, the resizing after dilation resukts in significantly messier characters. I'm not sure if the Reader does antialiasing of fonts internally utilizing the grayscale capability of the display.

alex_d
04-11-2007, 09:23 PM
30-50 kb/page is the unavoidable result of converting to images. The difference in the "increase factor" simply has to do with how much non-text data the pdf contained. Output size stays consistent, input size can vary widely. Most people, however, find that 30-50 kb/page is less than what their original pdf took up. Nevertheless, a completely text pdf might use just a few kb per page.

These days, however, a 1GB sd card costs about $10 (with 4GB also available) so none of the above should really matter.

About the blurry text: Ashkulz chose not to implement the sharpening filter found in PDFrasterFarian. You might want to try the other program and see if you like the result. The possibilities for post-processing, however, are vast and few have so far been explored. If you have any suggestions, they would be welcome.

dstampe
04-12-2007, 08:07 AM
Yes, memory cards are a possibility. The reason I don't use them for books is that it appears you can't organize the stuff on the card by collections. I tend to keep 100 or so books on the reader (just a preference). Another disadvantage is that the battery life is considerably shorter when reading from a card.

I wasn't really concerned about blurriness, in fact this is required for antialiasing. I wanted to eliminate dilation (and possibly resizing) because the low-vision fonts I use are already fat, and the extra dilation makes then merge too much.

Don't worry too much about these items, I am still searching for a way to handle text books with large low-vision fonts on the reader. So far I've tried:

- PDF with embedded fonts (works, as long as it's less than 200 pages or page turns get ridiculously long, and there is a hard 1000-page limit)

- PDF with conversion to graphics (slow conversion and large file sizes)

- BokkDesigner and embedded fonts (half works, but does not properly handle boldface and does not support left-justification and hyphenation). The result is poor readability with large fonts.

I am considering these as possible solutions:

- re-flashing the reader with better fonts and standardizing on RTF (main problem is getting good fonts that work with other books, and the lack of hyphenation).

- Trying to use Word to force hyphenation, then create hard page-breaks for left-justification. Hard to see how this can work as the font layout in Word and reader will probably differ.

alex_d
04-12-2007, 01:53 PM
One way to organize on memory cards is to use the "author" field in pdfread or pdfrasterfarian as categories. You can then sort by "author" and get some semblance of organization. You can also carry multiple sd cards. Kind of fun and old school.

I'm not so sure about battery life being a lot shorter with a memory card, however. There might be an effect but nothing that would turn the reader's huge battery life into a short one.

Also, if you don't want any dilation, pdfrasterfarian has options to only do sharpening or to skip all postprocessing.

p.s. this probably isn't any of my business, but keep in mind that "going easy" on your eyes will, like not exercising any other part of your body, just make them go soft and deteriorate quicker. i mean if you can't read you can't read, but don't willfully overcompensate.

sputnik
04-20-2007, 01:07 PM
It would be nice to have an option to select a range of pages to be converted, instead of the whole document. I got some very long pdf books (almost one thousand pages each) and conversion takes too long. I know, I could use Adobe products to obtain smaller documents (book chapters or sections), but a feature that you could use to specify a page range would be great.

ashkulz
04-21-2007, 01:21 AM
It would be nice to have an option to select a range of pages to be converted, instead of the whole document. I got some very long pdf books (almost one thousand pages each) and conversion takes too long. I know, I could use Adobe products to obtain smaller documents (book chapters or sections), but a feature that you could use to specify a page range would be great.

That's already present in the new release which I'll be releasing in a day or two along with a bunch of other things :)

sputnik
04-23-2007, 07:44 PM
great

Arcee
04-25-2007, 12:47 AM
I seem not to be able to install the python imaging library. I think it can't find SDK? ... This is starting to not make sense to me. :huh:

Can someone look at the output file and see what I can do to get this installer to work?

btw... this is PIL 1.1.6 and Python version is 2.5.1

Yes and thank you Ashkulz for you work on this, I can't wait to get it working.

ashkulz
04-25-2007, 12:59 AM
I seem not to be able to install the python imaging library. I think it can't find SDK? ... This is starting to not make sense to me. :huh: I looked at the log, and it looks like you don't have the proper Apple Developer Tools (http://developer.apple.com/tools/) installed. Please try to (re)install that and try again.

ashkulz
04-25-2007, 08:05 AM
PDFRead 1.6 has been released (http://www.mobileread.com/forums/showthread.php?t=10558). It has all the features which have been requested here, plus quite a bit more. Please see the linked thread for details :)

romsempire
04-27-2007, 02:25 AM
Thx for your fantastic tool.
According to me, there are two little bugs in versione 1.6:
1) when I make an ebook the generated file hasn't extension.
for example

lrf for prs500
imp for Reb1150, etc.

2) I have tried to make an ebook in eb1150 profile. the ebook is created but eBookwise Librarian recognize it like REB1200/GEB2150, so I cannot see this ebook in reb1150 bookshelf.

ashkulz
04-27-2007, 02:57 AM
when I make an ebook the generated file hasn't extension.
for example
lrf for prs500
imp for Reb1150, etc. This has already been implemented in subversion (http://pdfread.svn.sourceforge.net/viewvc/*checkout*/pdfread/trunk/doc/index.html?revision=16#changes) yesterday, should be part of next release.
I have tried to make an ebook in eb1150 profile. the ebook is created but eBookwise Librarian recognize it like REB1200/GEB2150, so I cannot see this ebook in reb1150 bookshelf. Hmm, that's weird. Can you attach a copy of the input and output documents and/or screenshots of the process?

romsempire
04-27-2007, 02:09 PM
Hi, I have converted the first 10 pages of a pdf files.
The problem happen with every pdf file.

I haven't attachted input fiule because is too large

ashkulz
04-27-2007, 02:21 PM
Hi, I have converted the first 10 pages of a pdf files.
The problem happen with every pdf file.

I haven't attachted input fiule because is too large Now, that is really wierd. It works quite well for me, and quite a few other people. Can you try the following:

Try with PDFRead 1.7 (https://sourceforge.net/project/showfiles.php?group_id=87679&package_id=91453) instead of the 1.6 you have right now.
Uninstall and reinstall the latest eBook Publisher (http://www.ebooktechnologies.com/support_publisher_download.htm).
Try running on a computer with the English windows version, instead of italian.


This one really has me stumped.

davidw89
04-14-2008, 09:22 AM
What are the settings to get a full portrait mode for prs-505?

nrapallo
04-14-2008, 11:55 AM
What are the settings to get a full portrait mode for prs-505?

The profile settings for the prs-505 are listed in the most recent thread for PDFRead version 1.8 here (http://www.mobileread.com/forums/showthread.php?t=21906).

wizard327
06-21-2008, 03:53 AM
I tried PDFRead to convert some of my pdf files to imp2 format. Unfortunately, I have to use the landscape mode just to be able to read it on my EBW1150. I always prefer the portrait mode but the result is hardly readable. Even on landscape mode, sometimes the text cannot be read. I also noticed that using PDFRead, the file is bloated to several megabytes. I tried experimenting using Mobipocket Creator. I first convert my pdf files ( with images) to mobipocket format ( prc) using Creator then use Mobi2IMP to convert to imp. The result is outstanding ( for me at least). Not only can I read in portrait mode but the final file is a just a few megabytes. :bulb2:

dbh1960
06-04-2009, 02:43 PM
I tried PDFRead to convert some of my pdf files to imp2 format. Unfortunately, I have to use the landscape mode just to be able to read it on my EBW1150. I always prefer the portrait mode but the result is hardly readable. Even on landscape mode, sometimes the text cannot be read. I also noticed that using PDFRead, the file is bloated to several megabytes. I tried experimenting using Mobipocket Creator. I first convert my pdf files ( with images) to mobipocket format ( prc) using Creator then use Mobi2IMP to convert to imp. The result is outstanding ( for me at least). Not only can I read in portrait mode but the final file is a just a few megabytes. :bulb2:

Thank you for posting this. It helped me tremendously!