View Full Version : Yet another PDF to LRF converter


Pages : [1] 2 3

cacapee
08-22-2007, 12:54 PM
Moderator's Note: I've taken down the attached programs because it appears to be in violation of GPL. They will go back up after the issue is resolved.

Nate the great



Hi, I've taken some of the ideas of existing tools along with a few refinements of my own to code this one up. I've attached a few sample conversions to get an idea of what the tool can do.

The refinements are --runpages (which causes adjacent pdf pages to be spliced into the same image if possible) and --smartcut (which avoids the annoying splits at the edge of the image) Another feature is that landscape mode (which is the default) rotates the image but doesn't actually use the Reader's landscape mode.

f_l.lrf outputs the page that can be viewed by rotating the reader (-rs)
f_p.lrf outputs the page in portrait mode (-prs)
f_po.lrf outputs the page without runpages and smartcut (-p)
g_2col.lrf sample of two column mode (-vrs)
comicl.pdf comicp.pdf strips in landscape and portrait mode (uses --nosplitpage -rs)

The pdf files tested are linked here

http://cm.bell-labs.com/cm/ms/what/shannonday/shannon1948.pdf
http://www.comp.nus.edu.sg/~tants/tsm/tsm.pdf


pdflrf 0.99

A program to generate lrf files for the Sony Reader
Needs Ghostscript to be installed unless you are using poppler

Usage: For 2 column portrait mode
pdflrf.exe -vrs -i file.pdf -o file.lrf
For landscape mode
pdflrf.exe --rotation=-90 -rs -i file.djvu -o file.lrf
For comics
pdflrf.exe -nrs --erode 1 -i file.cbz -o file.lrf


-h, --help Print help and exit
-V, --version Print version and exit
-i, --input=STRING Input file (PDF, DJVU, CBZ)
-o, --output=STRING Output file
-f, --firstpage=INT First page to process (default=`1')
-l, --lastpage=INT Last page to process (default=`-1')
--ghostscript Use ghostscript (instead of poppler by default)
to read pdf files (default=off)

LRF file metadata
-t, --title=STRING Title
-a, --author=STRING Author
--category=STRING Category
--publisher=STRING Publisher

LRF file generation properties
--fit=STRING Scale image (possible values="width",
"height", "2xheight" default=`width')
--rotation=STRING Rotation (possible values="-90", "0",
"90", "180" default=`-90')
--filter=STRING Resizing filter (possible values="lanczos",
"quadratic", "cubic", "catrom",
"mitchell", "sinc", "bessel"
default=`lanczos')
--stretch Stretch image to fit screen (default=off)
-v, --vsplit Vertically split the page (default=off)
-r, --runpages Run pages together (default=off)
--pad=INT Pad (in pixels) to add when concatenating pages
(default=`3')
-s, --smartcut Cut pages at blank lines (default=off)
-n, --nosplitpage Do not split pages across images (default=off)
--notoc Do not add TOC (default=off)

Image processing
--erode=INT Size of erosion kernel (2 works well too)
(default=`3')
--overlap=FLOAT Overlap % between successive pages
(default=`0.05')
-c, --colors=INT Number of colors in final image (default=`4')
--grayscale Convert image to grayscale (default=on)
--nocrop Do not crop the sides automatically
(default=off)
--outputimages Write generated images out (default=off)
--width=INT Width of final image (default=`584')
--height=INT Height of final image (default=`754')

The following options are run before any of the above processing is done
--trimleft=FLOAT Trim width*trimleft/100 pixels from the left for
all pages (default=`0')
--trimright=FLOAT Trim width*trimright/100 pixels from the right
for all pages (default=`0')
--trimtop=FLOAT Trim height*trimtop/100 pixels from the top for
all pages (default=`0')
--trimbottom=FLOAT Trim height*trimbottom/100 pixels from the bottom
for all pages (default=`0')
--eventrimleft=FLOAT Trim width*trimleft/100 pixels from the left for
even pages (default=`0')
--eventrimright=FLOAT Trim width*trimright/100 pixels from the right
for even pages (default=`0')
--eventrimtop=FLOAT Trim height*trimtop/100 pixels from the top for
even pages (default=`0')
--eventrimbottom=FLOAT Trim height*trimbottom/100 pixels from the bottom
for even pages (default=`0')
--oddtrimleft=FLOAT Trim width*trimleft/100 pixels from the left for
odd pages (default=`0')
--oddtrimright=FLOAT Trim width*trimright/100 pixels from the right
for odd pages (default=`0')
--oddtrimtop=FLOAT Trim height*trimtop/100 pixels from the top for
odd pages (default=`0')
--oddtrimbottom=FLOAT Trim height*trimbottom/100 pixels from the bottom
for odd pages (default=`0')
--fuzz=FLOAT Fuzz factor % for matching colors
(default=`0.01')


version 0.2 now reads in cbz and djvu files. It also reads pdf files using poppler - so you do not need to install ghostscript.

version 0.3 has a windows gui. --portrait has been removed and replaced by --rotation. Default is portrait now. Also metadata from pdf files are automatically read

version 0.4 adds drag/drop and batch processing (drag mutiple files over) + other small refinements (not specifying smartcut/runpages/splitpage scales a page to fit in an image)

version 0.5 has support for pre-trimming off the input pages. These are specifed in terms of % of the page from the left/right/top/bottom that you want to trim away. They can also be specified independently for even/odd pages. Also unicode metadata is now supported.

version 0.6 adds support for unicode filenames (previously only unicode metadata was supported), cbr/rar files, better error reporting and assorted bug fixes.

version 0.7 adds preview of pre trimming, catrom filter for resizing images, png files are now generated for embedding in lrf files. Converted over to using threads - so batching is a lot more improved.

version 0.8 adds support for Table of Contents in pdf files. It is possible to preview output images to test out various settings. An experimental linux build (built on Ubuntu) has been added. Improved threading support so processing should be faster. Changed default colors to 4 to reduce frequency of file size questions. Added more filtering options and better dithering so images should look a lot better.
Fixed threading bug that causes it to hang occasionally under linux. Fixed TOC. Erosion now works sensibly with small images (like those of comic strips). Can also go down to 2 colors.

version 0.9 Added padding (in pixels) when using runpages. Fixed crash bug when generating toc. Added back ghostscript support. Added option to disable generation of TOC.

version 0.99 Fixed Librie compatibility (maybe). Adjustable image size on output. Output zip files (if extension is cbz or zip). Added RGB output. Added post stretch of image to fit page. Added options to fit image by height/width/2*height etc. Sort files in rar and zip.

Fixed bug in dos and linux commandline versions that ignored toc. Fixed metadata bug.

I've broken out the dos commandline, windows gui and linux commandline versions into separate files to better track usage.

JSWolf
08-22-2007, 05:36 PM
Nice job. Looks good in Connect on my computer. Now to download to the reader for a look.

thinredline
08-23-2007, 08:35 PM
I am not a programmer. How does this tool work? Is it a windows application? or it can only be used on linux?

thinredline
08-23-2007, 09:24 PM
Load the attached samples to my reader. The result is excellent! Very sharp edge. Can you tell me how to use the tool in detail? TIA.:2thumbsup

JSWolf
08-24-2007, 12:24 AM
Hi, I've taken some of the ideas of existing tools along with a few refinements of my own to code this one up. I've attached a few sample conversions to get an idea of what the tool can do.
The attached ZIP file is just source code. Would you mind attaching actual compiled code that's ready to run. Also, what operating systems does this run under? Thanks!

cacapee
08-24-2007, 01:35 AM
I've attached the executable. It required ghostscript to be installed (I've tested it with the latest version). More improvements - it now reads in djvu files. Also it does auto cropping of the left and right edges. Please try it out and provide feedback.

I purposefully designed it so that it is cross platform. However for now it's windows only.

Thanks!

thinredline
08-24-2007, 09:24 AM
I know it's a bit difficult, but can you write a GUI to make it easier to process files? TIA.

threetwo
08-24-2007, 02:31 PM
Your program is awesome!! Thank you for building and sharing it. I have been using the other rasterizers with varying frustration at converting hundreds of math-intensive papers. Your program worked right off the bat, with perfect results. I bought the Sony reader knowing it wasn't going to be perfect for what I wanted, and was willing to live with it ... but now it might actually be better than I expected!

JSWolf
08-24-2007, 02:40 PM
I've attached the executable. It required ghostscript to be installed (I've tested it with the latest version). More improvements - it now reads in djvu files. Also it does auto cropping of the left and right edges. Please try it out and provide feedback.

I purposefully designed it so that it is cross platform. However for now it's windows only.

Thanks!
Thank you very much!

thinredline
08-24-2007, 03:44 PM
Sometimes I got right side of the content being cut off a bit. How do I adjust the parameters to avoid this? TIA.

Nate the great
08-24-2007, 03:56 PM
I tried it and get the following error:

Can't load DLL, LoadLibrary error code 0



I've looked at your samples, and they make me really really want to get your program working

ddavtian
08-24-2007, 04:19 PM
I tried it and get the following error:

Can't load DLL, LoadLibrary error code 0

After this error I downloaded and installed ghostscript (google it), works fine. Thanks for a good application.

David

Nate the great
08-24-2007, 04:46 PM
I tried it and get the following error:

Can't load DLL, LoadLibrary error code 0


I've looked at your samples, and they make me really really want to get your program working


It works now. Here is where you can download Ghostscript:


http://sourceforge.net/project/showfiles.php?group_id=1897&package_id=108733&release_id=529280

Nate the great
08-24-2007, 11:30 PM
I have a request. A couple of my free PDFs have tiny marks right on the top and bottom edge of the page. This prevents it from being cropped. The one PDF I have done so far is about twice the size it should be. Can you figure out a way to fix it?

This is a really nice tool.




An example should be attached.

Fallen angel
08-25-2007, 04:59 AM
I'm sorry, but I couldn't understand how is this application working ... could somebody give me some instructions? :stupid:

The only thing happening when I'm running the pdflrf.exe file, is opening a dos window that closes within a sec ... :blink: :unafraid:

thinredline
08-25-2007, 07:19 AM
I'm sorry, but I couldn't understand how is this application working ... could somebody give me some instructions? :stupid:

The only thing happening when I'm running the pdflrf.exe file, is opening a dos window that closes within a sec ... :blink: :unafraid:

I borrow the instruction written by callall posted on a Chinese forum below (http://www.hi-pda.com/forum/viewthread.php?tid=373806&extra=page%3D1):

Follow the following steps:
1. save and unzip the file 'pdflrf.zip' to a folder such as "c:\pdflrf".
2. save a copy of the pdf file you want to convert to that folder. For example a file namded 'bullshit.pdf'.
3. click start-> run... and type in 'cmd'. You are in DOS mode
4. Type in 'cd c:\pdflrf'
5. Type in 'pdflrf -i bullshit.pdf -o stillbullshit.lrf -p -v -s'
6. you should have a file stillbullshit.lrf in that folder.
7. try other options and have fun...

phookoo
08-25-2007, 07:27 AM
I second that request - I'm not a developer or programmer, but I am reasonably tech-capable.

Any chance of some instructions for non-programmers (idiot-proof would be even better :D )

thinredline
08-25-2007, 07:56 AM
Here is the instruciton borrowed from a Chinese forum

Follow the following steps:
1. save and unzip the file 'pdflrf.zip' to a folder such as "c:\pdflrf".
2. save a copy of the pdf file you want to convert to that folder. For example a file namded 'bullshit.pdf'.
3. click start-> run... and type in 'cmd'. You are in DOS mode
4. Type in 'cd c:\pdflrf'
5. Type in 'pdflrf -i bullshit.pdf -o stillbullshit.lrf -p -v -s'
6. you should have a file stillbullshit.lrf in that folder.
7. try other options and have fun...

astra
08-25-2007, 11:24 AM
Here is the instruciton borrowed from a Chinese forum

Follow the following steps:
1. save and unzip the file 'pdflrf.zip' to a folder such as "c:\pdflrf".
2. save a copy of the pdf file you want to convert to that folder. For example a file namded 'bullshit.pdf'.
3. click start-> run... and type in 'cmd'. You are in DOS mode
4. Type in 'cd c:\pdflrf'
5. Type in 'pdflrf -i bullshit.pdf -o stillbullshit.lrf -p -v -s'
6. you should have a file stillbullshit.lrf in that folder.
7. try other options and have fun...


'pdflrf -i bullshit.pdf -o stillbullshit.lrf -p -v -s'

Doesn't work. The resulting file is 0 kb :(


E:\>cd pdflrf

E:\pdflrf>dir

Directory of E:\pdflrf

25/08/2007 16:22 <DIR> .
25/08/2007 16:22 <DIR> ..
30/01/2006 01:30 522,856 btp.pdf
25/08/2007 16:07 29,696 Follow the following steps.doc
24/08/2007 00:46 1,007,616 pdflrf.exe
3 File(s) 1,560,168 bytes
2 Dir(s) 30,384,414,720 bytes free

E:\pdflrf>pdflrf -i btp.pdf -o btl.lrf -p -v -s

E:\pdflrf>

thinredline
08-25-2007, 12:21 PM
try this:

pdflrf -ibullshit.pdf -obullshit.lrf

This should work. There is no space between "-i" and "bullshit.pdf"

cacapee
08-25-2007, 02:27 PM
Uploaded version 0.2. This version doesn't need ghostscript to be installed. Instead poppler (xpdf) is builtin and is used by default. Also preliminary support for cbz files (png, gif, jpeg) has been added. cbr will follow shortly.

Let me know if you have any questions, feedback etc.

mdhuang
08-25-2007, 04:05 PM
Great work! A better replacement for RasterFarian . Thanks for your contribution!

astra
08-25-2007, 04:16 PM
try this:

pdflrf -ibullshit.pdf -obullshit.lrf

This should work. There is no space between "-i" and "bullshit.pdf"

I didn't realise I needed ghostscript :(
Stupid me.

I am going to try again but with newer version which doesn't need ghostscript.

astra
08-25-2007, 06:23 PM
It works now :)

Evene the first command works.

Nate the great
08-25-2007, 10:51 PM
I have a request. A couple of my free PDFs have tiny marks right on the top and bottom edge of the page. This prevents it from being cropped. The one PDF I have done so far is about twice the size it should be. Can you figure out a way to fix it?

This is a really nice tool.



Can you post the source? I want to do a couple one-off mods in order to convert some of my PDFs.

I have another request. Are you able to access the title, publisher, author, etc of the PDF? Can you set that data to be written into the LRF file by default (so it doesn't have to be done by hand)?


Again, this is a really nice program.

megacoupe
08-26-2007, 03:14 AM
Well, after messing with this program for a few conversions, I believe it to be the best PDF to LRF converter right now (I've only tried Book Designer and RasterFarian, but those are the big ones, right?).

One of my few complaints with RasterFarian was that no matter how you cropped your PDF, it would still create blank space to the right side of each page to force it into 4:3 proportions. Of course, RasterFarian had to do this so that all the text on a page would fit on your Reader screen.

The innovation that PDFLRF makes is that instead of adding blank space to the right of the page, it enlarges the text so that it fills the screen. Some lines end up being cut off at the bottom of the page because of the text enlargement but PDFLRF simply places it on the next page. You end up with half a blank page on every other page, but at least the text is legible! I don't mind that my 246 page book now shows 473 pages since it's much easier on the eyes.

In case my description above doesn't make any sense, I've take some screenshots.

http://i131.photobucket.com/albums/p318/DrDoombot/RasterConversion.jpg

The above is a conversion done with RasterFarian. Notice the the blank space on the right side, added by RasterFarian even though I cropped the page in Acrobat.

http://i131.photobucket.com/albums/p318/DrDoombot/PDFLRFConversion.jpg

The above is a conversion of the same exact page using PDFLRF. As you can see, it now takes up two pages, but the letters are much larger, making it easier to read comfortably on the Reader screen.

Despite the fact that there is no GUI, this program is pretty damn great. The only complaint I have is the file size. The book I tested is a 7mb PDF; after a RasterFarian conversion, the same file would be a 8mb LRF. After a PDFLRF conversion, the book is a whopping 18MB. That could seriously reduce the number of books you can keep on your Reader at one time.

Nate the Great: You can set the title, author, etc. Type "pdflrf -h" into the command prompt and you will see a list of the metadata available for this program.

Fallen angel
08-26-2007, 05:32 AM
Thank you really really much both for the instructions and the program! :smash: It is awesome!! :wink::grin2:

yota
08-26-2007, 06:32 AM
Uploaded version 0.2. This version doesn't need ghostscript to be installed. Instead poppler (xpdf) is builtin and is used by default. Also preliminary support for cbz files (png, gif, jpeg) has been added. cbr will follow shortly.

Let me know if you have any questions, feedback etc.

Hello cacapee,

I have been using pdfread but your smartcut feature is cool!! and also the -r option. It always annoyed me when few lines overlapped when turning pages with pdfread.

But I'm on a Mac OSX. Could you post the source again please? I missed to download that when you had it posted.

Thanks.

Paviko
08-26-2007, 06:52 AM
Hi cacapee!

Thank you for the best PDF to LRF converter.

I've noticed two things, don't know if these are bugs:

- with -p and --nocrop options the pdf page is not filling the whole screen (looks like it has not been fit to cover whole screen). There is lots of free space at the right and at the bottom margins.

- by default color parameter has 16 shades, but PRS can display only 4. The output on the computer screen looks beautifully with 16 shades, but Sony PRS-500 is doing dithering to render this and I think it's causing some quality degrading. When I choose 4 shades I end up with some pages that are not white but gray. Looks like the color map doesn't have pure white and black, but algorithm is choosing the best 4 shades that match original page. I think that Sony Reader is not able to display at once 4 shades from e.g. 256 palette, but is only able to display exactly 4 shades, which means that dithering here also apply even if output image has only 4 colors in the map.

Regards

phookoo
08-26-2007, 06:57 AM
OK, clearly I'm a bigger dumbass than anyone here, but as I don't spend ANY time in DOS ever, using it is more than a little confusing.

No matter what I try, wherever I put the pdflrf exe, I try typing cd c:\pdflrf into the DOS prompt & hit return & it just returns the message -

C:\Documents and Settings\Administrator\My Documents>cd c:\pdflrf
The system cannot find the path specified

Bear in mind the 'C:\Documents and Settings\Administrator\My Documents>' is already there & I can't overtype it. The exe is currently residing directly in the C drive, but I have also tried putting it in a new folder by itself & using that pathname with no more luck.

IDIOT-PROOF guide anyone? :tired::pray::blink:

Nate the great
08-26-2007, 07:25 AM
Nate the Great: You can set the title, author, etc. Type "pdflrf -h" into the command prompt and you will see a list of the metadata available for this program.

I know this. I want this program to do it automatically.

yota
08-26-2007, 07:40 AM
OK, clearly I'm a bigger dumbass than anyone here, but as I don't spend ANY time in DOS ever, using it is more than a little confusing.

No matter what I try, wherever I put the pdflrf exe, I try typing cd c:\pdflrf into the DOS prompt & hit return & it just returns the message -

C:\Documents and Settings\Administrator\My Documents>cd c:\pdflrf
The system cannot find the path specified

Bear in mind the 'C:\Documents and Settings\Administrator\My Documents>' is already there & I can't overtype it. The exe is currently residing directly in the C drive, but I have also tried putting it in a new folder by itself & using that pathname with no more luck.



Maybe I can help you a little.

Your current directory or "folder" is
C:\Documents and Settings\Administrator\My Documents

The command you typed "cd c:\pdflrf" means that you want to "Change Directory" i.e. move to a folder inside your C drive named "pdflrf".

But you're getting an error that says that there is no such "folder".

Are you sure you have a folder named "pdflrf" inside your C drive?

Probably the easiest way is to move the "pdflrf.exe" file and the pdf file you want to convert into your "My Documents" since your are now there.
Then you can just type

pdflrf -i foo.pdf -o foo.lrf -s

at the prompt where foo is the name of your pdf file.

or try typing

pdflrf -h

to display the help message!

mdhuang
08-26-2007, 09:01 AM
This thread should be made sticky!
The best PDF converter, period.
Only drawback now I see is the file size is huge compared to original. But it is not a real problem, since nowadays you can pick up a 1GB SD card for less than $10.
Thanks for the great work!

cacapee
08-26-2007, 01:25 PM
megacoupe, thanks for the screenshots.

I have another request. Are you able to access the title, publisher, author, etc of the PDF? Can you set that data to be written into the LRF file by default (so it doesn't have to be done by hand)?

Will be included in the next release

- with -p and --nocrop options the pdf page is not filling the whole screen (looks like it has not been fit to cover whole screen). There is lots of free space at the right and at the bottom margins.

Be sure to use -rs. --nocrop will not crop whitespace on the sides

IDIOT-PROOF guide anyone?

A gui is in the works.

Also be sure to use --colors 16 or --colors 4 along with -rs to further reduce file size.

Source code is still not ready to be released. As you can see there are no external dependencies so it is a huge wad of code. A list of the libraries used are gd, jpeg, png, zlib, unrar, poppler, freetype, libdjvu

cfishy
08-26-2007, 05:29 PM
Hi cacapee, Awesome job!

I just got done complaining about PDF support on Engadget and decided to come back and check things out. I tried PDFLRF and it's so awesome that I can't wait to convert all my pdf files. I cannot wait! so I rushed to create a little Windows utility to provide a drag and drop batch interface. Hope you don't mind I just whip up something quickly so we can all jump on it while waiting for your GUI to complete.

My GUI wrapper for PDFLRF is just called PDFLRF_GUI and it can be found here: http://www.carolchen.com/code/

(It's a rapid application so i will upload newer version whenever I get a chance to improve it. I know it's primitive but it gets some stuff done. comments welcome.)

cfishy
08-26-2007, 05:36 PM
A gui is in the works.


(what happened to my previous post?)
Awesome job! If you forgive me I cannot wait to start converting so i wrote a little GUI wrapper. It's primitive but it lets you drag and drop files, hit "Run" and be done with it.

It can be found here: http://www.carolchen.com/code/

thinredline
08-26-2007, 08:14 PM
Looks like this tool will become the No.1 choice of PDF converter.

I also have a request: could the batch processing function be built into the tool later? This will save user a lot of times. Thanks.

NatCh
08-27-2007, 12:22 AM
(what happened to my previous post?)Sorry about that, cfishy -- the "spam filter" caught it until one of us editors could look it over. We get a lot of posts selling cell phones and other dreck, and the filter catches them. Sometimes it gets a bit overzealous and grabs legitimate posts too. However, it should stop happening now that you've posted a few times, since one of its parameters (as I understand it) is the number of posts a user has made. Sorry for the mix-up. :sad:

cacapee
08-27-2007, 04:42 AM
Uploaded version 0.3 which has a windows gui. --portrait has been removed and replaced by --rotation. Default is portrait now. Also metadata from pdf files are automatically read

thinredline
08-27-2007, 01:12 PM
Could someone tell me what is the "erode" about? How should I choose this parameter?

cacapee
08-27-2007, 01:15 PM
erode is similar to dilate in rasterfarian. It thickens the black regions proportional to the values specified. http://docs.gimp.org/2.2/en/plug-in-dilate.html

cacapee
08-27-2007, 03:52 PM
Check out http://docs.gimp.org/2.2/en/plug-in-dilate.html

Kilarney
08-27-2007, 11:09 PM
Okay... time for an ignorant newbie question.

Does this program actually help fit PDFs to the Reader screen?
I downloaded this file as a test:
http://www.targeting-the-kinome.org/images/Gasser2.pdf
I converted the first three pages.
What I saw was an exact replica of the PDF. So why convert in the first place? I was hoping that this program would extract the text and make it easier to read on the Reader screen. Am I missing something? Can you state the font size when converting?

Sorry to be such a newb, but can someone explain to me the benefits of this program? Is it just a matter of adjusting contrast?

Thanks.

Gowry
08-28-2007, 12:13 AM
All I can say is wow!

Thank you for this great piece of software. Where's your paypal button?

I never thought I'd be able to read my programming books on this device after trying to convert some of them. If I converted to word then lrf, the formatting and graphics were all messed up or it was boxed, and any attempt to convert straight to lrf just made it unreadable...but after my first conversion with this, I can jump for joy!

Now if I could just find Terry Goodkind for sale as pdf...

Thanks again to both developers. Both the converter and GUI are wonderful!

Gowry

PS. I had a hard time figuring out if it was done at the end. The progress bar was full, but it wasn't finished...I could tell by my CPU meter. I think it'd be nice if the title bar maybe said PdfLrf - XX% and then changed to PdfLrf - Done! when finished. Just a suggestion :)

cfishy
08-28-2007, 09:02 AM
Okay... time for an ignorant newbie question.
Sorry to be such a newb, but can someone explain to me the benefits of this program? Is it just a matter of adjusting contrast?


you will see the difference once you load them into your reader.

paulkbiba
08-28-2007, 10:32 AM
This is very impressive. Thank you for your work. Now I can read my Wowio books on the Reader. They were pretty much illegible in PDF format.

NatCh
08-28-2007, 11:17 AM
Where's your paypal button?He may not have a PayPal button, but you can still say "thank you" with karma! :nice:

paulkbiba
08-28-2007, 11:24 AM
Done!

cacapee
08-28-2007, 04:37 PM
version 0.4 adds drag/drop and batch processing (drag mutiple files over) + other small refinements (not specifying smartcut/runpages/splitpage scales a page to fit in an image)

thinredline
08-28-2007, 08:15 PM
cacapee, you are the best! The sales of Sony ereader could have been doubled if they bundle the ereader with your software.

pocketpc
08-29-2007, 12:11 AM
Great software, cacapee! :2thumbsup
When I was reading documents on my sony reader, sometimes I'd like to view it in a single page so I can get more content, but sometimes I think it's clear and easier to read if I split it into 2 pages.
So I have a idea, can you make a new mode, say "double portrait", the Height will be (600-46)*2=1108 or something like that, and the width will be 1108/1.29=858.
This mode will increase the size of final LRF file, but we can use the rotate function of sony reader itself to switch in different view rotation.
And thanks again for the awesome work.

cacapee
08-29-2007, 12:59 AM
Try out pdflrf's landscape mode which does a similar thing. Sony's landscape mode sucks for multiple reasons.

pocketpc
08-29-2007, 01:11 AM
pdflrf's landscape mode will divide the images into 2 pages. we can switch it back to portrait mode and sometime I think switch between those 2 modes is very convenient for readers.
I mean if pdflrf can make the output file in large size so we can use a simple way to switch between portrait and landscape mode as Sony's landscape function did, it's good.
And why you said Sony's landscape mode sucks? I feel it's ok for me.

cacapee
08-29-2007, 01:23 AM
Currently pdflrf generates images that map one to one with the visible pixels in the ereader. The effect you are looking for would mean that the reader would be doing the interpolation and consequently doing a bad job of it. You can test it out by viewing pdflrf's landscape output in portrait mode. Also Sony's landscape mode greys out the bottom and top in landscape mode which is very distracting.

Rasterfarian/PDFRead do what you want in landscape mode. It obviously does not work very well.

pocketpc
08-29-2007, 02:07 AM
So the only way for me maybe is convert the documents 2 times, 1 use portrait mode and 1 use landscape mode.
this way will bring some problems for example reading history and bookmarks, but it seems is the last solution.
Anyway, thanks again Cacapee, your work is really great!

Nate the great
08-29-2007, 08:37 AM
Have you thought about adding an option to output the file in HTML? There are a number of other ebook devices with a similar sized screen that don't read LRF.

This is a very good tool. It would be great if more people could use it.

thinredline
08-29-2007, 09:46 AM
If cacapee can add TOC editing function, this software will become perfect. Are you planning to make it support unicode?

ddavtian
08-29-2007, 11:27 AM
If cacapee can add TOC editing function, this software will become perfect. Are you planning to make it support unicode?

I vote for TOC too, if it's doable.
About unicode: does this tool care for fonts? I thought it was treating pages as images, so you could have anything there.

ddavtian
08-29-2007, 11:30 AM
Done!

Done!

thinredline
08-29-2007, 12:17 PM
I vote for TOC too, if it's doable.
About unicode: does this tool care for fonts? I thought it was treating pages as images, so you could have anything there.

What I mean if the automatically extracted metadata, such as title, publisher, etc. supports unicode.

cacapee
08-29-2007, 12:27 PM
Can you detail out the TOC editing feature?

thinredline
08-29-2007, 01:40 PM
Can you detail out the TOC editing feature?

If the PDF file already has TOC, can you extract the info and let user to edit it before building it into lrf file?

RWood
08-29-2007, 02:39 PM
This software has already earned a place in my bag of tricks for the Reader. The other major items in the bag are Stingo's Word Macro, libprs500, and BookDesigner. Yes, there are areas of overlap. These tools make my enjoyment of the Reader much better than it would have been without these tools.

As someone else said in another thread (slightly changed for the season), "Karma, the perfect late summer gift." It feels good to give or get and it costs you nothing.

slayda
08-29-2007, 03:25 PM
Currently pdflrf generates images that map one to one with the visible pixels in the ereader.

cacapee, from this comment, I infer that your program is providing an output image of text, not true text that could be edited. Is this correct?

timestory
08-29-2007, 06:00 PM
If the PDF file already has TOC, can you extract the info and let user to edit it before building it into lrf file?

Yes, if PDFlrfl can provide this feature, that will be great, then we can navigate the book from the Table of contents. Which can compensate the poor navigation only provided by percentage scale.

If it could be implemented, I assume that user cannot edit it too much even before building it into lrf, while after lrf file is built, it is harder to insert those link into lrf.

If the information PDF's TOC can be read, a hierachy of structure of the book could be built using <a href="#n">toc tile1</a> where the number links to page n.

Is it possible borrow the idea(or implementation) from the tool Html2lrf. Generate one link (the target is corresponding page) for each item in PDF TOC, and put those links in first pages of the book.

mosteo
08-29-2007, 06:36 PM
Wow, just discovered this and it's amazing. It actually allows reading scientific two-column papers in the e-reader! And, incidentally, it works in linux using wine.

volwrath
08-29-2007, 08:26 PM
I just wanted to reiterate what a great tool this is. I was actually able to get usable results from a pdf newsletter that I tried all sorts of ways and couldnt read. Great job!

godel10
08-30-2007, 05:14 AM
Great (wonderful, amazing, etc) work. I never expected to be able to read scientific documents in the Librie (I am using the Librie, not the reader), but it works great.

I have one question. I do not understand the meaning of the "-r" (runpages) option. What is this for?

And another question. Why the default erode option is 3? Is there any reason why it is not convenient to consider 0 (at least to me the text seems easier to be read with this parameter)?

Alan_S
08-30-2007, 10:35 AM
Well, just to say, I just downloaded version 0,4 and it converted one of my PDFs.

First to tell, it took a long, very long time (original file is about 6MB and end file is about 20MB) to do the job.

But, the most important thing is, the job is done with flying colors. Pictures are there, everything looks perfect. Only thing is my err, I didn't corrected overlapping so some info overlaps. Next time, I know what to do.

Excellent program. Really, I can say that with this program, there's no problem with PDFs anymore. At least, I hope it would do all job like this one. Excellent job.

Thank you very much.

cszhy
08-30-2007, 08:45 PM
Buddy ,it's a great tool , actually it save the sony reader!!!
before it's born , reader can do nothing on pdf!!!!
The only problem I met is the book title and author field can not be typed in Chinese
It would be perfect if this function is added
thanks a lot

JSWolf
08-30-2007, 08:54 PM
Can you detail out the TOC editing feature?
What I was thinking however is if the original PDF has a ToC, just convert that to a Sony LRF native ToC. That would be fine for me.

Fallen angel
08-31-2007, 06:41 AM
The only problem I met is the book title and author field can not be typed in Chinese
It would be perfect if this function is added
thanks a lot

Neither in Greek, but I'm using what we call "greeklish". :shy:

cacapee
08-31-2007, 12:30 PM
version 0.5 has support for pre-trimming off the input pages. These are specifed in terms of % of the page from the left/right/top/bottom that you want to trim away. They can also be specified independently for even/odd pages. Also unicode metadata is now supported.

thinredline
08-31-2007, 01:46 PM
Thanks cacapee. The unicode support for metadata is perfect. As far as the support for pre-trimming off the input pages concerned, I don't know how difficult to put preview of pre-trimming function into the tool. What I like Acrobat and Nitro PDF is that they support the preview of cropping so that I know instantly how much I need to trim. In many cases, the estimation of how much need to be trimmed may not be accurate. Just a thought. It's not a big deal.

MatthewTheRaven
08-31-2007, 01:52 PM
version 0.5 has support for pre-trimming off the input pages. These are specifed in terms of % of the page from the left/right/top/bottom that you want to trim away. They can also be specified independently for even/odd pages. Also unicode metadata is now supported.

This is a tremendous tool! This is the first time I've been able to make PDFs easily readable on the Reader. This really opens it up for me.

I think the pre-trimming will help a lot also, but as a suggestion, having a preview window that can show an even and an odd page will make this 100% more useful. I've been using pdcrop from pdf-tools to trim down the pages before using your tool to convert, but pdcrop is nearly instantaneous, so I can tweak and play with the sizes before worrying about converting to lrf.

Unfortunately since a full conversion takes a lot more time, setting the wrong cropping in pdflrf is a far more time consuming issue. I realize that you could just convert a single page or two and it will be faster, but a preview would make it just that much more useful.

Also, I haven't checked 0.5, but I know with 0.4 that it doesn't seem to be multithreaded. Doing that will definitely help the encoding time for a lot of users. Having two threads, perhaps one for odd pages and one for even pages, would really speed up the process. A third thread could assemble the processed pages into the final file in the correct order.

Anyway, even in its early development, this is probably the best software out there for making the Reader what it should be. Great work!

Matt

cacapee
08-31-2007, 01:55 PM
Preview is in the plans. One suggestion is to limit the number of pages to two, so you can see the effect of pretrimming. Use the View button. That makes the turnaround much faster. Also you only need to get results that trim off non-whitespace borders, lines etc. pdflrf's whitespace cropping will take care of the rest.

Multithreading is also in the plans.

starship
08-31-2007, 02:38 PM
Buddy, it's a great job! I've been following your progress since 0.1. You've changed the history of Sony Reader!

niuniuniu
08-31-2007, 09:35 PM
I have some books that have two columns and read from RIGHT to LEFT. If I use two the column option. the page number will 2, 1, 4, 3, 6, 5, .... Could you help correct this?

JSWolf
08-31-2007, 09:57 PM
Very nice improvement. I'm trying portrait mode now with this Wowio book. Seems to look good enough in Connect's preview to be possible to read comfortably that way.

mudkiller
09-01-2007, 12:53 AM
I registered just to say thanks! Great Job cacapee!

Fallen angel
09-01-2007, 06:38 AM
The latest version doesn't work for me. It says "Failed to create LRF file" ... :tired:

thinredline
09-01-2007, 04:00 PM
Yeah, the same error message for me.

cacapee
09-01-2007, 04:33 PM
Does it happen with the commandline too? Can you post the settings and/or the file?

thinredline
09-01-2007, 08:47 PM
Does it happen with the commandline too? Can you post the settings and/or the file?

It looks like relating to the file name in version 0.5. I am trying to convert a PDF file with file name other than English and error happens. If I change the file name into English, there is no error. The strange thing is that in 0.4 there is no such an error.

cacapee
09-01-2007, 09:03 PM
What I need is a list of steps to reproduce this bug here. That will make it much easier.

Thanks

Edit: Are you sure you could open non-ascii filenames in 0.4? As far as I know Unicode support is only for metadata. filenames are tricky cause the libraries only take in ascii filenames.

coredog64
09-02-2007, 12:22 AM
I'm seeing a bug with very large PDF files (1000+ pages): I'll get a run of "blank" pages.

I tried the "first/last" page arguments and it still happens, although in a different location.


C:\Downloads\stage>pdflrf -t "MOSS 2007 Part 1" -a "Bill English" -c 4 -f 42 -l 185 -rs -i 622821eBook.pdf -o moss2007_1
.lrf
Processing page: 42, image: 1, Progress: 0
Processing page: 42, image: 1, Progress: 0
Processing page: 43, image: 2, Progress: 1
Processing page: 43, image: 3, Progress: 1
Processing page: 43, image: 4, Progress: 1
Processing page: 43, image: 5, Progress: 1
...
Processing page: 43, image: 34, Progress: 1
Processing page: 43, image: 35, Progress: 1
Processing page: 43, image: 36, Progress: 1
Processing page: 43, image: 36, Progress: 1
Processing page: 44, image: 37, Progress: 2


Still, this is an outstanding tool. I was using Acrobat Pro to crop the books but this does a much better job.

cacapee
09-02-2007, 04:17 AM
Heh, There's a tiny vertical gray rectangle on one of the pages. This rectangle is being drawn across multiple images. I'll fix this by adding a minimum crop width which I've been meaning to do anyway. In the meantime get rid of pages like page 43

Fallen angel
09-02-2007, 04:57 AM
Can you post the settings and/or the file?

I'm using the comic-landscape mode, without changing anything in the settings, not even the crop option (maybe that's the problem?). Also I would like to ask if .cbr support is going to be added.

As far the file, it happens with every file I' ve tried.

cacapee
09-02-2007, 12:01 PM
Weird, It works here. Can you post/pm me a link to the relevant file.

balok
09-02-2007, 05:39 PM
Wow, just discovered this and it's amazing. It actually allows reading scientific two-column papers in the e-reader! And, incidentally, it works in linux using wine.

I'm looking forward to a native linux version. A tool this good should be multiplatform. Maybe the author would consider releasing the source for porting?

Fallen angel
09-03-2007, 05:11 AM
Weird, It works here. Can you post/pm me a link to the relevant file.

Sure ...

slav
09-03-2007, 07:17 AM
Hi,

The tool is great!!! I love the output :)

but It seems to crash from time to time especially when my files have long names with dots and other things like that

oh and one more (just UI issue) - if your filename is long you can't go to the end of text in filename textbox for output filename.

Thanx again for a great tool :2thumbsup

Alexander Turcic
09-03-2007, 07:56 AM
One thing I noticed: If you close the parent GUI task (pdflrfwin.exe) while converting, the child task (pdflrf.exe) remains active.

cacapee
09-04-2007, 03:46 AM
version 0.6 adds support for unicode filenames (previously only unicode metadata was supported), cbr/rar files, better error reporting and assorted bug fixes.

Fallen angel
09-04-2007, 08:04 AM
Thank you cacapee! It works great now ...!! :2thumbsup

Alan_S
09-04-2007, 08:32 AM
Preview is in the plans. One suggestion is to limit the number of pages to two, so you can see the effect of pretrimming. Use the View button. That makes the turnaround much faster. Also you only need to get results that trim off non-whitespace borders, lines etc. pdflrf's whitespace cropping will take care of the rest.

Multithreading is also in the plans.

First of all, you're one of the Great Ones.

Just to suggest, if you add preview of two pages, don't forget to include option to choose which two pages (they can be side by side, but it's always good to choose some in the middle, or specially problematic ones, rather than first two).

It's just the reminder, probably you'd do that anyway, but I felt obliged to post it.

Thank you man for your great SW. It's not small (except in size on HDD) one. Excellent.

slav
09-04-2007, 02:10 PM
First of all, you're one of the Great Ones.

Just to suggest, if you add preview of two pages, don't forget to include option to choose which two pages (they can be side by side, but it's always good to choose some in the middle, or specially problematic ones, rather than first two).

It's just the reminder, probably you'd do that anyway, but I felt obliged to post it.

Thank you man for your great SW. It's not small (except in size on HDD) one. Excellent.

Isn't it possible to choose pages you want to covert by putting page numbers from and to??

version 0.6 adds support for unicode filenames (previously only unicode metadata was supported), cbr/rar files, better error reporting and assorted bug fixes.

It works cool now! Even with long filenames containing non letter characters.

Thanx dude :)

JSWolf
09-04-2007, 04:53 PM
Isn't it possible to choose pages you want to covert by putting page numbers from and to??
Yes it is.

jaffer1979
09-05-2007, 01:55 PM
hey cacapee I posted those sample pages here http://www.mobileread.com/forums/showthread.php?t=10402&page=11 that you asked for. I hope you can figure out what going wonky on my I surely would appreciate it.

kstoertz
09-05-2007, 01:56 PM
The converter is great - even better than RasterFarian.

cacapee,

if you are able to fix the "cropping" problem that jaffer1979 described in the RasterFarian thread then you are a real hero!

jaffer1979
09-05-2007, 02:18 PM
I agree this would be an awesome utility. The lack of a ui drove me nuts about rasterfarian this program is simple to use! By the way I posted the sample pages in pdf format in the other thread I had to zip them though to beat the 5mb limit.

nesagwa
09-06-2007, 02:02 PM
I have a suggestion. I converted a few magazines and comics over to try this out and they looked pretty good (even in fit to screen mode). But when I tried the landscape mode there is always a little bit of the page left so it goes to 3 landscape pages for 1 page. I think the smart cut option is supposed to help, but is there a way you can give an option to force squish a single page onto just 2 landscape pages on the reader?

BettyE
09-06-2007, 02:35 PM
This is fantastic! Can't thank you enough, cacapee!

Betty

slav
09-06-2007, 03:30 PM
Hi,

I've noticed that for some of my PDF's when converted to LRF have last line on page cut between both pages and some letters slightly overlap.

If you will have any idea I will really appreciate it (maybe one of the settings needs to be adjusted - I'm using default ones and version 0.6).

I'm attaching pdf and lrf files.

Thanx!

Kilarney
09-06-2007, 03:32 PM
Can somebody tell a dummy like me what the ideal settings are to convert your average PDF book to LRF?

Are you converting books into landscape format? What other settings should be tweaked from the default settings?

cacapee
09-06-2007, 04:39 PM
is there a way you can give an option to force squish a single page onto just 2 landscape pages on the reader?

I'll look into adding that
I've noticed that for some of my PDF's when converted to LRF have last line on page cut between both pages and some letters slightly overlap.
Ah, The gray background is causing smartcut not to work corectly (which currently assumes a white background)

Kilarney, try the different built-in profiles. You can tweak them later.

altor
09-06-2007, 09:15 PM
thanks, cool.

Now how would 1 go to desired page? Is there a way to do it on the reader itself?

slav
09-07-2007, 09:11 AM
Ah, The gray background is causing smartcut not to work corectly (which currently assumes a white background)

Thanx for reply - It's good to know what's causing this, so I can try to avoid this when possible :grin2:

jaffer1979
09-08-2007, 05:11 AM
Any idea when your next version is coming out?

cacapee
09-09-2007, 04:26 AM
version 0.7 adds preview of pre trimming, catrom filter for resizing images, png files are now generated for embedding in lrf files. Converted over to using threads - so batching is a lot more improved.

slav
09-09-2007, 04:53 AM
Hi cacapee,

Just few small things:
- program still displays version 0.6 instead 0.7
- clicking OK in preview window doesn't seem to do anything
- I don't quite understand the preview window since it seems to display my original pdf

and one new thing - I was just wondering if it wouldn't be useful to add label / textbox with a cmd line command which is generated by GUI options since sometimes you might want to do some tests in UI and then do a batch converting...

Thanx :)

jaffer1979
09-09-2007, 09:20 AM
Wow! Thanks a million Cacapee! This new version works like a charm for me now. My pages split exactly where they are supposed to! Only wish the files sized could be a little smaller but still even that isn't too bad. I just switched to a 2gb card to compensate. Great job awesome utility! YOU ARE THE MAN!

By the way adding the ability to use 32 and 64 colors is NICE pictures no longer look patchy!

jaffer1979
09-09-2007, 09:35 AM
Could you tell me what the erode feture does, and how the number setting affects the final output?

cacapee
09-09-2007, 12:08 PM
Check out http://docs.gimp.org/2.2/en/plug-in-dilate.html

Currently preview mode displays the Trim rectangle.

File sizes are directly correlated with the number of colors (this is true for any tool that generates lossless images)

jaffer1979
09-09-2007, 12:49 PM
Cool thanks!

jaffer1979
09-09-2007, 01:36 PM
Well I have been whining about rasterfarian not being available for a while now and probably got on more than one nerve doing it but now that it has been re-released (http://hightech.afmag.net/rasterfarian-for-sony-reader-copyright-problems-download-the-patched-rasterfarian-version-here.html#comment-185) and I get to use it side by side with this utility I realize just how nice pdflrf is! I will still be keeping rasterfarian in my toolbox but I think pdflrf is staying as my main pdf conversion tool!

athlonkmf
09-09-2007, 02:09 PM
version 0.7 adds preview of pre trimming, catrom filter for resizing images, png files are now generated for embedding in lrf files. Converted over to using threads - so batching is a lot more improved.

Happy that you took my suggestions in heart. Are you using some pngoptimizer to squeeze that last bit of size out of the png's?

I've noticed that pdf2lrf doesn't use bookmarks yet, so...
If you use http://www.pdfhacks.com/pdftk/ you can read out the bookmarks in an original PDF and maybe integrate it with the lrf.

cacapee
09-09-2007, 02:17 PM
reading toc/bookmarks is easy. I have to add that capability to the makelrf source code, which doesn't handle toc.

kovidgoyal
09-09-2007, 03:20 PM
If you're messing with the makelrf source code fix its creation of the metadata block. It doesn't encode it correctly in UTF16

cacapee
09-09-2007, 04:07 PM
It needs a complete rewrite - I'm thinking of translating libprs to c++ (probably not the most productive thing I could be doing). BTW I already fixed that pdflrf supports unicode metadata

kovidgoyal
09-09-2007, 06:05 PM
why not just embed a python interpreter into your c++ app and use the python directly? Or better yet rewrite pdflrf in python, that's going to be a *lot* easier than translating python into c++

KEM
09-09-2007, 11:34 PM
I just tried your tool on a 2 column pdf. The file is very readable on the PRS500 and the tool was easy to use.

Over all it worked great but the picture on the front page was converted to 4 pages and the last line of each page repeats on the following page. Is there something that I can change in my set up to fix that? :blink:

cacapee
09-10-2007, 12:03 AM
Set overlap to 0

evgen
09-12-2007, 03:25 PM
One additional advantage of linking in the python interpreter is that you will get access to some nice tools like pyPdf, something I was able to swap in to my hacked version of pdfread to eliminate the obnoxious external dependency on pdftk, which can do nice things regarding pdf manipulation (e.g. write the TOC catalog with a small bit of hacking, etc)

kovidgoyal
09-13-2007, 02:38 PM
One additional advantage of linking in the python interpreter is that you will get access to some nice tools like pyPdf, something I was able to swap in to my hacked version of pdfread to eliminate the obnoxious external dependency on pdftk, which can do nice things regarding pdf manipulation (e.g. write the TOC catalog with a small bit of hacking, etc)

You have a hacked version of pypdf that can read the toc catalog? If so can you send it to me: kovid _the usual email address separator_ kovidgoyal.net

Vienna01
09-13-2007, 06:25 PM
Confused pdflrfwin.exe vs pdflrf_gui?

ereszet
09-15-2007, 06:49 AM
I have followed all your pdflrf releases with growing amazement of what you have achieved and how soon you responded to new demands and challenges. Apart from all the options that pdflrf offers, it is extremely fast. Believe me, I have tried scores of different programs/utilities (DOS/Windows/Ubuntu) to process pdf/djvu photos of old books (like Google books) before OCR-ing them with Finereader and none is even close to your program. Thank you.
And now is my humble suggestion. Can you include pdf as an output? Sony Reader is only one of many toys for reading books while pdf format is universal. It would be so useful to have pdf files readibility improved before OCR-ing them and storing them in my laptop library or reading with Archos 704 (I just ordered it and hope that 7" screen will make a difference to Sony's 6").
For your info, my workflow before discovering pdflrf was: 1. reading page images or pdfs to Finereader, 2. recognizing blocks of text/images, 3. saving images with blocks only (no white space surrounding it), 4. reading images back to Finereader, 5. OCR-ing, 6. saving to pdfs (text under image). Of course the original page images require a lot of cleaning before going to Finereader, because otherwise all black margins or blobs would be recognized as blocks and prevent removal of white space surronding the text.

ereszet
09-15-2007, 06:54 AM
I have followed all your pdflrf releases with growing amazement of what you have achieved and how soon you responded to new demands and challenges. Apart from all the options that pdflrf offers, it is extremely fast. Believe me, I have tried scores of different programs/utilities (DOS/Windows/Ubuntu) to process pdf/djvu photos of old books (like Google books) before OCR-ing them with Finereader and none is even close to your program. Thank you.
And now is my humble suggestion. Can you include pdf as an output? Sony Reader is only one of many toys for reading books while pdf format is universal. It would be so useful to have pdf files readibility improved before OCR-ing them and storing them in my laptop library or reading with Archos 704 (I just ordered it and hope that 7" screen will make a difference to Sony's 6").
For your info, my workflow before discovering pdflrf was: 1. reading page images or pdfs to Finereader, 2. recognizing blocks of text/images, 3. saving images with blocks only (no white space surrounding it), 4. reading images back to Finereader, 5. OCR-ing, 6. saving to pdfs (text under image). Of course the original page images require a lot of cleaning before going to Finereader, because otherwise all black margins or blobs would be recognized as blocks and prevent removal of white space surronding the text.

BTW. As it is my first post to the forum, and I am having problems with sending it (I had to login a number of times), I am sorry if this post appear more than once.

DrMoze
09-15-2007, 09:19 AM
I just tried (after finding in this thread) the pdflrfwin.exe utility. It's the first time I was able to successfully translate *anything* into lrf format! :D And the results make some pdfs (which I had given up on) quite readable. Yay!

Question: Are there any settings that can reduce the pdf file size, perhaps at the expense of a slight impairment in print darkness? For example, a 1.5MB pdf file was converted to 18.5MB! ANother pdf of 64kB became 1.1MB. At this rate, I can only fit a few medium-size pdf's on the netire reader! (Yes, I have an SD card, but I like using groups on the internal memory...)

Just wondering. But thanks ofr a great and easy-to-use utility!

cacapee
09-15-2007, 06:17 PM
version 0.8 adds support for Table of Contents in pdf files. It is possible to preview output images to test out various settings. An experimental linux build (built on Ubuntu) has been added. Improved threading support so processing should be faster. Changed default colors to 4 to reduce frequency of file size questions. Added more filtering options and better dithering so images should look a lot better

DrMoze
09-15-2007, 11:19 PM
Ah, # of colors is (of course) the main reason for larger files.

BTW, v8 (pdflrfwin.exe) cuts out every tie I try to convert a second file. After selecting the second file, the program quits. (v7 did not have this problem.) I'm running WinXP Pro sp2.

cacapee
09-16-2007, 12:40 AM
I cannot reproduce it here. Does it happen with any kind of file?

timestory
09-16-2007, 01:52 AM
version 0.8 adds support for Table of Contents in pdf files. It is possible to preview output images to test out various settings. An experimental linux build (built on Ubuntu) has been added. Improved threading support so processing should be faster. Changed default colors to 4 to reduce frequency of file size questions. Added more filtering options and better dithering so images should look a lot better


It is great to add ToC feature, I just converted one PDF book with one level bookmarks, when I imported the lrf file to reader, found that all those links are stored in Table of content under the menu for this book.

But when I converted a PDF book with multiple level bookmarks(one root), and imported this book, found that only the root item stored in Table of contents, all others are skipped.

cacapee
09-16-2007, 02:12 AM
Ah, this will be fixed in the next release

leha
09-16-2007, 01:24 PM
Hi, thanks for nice converter. Being linux user I have a remark. I am not sure how portable your code is and if you are interested in writing linux gui for your utility but it might be easier to keep your windows version wine compatible and skip on gui interface for linux. Of course I would like to see nice kde frontend but now it is easier to run windows version in wine then command line linux utility.

cacapee
09-16-2007, 03:31 PM
Thanks for the feedback. Anyone got the linux commandline working and prefer it to using wine?

kovidgoyal
09-16-2007, 03:38 PM
I certainly prefer using the commandline to wine.

ereszet
09-16-2007, 04:47 PM
Thanks for the feedback. Anyone got the linux commandline working and prefer it to using wine?

./pdflrf -h works ok (BTW it should be pdlrf rather than pdflrf.exe in help)
./pdflrf -V works ok
./pdflr -ibuba.pdf -obuba.lrf does not work (first the cursor blinks - no info on progress, then stops blinking, and terminal does not respond anymore - just lets me write with no response)

DrMoze
09-16-2007, 10:05 PM
I cannot reproduce it here. Does it happen with any kind of file?

Happens every time I try to convert a second pdf file in the same session (using 'browse' it quits after I select the new file to convert. Happens every time too.

cacapee
09-16-2007, 10:17 PM
New build uploaded. Fixed threading bug that causes it to hang occasionally under linux. Fixed TOC. Erosion now works sensibly with small images (like those of comic strips). Can also go down to 2 colors.

DrMoze - try this version - I still cannot reproduce your bug. Has anyone else had a similar problem?

ereszet
09-17-2007, 05:34 AM
New build uploaded. Fixed threading bug that causes it to hang occasionally under linux. Fixed TOC. Erosion now works sensibly with small images (like those of comic strips). Can also go down to 2 colors.

DrMoze - try this version - I still cannot reproduce your bug. Has anyone else had a similar problem?

I tested pdf and djvu documents. No problems, just some warning messages when processing pdf (may be a pdf problem - Google pdf books have various problems in other programs as well).

The first warning message is about fonts (Cannot find etc. - I don't care about that as long as the end result is perfect)
The second warning is:
Processing page: 65, Progress: 31
21654 extraneous bytes after segment

But again the end result is perfect. As for speed, pdf was slow, djvu was fast (different documents though).
I will do some speed comparisons Ubuntu/DOS in the next few days.

For your consideration:
I assume that you convert pdf/djvu to images, then you process the images and convert them to lrf. Therefore, I assume again that it is not a huge problem for you to include an option to output processed images. That would be a nice thing to have since images in Sony can be scaled up (e.g. a picture of a road map improved through your algorithm). And of course, once I have the beefed up images, I can always convert them to pdfs and use them with all kind of readers, not only Sony.

adinb
09-17-2007, 10:07 PM
Since you seem to be getting linux under control, any possibility of an OS X build?

If you don't have an OS X machine handy, I'll volunteer my machine (VNC) or my time in getting it to build.

cszhy
09-17-2007, 10:56 PM
Hi buddy
Thanks for improving so many function in such a short time,
The problem I have now is can't find the ghostscript option in the new versions ,does it support that? thanks

ashkulz
09-18-2007, 02:01 AM
You have a hacked version of pypdf that can read the toc catalog? If so can you send it to me: kovid _the usual email address separator_ kovidgoyal.net I've already sent a patch to the pyPDF maintainer, which was accepted with some minor changes in the official repository (http://hg.pybrary.net/pyPdf).

kovidgoyal
09-18-2007, 10:37 AM
Thanks.

ereszet
09-18-2007, 05:42 PM
Mobile Pentium 4, 2.8 Ghz, 512 Mb RAM

input files:
buba.djvu 1536 Kb
buba.pdf 13394 Kb (same book as buba.djvu - converted to pdf)

output DOS/Windows XP:
bubadjvu.lrf 8965 Kb - 285 seconds
bubapdf.lrf 9141 Kb - 590 seconds

output Ubuntu:
bubadjvu.lrf 8824 Kb - 208 seconds
bubapdf.lrf 16029 Kb - 950 seconds

Conclusions:
djvu to lrf - Ubuntu is faster
pdf to lrf - in both operating systems conversion is slower than djvu to lrf but DOS/Windows wins

My wild guess: the time critical step may be pdf to raster conversion; something is wrong with the pdflrf implementation in Ubuntu (warning messages - see my previous post).

The quality of all end results is perfect.

cacapee
09-18-2007, 06:13 PM
Interesting experiment. However, all this seems to show is that pdf is not as great as djvu for image only documents. As you can see pdf size is around 10 times djvu document. the pdf rasterizing code should be the same on linux and windows.

I can turn ghostscript support back on. Why do you need it though?

Can one cross compile for OSX?

cszhy
09-19-2007, 12:29 AM
Actually I need the ghost script support for converting some kind of Chinese text format Pdf files , I can do that in the elder versions very well
thanks for developing the greatest tool for Sony Reader user all over the world!!!

adinb
09-19-2007, 03:56 AM
Can one cross compile for OSX?

I don't think that's an option for any linux implementations of gcc, but stick to posix level compat and give me a working project/make directory & I'll muck with it until I get the project to build.

godel10
09-19-2007, 04:52 AM
I am the owner of a Sony Librie (not a Reader) and pdflrf has been wonderfully working with Librie until version 0.7.

The last version 0.8 does not work (neither the DOS one nor the Linux one), and I am wondering if anyone could know the reason why this compatibility has been lost. If it is possible I would like to ask the creator of pdfrlf to keep this compatibility issue.

JSWolf
09-19-2007, 08:40 AM
I am the owner of a Sony Librie (not a Reader) and pdflrf has been wonderfully working with Librie until version 0.7.

The last version 0.8 does not work (neither the DOS one nor the Linux one), and I am wondering if anyone could know the reason why this compatibility has been lost. If it is possible I would like to ask the creator of pdfrlf to keep this compatibility issue.
0.8 adds ToC support and as we know, then BBeB (LRF) version in the Libre is older then the one in the Reader. So, it looks like it's a Libre problem due to an older BBeB viewer used in the Libre.

JSWolf
09-19-2007, 08:43 AM
Would it be possible to add an option to pdflef to all us to specify how much space between the end of one page and the beginning of the next? What happens is when you cut out headers/footers, and have the pages running without breaks, you get the teo pages right on top of each other with no spacing. It's literally the last line of the previous page and the first line of the next page joined together almost.

godel10
09-19-2007, 10:27 AM
0.8 adds ToC support and as we know, then BBeB (LRF) version in the Libre is older then the one in the Reader. So, it looks like it's a Libre problem due to an older BBeB viewer used in the Libre.

Could it be possible to get a (command line is fine) linux version of the 0.8 without the TOC feature? This would allow linux users that are owners of a Sony Librie to be able to use the tool pdflrf.

Cacapee, if you do not want to "waste" your time with this issue I could help you doing most of the work.

Tuello
09-19-2007, 07:13 PM
Thanks cacapee, for a great tool - the TOC makes it great for reference books, and the fact that it handles djvu files too is fantastic. I can hardy wait for the 2-level TOC.

I'm running the pdflrfwin version 0.8 (the latest 0.8) and with one pdf file I get a "Error" message box with the text "Last position was: 33283942", when I hit OK it crashes. I made the pdf myself by merging a bunch of chapters together - am I doing something wrong?

cacapee
09-19-2007, 07:22 PM
multi-level toc is already supported - unless there is a bug... Can you send me a test case that displays the "Last position was: 33283942" error?

blacksharpie
09-19-2007, 08:31 PM
I second this. That's exactly what I would like to have as well.

Would it be possible to add an option to pdflef to all us to specify how much space between the end of one page and the beginning of the next? What happens is when you cut out headers/footers, and have the pages running without breaks, you get the teo pages right on top of each other with no spacing. It's literally the last line of the previous page and the first line of the next page joined together almost.

Tuello
09-19-2007, 09:25 PM
Can you send me a test case that displays the "Last position was: 33283942" error?
Pages 1-125 of the full text (589 pages) converts o.k., so does 125-300, but doing 1-300 in one go crashes. I extracted out pages 1-300 to a separate pdf so the resulting file is less than 5 MB, and verified that it crashes too. I also checked the conversion on a laptop and I get the same behavior (it crashes but it doesn't always give the error message first).

NatCh
09-20-2007, 12:39 AM
Hmmm. That PDF looks like it's copyrighted -- unless the author's given permission to re-post it, it might be better to go with a link to the originating sight instead -- we're workin' real hard to stay on the right side of copyright issues around here, you understand. :nice:

cacapee
09-21-2007, 03:44 AM
version 0.9 Added padding (in pixels) when using runpages. Fixed crash bug when generating toc. Added back ghostscript support. Added option to disable generation of TOC.

ereszet
09-21-2007, 04:21 AM
version 0.9 Added padding (in pixels) when using runpages. Fixed crash bug when generating toc. Added back ghostscript support. Added option to disable generation of TOC.

Your program and response to suggestions is great.
Any chance for pdf/jpg output in the near future?
Alternative would be an lfrtopdf or lfrtojpg or lfrtoprinter, but I cannot find anything like that:(

cacapee
09-21-2007, 11:54 AM
png output should be possible. I dunno about pdf output. What do you need it for? Will you want to change the output size?

ereszet
09-21-2007, 01:41 PM
png output should be possible. I dunno about pdf output. What do you need it for? Will you want to change the output size?

Png output would be fine. I can convert it to anything (imag)inable like jpg or pdf. I ocr and store hundreds if not thousands of historical books in my computer as pdf's and I index them. I can then find every word and go to a proper page. Your program does an excellent job for my pre- or post-processing, like fattening the fonts. It does other things as well, it does it fast and it does it in batch. No other program that I tried (scores sof them, even commercial) can match that speed.

Apart from that, which has little to do with PRS-500, once I have png output, I can use it (after conversion to jpg or pdf) both with Sony and with all other devices that read jpg or pdf. I would not need to keep duplicates as lrf's and pdf's.

On top of that, imagine that you scanned a road map. Lrf is limited to 6" or with your program to 2x6". Png (converted to jpg) can be scaled up and panned as an image even in Sony.

BTW. I invite you to visit my thread about paper book to Sony (do-it-yourself repro v-cradle) in Sony - Accessories. You will see that I am quite serious about converting paper documents/books to digital format. I have digitized all kind of family and other documents. The number goes into tens of thousands (I am not exaggerating). Sony Book Reader is just a name. In reality it is a document storage and reader with one disadvantage of lacking folders. I am trying Archos 704 wifi right now. It cannot read lfrs but can pdfs, though it is very slow. Anyway, the universal png format can be used for all kind of various purposes.

If you can do that output, please do.

ereszet
09-21-2007, 01:55 PM
png output should be possible. I dunno about pdf output. What do you need it for? Will you want to change the output size?

As for the output size, it is always nice to have an option with some parameters. I always wonder why the programmers (I did some programming myself in assembler, fortran and algol 30 years ago) do not let as an (advanced) option for the users to change all kind of parameters within allowable limits. I even went so far in the early days of pc's as to use a hex editor to change some parameters that were not available in the menus or command lines. I remember that my first succesful attempt was to change lpt1 output to lpt2.

But if changing the size is a lot of work for you, do not bother to much. Png output will be sufficient. You work hard enough and everybody appreciates that.

godel10
09-21-2007, 03:56 PM
version 0.9 Added padding (in pixels) when using runpages. Fixed crash bug when generating toc. Added back ghostscript support. Added option to disable generation of TOC.

Thank you very much for adding the option of disabling TOC generation. Unfortunately, it seems that this issue was not the reason why compatibility with Librie was lost (because the files converted with TOC disable are not working with Librie).

Can anyone guess why the compatibility with Sony Librie that was in pdflrf0.7 has been lost in the new versions?

cacapee
09-21-2007, 04:28 PM
I was afraid of that. I made sweeping changes to the lrf outputter but obviously I could only test it with my Sony Reader. Let me look into it.

Tuello
09-21-2007, 05:22 PM
cacapee, you rock! Version 0.9 does a great job on the book that was giving me trouble. Thanks for the great customer support ;)

godel10
09-22-2007, 07:03 AM
I was afraid of that. I made sweeping changes to the lrf outputter but obviously I could only test it with my Sony Reader. Let me look into it.

I do not know if this can help you, but the lrf output is neither working with editlrfmeta (http://editlrfmeta.peterknowles.com/). I mean that this tool complains with the lrf generated and does not answer the metadata (title, author, etc) of the lrf file.

If you manage to generate an lrf output that works with editlrmeta, I think it could work well in Librie.

ereszet
09-22-2007, 05:37 PM
png output should be possible. I dunno about pdf output. What do you need it for? Will you want to change the output size?

Attached pictures show photos of lrf and png screens for Sony and png screens for Archos 704 (for comparison).
Pictures with "max" in their name show maximum details available in respective png screens.
Sony.lrf is an output from pdflrf with maximum details available (a page splitted in two).

Conclusion:
Png format makes it possible to read city maps even in Sony. What's more, it can be used universally outside of Sony world and can be converted to any image or pdf/djvu format.

Shake
09-22-2007, 06:13 PM
You tool is really great! I love it.

Unfortunately I get the following errors:

Couldn't find a font for 'Helvetica'
Couldn't find a font for 'Times-Roman'
Couldn't find a font for 'ZapfDingbats'

This error does not occure for all PDFs - just for a few. But the fonts are installed:


fc-match Helvetica
NimbusSanL-Regu.pfb: "Nimbus Sans L" "Regular"

fc-match Times-Roman
NimbusRomNo9L-Regu.pfb: "Nimbus Roman No9 L" "Regular"

fc-match ZapfDingbats
DejaVu-Sans.ttf: "DejaVu Sans" "Book"

JSWolf
09-22-2007, 09:59 PM
I was afraid of that. I made sweeping changes to the lrf outputter but obviously I could only test it with my Sony Reader. Let me look into it.
Please don't do anything that's going to do something to the LRF to make it not as good as it is now just for backwards compatibility please. What you could do is a switch for Libre compatability with Reader as the default LRF mode.

cszhy
09-23-2007, 09:24 AM
It's so nice of you to put the ghostscript option back!!!
thanks buddy ^_^

cacapee
09-23-2007, 01:44 PM
Hi ereszet, I noticed that the lrf thumbnail is displaying the status bar in landscape, but is not using Sony's landscape mode. Is there a hack that will enable that?

ereszet
09-23-2007, 04:11 PM
Hi ereszet, I noticed that the lrf thumbnail is displaying the status bar in landscape, but is not using Sony's landscape mode. Is there a hack that will enable that?

Hi cacapee, the hacks I installed have nothing to do (I beleive) with that display.

I used the hack builder to install: small menu font, direct page control, clock, delete book, total page counter. I lost the possibility to change slideshow settings. After installing the hacks all my pictures were shown for half a second. If I wanted to see picture one out of three, the display went immediately to picture three and blinked every half a second. I couldn't set slideshow to off. Finally, I discovered that I can restore default settings, which include "slideshow off". But now, I cannot use the slideshow, which is useless anyway for poor quality gray rendering of color photos (you should see my Archos 704 color slideshow - more vivid than on my laptop).

So, the hack of status bar in landscape mode is probably - in a way - of your making. I process pdfs through your pdflrf with 90 degrees rotation, and it is displayed in Sony's portrait mode with the status bar down there.
The status bar is displayed in landscape when I push the size button to change from portrait to landscape. What happens is that the lrf page is split in two vertically and I can see scaled up half pages separately. Very useful for maps, less convenient - but sometimes necessary - for texts in small fonts.

balok
09-23-2007, 06:15 PM
png output should be possible. I dunno about pdf output. What do you need it for? Will you want to change the output size?

Output in PDF is a good idea because it will definitely outlast BBeB, and may turn up on future ebook readers.

Output in formats other than lrf would make this tool useful for other devices. We do in fact expect at least two new 6" e-ink devices in the near future, which will need a tool like this one for reading pdf. Both will be reading mobipocket format, so output in prc would surely be appreciated. The Iliad also reads mobipocket, if I'm not mistaken. Besides that, there are other electronic gadgets with small screens that could benefit from your neat little program (what formats do they read?).

Btw, cacapee, thanks for this great program. I use it practically every day. Thanks also for the linux build. I bet the OSX people are getting impatient... I hope they can run wine at least.

cszhy
09-23-2007, 09:28 PM
Hi Buddy
when I use the version 0.9 converting some txt pdf
I got 5 error box said
"AFPL Ghostscript" 8.51: "unrecoverable error" "exit code 1"

what does that mean? what can I do ? thanks!!!

cacapee
09-23-2007, 10:05 PM
PM me a link to the file. I'll see what the problem is.

ereszet
09-24-2007, 08:06 AM
I share here my experience based on processing tens of thousands of paper documents into a digital form using my camera, my repro v-cradle setup of my design described earlier, and lots of free, demo, and commercial software.

The source of original paper books / documents images can be either a scanner, a digital camera or internet djvu/pdf files. I am discussing image pdf files rather than text pdf files. The best pdf format is "text under image" because you get exact image of the original page plus a text underneath that can be indexed and searched in your computer (alas no search is available in Sony Reader or other book readers that I know).

If the original images are of good quality, the only tool you need is pdflrf by cacapee. It will resize and rotate the pages, remove white background surrounding the text (if it is a clean background), and it will fatten the fonts to make them more outstanding when displayed by Sony Reader. It is extremely fast in comparison to any other program I know. As far as I understand, the most time consuming stage of pdf to lrf conversion is extracting images from pdf. Usually djvu to lrf (by pdflrf) is much faster. I believe that the fastest process would be to use scanned or camera images as a direct input to pdflrf, but I do not dare to ask cacapee to do that (I already requested him to add png images to the output for the reasons that I will explain later). Pdflrf is available for DOS with a nice Windows interface and for Ubuntu Linux. Djvu conversion is faster in Ubuntu but pdf conversion in Ubuntu is much slower (same algorithm according to cacapee, but apparently DOS image extraction implementation is better than the one in Ubuntu, plus pdflrf in Ubuntu complains frequently with warning messages about missing fonts end "blocks" while producing good results - why it complains about missing fonts if the input is image rather than text is beyond me).

Of course to convert your scanned or camera images to pdf or djvu you need a program that can do that. As there is a plethora of free programs both for Windows and Linux, I am not going to discuss that. Just note that you can print to pdf or djvu any document/image that can be browsed in a program that allows printing (you cannot print lrf though - a challenge to developers).

In a perfect world pdflrf would be all you need to get lrf files readable with your Sony Reader (a few hundred pages in your free evening time, including the time to photo scan the documents). But the input to pdflrf may be of poor quality either because you didn't take time to shoot the photos properly or you use poor quality google books pdfs or you get djvu files from digital libraries that are based on old microfilms. So you need some preprocessing.

I will concentrate now on camera images. The first stage is to get your images from a camera to the computer. You may do it with the software provided by the camera manufacturer or other developers, but I use Picasa offered free by google both for Windows and Linux. I use it also to automatically correct contrast and color in a batch of photos. Contrast is expecially important for further processing. I once took a number of document photos just by "shooting from the hip" in a dark hotel room. In the resulting images the text was hardly discernible from the background. Picasa took care of that.

If you have a lot of time and patience, you can use Picasa to correct other aspects of your photos (like cropping to remove background, deskewing, correcting white balance, etc.) but image after image rather than in a batch. It is manageable for a dozen of images or so but would be rather time consuming for a book with some hundred pages. Therefore we need other programs that can do additional processing in batch.

The one I use for further batch processing is the commercial Finereader 8 program. Basically it is for OCRing the images/pdf input (does not take djvu) but its preprocessing abilities ares very useful. With it you can split the double pages, adjust the resolution (from 96 or even 72 dpi to 600 dpi and more), convert to black and white (the algorithm is quite good), deskew the lines of text (the algorithm is rather poor), clean the small spots, and remove anything which is not text or images by saving only the recognized blocks (for batch saving it requires a special trick that I will describe in another post). Finally you can save all the pages to pdf ("text under image" or a number of other formats, but not djvu). I eagerly await the next version of Finereader, possibily with better cropping option, better deskewing and an option to save blocks of text and images in a batch process. Unfortunately, in my opinion, Finereader marketing policies keep them from issuing new versions fast (version 8 is more than 2 years old). Their developers are ingenious, but marketing people do not want my money for a new version (same with Canon - my Powershot Pro 1 is more then 2 years old).

That's all for now folks. My description of specialized software for processing poor images will follow soon. In the meantime, you can have a look (search the net) at the BookRestorer (very expensive, comes with commercial photo scanners) or ScanKromsator by bolega (Russian interface, free and powerful) or Snapter by Atiz (demo available - it is slow and not fit for real batch processing but it is fun to experiment with). Those of you who have any experience with free GIMP and ImageMagick or Adobe (commercial) Photoshop/Lightroom plus various plug-ins or similar heavy wieghts are welcome to share their experience. But remember the we are considering here speed, batch processing, ease of use, and a special application (not just improving family photos).

ereszet
09-24-2007, 08:10 AM
Cacapee, I draw your attention to my post "Software tools to convert paper documents to lrf with thanks to cacapee for pdflrf" in Sony Acessories.

ereszet
09-24-2007, 09:24 AM
I have a request. A couple of my free PDFs have tiny marks right on the top and bottom edge of the page. This prevents it from being cropped. The one PDF I have done so far is about twice the size it should be. Can you figure out a way to fix it?

This is a really nice tool.




An example should be attached.

Nate, see my post "Software tools to convert paper documents to lrf with thanks to cacapee for pdflrf" and the ones that will follow in Sony Accessories. If you attach your original pdf file rather than lrf, I will see what is the best method to remove "white" space with marks and I can (with your permission) use it as an example in my thread.

JSWolf
09-24-2007, 09:42 AM
Cacapee, I draw your attention to my post "Software tools to convert paper documents to lrf with thanks to cacapee for pdflrf" in Sony Acessories.
See post #177 in this thread.I moved it because it did not fit with where you placed the post.

cacapee
09-24-2007, 11:48 AM
Fixing white space marks can be done by using the Trim% feature. Click preview and move the left, top, bottom, right markers so that you crop out what you don't need. You don't have to be very accurate since pdflrf's whitespace removal takes care of the rest.

ereszet
09-24-2007, 12:27 PM
JSWolf, I see that you moved my Software Tools post from Reader Accessories to this thread. It is all right with me but it discontinues the logical thread from Paper to Reader, including the hardware setup. I will make aware visitors to Acessories about tha move. However, I am not sure how to refer them to the new place of my post. I can only find it by searching for my own posts and not for number 177. How do one finds post number 177?

ereszet
09-24-2007, 12:44 PM
Fixing white space marks can be done by using the Trim% feature. Click preview and move the left, top, bottom, right markers so that you crop out what you don't need. You don't have to be very accurate since pdflrf's whitespace removal takes care of the rest.

It is a great feature of pdflrf but it will not work with book pages whose margins change from page to page (like a lot of them available in digital libraries or in photos taken without a cradle and a tripod). In addition, white space removal (another magnificent feature of pdflrf) will not work with whitespace stained by black spots or blobs. You get a lot of them in photos of old books or if you split double pages in two (black/gray shadow in the middle of the original photo becomes black/gray shadow at the margins of the resulting two pages. It shows as black lines/blobs after converting to black and white. So a number of books require preprocessing before going to pdflrf. I will post my experience with that at a later stage, if you believe that my Software Tool theme belongs to this thread.

BTW. I take my photos with the black background, so I use programs to remove black background first and whitespace after that.

JSWolf
09-24-2007, 01:27 PM
JSWolf, I see that you moved my Software Tools post from Reader Accessories to this thread. It is all right with me but it discontinues the logical thread from Paper to Reader, including the hardware setup. I will make aware visitors to Acessories about tha move. However, I am not sure how to refer them to the new place of my post. I can only find it by searching for my own posts and not for number 177. How do one finds post number 177?
#177 in this thread. Each post in a thread is numbered. So just go to the page and post with #177.

cacapee
09-24-2007, 02:10 PM
Have you loooked into unpaper?

JSWolf
09-24-2007, 03:05 PM
Have you loooked into unpaper?
What is unpaper?

cacapee
09-24-2007, 03:10 PM
http://unpaper.berlios.de/

I haven't personally used it.

NatCh
09-24-2007, 03:11 PM
best Google-guess: http://unpaper.berlios.de/ :wink:

ereszet
09-24-2007, 05:11 PM
Have you loooked into unpaper?

I was in touch with the author a few weeks ago to learn how to pipe jpg images to unpaper and save results as jpg as well. His advice was good for Linux but I have not managed to repeat it in DOS yet (however the DOS version packaged in pdfread works well with ppm/pbm/pgm).

Both versions - Linux and DOS - work nicely in batch mode. After experimenting with some parameters one can clean the image from black spots, lines and blobs with the result being masks (or blocks of text/image) surrounded by whitespace. His algorithm for conversion to black and white is based on the threshold method, which is not sufficient for poor quality originals. One has to keep in mind that even with cleaning parameters adjusted to clean one page, the processing may damage other pages by removing the text as well.

A nice feature of unpaper is splitting of double pages and replacing the dark shadow between the pages or at the margins with whitespace.

I asked the author to consider trimming the white space automatically once the program recognized the masks. He may do that in the future but not too soon.

For now, it is a good free preprocessing tool for pdflrf.

ereszet
09-24-2007, 05:15 PM
#177 in this thread. Each post in a thread is numbered. So just go to the page and post with #177.

It was not possible for me in the hybrid mode. Only now I discovered that posts are numbered sequentially in linear mode.

ereszet
09-24-2007, 05:29 PM
Fixing white space marks can be done by using the Trim% feature. Click preview and move the left, top, bottom, right markers so that you crop out what you don't need. You don't have to be very accurate since pdflrf's whitespace removal takes care of the rest.

A poor quality original jpg is attached. Pdflrf can do little to improve it unless one manually trims the black margins (different for different pages), see the attached lrf file.
After preprocessing with ClearImage (demo version) and Finereader8 (commercial) the jpg is much improved although some black lines at the margin remain (more aggresive preprocessing would remove parts of text as well). Then, pdflrf can be used more effectively.

BTW. The image is a photoscan of a 1914 Russian calendar. I can make a better photo of it by zooming to individual rather than double pages, putting white paper to avoid effects of transparency, etc., but the photo in example is just to make my point about the need for preprocessing.

cszhy
09-24-2007, 07:37 PM
PM me a link to the file. I'll see what the problem is.

It's in Chinese , thanks :)

bookworm
09-25-2007, 03:43 AM
Cacapee.
Thanks for your good program. However, is there a way that you can use the size button (S M L) on the Sony ereader with your program. After converting to LRF file the sizing does not work.
Thanks

JSWolf
09-25-2007, 05:50 AM
Cacapee.
Thanks for your good program. However, is there a way that you can use the size button (S M L) on the Sony ereader with your program. After converting to LRF file the sizing does not work.
Thanks
The size button won't work. pdflrf converts the PDF into images in LRF format. Sinze the resulting file is not text, you cannot change the font size. It's also not relflowable. So what you get is what you see.

ereszet
09-25-2007, 06:48 AM
The size button won't work. pdflrf converts the PDF into images in LRF format. Sinze the resulting file is not text, you cannot change the font size. It's also not relflowable. So what you get is what you see.

I am under impression that the size button works sometimes with pdf files (S and M), because the Reader tries to trim the white margins. After trimming the whitespace with pdflrf there is no longer room for expansion.

However, if you convert pdf to images, you can scale them up. Let us wait for cacapee to provide the png output.

cacapee
09-25-2007, 12:21 PM
It's in Chinese , thanks :)

That file doesn't work in ghostscript by itself. Can you help me figure out why?
The problem looks not to be in pdflrf.

cszhy
09-25-2007, 07:49 PM
the previous version of pdflrf before 0.6(I forgot which one)can do that pretty well, but I already delete it when the new version came,so would you mind upload it again? I think that will save your time, thanks

Fallen angel
09-28-2007, 03:29 PM
I'm sorry, I don't know if that would be easy to do, but it would be really great if .chm support could be also added. Also, it would be useful (but not absolutely necessary), if we could convert multiple files per time.

ashkulz
09-29-2007, 04:00 AM
I'm sorry, I don't know if that would be easy to do, but it would be really great if .chm support could be also added. You can easily extract CHM to HTML (http://en.wikipedia.org/wiki/Microsoft_Compiled_HTML_Help#Extracting_to_HTML) and then use libprs500 (https://libprs500.kovidgoyal.net/) to convert from HTML to LRF. I'm sure if you request chm support from Kovid (author of libprs500) he'll be willing to add support for it :)

Bob Russell
09-29-2007, 05:54 AM
I was thinking the same thing - chm support would be wonderful.
And that's right... David's program certainly seems the right place for it.

cacapee
09-29-2007, 11:46 AM
libprs500 is the better place to have chm support. Also you can batch by dragging multiple files over to the pdflrfwin window.

ereszet
09-29-2007, 05:06 PM
Hi, I've taken some of the ideas of existing tools along with a few refinements of my own to code this one up. I've attached a few sample conversions to get an idea of what the tool can do.

The refinements are --runpages (which causes adjacent pdf pages to be spliced into the same image if possible) and --smartcut (which avoids the annoying splits at the edge of the image) Another feature is that landscape mode (which is the default) rotates the image but doesn't actually use the Reader's landscape mode.


Cacapee, can you disclose what GNU/Open Source libraries/modules do you use for fattening the fonts?
I convert all my small print documents to images that can be zoomed (for some of my documents pdf or djvu display is not big enough even with your pdflrf conversion). While I wait patiently for your pdflrf release with png export, I would be quite happy in the meantime to try my hand with direct processing of my images to fatten the print. I can do it now with ClearImage demo, using some tricks to do it in batch, but your processing is blindingly fast (taking into account a number of processing steps that pdflrf includes).

Nothing can substitute your refinements but for the time being, I need something to help me read small print without wasting my time with programs less efficient than yours.

cacapee
09-29-2007, 09:13 PM
I use CImage to do the erosion (which is fattening the lines). This is the most time consuming part of the entire process by far. I think paint.net and even gimp has this filter.
The key is to achieve a balance between the original image size, the final image size and the erosion factor. Are you finding that it is not thick enough? Can you pm me an example.

cacapee
09-30-2007, 12:50 AM
This version adds lots of improvements to comic strip output.

version 0.99 Fixed Librie compatibility (maybe). Adjustable image size on output. Output zip files (if extension is cbz or zip). Added RGB output. Added post stretch of image to fit page. Added options to fit image by height/width/2*height etc. Sort files in rar and zip.

bookworm
09-30-2007, 04:43 AM
The 0.99 version is really excellent. I always log in to mobile read forums to see what developments you made on PDFLRF. Thanks a ton. This version gives excellent flexibility

godel10
09-30-2007, 08:21 AM
This version adds lots of improvements to comic strip output.

version 0.99 Fixed Librie compatibility (maybe). Adjustable image size on output. Output zip files (if extension is cbz or zip). Added RGB output. Added post stretch of image to fit page. Added options to fit image by height/width/2*height etc. Sort files in rar and zip.

Thank You. Your efforts about Librie compatibility are much appreciated. But unfortunately it is still not working.

In version 0.9 when I was trying to use editlrfmeta with an lrf file generated with pdflrf all what I was getting was an error. In version 0.99 what I get is something like

title:
author:
bookid: PDFLRF046ff8d14
publisher:
description:
date: 2007-09-30
error displaying meta information: unexpected end of file. Current depth is 3 Line 17, position 14.

That is, first of all we can see the metadata, but later we get the error. What it is surprising is that we always (it does not matter what lrf file you generate) get the same error talking about depth 3, Line 17 and position 14.

ereszet
09-30-2007, 08:59 AM
I use CImage to do the erosion (which is fattening the lines). This is the most time consuming part of the entire process by far. I think paint.net and even gimp has this filter.
The key is to achieve a balance between the original image size, the final image size and the erosion factor. Are you finding that it is not thick enough? Can you pm me an example.

I would think that image extraction is the most consuming part of the process but you know it better. The fattening is ok in lrf but you cannot increase the size of small fonts display beyond what the Sony landscape mode allows. That is why I asked for png output, that allows panning the display view. You have done it in version 0.99 in a marvelous way. See my next post with image examples.

ereszet
09-30-2007, 09:10 AM
Adjustable image size on output. Output zip files (if extension is cbz or zip). Added RGB output. Added post stretch of image to fit page. Added options to fit image by height/width/2*height etc. Sort files in rar and zip.


I do not exaggerate in my post title. In version 0.99 pdflrf can output zipped png images of any size and they can be refined to be viewed with probably any portable readers (not just Sony or Archos).

For the Sony Reader it means that in can display small print and even city maps.

For the Archos it means that it becomes a reader. Pdf reading in Archos is almost useless for the long time it takes to load or zoom every page. Pdf converted to png images can be resized and displayed immediately from its folder page by page.

BTW. Cacapee, if the Archos people buy an adopted pdflrf from you (which they should do right now) do not allow them to ask for a buyers' telephone number and e-mail address before selling it (they do it with their plug-ins, which means that my Archos stays crippled because I cherish my privacy).


The images in "quick fox" photo are from left to right:
1. the original color pdf
2. lrf by pdflrf
3. like image 2 but splitted in two for maximum size
4. png output from pdflrf (can be panned)

The images in "city map" photo are from left to right:
1. lrf by pdflrf splitted in two for maximum size
2. png output from pdflrf
3 and 4. panned image 2

The images in "archos city map" photo are from left to right:
1. pdf zoomed to maximum view (takes 2 minutes to prepare the image and another two minutes to zoom)
2. double size png output from pdflrf zoomed to maximum view (loads and zooms almost immediately, can be panned)

cacapee
09-30-2007, 11:27 AM
godel10 - editlrfmetadata works for me in both .9 and .99 Can you pm me a pdf file which has that problem.
cszhy - Similarly regarding ghostscript support in 0.5, I get the same error.
Someone complained about TOC support. Again I can't reproduce the problem. Please pm me the file that shows the problem.

ereszet
09-30-2007, 12:09 PM
Cacapee, would asking for png input be too much?

You see, I photoscan the books/documents with my camera in an image format, so the convertion to pdf/djvu is a step that could be skipped when I need to process them through pdflrf.

cacapee
09-30-2007, 12:14 PM
Zip (or rar) up the png files.

Also I reuploaded new builds of dos and linux versions.

godel10
09-30-2007, 02:58 PM
godel10 - editlrfmetadata works for me in both .9 and .99 Can you pm me a pdf file which has that problem.

For instance you can use a pdf file (http://www.iiia.csic.es/~fbou/papers/Bou02b.pdf) I created myself. When I use your linux .99 version my output is this (http://www.iiia.csic.es/~fbou/papers/Bou02b.lrf), and the output of editlrfmeta with this file is

title:
author:
bookid: PDFLRF046fff061
publisher:
description:
date: 2007-09-30
error displaying meta information: unexpected end of file. Current depth is 3 Line 17, position 14.

Note: Anyway, if I try your windows version then the output lrf file is neither working with Sony's Software for Librie ("Librie for Windows")

kovidgoyal
09-30-2007, 03:05 PM
cacapee there is a bug in creation of the metadata block by pdflrf. I had to work around it in the latest libprs500. Basically there's content after the closing </Info> tag, which probably indicates that the size of the emtadata block is not being correctly set in the output LRF file.

ereszet
09-30-2007, 04:04 PM
Zip (or rar) up the png files.

Also I reuploaded new builds of dos and linux versions.

Some tests:

djvu to lrf - 200 seconds
png (converted from djvu to tiff by DjVuDecode and from tiff to png by Irfanview - size over 20 Mb each) - 375 seconds (output size about 50 Kb for each png)
png (converted from djvu to tiff by DjVuDecode and from tiff to png by tiff2png - size less than 100 Kb each) - 340 seconds (output size as above)

It looks like unzipping and zipping takes it toll.

Zips created by an old pkzip version 5.1.2600 do not seem to work (I cancelled processing after more than 10 minutes).
Zips created by Total Commander are ok.

Pictures attached are from an old Polish heraldic book available in djvu format from an academic network of dLibra.

cacapee
09-30-2007, 05:17 PM
I've uploaded a new build up that fixes a couple of metadata bugs.

ereszet
09-30-2007, 05:34 PM
The pdflrf processed png picture I attached to my previous post was taken from Sony's small view. It is in practice the only view available for pdfs or image based lrfs. But the pngs can be seen (and panned) in medium and large view as well.

What a pity that Sony does not allow collections/folders of pictures. Why their developers have crippled the Linux implementation so much is beyond any reasoning. The Archos allows folders, so each collection of png book pages can be placed in its own folder and read page after page. In Sony, it is just a long list of png pages from the pictures menu. Fortunately they can be viewed in sequence by using the page buttons.

godel10
10-01-2007, 04:22 AM
I've uploaded a new build up that fixes a couple of metadata bugs.

It seems that the Librie compatibility issue in the previous release was in that small bug.

Thus, release 0.99 is the first one that have Librie compatibility and a Linux version. Thank you very much cacapee :)

ereszet
10-01-2007, 09:01 AM
Cacapee, below is a list of refinements for you to consider to make your program a comprehensive tool for "paper books to image" processing:

1. Deskew (based on lines rather than borders - like ClearImage)
2. Clean Black Borders before wiping out White Borders (again like ClearImage that removes the borders, and sets border area pixels to white). That option is reliable for B&W pictures only.
3. Straighten the curved text lines (the only program I know that does it properly is BookRestorer)
4. Remove the perspective/trapezoidal effect (as above)
5. Despeckle/ Remove noise
5. Remove black objects (lines and blobs) with a specified number and density of black pixels. No program that I know can do it properly.Some programs allow to remove the punch holes (ClearImage) and some allow to remove lines or blobs of certains dimensions (Scanfix) but every so often they remove parts of text as well. That is why I suggest to take into account the number of black pixels that are close together rather than vertical and horiontal dimensions of black objects. The line of text is a mix of black and white pixels, while a black blob will have much higher proportion of black to white.

I do not expect you to do what professional/commercial teams of developers have not achieved yet, but it looks from what you have already done that you can improve on them. The cleaning of black borders would be the most welcome refinement for "picture perfect" photos of book pages (camera pictures are of different proportions than paper documents/books and that leaves the black bacground at the picture borders). The other refinements are for poorly taken photos/microfilms from digital libraties.

Once again let me tell you how much I admire what you have done, and my suggestions are just a dream for the future.

sszekacs
10-02-2007, 07:55 AM
Hi all,

I've read through the whole discussion. Cacapee, you did a wonderful job!

Just one question:

How can one output the generated pages in an image format?

I tried (using the command prompt):
pdflrf.exe --outputimages -vrs -f1 -l3 -i xxx.pdf -o xxx.lrf

The lrf file was generated without any error, however I could not find any generated images in the folder. What am I doing wrong?

Thanks guys,

Szabi

ereszet
10-02-2007, 08:45 AM
In Windows interface you just give your output file the zip extention (in the setting window). Same for images as the input.

godel10
10-02-2007, 08:51 AM
Hi all,

I've read through the whole discussion. Cacapee, you did a wonderful job!

Just one question:

How can one output the generated pages in an image format?

I tried (using the command prompt):
pdflrf.exe --outputimages -vrs -f1 -l3 -i xxx.pdf -o xxx.lrf

The lrf file was generated without any error, however I could not find any generated images in the folder. What am I doing wrong?

Thanks guys,

Szabi

Try
pdflrf.exe -vrs -f1 -l3 -i xxx.pdf -o xxx.zip

MosFet
10-02-2007, 10:16 AM
I think that the PdfLrf is stable and mature enough to reach the 1.00 Version.

Many thanks to Cacapee !

:crowngrin

ereszet
10-02-2007, 02:54 PM
The first picture shows from left to right:
original image,
image after automatic black background removal with ClearImage demo,
image after black background removal, processed with pdflrf (default portrait mode, thicken=3).

The black line and the black spot at the left margin prevent pdflrf from automatic trimming that part as a white space. That line comes from taking the picture with the opposite panel of the v-cradle getting into the view.

The second picture shows the same sequence but with the original image shot with greater care in order to avoid the view of the opposite panel of the v-cradle.

The lighting for both original images was not perfect, as one can see well in the decorative frame part of the images. I can do better than that but not in provisional conditions when I am far away from home now.

The pdflrf output of both images was set at the original input size rather than adjusted for Sony in order to better see the resulting details. For the same reason the pictures I attach are quite large (still less than 1 Mb each) although I saved them at 25% size of the originals at 50% compression.

sszekacs
10-02-2007, 04:14 PM
In Windows interface you just give your output file the zip extention (in the setting window). Same for images as the input.

thank you, ereszet and godel10

daiphile
10-03-2007, 10:53 PM
Thank you for wonderful program.
BUt I have a trouble in using your program.
When I let ghostscript option on, I always got error message.(Windows XP, Ghostscript 8.54) Is there anyone who doesn't have the problem?

cacapee
10-04-2007, 01:35 PM
I guess you have ghostscript installed right? Any idea about what the error message says?

ereszet
10-04-2007, 04:21 PM
Cacapee, it looks that at some stage you have changed the default setting for overlap% to 0.0 (Windows version). Was there a reason for that? The lines at the end of some pages in a document I processed with default settings were broken horizontally. The overlap setting of 0.05% worked all right by repeating the last line as the first line in the next page. Can you make it a default? (I have not conducted a thorough study with different documents, so 0.05% may not be the best for all cases, but my document page was quite typical A4 size from Word converted to pdf).

MatthewTheRaven
10-05-2007, 12:20 PM
I think the best way to deal with the question of defaults would simply be to allow you to save your own defaults to a file (or the registry, if you want it to persist across versions).

I would much prefer that, as I usually go for the portrait mode and I have to remember to change it each time. I know I would prefer no overlap or almost no overlap. so even then, somebody like me would have to remember to change it each time.

This is a tremendous program that just keeps getting better...

JSWolf
10-05-2007, 05:03 PM
Would it be possible to differentiate between text and graphics and if graphics would either be split among pages or of a larger resolution then the screen to make the graphic start on a new page and/or resize it for a single screen? That way things like equations or diagrams won't be split and still be useful.

ereszet
10-05-2007, 05:13 PM
As far as I understand pdflrf, all is graphics. The pdf/djvu input is converted to image files and the ouput is an image contained in lrf (or a set of png images).

JSWolf
10-05-2007, 05:14 PM
As far as I understand pdflrf, all is graphics. The pdf/djvu input is converted to image files and the ouput is an image contained in lrf (or a set of png images).
The output is graphic pages yes, but the input may not be just graphics.

cacapee
10-05-2007, 05:22 PM
You can get a similar effect by checking runpages but unchecking split pages.

ashkulz
10-05-2007, 11:54 PM
cacapee, now that pdflrf is stable do you have any plans of opening the source for it? I'm mainly interested in getting it to work with other devices/formats, as I want to avoid duplication of effort in enhancing pdfread (http://pdfread.sourceforge.net) to support the features pdflrf has.

cacapee
10-06-2007, 12:02 AM
Almost there. I'm presently porting over to Qt.

JSWolf
10-06-2007, 01:16 AM
Almost there. I'm presently porting over to Qt.
Quicktime? ACK!

cacapee
10-06-2007, 01:40 AM
http://trolltech.com/

JSWolf
10-06-2007, 02:05 AM
They really do need a better name for it.

ereszet
10-06-2007, 06:00 AM
Almost there. I'm presently porting over to Qt.


So now, that you are almost there with version 1, you may consider one more idea for the next version.

I am suggesting an lrf file as input and a png file as output, which means that it would be possible to print lrf files via png.

I am not suggesting printing the books, but with all kind of documents that I store in Sony (e.g. business cards), it would be possible to print them when a need arises (via png and SD card).

And of course a user settings file to make my own defaults.

ashkulz
10-08-2007, 09:50 AM
I am suggesting an lrf file as input and a png file as output, which means that it would be possible to print lrf files via png.

I am not suggesting printing the books, but with all kind of documents that I store in Sony (e.g. business cards), it would be possible to print them when a need arises (via png and SD card) Creating LRF files and displaying them are two different kettle of fish together. It'd make more sense to use the LRF renderer in libprs500 to render PNGs. One more feature request for Kovid :-)

DrMoze
10-08-2007, 11:00 PM
OK, I give up. I appreciate the great conversion tool, but can't figure out the various settings. (The brief descriptions when mousing over the checkboxes isn't enough, alas.)

Here's what I want to do. Can anyone detail which settings I need in pdflrfwin?

I want to convert a pdf (with some images, 4-level gray is fine), rotated 90 degrees (landscape mode), with each page split in HALF, and size of the top or bottom page half adjusted to the sideways screen. (left-right centering would be nice too, but not critical.)

I keep getting lines of text split, or a third screen for each page that's blnk except for a page number, etc.

Also, what is the real difference between the filter options?

TiA....

cacapee
10-09-2007, 11:47 AM
try it with --run-pages --smart-cut --rotation=90

Alternatively use one of the profiles in pdflrfwin

adinb
10-10-2007, 01:43 AM
Anyone have any settings that they're in love with for network world, computer world, or eweek?

Network World is *just* the wrong size -- the fonts are just too small to read in the normal "landscape" profile, and 2col splits the pages in the middle of columns of text.

thanks!

DrMoze
10-12-2007, 10:57 PM
try it with --run-pages --smart-cut --rotation=90

Alternatively use one of the profiles in pdflrfwin

I'm using pdflrfwin (as noted earlier). I can't get the right combination of options. Anybody?

ereszet
10-13-2007, 06:32 AM
I'm using pdflrfwin (as noted earlier). I can't get the right combination of options. Anybody?

Is this (pictures attached) what you need or something else?

The left hand photos in each of two images are original pdfs, the right hand photos are lrfs after pdflrf 0.99 for Windows processing with default settings.

mrkai
10-14-2007, 09:33 PM
Yeah hi...

Is there a particular reason why this software isn't available for mac os x? I can build it if needed...its what I do :)

Thanks!

ashkulz
10-15-2007, 12:14 AM
Is there a particular reason why this software isn't available for mac os x? I can build it if needed...its what I do :) cacapee will be releasing the source quite shortly (see earlier in the thread), after that we will be able to port/add features as needed.

emkay
10-17-2007, 07:39 AM
Hi cacapee,
Thanks for this really useful tool. Could it output pdf too?

I'm an iLiad user, looking for a tool with a GUI to split my two page landscape PDFs into single page PDFs. I've been doing this in a slow roundabout way. I did a test with pdflrf and it worked really well. I can't do much with the resultant lrf though!
Happy to donate to your project if you can help.

ereszet
10-17-2007, 07:54 AM
[QUOTE=emkay;107331]Hi cacapee,
Thanks for this really useful tool. Could it output pdf too?
QUOTE]

I have suggested this to cacapee before and he included (zipped) png input and output making his tool a universal one (you can easily convert/print png to pdf). Apparently the library modules he uses for pdflrf provide for png rather than pdf output.

emkay
10-17-2007, 08:05 AM
OK, thanks for that. The output options aren't so clear in the GUI.

EDIT: That's worked really well. Good results, and quite a small pdf from the png files compared to the original, which was also image based. And very quick to use. :)

ashkulz
10-17-2007, 02:14 PM
Almost there. I'm presently porting over to Qt. any update? if you want, I can try to help you with it...