View Full Version : soPdf - Better than Yet another PDF to LRF converter


theguru
11-15-2008, 11:03 PM
I really liked the pdflrf tool from the "Yet another PDF to LRF converter" thread, but it has been taken down by the moderator for violation of GPL and has been down for quite some time because it seems like the author is not interested in providing the source for his tool. But there are some issues with the pdflrf tool.
pdflrf renderes the pdf into image and then creates the lrf file.
This makes the 4mb pdf file grow into more than 40mb file.
No text information is preserved because of the image conversion
Very slow
No source for the tool <-- biggest disadvantage

So I decided to write a tool for myself. soPdf is a pdf formatter for sony reader. It is based on sumatrapdf's version of mupdf and fitz.

The advantages of soPdf over pdflrf
Pdf to Pdf conversion
Text and other contents of pdf are preserved
Size of the output file is very close to size of input file
and in some cases smaller than input file.
Super fast conversion compared to pdflrf.
Source available to make further changes !!!!!! <-- biggest advantage

The disadvantages over pdflrf

Cannot yet convert the comic book. It can still split the image pdfs into two.
soPdf is in alpha stage. (ver 0.1). There may be lots of bugs to be found yet. At least all of the mupdf bugs (http://mupdf.pbwiki.com/mupdf+bugs).
???

soPdf command line options

about: soPdf
author: Navin Pai, soPdf ver 0.1 alpha
usage:
soPdf -i file_name [options]
-i file_name input file name
-p password password for input file
-o file_name output file name
-w turn off white space cropping
default is on
-m nn mode of operation
0 = fit 2xWidth *
1 = fit 2xHeight
2 = fit Width
3 = fit Height
4 = smart fit Width (not yet implemented)
5 = smart fit Height (not yet implemented)
-v nn overlap percentage
nn = 2 percent overlap *
-t title set the file title
-a author set the file author
-b publisher set the publisher
-c category set the category
-s subject set the subject
-e proceed with errors
-r reverse landscape

* = default values


The conversion algorithm is as follows

If user specified Fit2xWidth or Fit2xHeight then simply make two copies of pdf page from source into destination pdf file.
Render the page and get the actual boundary box that encompasses all of the content in the page. This step removes all the white space border of the page.
If page cannot be rendered by mupdf and error option is specified then split the page w/o rendering by setting the MediaBox of the page.
Try to split the file first by iterating all the elements that can fit in half a page and if that does not work then split the file half way with 2% overlap (this can be changed).
If FitWidth or Fit2xWidth is specified then rotate the page by -90 deg.

Source code for soPdf is available from google code.
http://sopdf.googlecode.com

To compile the source code you will need Visual Studio 8.0 (Even free edition will work). Visual studio is not required if you just want to run the soPdf tool. If you are having issues running the binary then make sure you have VC runtime library. You can download the VC runtime library from Microsoft website.

Coming soon

Output to image pdf - for complex pdf that renders slowly on the reader devices.

Update 0.1 Rev 12

Added reverse landscape mode. Ever wished that you could hold your reader the other way around in landscape mode and scroll thru the pages using your right thumb. Use reverse landscape mode and start reading from last page onwards.

Update 0.1 Rev 10

Proceed with error option. With this option, soPdf can now process any pdf file, even the ones mupdf cannot handle. If mupdf cannot load the contents then it simply splits the page into two w/o any processing. The disadvantage is that the white space border in this case is not removed but you can still get a pdf output file.
Set subject of the pdf file option
Fixed stack over flow when processing complex pdf files
Better clipping algorithm

Update 0.1 Rev 7

Work around a mupdf bug where it is not able to allocate oid and gid numbers. This prevented some of the files from being split properly.

godel10
11-16-2008, 07:42 AM
Thanks for your effort.

I am not an user of Windows, so I wonder if anyone could upload an example of an input file and an output file.

ProDigit
11-16-2008, 09:54 AM
I'd suggest you to try recoding the prs-505's manual again,it seems kind of buggy!

ddavtian
11-16-2008, 11:55 AM
Does this mean I need VC runtime (have no idea what it means)?


Error: .\mupdf\pdf_xref.c(459) : pdf_loadindirect() - cannot load indirect objec
t 1586
Error: .\mupdf\pdf_xref.c(442) : pdf_loadobject() - cannot load object 1586 into
cache
Error: .\mupdf\pdf_xref.c(416) : pdf_cacheobject() - found object 1636 0 instead
of 1586 0

theguru
11-16-2008, 01:09 PM
I'd suggest you to try recoding the prs-505's manual again,it seems kind of buggy!

This bug has been fixed.

theguru
11-16-2008, 01:12 PM
Does this mean I need VC runtime (have no idea what it means)?


Error: .\mupdf\pdf_xref.c(459) : pdf_loadindirect() - cannot load indirect objec
t 1586
Error: .\mupdf\pdf_xref.c(442) : pdf_loadobject() - cannot load object 1586 into
cache
Error: .\mupdf\pdf_xref.c(416) : pdf_cacheobject() - found object 1636 0 instead
of 1586 0
It means that there is error in your pdf file. Check if the pdf file can be loaded by sumatrapdf viewer. If the file cannot be handled by sumatrapdf viewer then soPdf cannot handle the file as well.

=X=
11-16-2008, 03:52 PM
Quite an excellent app. This tool provides the feature I have been sorely looking for. I have some scripts that do remove the margins but none provided this level of success. I have a feeling this tool will become my new favorite PDf tool.

This tool does struggle with the more complicated PDF but for those there are PDFLRF/PDFRead/PaperCrop

Thanks.


One recommendation is since the tool is written in CPP there is no reason to tie it to one platform. There is a surprising large number of users on this board that use Linux/Mac OSX.


Thank you,
=X=

theguru
11-16-2008, 05:22 PM
I am working on fixing the bugs for the complicated pdf's. And yes it can be easily ported to any platform. There is no platform specific stuff in the code and since the source is available, anyone who is interested in creating a port for Linux/Mac is welcome to do so.

ProDigit
11-17-2008, 11:36 AM
So far I've only managed to get PDF to PDF working here.
So how do I convert it to LRF,or do you suggest keeping those documents in PDF?

(BTW thank you for the program,I've only had a brief look at it)

ProDigit
11-17-2008, 11:46 AM
So far I've only managed to get PDF to PDF working here.
So how do I convert it to LRF,or do you suggest keeping those documents in PDF?

(BTW thank you for the program,I've only had a brief look at it)

theguru
11-17-2008, 12:35 PM
So far I've only managed to get PDF to PDF working here.
So how do I convert it to LRF,or do you suggest keeping those documents in PDF?

(BTW thank you for the program,I've only had a brief look at it)

That was the original plan. I wanted to keep the files in pdf format. These reformatted pdf files can be read easily on the reader.

=X=
11-17-2008, 04:11 PM
Okay I found my first bug. It seems bookmarks on PDF are getting removed from the PDF.

These are bookmarks in the PDF used by the SONY reader for the table of contents.

=X=

DDHarriman
11-17-2008, 04:15 PM
Not bad at all.

Big files resulting from scanning still crash my Cybook or are so slow that hey are useless, but text ones (with images or not) are quite good!

One more info: does not convert PDF files with monochrome images JBIG2 compressed, CCITT Group 4 compressed, no problem.

Bets regards,

theguru
11-17-2008, 04:39 PM
Okay I found my first bug. It seems bookmarks on PDF are getting removed from the PDF.

These are bookmarks in the PDF used by the SONY reader for the table of contents.

=X=

I am aware of the issue. I do not quite understand how the bookmarks work in PDF file. I will update the tool once I do or if anyone here does understand how bookmarks work then they are free to update the source.

DDHarriman
11-17-2008, 06:29 PM
One question: any plans on evolving into a gui with options to choose from?

theguru
11-17-2008, 09:24 PM
One question: any plans on evolving into a gui with options to choose from?

It shouldn't take too long to wrap this in gui. I will look into it after I complete the output to image option.

DDHarriman
11-18-2008, 10:19 AM
Thanks!

Great program!

ProDigit
11-19-2008, 08:49 AM
If it's working great,maybe you can contact the guys at calibre, and integrate it in their software.
Calibre has already got the option of converting some documents, if yours works better (faster, more accurate,etc) then maybe a joint venture might be a good idea?

theguru
11-19-2008, 01:41 PM
If it's working great,maybe you can contact the guys at calibre, and integrate it in their software.
Calibre has already got the option of converting some documents, if yours works better (faster, more accurate,etc) then maybe a joint venture might be a good idea?

I have published the source. They can integrate it w/o my permission if they wish to do so.

=X=
11-25-2008, 12:57 AM
two point

1) This thread is better suited at the "Format Conversions" section since this is not SONY only. This tool can be used by any eBook Reader that supports PDF.

2) Can this tool implement a feature that splits 2 page PDF to a single page?

=X=

daesdaemar
12-01-2008, 08:29 PM
When I run sopdf, all I get is a quit flash of an apparent DOS window and nothing else. Using Vista 64. Any suggestions?

OK, worked it out. Opened a CMD window first and then ran the program.

daesdaemar
12-01-2008, 09:07 PM
Can someone please explain the usage of soPdf (I'm a newbie). Why would I want to take a PDF file and convert to another PDF file? Ultimately, for my Sony 505, my preference would be an LRF file, but I agree that pdflrf certainly has limitations especially in terms of its massive file size outputs.

Thanks in advance.

theguru
12-01-2008, 10:54 PM
Can someone please explain the usage of soPdf (I'm a newbie). Why would I want to take a PDF file and convert to another PDF file? Ultimately, for my Sony 505, my preference would be an LRF file, but I agree that pdflrf certainly has limitations especially in terms of its massive file size outputs.

Thanks in advance.

Because when you convert to another pdf file it becomes easier to read on your Sony 505.

daesdaemar
12-02-2008, 02:19 PM
Because when you convert to another pdf file it becomes easier to read on your Sony 505.

I hope there's more to the explanation than that. I like to understand things. Why is it easier to read?

=X=
12-03-2008, 11:56 AM
Can someone please explain the usage of soPdf (I'm a newbie). Why would I want to take a PDF file and convert to another PDF file? Ultimately, for my Sony 505, my preference would be an LRF file, but

I hope there's more to the explanation than that. I like to understand things. Why is it easier to read?
:)

There is another limit to PDFLRF. Because the text is now turned into an image the view will be the same for S/M/L. And while PDFLRF does a good job smoothing fonts there are still times when fonts pixelization are still displayed.

With soPDF the text and layout is preserved. The only thing that is modified(removed really) is the borders, PDFLRF also removes the margin. So now instead of trying to view a PDF (typically 8.5x11") down to a 6" screen you are trying to view (7.5x10") (margin are usually 1") which makes it much easier to view the text. Because the PDF is still text base text reflow still works and the font size can be increase if need be.

With soPDF and PDFCroper I find I rarely if ever use PDFLRF.

Hope this helps,
=X=

daesdaemar
12-03-2008, 03:04 PM
:)

There is another limit to PDFLRF. Because the text is now turned into an image the view will be the same for S/M/L. And while PDFLRF does a good job smoothing fonts there are still times when fonts pixelization are still displayed.

With soPDF the text and layout is preserved. The only thing that is modified(removed really) is the borders, PDFLRF also removes the margin. So now instread of trying to view a PDF (typically 8.5x11") down to a 6" screen you are trying to view (7.5x10") (margin are usally 1") which makes it much easier to view the text. Because the PDF is still text base text reflow still works and the font size can be increase if need be.

With soPDF and PDFCroper I find I rarely if ever use PDFLRF.

Hope this helps,
=X=

Thank you very much... ;-)

Boyodublin
12-04-2008, 12:12 AM
Does this program only work in Windows xp?

When I try to launch from Vista, I get an error that says "this application has failed to start because its side-by-side configuration is incorrect."

Thanks!

valiant
12-04-2008, 10:37 AM
Great program. It would be close to perfect if it maintained chapter marks (bookmarks/contents page). It's not especially important for reading some stuff but a real pain if you're dipping in and out of reference material.

ProDigit
12-04-2008, 11:50 AM
When I ran "Paulo Coelho - The Alchemist.pdf" through all the program did was it tiled the book 90 degrees.

I prefer the original on the Sony PRS. Clicking on the 'zoom' will show the book in text format like a normal LRF,which is more readable than the output version of this program.

I find the program has a too small border on the left, almost making it appear as if there is no border.


What does this program do to 'png' books (text scanned to png, but not ran through an ocr)?

=X=
12-08-2008, 01:24 PM
Is there any way this tool could add the feature to split PDF pages that have 2 pages one page. There is a similar algorithm right now that does this but I'm looking for a feature to specifically convert a 2pg pdf to 1pg.

=X=

marcusgennaroz
12-09-2008, 07:31 AM
Hi! I am not able to run the .exe file. I downloaded the zip file and tried to open the .exe file. Are there any required files to be installed?
Thanks!
M.

=X=
12-16-2008, 01:52 PM
This thread should get moved to the PDF thread.


MobileRead Forums -> E-Book Formats -> PDF

Since this tool really just manipulates PDFs to PDFs

=X=

Alfy
12-16-2008, 02:55 PM
Hi!

Any chance to get the code compiled for Mac OS X? Visual Studio seems to be heavily tied to Windows...

Cheers,

Alfy.

theguru
01-03-2009, 06:53 PM
Hi!

Any chance to get the code compiled for Mac OS X? Visual Studio seems to be heavily tied to Windows...

Cheers,

Alfy.

I do not have Mac OS X but anyone interested can compile it since the source is avaiable. Visual Studio is not required to run the program only to compile it.

pilotbob
01-03-2009, 08:27 PM
I do not have Mac OS X but anyone interested can compile it since the source is avaiable. Visual Studio is not required to run the program only to compile it.

What language is it written in? There is NO visual studio for Mac... as I'm sure you know.

BOb

bill_syue
01-14-2009, 10:16 AM
I'm a beginner. Can someone help me on using the soPdf tool? I am using WinXP system and I put soPdf.exe and the sample pdf file "ebooktestin.pdf" under C:\. And then I opened the CMD window under the directory of C:\. I typed "soPdf.exe -i ebooktestin.pdf". I was told that "The system cannot execute the specified program".

theguru
01-14-2009, 01:21 PM
I'm a beginner. Can someone help me on using the soPdf tool? I am using WinXP system and I put soPdf.exe and the sample pdf file "ebooktestin.pdf" under C:\. And then I opened the CMD window under the directory of C:\. I typed "soPdf.exe -i ebooktestin.pdf". I was told that "The system cannot execute the specified program".

Seems like people do not read complete instruction. You have to download the runtime libraries from Microsoft. Do a search for Visual Studio 2008 runtime libraries.

theguru
01-14-2009, 01:21 PM
What language is it written in? There is NO visual studio for Mac... as I'm sure you know.

BOb

Its written in C/C++

pilotbob
01-14-2009, 01:26 PM
Its written in C/C++

so, can it be compiled with XCode tools, GNU or something?

Any idea?

BOb

=X=
01-14-2009, 02:01 PM
so, can it be compiled with XCode tools, GNU or something?

Any idea?

BOb

Bob it should with a little work. A lot of the tools theGuru used are GNU*.
=X=

bill_syue
01-14-2009, 08:05 PM
Seems like people do not read complete instruction. You have to download the runtime libraries from Microsoft. Do a search for Visual Studio 2008 runtime libraries.

Thanks a lot! Initially I thought I don't need to install VS 2008 runtime librabries since I have Visual Studio 2005 in my laptop. Just now I downloaded and installed a VS 2008 runtime library and now it works! thanks again!

bill_syue
01-15-2009, 12:38 AM
One more question:

Can someone explained a bit more about "-m" option? What does all these options mean? Is the "fit 2xWidth " option best suited for PRS-505? Thanks in advance!

Emit
01-17-2009, 05:05 AM
First of all, I finally got a ereader (PRS-505) and I love it, except the screen is a lot smaller than I expected. I don't read novels (pft who has time for those! :p) but textbooks and papers, so I really needed something to cut out the margins and basically maximize reading area. Thankfully soPdf does exactly that... but for windows :smack:

Attached is my port of soPdf for Linux.

I included a binary plus my sources for soPdf.c and processPdf.c and a brief readme.

the only dynamic dependencies are to common libs like libjpeg, zlib, freetype, etc so it should work out of the box on any modern distro (I tested on ubuntu 8.04..) I also compiled in the optional jbig2dec dependency.

it works for me, i've used it on multiple pdfs ~15-20MB each.
however, YMMV

thanks theguru :)

cszhy
01-18-2009, 11:43 PM
Hi when I try to convert the attached file, it seems doesn't work , the file I got can not be opened, connect reader shows
"operation failed due to an error"

I would be very appreciated if anyone can help

Thanks

bill_syue
01-19-2009, 03:14 AM
Hi when I try to convert the attached file, it seems doesn't work , the file I got can not be opened, connect reader shows
"operation failed due to an error"

I would be very appreciated if anyone can help

Thanks

No problem from my side. I attached the converted file. FYI

cszhy
01-19-2009, 08:05 AM
thanks a lot, is it a lrf file?

bill_syue
01-19-2009, 07:35 PM
thanks a lot, is it a lrf file?

No, it is still pdf format. soPdf tool is for pdf to pdf conversion.

cszhy
01-20-2009, 02:59 AM
thanks, it seems can convert txt pdf into image ones ,
It would be great if there is some gui interface

=X=
01-20-2009, 10:33 AM
thanks, it seems can convert txt pdf into image ones ,
It would be great if there is some gui interface

No this tool keeps the text to text. It just reduces the white spaces on the borders.

If you want a tool that turns text to images, that has a GUI.

PaperCrop
http://www.mobileread.com/forums/showthread.php?t=31677

OR look for PDFRead
http://www.mobileread.com/forums/showthread.php?t=21906

=X=

yargoflick
01-20-2009, 01:55 PM
Attached is my port of soPdf for Linux.

I included a binary plus my sources for soPdf.c and processPdf.c and a brief readme.

the only dynamic dependencies are to common libs like libjpeg, zlib, freetype, etc so it should work out of the box on any modern distro (I tested on ubuntu 8.04..) I also compiled in the optional jbig2dec dependency.

it works for me, i've used it on multiple pdfs ~15-20MB each.
however, YMMV

thanks theguru :)

Thanks, I'll definately check this out.

cszhy
01-20-2009, 10:35 PM
thanks, this tools is perfect for me now!!!
No this tool keeps the text to text. It just reduces the white spaces on the borders.

If you want a tool that turns text to images, that has a GUI.

PaperCrop
http://www.mobileread.com/forums/showthread.php?t=31677

OR look for PDFRead
http://www.mobileread.com/forums/showthread.php?t=21906

=X=

drogo
02-06-2009, 01:21 PM
Is there a way to make the text/image bolder or darker?

I just did a comparison between the verboten PDFLRF and soPDF. soPDF is so light as to just about be illegible. On my RS-500 at least. PDFLRF is normal darkness.

Any idea or tips?

Thanks!

=X=
02-06-2009, 02:47 PM
Is there a way to make the text/image bolder or darker?

I just did a comparison between the verboten PDFLRF and soPDF. soPDF is so light as to just about be illegible. On my RS-500 at least. PDFLRF is normal darkness.

Any idea or tips?

Thanks!

soPDF really just strips the margins off of a PDF and doe not modify the font. You can change the PDF to landscape, this will make the font size bigger.

Last you might want to try PDFRead or PaperCrop if you want to increase the font size but these tools turn your PDF into an image.

Also there is a an article "Poor boys way of editing a PDF file (http://www.mobileread.com/forums/showthread.php?t=10066&highlight=poor+boys+way+editing+pdf+files+%28mostl y+linux%2C+cygwin%29)". Which does change the font style to a bold, tried this and it did work, but boy was it effort.

drogo
02-07-2009, 07:17 PM
Hmm, so then was the end product different in pdflrf because it was changed to an image? Or because it converts it to LRF?

=X=
02-07-2009, 09:58 PM
Hmm, so then was the end product different in pdflrf because it was changed to an image? Or because it converts it to LRF?

For Both reasons

The final product was much bigger. PDFLRF a file that is about 1/2 MB turned to about 20MB (results may vary) because each page was an image. You also lose the ability to resize the fonts because they are no longer fonts but images.

Also PDFRLF produces an LRF as the final product so only the SONY products could view the file.

With SoPDF the file remains a PDF, all that is modified is the margins and the layout (portrait vs landscape) so any software that reads PDF can view the file.


=X=

Staz
02-11-2009, 02:39 AM
About how long does it take to turn a page that contains diagrams etc. on the PRS-505 using this method?

=X=
02-11-2009, 01:31 PM
Text base LRF is the fastest. However a PDF is always faster than an images based LRF. When you run PDFLRF or PDFRead they will usually run slower. The advantage is that the completely remove white space and can bolden the font text to make it more readable. with soPDF you can remove the margins, but you also have a text base PDF so it is smaller turns quicker and if you are in a bind in reading the fonts you can increase the font size(reflow the PDF) and read it that way. Images are usally lost when you do so.

Personally I prefer soPDF, ever since SONY fixed their PDF issues PDF is not so bad on the reader. Esp if it has been created for the reader like FeedBook does or even oneself.

=X=

Staz
02-11-2009, 03:42 PM
Text base LRF is the fastest. However a PDF is always faster than an images based LRF. When you run PDFLRF or PDFRead they will usually run slower. The advantage is that the completely remove white space and can bolden the font text to make it more readable. with soPDF you can remove the margins, but you also have a text base PDF so it is smaller turns quicker and if you are in a bind in reading the fonts you can increase the font size(reflow the PDF) and read it that way. Images are usally lost when you do so.

Personally I prefer soPDF, ever since SONY fixed their PDF issues PDF is not so bad on the reader. Esp if it has been created for the reader like FeedBook does or even oneself.

=X=

soPDF sounds very nice! Is there any chance of you taking a photo of a reasonably complicated technical document (reformatted using soPDF) displaying on your Sony PRS-505? Ideally one with diagrams and tables.

Perhaps you could compare these to documents converted by pdfRead and pdflrf?

I have heard so much about these wonderful tools but have seen very few (if any) conversion results! :)

Thanks!

mita
02-17-2009, 04:46 PM
I have made a very crude tool which allows batch processing using the SoPDF application. It runs as an excel spreadsheet using simple macros (see attached). Please feel free to use / improve as required.

Regards,

Andy

frabjous
02-22-2009, 01:05 AM
I really like soPdf -- it works very well, and fast too. Thanks.

The only issue I've been having is with the metadata command lines when the title or author you want has spaces in it. Sometimes it just seems not to do anything: it just retains whatever value was in the original pdf.

And other times it doesn't give you what you wanted.

For example:

sopdf -i in.pdf -o out.pdf -a Shakespeare, William -t The Tempest

seems to yield a title of just The and an author of just Shakespeare,. Putting the whole in quotation marks, i.e.:


sopdf -i in.pdf -o out.pdf -a "Shakespeare, William" -t "The Tempest"

gives the right result for the title, but makes the author "Shakespeare, William" with quotation marks.

I feel like I must be missing something obvious here, but is there a way to do this right? Of course, I can fix the metadata fields on Acrobat or Calibre or other programs, but I'd rather just do it at the same time.

oldsouth
02-24-2009, 11:07 PM
I am having problems getting this to work. I have VB 2008 on board, but when I double click on the soPDF.exe icon, the command prompt flashes. So I open a command prompt session..... this is where I need some help. If I type c:\soPDF.exe -i book title.pdf it starts to work, only to end with errors and no output. Can someone please show me the correct syntax for making this work? (I have soPDF and the pdf's on c: - no folders)

do I type this"c:\soPDF.exe -i Guns, germs, and steel_the fate of hum - Jared Diamond.pdf"

or "c:\soPDF.exe -i c:\Guns, germs, and steel_the fate of hum - Jared Diamond.pdf

Or something else.... really lost on this one..

frabjous
02-25-2009, 11:42 PM
Try:

soPDF.exe -i "Guns, germs, and steel_the fate of hum - Jared Diamond.pdf"

...with quotations around the file name.

Or else try renaming the input .pdf so it doesn't have spaces in it.

frabjous
03-02-2009, 12:04 PM
I've been using SoPDF a lot, so I went ahead and made a very simple GUI interface for Windows for using it. Again, I mainly made it for myself, but I thought I'd share it in case anyone else might find it worthwhile. I hope theguru or anyone else won't mind.

It looks like this:

http://people.umass.edu/phil795l/sopdfwin.png

It's very barebones, and not at all idiot-proofed. (E.g., it does not check to make sure the options make sense, it will not tell you if SoPDF encountered errors, etc.)

Usage: Extract the sopdfwin.exe file from the sopdfwin.zip file into the same directory as SoPDF.exe, and run it instead. (Or create a shortcut to it.) You choose your options, click process, and it will call SoPDF for you.

The .ahk file included in the .zip file is the source code for the GUI interface written for AutoHotKey (http://www.autohotkey.com). Make changes if you wish. AutoHotKey does not need to be installed to run the .exe, however. (It would be needed to recompile any changes, of course.)

Note, I still don't have an answer to the question about author names, etc., I posted above; that would have to be fixed on the other end. Right now, however, quotation marks are put around the filenames, and metadata fileds automatically when fed into SoPDF.

(Edited to include =X='s suggestions below.)

=X=
03-02-2009, 01:48 PM
@frabjous Great job thank you for the GUI. You know what would be a nice feature. Is if the tool parsed the file name when it was selected.

Example we have a PDF with the filename

Plato - Republic.pdf

When selecting the file the author selection puts in "Plato" and the Title is "Republic"

I know I can type it myself, but tools are suppose to make life much easier :).

Thank you,
=X=

frabjous
03-02-2009, 02:03 PM
I agree that that would be nice. You just mean from the file name chosen (rather than, e.g., trying to read it from the metadata in the input .PDF)?

Next question: should it parse the input filename, or the output filename?

And would everyone agree that it should be "Author - Title" rather than "Title - Author" or something else? (I can imagine some fairly sophisticated parsing algorithms, as well as some very barebones ones; the latter are of course easier to code!)

I believe that sopdf will retain the author and name from the input .PDF if these flags are not used (or left blank in my GUI), although I had meant to test that [EDIT: tested and it does]; until theguru (or someone else) fixes the issue in sopdf.exe with quotation marks in the author field I pointed to above, I typically would prefer to make use of that feature than a lot of filename parsing. I guess filename parsing could be made optional, or done with a button or something.

=X=
03-02-2009, 03:08 PM
I agree that that would be nice. You just mean from the file name chosen (rather than, e.g., trying to read it from the metadata in the input .PDF)?

Yes that is what I mean.


Next question: should it parse the input filename, or the output filename?

Defiantly the input filename. The output filename should auto fill to reflect the input name with a modification like: "inputname_soPDF.pdf".


And would everyone agree that it should be "Author - Title" rather than "Title - Author" or something else? (I can imagine some fairly sophisticated parsing algorithms, as well as some very barebones ones; the latter are of course easier to code!)

How hard would it be to add a radio button and allow both options. The ultimate solution would be to do what calibre does, but that is getting complicated. (Calibre allows the user to write some psudo regular expression and allows for complete customization of convention.)


I believe that sopdf will retain the author and name from the input .PDF if these flags are not used (or left blank in my GUI), although I had meant to test that [EDIT: tested and it does]; until theguru (or someone else) fixes the issue in sopdf.exe with quotation marks in the author field I pointed to above, I typically would prefer to make use of that feature than a lot of filename parsing. I guess filename parsing could be made optional, or done with a button or something.


Yes it does, however my experience is that most PDF, even professionally bough eBooks don't fill that information out. having the ability to enable disable this would be a good feature.
Maybe you can have a radio option

Autofill:
* None
* Author - Title
* Title - Author

=X=

frabjous
03-03-2009, 12:19 AM
All right; I've modified the GUI interface in response to =X='s suggestions, or at least something close to them. (I deleted the old version, and uploaded the new version, and it's the one attached to the newly edited message above. The image is revised too.)

Yes that is what I mean.

Defiantly the input filename. The output filename should auto fill to reflect the input name with a modification like: "inputname_soPDF.pdf".

Defiantly? :D

OK, I made it so you could do it with either the input filename or the output file name; personally, I get a lot of files with unhelpful names, and so long as I was renaming them in the process, I might as well make it easier on myself. But for others, doing it with the input file should work too.

I also made it to auto-fill the output file name to "inputname-soPDF.pdf", but only if you use one of the Browse buttons. In particular, if you click Browse for the input file name, after you finish, it'll automatically insert that into the output file name. Also, if you click Browse for the output file name, it'll use that as its default "suggestion" (even if the input filename was not filled in through the Browse button), but you can still edit it to something else if you wish.

(The particular ending -soPDF is also ignored in metadata parsing.)

If you don't use the Browse buttons for either, it won't do this, but I figure most people will use them, and it's good to have some flexibility.

The ultimate solution would be to do what calibre does, but that is getting complicated. (Calibre allows the user to write some psudo regular expression and allows for complete customization of convention.)

Yeah, that seemed over the top; especially prior to someone actually asking for it!

Maybe you can have a radio option

Autofill:
* None
* Author - Title
* Title - Author

As you can see in the new image above, I used normal buttons rather than radio buttons. It would have been just as easy with radio buttons, but since I'd probably make it default to "None" with radio options, you'd have to click something anyway. (And having it save your preferred option in some config file seemed over the top...) And a radio button would give the impression that changes made to the filename after having already done the parsing once would automatically end up reflected in the metadata even if the button was not re-clicked, and I wanted to give the flexibility of being able to do the parsing, but then change the filename afterwards while keeping the old parsed metadata values (or vice-versa). You can always change, then reclick this way.

Obviously, someone might prefer the other way of doing it, but there's no pleasing everyone, and I like this way better (...and I really wrote it for my own purposes, after all...)

Thanks for the suggestions!

=X=
03-03-2009, 03:45 PM
Works great frabjous! Thank you

Defiantly? :D

Yes that word is better suited for my spell checker who "defiantly" refuses to cooperate. :p

=X=

frabjous
03-04-2009, 11:34 AM
Works great frabjous! Thank you


No problem.

Does anyone know why this thread is in the LRF subforum rather than the PDF forum, though, considering that SoPdf is PDF > PDF?

P.S. I also have to laugh at myself for using the redundant phrase "GUI Interface" above. Does this mean I no longer have the right to laugh at people who say "ATM machine" or "PIN number"?

=X=
03-16-2009, 02:21 AM
No problem.

Does anyone know why this thread is in the LRF subforum rather than the PDF forum, though, considering that SoPdf is PDF > PDF?

P.S. I also have to laugh at myself for using the redundant phrase "GUI Interface" above. Does this mean I no longer have the right to laugh at people who say "ATM machine" or "PIN number"?

This tread was created before the "E-Book Format" section was created. When the mods where reshuffling the messages they put this in the LRF section. I did point this out to the mods, look for an earlier thread but I was ignored, imagine that :).

Anyhow I agree as this thread should be moved because this tool is useful to many people outside of the SONY Reader world.

=X=

jimmyzou
03-16-2009, 11:16 AM
When I tried a very simple PDF file, the error window pop up like this:


I've been using SoPDF a lot, so I went ahead and made a very simple GUI interface for Windows for using it. Again, I mainly made it for myself, but I thought I'd share it in case anyone else might find it worthwhile. I hope theguru or anyone else won't mind.

It looks like this:

http://people.umass.edu/phil795l/sopdfwin.png

It's very barebones, and not at all idiot-proofed. (E.g., it does not check to make sure the options make sense, it will not tell you if SoPDF encountered errors, etc.)

Usage: Extract the sopdfwin.exe file from the sopdfwin.zip file into the same directory as SoPDF.exe, and run it instead. (Or create a shortcut to it.) You choose your options, click process, and it will call SoPDF for you.

The .ahk file included in the .zip file is the source code for the GUI interface written for AutoHotKey (http://www.autohotkey.com). Make changes if you wish. AutoHotKey does not need to be installed to run the .exe, however. (It would be needed to recompile any changes, of course.)

Note, I still don't have an answer to the question about author names, etc., I posted above; that would have to be fixed on the other end. Right now, however, quotation marks are put around the filenames, and metadata fileds automatically when fed into SoPDF.

(Edited to include =X='s suggestions below.)

frabjous
03-16-2009, 12:14 PM
My guess is that you didn't unzip the GUI executable into the same directory as SoPDF.exe, as instructed.

The GUI does not contain SoPDF. You need to download and extract it first (as per the first post in this thread). Then extract SoPDFwin.exe into the same directory.

jimmyzou
03-16-2009, 02:05 PM
My guess is that you didn't unzip the GUI executable into the same directory as SoPDF.exe, as instructed.

The GUI does not contain SoPDF. You need to download and extract it first (as per the first post in this thread). Then extract SoPDFwin.exe into the same directory.

They are in the same directory .... Any other hints? do I have to put them in C:\ ?

frabjous
03-16-2009, 06:36 PM
They are in the same directory .... Any other hints? do I have to put them in C:\ ?

By "them", do you mean the PDFs, or the SoPDF.exe and SoPDFwin.exe executables? I can tell from your error message that the PDFs are on D: Where are the executables?

It's worth a try running it from C:. Since I only wrote the shell, I don't know what the precise conditions are for running SoPDF itself, but I've only tested it from C.

What is drive D:?

Could you provide a screenshot of the complete contents of the folder containing SoPDFwin.exe and SoPDF.exe? What OS is this?

Have you tried running sopdf from a command line rather than using my GUI? Does it work then?

EDIT: I just tried running them from a flashdrive, and it worked fine. That is, however, the precise error message I get if sopdf.exe and sopdfwin.exe are not in the same folder. You did unzip both, right?

jimmyzou
03-16-2009, 07:22 PM
It's working now, somehow the computer I was using at work has problem with it. It work perfectly on my home laptop.

frabjous
03-16-2009, 10:37 PM
Is the one at work using Vista with UAC enabled, or have some other advanced security enabled? It wouldn't surprise me if that kept one executable calling another. Hard to say, really.

Anyway, I'm glad you figured it out for home at least. :D

All4Fun
05-01-2009, 11:04 AM
My apologies for being a little "slow" today but I'm having a hard time understanding the fit options (2xWidth, 2xHeight, etc.).

All I want to do is cut off the 1" margins around the pdf document so that it's 7.5"x10" for a typical letter sized document. Which fit option do I select to do that?

EDIT: I figured it out (perhaps I should "try" all the options before asking)...like I said, I was a little slow today. Thanks.

=X=
05-01-2009, 11:33 AM
My apologies for being a little "slow" today but I'm having a hard time understanding the fit options (2xWidth, 2xHeight, etc.).

All I want to do is cut off the 1" margins around the pdf document so that it's 7.5"x10" for a typical letter sized document. Which fit option do I select to do that?

Use the "-m 3" option. This removes the margins for the top and bottom (fits the height) of the PDF document. There will still be margins on the left/right of the document.


Using the "-m 2" option trims the width and concats the height, so you will have the bottom portion of a PDF page show up on the top portion of the next PDF page. I find this confusing to read since I have to find my place on every page turn, instead of just staring at the top of the page.

=X=

Linkinsoldier
05-04-2009, 02:59 PM
I have a question:

I use many PDFs and i converted them with soPDF. It works just fine. Only problem is, that my Chaptermarks are deleted while converting. Is there a possibility to keep them?

Takes hours to "remark" them later ;)

Thanks,
Linkin

=X=
05-04-2009, 04:32 PM
I have a question:

I use many PDFs and i converted them with soPDF. It works just fine. Only problem is, that my Chaptermarks are deleted while converting. Is there a possibility to keep them?

Takes hours to "remark" them later ;)

Thanks,
Linkin

Yes that is a known issue, I even reported it a way back theGuru's response was that he was aware of this but does not know how to fix it.

I did find a tool that stripped out Bookmarks and restored them. But this was a $200 shareware program... needless to say I've leared to live with the BM limitation. I just add them with a tool called ByCyPDFMetatEdit.

It usually only takes me 5-15 min depending on the number of chapters.

=X=

Linkinsoldier
05-04-2009, 05:22 PM
well thanks. As i made all of my pdfs now, there aren't going to be more soon ;)

Thank you anyway!

enarchay
05-31-2009, 08:55 PM
For some reason, when I try to open the program, it opens for a split second, then immediately closes. I tried opening it repeatedly and this keeps happening. I also tried "Run as Administrator" and that didn't help either. What can I do?

=X=
05-31-2009, 09:32 PM
For some reason, when I try to open the program, it opens for a split second, then immediately closes. I tried opening it repeatedly and this keeps happening. I also tried "Run as Administrator" and that didn't help either. What can I do?

Do you have the runtime library installed. The instructions is on the first post.

Also this is a command line tool so you have to run this from the command line.

=X=

enarchay
05-31-2009, 09:59 PM
Do you have the runtime library installed. The instructions is on the first post.

Not sure. I only saw one file attached so that's what I downloaded. Let me re-read it.


Also this is a command line tool so you have to run this from the command line.


Is the command line tool the .exe file? Because that's what will not stay opened for me.

EDIT

I just downloaded VC runtime library (this is it, right (http://www.microsoft.com/downloads/details.aspx?FamilyID=9b2da534-3e03-4391-8a4d-074b9f2bc1bf&displaylang=en)?) and it's still not working.

Shouldn't the GUI allow me to use this without the command line tool? When I try to use the GUI, and I hit "submit" process my files, the soPdf.exe file flashes up for a split second then disappears - and it doesn't process anything.

=X=
06-01-2009, 12:08 AM
thats it. There is no GUI that comes with the tool. soPDF is a command line only tool. Another member wrote a GUI around the executable. Personally I would get the command line tool before I would start using the GUI.
Open an CMD window and rune the tool.

=X=

enarchay
06-01-2009, 12:47 PM
thats it. There is no GUI that comes with the tool. soPDF is a command line only tool. Another member wrote a GUI around the executable. Personally I would get the command line tool before I would start using the GUI.
Open an CMD window and rune the tool.

=X=

I was trying to use the GUI, but it won't work because the command line program only opens for a split second and then closes. So, for example, if I hit "process," the command line opens for a split second, but it doesn't process the file since it closes too quickly.

I don't know much about using a command line program. I tried to use it through the CMD window and it wouldn't convert either of the two PDFs I tried - maybe I did something wrong.

I'd rather use the GUI for obvious reasons. But I can't with the error I'm experiencing. What can I do?

frabjous
06-01-2009, 12:57 PM
I was trying to use the GUI, but it won't work because the command line program only opens for a split second and then closes. So, for example, if I hit "process," the command line opens for a split second, but it doesn't process the file since it closes too quickly.

I don't know much about using a command line program. I tried to use it through the CMD window and it wouldn't convert either of the two PDFs I tried - maybe I did something wrong.

Your problems are probably related.

All the GUI does is take the options you give it, and call the commandline program. It doesn't even check the output of the commandline.

I wrote for the GUI, "It's very barebones, and not at all idiot-proofed. (E.g., it does not check to make sure the options make sense, it will not tell you if SoPDF encountered errors, etc.)"

If SoPDF encounters errors when called from the GUI, the command line will open, show the error message, but then close immediately. Not very helpful, I admit, but my goals were very limited when I created the GUI. I was mainly doing it for my own convenience, but I thought I'd share just in case anyone else found it useful.

However, if you run sopdf directly from the commandline, the error message should stay visible. What does it say? What options are you using in particular?

Maybe that'll help us figure out what went wrong, though there are some pdfs. I've just never been able to get it to work right with. I don't know why. I had hoped to look at sopdf's source code at some point, though I've been far too busy.

enarchay
06-01-2009, 01:35 PM
However, if you run sopdf directly from the commandline, the error message should stay visible. What does it say? What options are you using in particular?

When I try to open the original program (soPdf.exe) by clicking on it, the window opens for a split second and then closes. If I run it through CMD.exe, the list of command functions comes up, and that's it - no error message (that is, until I try to convert a PDF).

I attached a screen cap. It doesn't look that good because I had to print screen really fast before the program closed. As you can see, there doesn't seem to be any errors - yet it closes anyway.

frabjous
06-01-2009, 01:54 PM
If I run it through CMD.exe, the list of command functions comes up, and that's it - no error message (that is, until I try to convert a PDF).

What I want to know is what error message you get when you try to convert a pdf.

E.g., if from CMD, you type:

sopdf -i input.pdf -o output.pdf

(Assuming input.pdf is the file you're trying to convert, and it's in the same directory as sopdf.)

If you just type in SoPdf.exe from CMD, without telling it what file to convert, getting the list of options is perfectly normal. Trying to run SoPdf.exe from within Windows and just having it open for a split second and then close is also perfectly normal.

When you spoke of the "GUI" earlier, did you mean the sopdfwin.exe file I posted? Or did you just mean trying to click on sopdf.exe and getting the screen to flash?

enarchay
06-01-2009, 02:15 PM
Okay, I just renamed a PDF "input.pdf." It's in the same folder as sopdf.exe (\PDF).

I tried:

Desktop\PDF\sopdf -i input.pdf -o output.pdf

(Because it starts with: C:Users\Me>)

It didn't work. It says, "Cannot open file for reading. No such file or directory."

So then I tried:

Desktop\PDF\sopdf -i C:Users\Me\Desktop\PDF\input.pdf -o C:Users\Me\Desktop\PDF\output.pdf

That didn't work either.

So then I tried:

sopdf -i input.pdf -o output.pdf

Nope. "sopdf is not recognized as an internal or external command, operable program, or bath file."

When you spoke of the "GUI" earlier, did you mean the sopdfwin.exe file I posted? Or did you just mean trying to click on sopdf.exe and getting the screen to flash?

I meant sopdfwin.exe. It doesn't process the files - and I think it's because sopdf.exe closes immediately after it opens. For example, if I process a file, sopdf.exe opens for a split second then closes, then sopdfwin.exe says "Process another file?" - when it didn't process the first file.

frabjous
06-01-2009, 04:03 PM
I tried:

Desktop\PDF\sopdf -i input.pdf -o output.pdf

(Because it starts with: C:Users\Me>)

It didn't work. It says, "Cannot open file for reading. No such file or directory."

So then I tried:

Desktop\PDF\sopdf -i C:Users\Me\Desktop\PDF\input.pdf -o C:Users\Me\Desktop\PDF\output.pdf

That didn't work either.

So then I tried:

sopdf -i input.pdf -o output.pdf

Nope. "sopdf is not recognized as an internal or external command, operable program, or bath file."

You need to actually navigate to the folder where SoPdf.exe and the input.pdf are located in CMD. So, first put in,

cd Desktop\PDF\

(The DOS prompt should change to show your new location.) And then type in:

sopdf -i input.pdf -o output.pdf

Either that, or pass the entire path to SoPdf.exe, i.e.:

Desktop\PDF\SoPdf -i "C:\Users\Me\Desktop\PDF\input.pdf" -o "C:\Users\Me\Desktop\PDF\output.pdf"

(You don't actually need to rename it as input.pdf -- just change it to whatever the name of the file is. If it has spaces in its name, you have to use quotation marks as I just have.)

I meant sopdfwin.exe. It doesn't process the files - and I think it's because sopdf.exe closes immediately after it opens. For example, if I process a file, sopdf.exe opens for a split second then closes, then sopdfwin.exe says "Process another file?" - when it didn't process the first file.

It just assumes it processed it. It doesn't check.

Did you put sopdf.exe and sopdfwin.exe in the same folder, as instructed?

enarchay
06-01-2009, 06:58 PM
Did you put sopdf.exe and sopdfwin.exe in the same folder, as instructed?

Yes. But as I said it doesn't work - not sure why.

enarchay
06-01-2009, 07:04 PM
You need to actually navigate to the folder where SoPdf.exe and the input.pdf are located in CMD. So, first put in,

cd Desktop\PDF\

(The DOS prompt should change to show your new location.) And then type in:

sopdf -i input.pdf -o output.pdf

Either that, or pass the entire path to SoPdf.exe, i.e.:

Desktop\PDF\SoPdf -i "C:\Users\Me\Desktop\PDF\input.pdf" -o "C:\Users\Me\Desktop\PDF\output.pdf"

(You don't actually need to rename it as input.pdf -- just change it to whatever the name of the file is. If it has spaces in its name, you have to use quotation marks as I just have.)



I get the same error. This makes no sense. The file "input.pdf" is in my PDF folder. What the hell.

=X=
06-01-2009, 07:49 PM
sopdf -i input.pdf -o output.pdf

Nope. "sopdf is not recognized as an internal or external command, operable program, or bath file."


This alone tells me either your sopdf is corrupted or your executable "soPDF.exe" is not in the path.

I know you did have SoPDF working at one point because you showed it in a previous post.

Try the following
1) Redownload the executable.
2) try typing in the command line "sopdf.exe" and see if that gives you a help menu.
3) IF not try ".\sopdf.exe"
If you do get the help then type
4) .\sopdf.exe -i input.pdf -o output.pdf

Note add the ".exe" to the command. Usually you don't have to type the extension but if you have a file name conflict then you must type it out completely.


Last comment you've typed "C:User\me\<snip>" Is this what you are really typing because you are missing a "\" after the colon, it should be "C:\User\me\<snip>"

=X=

enarchay
06-01-2009, 08:45 PM
I know you did have SoPDF working at one point because you showed it in a previous post.

It works - i.e. I get the help menu. I just can't get it to convert. However, the GUI doesn't seem to work at all: it opens, but it doesn't actually cause sopdf to do any converting.

Note add the ".exe" to the command.

Yeah, I tried it that way as well.


Last comment you've typed "C:User\me\<snip>" Is this what you are really typing because you are missing a "\" after the colon, it should be "C:\User\me\<snip>"

I forgot to add the "\" in my post. In the actual testing I used it.

I'll try redownloading the file.

enarchay
06-01-2009, 09:18 PM
Okay. I reinstalled (from the link in the first post) and it still doesn't work. I attached a screen cap.

frabjous
06-01-2009, 10:20 PM
3) IF not try ".\sopdf.exe"
If you do get the help then type
4) .\sopdf.exe -i input.pdf -o output.pdf

That looks like linux advice to me. Shouldn't be an issue in DOS.

Okay. I reinstalled (from the link in the first post) and it still doesn't work. I attached a screen cap.

That looks like something wrong with the PDF itself. Is the PDF encrypted, or DRMed or something like that? Could you try it with a different PDF?

You could try it with this file (http://people.umass.edu/klement/LinkReview.pdf), since I just tried it with that file, and so I know it works. (And since I made the PDF I know it's not encrypted, etc.)

I would guess you're getting the same error with the GUI; it just disappears before you can read it.

enarchay
06-01-2009, 10:28 PM
I downloaded your file and it still doesn't work. Wtf?

frabjous
06-01-2009, 10:45 PM
How doesn't it work? What's the message?

enarchay
06-01-2009, 11:01 PM
How doesn't it work? What's the message?

Same one as before.

frabjous
06-02-2009, 11:50 AM
Can you open the PDF in a normal PDF viewer?

Is the PDF still open when you try to convert it? If so, close any programs in which you're currently viewing it.

Could this be a Vista UAC thing? Try moving everything over to a directory like C:\PDF\, so it's not under C:\Users and trying from there.

=X=
06-02-2009, 01:17 PM
That looks like linux advice to me. Shouldn't be an issue in DOS.

No this is DOS advice too. It looked like he either had a corrupted soPDF or an executable conflict. I wanted to make sure he was executing the correct soPDF.exe .

His previous problem was the executable was not working.

There is some progression it looks like it's now the PDF that is not working.

=X=

=X=
06-02-2009, 01:21 PM
Same one as before.
Actually you are reporting different error messages, can you take a screen shot of the command line and the error.

Also do a content listing of the directory.

On the prompt type dir

Desktop\PDF\>dir

=X=

enarchay
06-02-2009, 01:41 PM
Can you open the PDF in a normal PDF viewer?

Yes.

Is the PDF still open when you try to convert it? If so, close any programs in which you're currently viewing it.

No. It's not open when I try to convert.

Could this be a Vista UAC thing? Try moving everything over to a directory like C:\PDF\, so it's not under C:\Users and trying from there.

I'll try it.

enarchay
06-02-2009, 01:47 PM
Actually you are reporting different error messages, can you take a screen shot of the command line and the error.

Also do a content listing of the directory.

On the prompt type dir

Desktop\PDF\>dir

=X=

Here are the screencaps.

=X=
06-02-2009, 02:44 PM
There is your problem. When you renamed you're file you called it "Input.pdf.pdf"

type this command instead

sopdf.exe -i input.pdf.pdf -o output.pdf

or rename the input file


Prompt> move input.pdf.pdf input.pdf
Prompt>sopdf.exe -i input.pdf -o output.pdf



What happens is Windows Explorer has this "feature" that hides the extension. When you rename a file and think you added an extension it does not your only adding the filename.

The command line however does not do this and requires the exact file name.

I hate that feature and disable it immediately.

enarchay
06-02-2009, 02:57 PM
There is your problem. When you renamed you're file you called it "Input.pdf.pdf"

Ouch. Can't believe I didn't notice that.

I tried another file and this one didn't work. (See attachment.) What can I do when a file doesn't work? It looks fine in PDF reader.

Also, how can I make it so the output is in portrait rather than landscape?

frabjous
06-04-2009, 03:30 PM
What happens is Windows Explorer has this "feature" that hides the extension. When you rename a file and think you added an extension it does not your only adding the filename.

I like that you put "feature" in scare quotes. "Annoyance" would be more like it. (Luckily you can turn it off.)

Ouch. Can't believe I didn't notice that.

I tried another file and this one didn't work. (See attachment.) What can I do when a file doesn't work? It looks fine in PDF reader.

I think there are some PDF versions soPDF just can't handle. Not sure if that's what you're seeing or what.

Also, how can I make it so the output is in portrait rather than landscape?

Try experimenting with the "mode of operations" options. That is, instead of what you had before, do:

("Fit Height x2")
sopdf -i input.pdf -m1 -o output.pdf

or

("Fit Height")
sopdf -i input.pdf -m3 -o output.pdf

Actually now that you've got it working for at least some files, you could also try the GUI again, and see if it works at all.

enarchay
06-05-2009, 08:19 PM
I got the GUI working. Some files don't work because of "errors." I tried it with proceed with errors but some still do not work. Others that do work still look just as bad on my ereader - and I think it's because they are A4 size.

skyfish
06-08-2009, 01:18 AM
I use both the GUI and the command line tool. worked great. fast and gives small(er) files.

One question, is there an option to thicken the font? The font looks too thin on the 500. I used pdflrfwin.exe. I liked its ability to thicken the font, and it looks pretty good on the PRE500. but it is slow and converted files are much too big.

Or any other tool can thicken/change the font in a PDF file?

frabjous
06-08-2009, 12:21 PM
There aren't any options besides those provided in the GUI, currently. The source code is there, however, if you want to play with it. Too bad theguru hasn't been around lately, though.

I don't really understand what you're looking for, though. Does the PRS-500 even support PDFs? I thought it didn't.

PDFLRF works by convering the files to images, which is why the files are so big, but that's what makes it possible to darken the image. (It isn't changing the font.) If you want a different font but still a text-based file, you'd be better off, I think, trying to convert the PDF to a different format and playing with it from there. How well that works depends a lot on the specifics of the PDF.

whatnopaper
06-08-2009, 12:38 PM
Hi theguru,

Thanks for an excellent tool. After trying a bunch of tools to convert pdfs for my sony reader, this one works best for me. :thumbsup:

The conversion from pdf to pdf is very fast.

For further development, is it possible to slightly nudge text that is hanging off too far to the left or right as is the case with chapter headers? That way the white space crop will be able to cut more and produce a better result.

eg (original)
Chapter 1.

The quick brown fox jumps over the lazy dog.

Should be changed to :
Chapter 1.
The quick brown fox jumps over the lazy dog.

skyfish
06-10-2009, 12:03 AM
There aren't any options besides those provided in the GUI, currently. The source code is there, however, if you want to play with it. Too bad theguru hasn't been around lately, though.

I don't really understand what you're looking for, though. Does the PRS-500 even support PDFs? I thought it didn't.

PDFLRF works by convering the files to images, which is why the files are so big, but that's what makes it possible to darken the image. (It isn't changing the font.) If you want a different font but still a text-based file, you'd be better off, I think, trying to convert the PDF to a different format and playing with it from there. How well that works depends a lot on the specifics of the PDF.

500 does display PDF, but not well at all. I like soPDF because it keeps PDF format. But because of 500's poor support for PDF, the font still looks faint. PDFLRF works fine, but it taker much longer to convert, and much bigger because of images. also I could not get TOC to work, while soPDF keeps TOC.

I have tried converting PDF to other format. For books have relatively simple texts, it is OK. but most of my PDFs contain pictures, tables, indented codes, etc. I have not yet found any program can handle well. soPDF is the closest I can get. if I can change the font it would be almost perfect.

dracodoc
06-12-2009, 01:39 PM
I have a question, what does 2xwidth and 2xheight mean? I tried the result but don't know why it is called that. Does that mean the page was rotated and split in half, so that each page was turned into 2 page exactly?

Besides, soPDF is much much quicker than pdflrf, but I found I still need pdflrf to process scanned pdf(not just comics, I have lots of scanned pdf. I think many google public ebook available to sony users are scanned too). The most valued feature of pdflrf to me is to crop the edge automatically -- I can crop pages in acrobat, but scanned pdf can have all kinds of layout in page, you can't have good result with fixed crop box.

I found a software called instacropper which can crop edge of picture automatically, but to export scanned pdf to jpg will increase the file size significantly(the data in scanned pdf are images, but to convert to jpg must increased file size a lot) , so it is not a good idea.

mianwo
06-19-2009, 12:18 PM
I love this tool!

Before this, I was always painstakingly crop the pdf pages manually, which inevitably creates incomplete pages from time to time because dimension fits one page doesn't always fits another, even if they are from the same book.
Now I can use this tool to crop those pdfs for me, and won't have to worry about a wrong cropping size. That's so great!

However, soPDF seems not working very well with pdf that has variable dimensions for different pages. I have some pdfs that use landscape for some page but portrait for others. By default, soPDF will cut the page in half from the middle of the page horizontally. This worked well with portrait pages, but for landscape pages, it's totally wrong.

So I'm thinking if you can add another feature of detecting the page dimension first and then decide how to cut the page based on the real dimension, this tool will be perfect!

olin
06-22-2009, 05:22 PM
I tried to run it on a Windows XP x64 machine and got "... side-by-side configuration is incorrect ..." error. Is it a problem with x64? Thanks.

joedevon
06-27-2009, 01:47 PM
I have a question, what does 2xwidth and 2xheight mean? I tried the result but don't know why it is called that. Does that mean the page was rotated and split in half, so that each page was turned into 2 page exactly?

I'm totally baffled by that too. Hopefully someone will answer.

soPDF is the closest I can get. if I can change the font it would be almost perfect.
I agree 100%!

frabjous
06-27-2009, 01:51 PM
Yes, 2xHeight and 2xWidth both cut the page in half and rotate it, in different ways. Just try them both and see the results you get to compare. I'm not exactly sure why they're called that myself. I would have called them something else.

Changing the font is not in the works for this kind of converter. You'd be better off trying to convert the PDF to some other format, and then change the font, and if need be, convert back to PDF.

joedevon
06-28-2009, 03:04 PM
Yes, 2xHeight and 2xWidth both cut the page in half and rotate it, in different ways. Just try them both and see the results you get to compare. I'm not exactly sure why they're called that myself. I would have called them something else.

Changing the font is not in the works for this kind of converter. You'd be better off trying to convert the PDF to some other format, and then change the font, and if need be, convert back to PDF.

I've tried so many things, kinda giving up on being able to change the font. Now the only thing I'm hoping to find is a way to add bookmarks to chapters from outside the Reader. Doesn't seem like Calibre will do it & no luck w/ the other tools I've tried. BookDesigner has been way too frustrating to deal w/ ...at least w/ the PDFs I'm working on...shame because it looked to be the easiest way to deal w/ bookmarks.

frabjous
06-29-2009, 06:27 PM
I really only use SoPDF for personal viewing, and then I don't find adding bookmarks to be worthwhile.

Since I'm a LaTeX user, if I wanted to do this, I do such things with pdflatex and the pdfpages and/or hyperref packages.

However, what is your input file format? Already PDF? Do the PDFs already have bookmarks, and you're just trying to preserve them through the conversion process, or are you adding new ones?

=X=
06-29-2009, 07:11 PM
I've tried so many things, kinda giving up on being able to change the font. Now the only thing I'm hoping to find is a way to add bookmarks to chapters from outside the Reader. Doesn't seem like Calibre will do it & no luck w/ the other tools I've tried. BookDesigner has been way too frustrating to deal w/ ...at least w/ the PDFs I'm working on...shame because it looked to be the easiest way to deal w/ bookmarks.

To change font(warning not for the faint of heart) see link to "Poor mans way to edit PDF files." (http://www.mobileread.com/forums/showthread.php?t=10066&highlight=poor+boys+way+editing+pdf+files+%28mostl y+linux%2C+cygwin%29)

To add/edit Meta data on PDF I use BeCyPDFMetaEdit (http://www.mobileread.com/forums/showthread.php?t=27627&highlight=BeCyPDFmetaedit).

=X=

joedevon
06-29-2009, 08:33 PM
Since I'm a LaTeX user, if I wanted to do this, I do such things with pdflatex and the pdfpages and/or hyperref packages.

Never heard of LaTeX.

However, what is your input file format? Already PDF?
Yes.

Do the PDFs already have bookmarks, and you're just trying to preserve them through the conversion process, or are you adding new ones?

If I can preserve them that would be dandy. There are some instances where editing them would be nice, but far from needed. soPDF is stripping them.

To change font(warning not for the faint of heart) see link to "Poor mans way to edit PDF files."

Got scared by that one already :)

To add/edit Meta data on PDF I use BeCyPDFMetaEdit.

Interesting! Are Bookmarks included in the definition of metadata?

It sounds like it has problems depending on what version the source file is though...

=X=
06-30-2009, 10:02 AM
If I can preserve them that would be dandy. There are some instances where editing them would be nice, but far from needed. soPDF is stripping them.

Look for a tool called PDFBookmark

It's a command line shareware tool which can extract bookmarks to an XML file and also creates bookmarks from the XML file.

The trial version adds the random string "[ TRIAL * ]" to a few bookmarks, but with BeCyPDFMetaEdit you can remove them.



Interesting! Are Bookmarks included in the definition of metadata?

yes

=X=

joedevon
07-03-2009, 02:29 AM
Look for a tool called PDFBookmarks

It's a command line shareware tool which can extract bookmarks to an XML file and also creates bookmarks from the XML file.

The trial version adds the random string "[ TRIAL * ]" to a few bookmarks, but with BeCyPDFMetaEdit you can remove them.



yes

=X=

Great! :thanks:

moonlit
07-08-2009, 09:38 AM
Tons of thanks to theguru for this program.

HarperCollins books were mangled on my Reader so I couldn't see chapter breaks, but when I put the files I bought through sopdf they display beautifully!

Pictures here: http://www.mobileread.com/forums/showthread.php?t=50408

hansl
08-03-2009, 04:45 AM
Look for a tool called PDFBookmarks

It's a command line shareware tool which can extract bookmarks to an XML file and also creates bookmarks from the XML file.

The trial version adds the random string "[ TRIAL * ]" to a few bookmarks, but with BeCyPDFMetaEdit you can remove them.

=X=

thanks for the hint. Searching for PDFBookmark and "command line" yields even better results. Note the singular of PDFBookmark.

hansl

inew
09-25-2009, 01:38 AM
I am wondering whether this effort can be merged with the multi-column converter?
http://www.mobileread.com/forums/forumdisplay.php?f=184

I see a strong use case for a program with combined functions.

=X=
09-26-2009, 03:23 PM
@inew It would be nice but the functionality of the two programs are much different. One program coverts the PDF to images and manipulates the image. The other program crops the PDF.

kergoth
09-27-2009, 08:27 PM
I just had to post a reply saying that soPdf is awesome. Technical PDFs were nigh unreadable without it. Either everything is so small I'd get a headache from reading it, or I'd lose formatting and not be able to read diagrams and code snippets. Thanks to soPdf, I can actually enjoy reading coding books on my PRS-505. Now this should hold me over till the Plastic Logic device shows up.. Thanks, soPdf author! :D

kergoth
09-28-2009, 01:22 AM
First of all, I finally got a ereader (PRS-505) and I love it, except the screen is a lot smaller than I expected. I don't read novels (pft who has time for those! :p) but textbooks and papers, so I really needed something to cut out the margins and basically maximize reading area. Thankfully soPdf does exactly that... but for windows :smack:

Attached is my port of soPdf for Linux.

I included a binary plus my sources for soPdf.c and processPdf.c and a brief readme.

the only dynamic dependencies are to common libs like libjpeg, zlib, freetype, etc so it should work out of the box on any modern distro (I tested on ubuntu 8.04..) I also compiled in the optional jbig2dec dependency.

it works for me, i've used it on multiple pdfs ~15-20MB each.
however, YMMV

thanks theguru :)

What version of fitz/mupdf is appropriate for building this? The july 09 versions don't seem suitable, as there have been some changes. Will look into what those changes are, but I'd like to avoid learning that much about pdfs ;)

kergoth
10-29-2009, 02:02 PM
It seems that a good chunk of the PDFs I have run sopdf against end up losing their metadata. These are shown on the sony reader (and in Preview) with no title or author. Prior to running sopdf, the title and author were just fine. The sopdf execution went successfully, and viewing it one can see that it worked, cropped and split as they should be, just the metadata was removed.

=X=
10-30-2009, 12:46 PM
if what you mean by mssing metadata is "bookmarks" yes that is an known issue see post #123 on this thread.

grimborg
11-02-2009, 09:06 AM
I'm getting this error: This application has failed to start because the application configuration is incorrect. Reinstalling the application may fix this problem.

Any ideas?

I never use Windows so I am (happily) illiterate on it. That was on a (vmwared) wxp.

(Yep, tried wine, doesn't work on it either)

frabjous
11-02-2009, 09:27 AM
grimborg,

There's a linux port of sopdf on page 2 of the thread. There's no need to try Wine or fire up a VM.

grimborg
11-02-2009, 09:42 AM
GNU/Linux users, check unpnup:

http://www.mobileread.com/forums/showthread.php?t=14340&highlight=unpnup

It uses pdftk and poster to split pdfs.

EDIT: The port in #43 also removes margins, thus it's probably a better alternative.

http://www.mobileread.com/forums/showpost.php?p=328780&postcount=43

grimborg
11-02-2009, 09:42 AM
grimborg,

There's a linux port of sopdf on page 2 of the thread. There's no need to try Wine or fire up a VM.


Thanks! Didn't see it! :D

0plus1
11-06-2009, 10:09 AM
Hi, I'm new on the ebook scene, so I don't really know if this has been forgotten to something newer, but since it gave me the best results I hacked together a small windows GUI in 5 minutes.. this is more than enough for my needs, if enough people are intrested I will expand it, either way it works just as it is.

Currently if forces -e since many pdf would return error even producing a good final result.

sopdf.exe not included, you have to put it in the same directory.

frabjous
11-06-2009, 10:59 AM
Hi, I'm new on the ebook scene, so I don't really know if this has been forgotten to something newer, but since it gave me the best results I hacked together a small windows GUI in 5 minutes.. this is more than enough for my needs, if enough people are intrested I will expand it, either way it works just as it is.

Currently if forces -e since many pdf would return error even producing a good final result.

sopdf.exe not included, you have to put it in the same directory.

If you look at page 2 (http://www.mobileread.com/forums/showthread.php?t=32066&page=2) of the thread (post #63), you'll see that I had also made a Windows GUI for it. I haven't tried yours yet, so I can't compare. (Indeed, I've pretty much stopped using Windows entirely in the meantime. I still use sopdf, though... only the linux version.)

0plus1
11-06-2009, 02:00 PM
If you look at page 2 (http://www.mobileread.com/forums/showthread.php?t=32066&page=2) of the thread (post #63), you'll see that I had also made a Windows GUI for it. I haven't tried yours yet, so I can't compare. (Indeed, I've pretty much stopped using Windows entirely in the meantime. I still use sopdf, though... only the linux version.)

Damn, I totally missed this :-(
I wouldn't have done it myself if I saw yours, especially since is better than mine..

I will use yours :D

sebastienbillard
11-08-2009, 01:10 PM
message to delete once i figure how to do it...

osama_jamal
11-12-2009, 03:50 PM
Man first of all my deepest respect and gratitude for your program.

I was so frustrated by SONY's in ability to view large PDF's.
Now, I have hope thanks to your tool =)

I have a suggestion, I tried to use the tool on a scanned book in Arabic (my mother tongue) and face a problem with white space. The program didn't detect it and it remained with the book.

Best of luck!

grimborg
11-13-2009, 05:52 AM
I'm using Adobe Acrobat to remove the white space. You can use the trial version, it's available on their web site.

It runs on Wine too.

Is there a Free-as-in-speech tool to crop pdfs?

frabjous
11-13-2009, 07:35 AM
SoPDF is deisgned to crop the whitespace. If it's not working, it's some kind of bug. Also try PaperCrop. (There's a thread on it here. Too lazy to find the URL to post.)

There are lots of other such tools. Most LaTeX distributions (which is usually considered Free Software by people who track such things) come with a croppdf tool (commandline).

If I'm not entirely mistaken, I think even calibre comes with a similar tool.

vietchovui
12-09-2009, 01:56 AM
soPdf is quite good. I've tried many conversion softwares and found that solid pdf converter and Indesign cs4 are the best tools! I convert pdf book with many tables and graphics to rtf with solid pdf converter, and then use Indesign CS4 to convert rtf back to pdf. The result is excellent!

gdsense
12-13-2009, 07:51 PM
Hello, people...

I've spent my entire weekend to build a simple GUI for SoPDF...(Still learning C#)

And then I've found there are already a bunch of nice GUI tools (including Excel) presented in this thread.

Anyway, I'm posting mine...

It's written in C#, so that .Net Framework 3.5 should be pre-installed.

Also GUI executable (SoPDF GUI 091215.exe) should be in the same location with "soPdf.exe".

Compared to frabjous's work, mine is quiet immature, just try it for fun...

Update on 09/12/15: GUI can be run with Options containing Space Characters.

http://mycyclopedia.tistory.com/207

frabjous
12-13-2009, 08:05 PM
Thanks for the plug, but really, mine could use a lot of work -- especially with regard to catching and reporting errors to the user.

Since I've switched to linux, however, I don't have the inclination myself...

gdsense
12-13-2009, 08:13 PM
Thanks for the plug, but really, mine could use a lot of work -- especially with regard to catching and reporting errors to the user.

Since I've switched to linux, however, I don't have the inclination myself...

Thank you for reply...

Maybe I should recompile it using mono.net framework...:) for linux users...

frabjous
12-13-2009, 08:22 PM
Not a bad idea, but don't do it on my account. I've gotten quite used to doing it through the command-line on linux (or actually with a custom bash script hard-wired to my favorite settings).

gdsense
12-13-2009, 08:47 PM
Not a bad idea, but don't do it on my account. I've gotten quite used to doing it through the command-line on linux (or actually with a custom bash script hard-wired to my favorite settings).

Actually, I've only a minute experience with Linux OS.
Just used SUSE for numerical analysis purposes(on X Windows)...

I think improving windows version of the GUI and then porting to other OSes is right strategy...

Thank you for advice...

tofuman
12-14-2009, 04:28 PM
It's been 2 weeks I'm trying to use this program, to no avail.

I installed framework but I still get error messages. (I also put exe files and the source file in the same directory)

soPdf ver 0.1 alpha Rev 12
A program to reformat pdf file for sony reader

Input : purgatorio.pdf
Output: purg2.pdf


Processing input page : Error: .\processPdf.cpp(403) : processErrorPage() - Cann
ot process page 1
Error: .\mupdf\pdf_page.c(241) : pdf_loadpage() - cannot load page resources
Error: .\mupdf\pdf_resources.c(430) : pdf_loadresources() - cannot load xobjec
t resource
Error: .\mupdf\pdf_resources.c(165) : preloadxobject() - cannot load image re
source 4
Error: .\mupdf\pdf_stream.c(505) : pdf_loadstream() - cannot open stream (4)

Error: .\mupdf\pdf_stream.c(454) : pdf_openstream() - cannot create filter
Error: .\mupdf\pdf_stream.c(306) : pdf_buildfilter() - cannot create filte
r
Error: .\mupdf\pdf_stream.c(127) : buildonefilter() - unknown filter name
(JBIG2Decode)

ambertape
12-16-2009, 10:04 PM
My Sony PRS-500 was modified by Sony to work with ePub and Adobe Acrobat formats. Will your program be able to do the following. Many pdf files I download are from a magazine called maximumpc and their articles are of the scanned type by this I mean, they take each page scan it and then save it as a pdf file. Will your program convert this pdf file into your version of the pdf file that will be read on the ereader of Sony, my Prs-500 reader. Will I be able to increase the font size in either your program or another pdf creating program ? What is the latest soPDF program should I use that has a GUI ?Thanks a lot.

Ambertape

frabjous
12-16-2009, 11:24 PM
Ambertape,

It doesn't look like the author is still checking this thread. AFAIK, the version posted in the first post of the thread is the only version that exists. You can use either my or gdsense's GUI with it.

If the PDF started off as a scan, there'll be no way to change the font size. This program does not convert images to text. You'll need an OCR program to do that. Whether or not soPDF will work well enough to make it readable on your screen depends a lot on the dimensions and make-up of the magazine. Personally, I tend to have better luck with PDFLRF (http://www.mobileread.com/forums/showthread.php?t=13135) for scanned PDFs. If the magazine organizes things into columns, you might also try PaperCrop (http://www.mobileread.com/forums/showthread.php?t=31677).

tricos
12-25-2009, 04:37 AM
Hi, I just got a kindle and spent the whole day researching on best way to get my journal articles on this thing... Kinda disappointed with the options, but sopdf is about the best way to get screwed since you're going to be any way you go...

I don't even think he did anything else with it after first creating it... You can't download the exe or see his code on his google site...

Anyways, I just spent a couple hours at 2 am writing something to get all my journal articles from my folders converted so I could copy and paste them on the kindle so I figured I'd drop the code here so someone more motivated than me can maybe run with it, and make it worth a dang other than just a one time thing for myself late at night...


Imports System.IO
Imports System.Text

Public Class Form1

Private Prefix As String = "G:\KindleOut\"

Private Sub btnFolder_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnFolder.Click

Try

Dim objFolderDialog As New FolderBrowserDialog()
objFolderDialog.ShowDialog()

Me.txtFolder.Text = objFolderDialog.SelectedPath

Catch ex As Exception

End Try

End Sub

Private Sub btnExit_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnExit.Click

Me.Close()

End Sub

Private Sub btnRun_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnRun.Click

Try

Me.btnExit.Enabled = False
Dim Path As String = Me.txtFolder.Text

If Not String.IsNullOrEmpty(Path) Then

If Not Directory.Exists(Path.Replace("G:\", "G:\KindleOutput\")) Then
Directory.CreateDirectory(Path.Replace("G:\", "G:\KindleOutput\"))
End If

Dim objDirectory As New DirectoryInfo(Path)
Dim objDirectories As DirectoryInfo() = objDirectory.GetDirectories
If objDirectories.Count > 0 Then
For Each objDir As DirectoryInfo In objDirectories
ProcessDirectory(objDir)
Next
End If

ProcessFiles(objDirectory)

End If

Catch ex As Exception

Finally

Me.btnExit.Enabled = True

End Try

End Sub

Private Sub ProcessDirectory(ByVal objDir As DirectoryInfo)

If Not Directory.Exists(objDir.FullName.Replace("G:\", "G:\KindleOutput\")) Then
Directory.CreateDirectory(objDir.FullName.Replace("G:\", "G:\KindleOutput\"))
End If

Dim objDirectories As DirectoryInfo() = objDir.GetDirectories()

If objDirectories.Count > 0 Then
For Each objDirectory As DirectoryInfo In objDirectories
ProcessDirectory(objDirectory)
Next
End If

ProcessFiles(objDir)

End Sub

Private Sub ProcessFiles(ByVal objDir As DirectoryInfo)

Dim objFiles As FileInfo() = objDir.GetFiles("*.pdf")

For Each objFile As FileInfo In objFiles

Dim objBoo As New StringBuilder()

With objBoo
.Append(" -i ")
.Append(Chr(34) & objFile.FullName & Chr(34))
.Append(" -o ")
.Append(Chr(34) & objFile.FullName.Replace("G:\", "G:\KindleOutput\") & Chr(34))
End With

If Not File.Exists(objFile.FullName.Replace("G:\", "G:\KindleOutput\")) Then
Process.Start("c:\sopdf.exe", objBoo.ToString())
System.Threading.Thread.Sleep(2000)
' if you don't have something in there to keep it from opening up
' 50 files at once to convert them, then it is going to take 100 times longer
' than if you just have it wait 2 seconds after each one...
End If

Next

End Sub

End Class

Nathan Campos
12-27-2009, 09:12 AM
Thanks very much for this useful software.

I'm also making a GUI frontend of it, to make more easy for other users ;)

mypolar
12-27-2009, 05:41 PM
Finally :thumbsup: I am able to read reports and journals that I had pretty much given up on reading with my Sony. I just set the soPdf to Fit 2x Height and it converts perfectly for my Sony 505. I can enlarge it to medium and still keep ALL of my graphs and diagrams intact. Yippee.

Special Thank You to theguru for developing this wonderful help & frabjous for making it easier to understand for a non techie-non cmd user like me!


:thanks:

Nathan Campos
12-29-2009, 05:33 PM
Finally I've finished the GUI front-end for soPDF, take a look soPDF GUI Front-End (http://www.mobileread.com/forums/showthread.php?t=67739) ;)

MrKyle
12-31-2009, 06:25 PM
Thanks to you all, I was using pdflrfwin before. I see now that theres little point in coding a gui for it unless I merge tools.

Nathan Campos
12-31-2009, 11:30 PM
Hmmm....
Good to know ;)

greenapple
02-08-2010, 07:02 PM
soPdf and the windows gui (by frabjous) are the MOST USEFUL tools in my chest of converters. Thanks very much to the creators.

By the way, is there a way to change the page size so that they are all the same? After conversion, some pages are larger than the others (when viewed with a regular PDF viewer, eg PDF X-Change). When I view them on my reader device, using auto full-screen resize (which is the only viable way to read the converted PDF on my jetBook), some pages 'jump' at me, because the font size is unexpected enlarged many times. This happens when some pages are of a smaller size (after processing), and are enlarged on the reader.

Edit: I forgot to mention that this happens only with landscape processing

radamo
02-13-2010, 10:46 AM
Wow... Just found sopdf. "So" useful, "so" much better than every other converter. Many thanks to the author.
RA

damag
04-13-2010, 08:59 AM
Hi, has anyone been able to compile the Linux version of sopdf? I can't seem to find a version of the mupdf library that works with it. The README contains some broken links and some vague instructions on updating the mupdf code to work with sopdf.

All I'm trying to do is compile a 64-bit version, since I can't run the 32-bit version provided in this thread. Any help would be appreciated, thanks.

frabjous
04-13-2010, 09:27 AM
The linux version posted earlier in the thread is already compiled and works just fine for me on 64 bit Ubuntu. I don't know whether there are any compatibility libraries needed which I just happened to have anyway, but I'll investigate when I get a chance.

damag
04-13-2010, 10:23 AM
Hi Frabjous, the GUI program seems to be 64-bit but the command line 'sopdf' tool (which I am trying to use) is only 32-bit. I verified this with the 'file' command:

$ file sopdf
sopdf: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.8, stripped

If the code can't be compiled, do you know what compatibility libraries I can installed to run 32-bit binaries?

Thanks

frabjous
04-13-2010, 11:10 AM
No doubt the executable was compiled on 32 bit, but it runs for me on 64 bit.

Trust me. I know I helped compile the GUI, but I don't use it. (It's just a wrapper for the command-line version anyway.) I use the commandline, and I'm sure I've never compiled it.

When I get a chance, I'll fire up a liveCD to investigate what other than default stuff it needs, if anything.

damag
04-13-2010, 02:08 PM
Thanks, I forgot to mention that I am using Fedora 12. When I try to run the 32-bit compiled executable from this thread, I get:

$ ./sopdf
bash: ./sopdf: /lib/ld-linux.so.2: bad ELF interpreter: No such file or directory

Which make sense since those libraries are all under /lib64 on my system, not /lib. I'd prefer to be able to compile the code myself, but if I need to install some legacy 32-bit libraries then so be it. =) I'm not even sure if Fedora provides a package for that in their repositories... everything in Fedora runs as native 64-bit.

Maybe when I get some time, I'll try to change the code so that it compiles against the standard mupdf library. Do you have any idea why it needed patching in the first place?

frabjous
04-13-2010, 03:36 PM
Yeah, after investigating, it seems that to run the 32-bit executable in Ubuntu 64 bit, you need to install the ia32-libs package, which is something I install routinely anyway.

(I just tested that with a LiveCD--actually it was a Linux Mint live CD, since that's the only 64 bit distro live CD I had available--but installing that and no additional packages I was able to run sopdf. I doubt it's any different for Ubuntu, since LM is based on Ubuntu.)

I don't know whether or not there's anything equivalent to ia32-libs for Fedora. I found this (http://www.metztli-it.com/blog/blog12.php/2009/05/09/providing-32-bit-application-support-und) when Googling, but it looks like it might be worth a shot if you can't figure out a way to compile the source.

No, I don't know anything about the sopdf source; I don't even know C.

damag
04-13-2010, 03:55 PM
Thanks for your help, I just figured out how to get the binary running with 32-bit libraries after reading the comments in this post (http://beginlinux.com/blog/2009/09/installing-32-bit-support-into-64-bit-fedora-11/comment-page-1/#comment-1385). It's just a matter of installing the libraries but specifying the i686 architecture for each:

$ sudo yum install glibc.i686 libjpeg.i686 zlib.i686 freetype.i686 fontconfig.i686

It works now, thanks for all your help. =)

frabjous
04-13-2010, 04:01 PM
Glad to hear it!

MrKyle
04-16-2010, 12:30 PM
Thanks for your help, I just figured out how to get the binary running with 32-bit libraries after reading the comments in this post (http://beginlinux.com/blog/2009/09/installing-32-bit-support-into-64-bit-fedora-11/comment-page-1/#comment-1385). It's just a matter of installing the libraries but specifying the i686 architecture for each:

$ sudo yum install glibc.i686 libjpeg.i686 zlib.i686 freetype.i686 fontconfig.i686

It works now, thanks for all your help. =)

Thanks for commenting as to what fixed it. The problem you were having is a much bigger issue than this particular app I imagine. Now all us non Fedora users know how to ensure compatibility with 32-bit binaries.

roger64
04-29-2010, 02:46 AM
Using Frabjous advice (better late than never), I downloaded and used the soPDF "bundle" on Windows XP.*

I find it amazingly efficient and fast to process a file.

This way I got a new PDF that I could easily read on my PRS-505 (landcape mode). I particularly appreciate the continuous flow of text (no white space between pages). Turning the pages is also twice quicker than what I got with the Acrobat Pro output (spend your money wisely :o ) :thumbsup:

A little sluggish for inserting bookmarks or looking for a new page but the end result is mostly pleasant.

For some pages, the last line (or last two lines) are not so crisp (a little faded I would say). I do not know to what attribute this -small- display defect.

As I am mainly an Ubuntu 32 bits user (today is a festive LTS day!!), is there also a GUI available on the Linux version?

* I also installed today (yes, again better late than never) the great PRS+ software on my PRS-505. :thumbsup:

frabjous
04-29-2010, 08:48 AM
There's a 64-bit linux GUI which I compiled in Nathan's SoPDF Frontend (http://www.mobileread.com/forums/showthread.php?t=67739) thread. It looks like no one ever got around to compiling a 32 bit linux version. You could do it yourself following Nathan's advice there. You could then share it with the community.

If you don't have time, or are too intimidated (which would be understandable; Nathan's instructions are not as clear as they could be...), I'm hoping to get around to doing this myself. I don't have a 32 bit linux system running right now, but I've been thinking about trying Lubuntu on my Notebook, which is only available 32 bit if I'm not mistaken. I was going to wait until the semi-official Lucid version is out, which I think should be in the next few days (Ubuntu/Kubuntu were released today, as you know.) If I end up doing that, I'll be happy to compile the GUI, but you'll have to give me a week or so.

Once you figure out what your favorite options are, it would also be trivial to write a bash script which you could call with from within the right-click context menu when browsing folders in nautilus. I do that on my machine. I'd be happy to share mine if you tell me what your favorites are.

For some pages, the last line (or last two lines) are not so crisp (a little faded I would say). I do not know to what attribute this -small- display defect.

This is just a hunch, but I suspect this has nothing to do with soPDF. The Sony Reader, or at least the 505, when viewing a PDF in landscape mode, so that you only see half the page at once, will fade out the edge bordering the other half of the page. It's its way of reminding you that there's more to see.

Personally, I find it very annoying.

Actually, one great way around this is to use the "Fit2xWidth" (or -m0) option in soPDF, which will, in addition to cropping, splits the pages in half and rotates them 90 degrees. That way, when you load them on the Sony, it'll look just like it does in landscape mode when the device is still in portrait mode, so you won't get this faded areas. (You'll also get portrait menus, which I prefer to the landscape ones, because the numbers line up with the number buttons...)

roger64
04-29-2010, 09:56 AM
Thanks for your compilation proposal but I am afraid I'm too thick for that. I just used a couple of times a sudo make install and that' it. Once I proposed a 32 bits x86 deb for Sigil this way but it had no future.

If this fading is intentional, it's a weird and dumb idea... So thanks also for explaining the Fit2xWidth workaround. The name looked so obscure that I dared not try this option.

OK. I tried it: it's better, quicker and no fading. Deviously nice. :)

:thanks:

NB: all Ubuntu versions not equally ready now but the servers are already jammed.
http://iso.qa.ubuntu.com/qatracker/build/all/all

Well, even this site is down now...

frabjous
04-29-2010, 10:59 PM
Yeah, there's a reason the Fit2xWidth option is default, though I don't really understand the naming conventions myself. Too bad theguru isn't around to ask...

I went ahead and compiled a 32 bit linux binary for Nathan's GUI. You'll find it, packaged with the sopdf binary, in Nathan's thread:

http://www.mobileread.com/forums/showpost.php?p=890654&postcount=36

I'd appreciate it if you let me know if it works for you, since I've only tested it on one system.

I downloaded Kubuntu 10.04 earlier today, which actually wasn't too bad to download, though I haven't gotten around to installing it. I also installed Lubuntu 10.04 (though it's unofficial) on my Notebook. Nice and fast in general, though updates are taking forever today, yeah.

roger64
04-30-2010, 02:58 AM
Your Linux binary port is working lightning fast on Ubuntu Karmic 32 bits. I also could read the output window (text ending with: saved), while on XP, I could only read it if it was reporting a failure for some reason (for example, failed to process because it's an image PDF).

To be sure, I did it twice. :p

PS: I am going to try to pay you with an unrelated tip that maybe will complete your signature. :)
If you use Grub2, have available a USB key, FAT32 formatted, you can install MultibootV3 . It's free, GPL and has an Ubuntu deb. It's intended to allow booting from any iso file placed on a USB key. It's incredibly easy to use.

You just drop nearly any iso in a small window (see photo) and have it copied on the key. You then can try it purely Live. If you like it, you can use this iso on a persistent way (it's just a matter of settings).
You can add and use as many isos on your single partition as there is space available on it. Ideal for testing or demonstration purposes or for using LiveCD like Clonezilla Live, RescueCD, Acronis boot iso, whatever.
I find it amazing: http://liveusb.info/dotclear/

flyash
06-14-2010, 04:23 PM
I have a pdf (image scan), and I tried using soPdf (w/ Windows GUI) to crop the margins. The program runs through all the pages (I can see this happening in the cmd window), but the resulting file is only 1kb and contains no pages.

Any suggestions appreciated.

frabjous
06-14-2010, 04:28 PM
I have a pdf (image scan), and I tried using soPdf (w/ Windows GUI) to crop the margins. The program runs through all the pages (I can see this happening in the cmd window), but the resulting file is only 1kb and contains no pages.

Any suggestions appreciated.
Does the cmd window show any errors at any point? Or just the page numbers?

SoPDF is rather temperamental, and in general I find that it's not the most effective tool when it comes to scanned PDFs anyway. Depending on your goals, I would suggest another tool, such as BRISS. (http://www.mobileread.com/forums/showthread.php?t=83055)

flyash
06-14-2010, 04:36 PM
Does the cmd window show any errors at any point? Or just the page numbers?

SoPDF is rather temperamental, and in general I find that it's not the most effective tool when it comes to scanned PDFs anyway. Depending on your goals, I would suggest another tool, such as BRISS. (http://www.mobileread.com/forums/showthread.php?t=83055)
I just ran sopdf from the command prompt. It processes all the input pages (runs through all 200+ pages), but doesn't copy output pages, then it saves the file. No error messages.

Thanks, I'll take a look at Briss.

Update: Briss did a good job cropping this pdf.

humore
07-16-2010, 05:57 AM
I sincerely hope theguru will come back and make it even better. The cropping is not consistent, with some of the pages left uncropped. Anyone else has got similar problems?

stjoe
08-16-2010, 10:03 PM
Hi All,

Newbie here with Kindle DX.

I tried to use sopdf to split PDF 2 column using fit 2xWidth but the output PDF pages are rotated (upside down).

I don't know if this rotated page is specifically for Sony PRS ereader, but for Kindle, we don't need the page rotated.

Anyone can help me here?

Thanks

frabjous
08-17-2010, 08:16 AM
SoPDF is not designed to handle multi-column PDFs. The 2x options were designed for splitting a single page vertically (and then rotating for viewing on the reader in landscape), not for splitting up columns. You might have better luck with something like BRISS. (http://sourceforge.net/projects/briss/)

However, if SoPDF manages to cut your multicolumn PDFs in the right place (this would be luck), I suppose you could use jPDFTweak (http://jpdftweak.sourceforge.net/) or pdftk (http://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/) to rotate them back upwards, but I bet it would take another run through SoPDF to get rid of the middle margins.

SunLight
09-30-2010, 09:38 AM
Any update in working with comicbooks:

"Cannot yet convert the comic book. It can still split the image pdfs into two." When I just tried it out on a comicbook images combined to a pdf, it was able to crop it but not remove any of the dead white space/margins.

And is there a way to change the resolution/size of the cropping so it work for the Kindle 3?

frabjous
09-30-2010, 01:19 PM
Could you be a little clearer? Removing the dead white space/margins is what cropping means, so I don't know what you could mean by saying that it was able to crop but not do that. Do you mean that it split but didn't crop? Are these comics scanned? Scanned whitespace is not real whitespace so that won't get cropped. BRISS (http://www.mobileread.com/forums/showthread.php?t=83053) might be a better tool for that if so.

aquawater
10-04-2010, 02:11 AM
Processing input page : 486warning: masks can not have colorspace, proceeding anyway.

Copying output page : 964Error: .\stream\obj_dict.c(96) : fz_deepcopydict() - assert: not a dict (<n
il>)

Get this error when converting...
no parameters set, except -i -o
any clue ??

SunLight
10-06-2010, 05:10 PM
Could you be a little clearer? Removing the dead white space/margins is what cropping means, so I don't know what you could mean by saying that it was able to crop but not do that. Do you mean that it split but didn't crop? Are these comics scanned? Scanned whitespace is not real whitespace so that won't get cropped. BRISS (http://www.mobileread.com/forums/showthread.php?t=83053) might be a better tool for that if so.

BRISS will not be a good program for this as the white space margins are not the same on each page.

frabjous
10-06-2010, 10:37 PM
BRISS will not be a good program for this as the white space margins are not the same on each page.

Well, BRISS does sort out pages into groups with similar margins and let you crop each group differently, but if actually every page is different, then yeah, it might not work. Have you considering trying something like the script I give at the bottom of this thread (http://www.mobileread.com/forums/showthread.php?p=1139595#post1139595) using Ghostscript/calibre? (Though I doubt that'll work too well if it's a scan either.)

EbokJunkie
12-23-2010, 05:22 PM
Please explain how page size participates in conversion? I have Kindle 3 and Sony 950, screen dimensions, especially screen heights are apparently different.
Does soPdf default to 6 inches?

Also, I don't see sopdf source download. on code.google.com.
Is source code still somehow available?

pherodeon
01-29-2011, 11:31 AM
It would be possible to crop each page of a .pdf into 4 parts?

I would like to read on mi kindle 3 like the four steps per pages that it is shown in this video at time 2:30:

http://www.youtube.com/watch?v=co5bPVY95DY&feature=channel

SoPdf works very well but the size of the letters continues to be too small for me.

Thank You very much

frabjous
01-29-2011, 01:40 PM
It would be possible to crop each page of a .pdf into 4 parts?

BRISS (http://sourceforge.net/projects/briss/) can do that. It would be tricky trying to do with soPDF alone.

luma
02-11-2011, 10:53 PM
Yeah, there's a reason the Fit2xWidth option is default, though I don't really understand the naming conventions myself. Too bad theguru isn't around to ask...

I went ahead and compiled a 32 bit linux binary for Nathan's GUI. You'll find it, packaged with the sopdf binary, in Nathan's thread:

http://www.mobileread.com/forums/showpost.php?p=890654&postcount=36

I'd appreciate it if you let me know if it works for you, since I've only tested it on one system.

I downloaded Kubuntu 10.04 earlier today, which actually wasn't too bad to download, though I haven't gotten around to installing it. I also installed Lubuntu 10.04 (though it's unofficial) on my Notebook. Nice and fast in general, though updates are taking forever today, yeah.

Thanks for this. Works great here.

Any chance you can get this in the OP so others don't have to wade through the thread?

:thanks:

penartur
09-27-2011, 03:15 AM
It seems that there is some sort of memory leak in soPdf.
I'm trying to process this PDF file (http://narod.ru/disk/26480166001/361810.pdf.html) with it (i believe this is a prepress of this book (http://www.abebooks.com/9780321415547/Macroeconomics-Abel-Andrew-Bernanke-Ben-032141554X/plp); it contains text, not just scanned images). The file is not that huge, it is just 30MB/632pp.
However, when i'm trying to process it with soPdf with default parameters, soPdf eats about 20MB of RAM per page, up to 2GB (which is Windows limit for 32-bit processes i believe), when it dies with the following message:

penartur@X220 D:\penartur\Downloads\software\soPdf# soPdf.exe -i 361810.pdf -o "Abel, Bernanke. Macroeconomy.pdf"

soPdf ver 0.1 alpha Rev 12
A program to reformat pdf file for sony reader

Input : 361810.pdf
Output: Abel, Bernanke. Macroeconomy.pdf


Processing input page : 102warning: cannot realloc 16777216 bytes
Error: .\processPdf.cpp(403) : processErrorPage() - Cannot process page 102
Error: .\mupdf\pdf_page.c(241) : pdf_loadpage() - cannot load page resources
Error: .\mupdf\pdf_resources.c(430) : pdf_loadresources() - cannot load xobject resource
Error: .\mupdf\pdf_resources.c(165) : preloadxobject() - cannot load image resource 471
Error: .\mupdf\pdf_stream.c(510) : pdf_loadstream() - cannot load stream into buffer (471)
Error: .\stream\stm_misc.c(103) : fz_readall() - cannot resize scratch buffer
Error: .\stream\stm_buffer.c(83) : fz_growbuffer() - outofmem: resize buffer memory

As you can see, it only processed 1/6th of a book before its death; so, in order to process the entire book, one would need x64 binaries and 12GB or more of free RAM, which seems ridiculous... or to fix that memory leak someway.

Nathan Campos
11-28-2011, 10:59 AM
Is there any Mac OS X port of this? I've tried compiling it on my one, but I'm very bad at C/C++

naxa
01-31-2012, 05:57 AM
All hails!

You've just put an end to my years of struggling with pdf margins!

May your name be celebrated for years to come and even after the day pdf was forgotten!

earthbear
02-15-2012, 09:10 PM
This is awesome!!!! Thankyou!:)

mikeww
05-24-2012, 12:59 PM
Hi, I'm trying to use soPDF but I'm getting errors:


C:\soPdf>sopdf -i "Larry Harris - Trading and Exchanges 2003.pdf"

soPdf ver 0.1 alpha Rev 12
A program to reformat pdf file for sony reader

Input : Larry Harris - Trading and Exchanges 2003.pdf
Output: Larry Harris - Trading and Exchanges 2003.pdfout.pdf


Processing input page : 2Error: .\processPdf.cpp(403) : processErrorPage() - Can
not process page 2
Error: .\mupdf\pdf_page.c(241) : pdf_loadpage() - cannot load page resources
Error: .\mupdf\pdf_resources.c(430) : pdf_loadresources() - cannot load xobjec
t resource
Error: .\mupdf\pdf_resources.c(165) : preloadxobject() - cannot load image re
source 1984
Error: .\mupdf\pdf_stream.c(505) : pdf_loadstream() - cannot open stream (19
84)
Error: .\mupdf\pdf_stream.c(454) : pdf_openstream() - cannot create filter
Error: .\mupdf\pdf_stream.c(306) : pdf_buildfilter() - cannot create filte
r
Error: .\mupdf\pdf_stream.c(127) : buildonefilter() - unknown filter name
(JPXDecode)


Can someone help me?

EDIT:
I guess I should mention, this pdf had security on it, and I used http://www.freemypdf.com/ to unlock it. Perhaps that corrrupted the file..

ectoplasm
10-13-2012, 09:13 PM
This is actually pretty sweet for automatically cropping text based PDF page margins. This is the first tool I found that does this automatically. If there are others, please comment. I'm not interested in the programs where you have to select a region by hand.

willus
10-14-2012, 10:27 AM
This is actually pretty sweet for automatically cropping text based PDF page margins. This is the first tool I found that does this automatically. If there are others, please comment. I'm not interested in the programs where you have to select a region by hand.
Try the PDF forum thread listing (http://www.mobileread.com/forums/forumdisplay.php?f=184)--the top half dozen threads which are listed with a "sticky" icon next to them all discuss PDF tools for mobile reading--almost all of them do some form of auto-cropping.

markom
10-14-2012, 01:19 PM
This is actually pretty sweet for automatically cropping text based PDF page margins. This is the first tool I found that does this automatically. If there are others, please comment. I'm not interested in the programs where you have to select a region by hand.

But if our PDF is image with text layer in the background we should be very much interested, because often we should first crop such PDF in Briss or PdfScissors, A-Pdf page crop etc. and then and only then use soPdf or k2pdfopt for much better result.

So it is 2 or 3 step process for PDF image.

1. Quick OCR-ing by Abyy, Acrobat etc. because there is usually no need for a great OCR behind the image.
2. Cropping roughly by Briss, eliminating headers/footers if needed (soPdf removes headers/footers like page numbers automatically).
3. Cropping in soPdf or k2pdfopt.

Often k2pdfopt should be enough as standalone (i.e. 1 step process) though, even for pure image (non OCR-ed).

With soPdf OCR layer stays there after cropping and PDF is about the same size i.e no rasterization involved that makes PDF bigger as with k2pdfopt.

Example:

1st picture is original, 8 pages of scanned pdf OCR-ed.
2nd picture is that original croped by Briss (just roughly i.e. not getting very close to the text proper but headers cropped)
3d picture is original cropped by briss and then cropped additionally in soPdf (to fit hight).
4th picture is original cropped in soPdf directly.

1http://s16.postimage.org/a6x850zu9/original.jpg (http://postimage.org/image/a6x850zu9/) 2http://s16.postimage.org/p45p61d2p/original_croppedin_Briss.jpg (http://postimage.org/image/p45p61d2p/) 3http://s16.postimage.org/sca6j2zch/original_croppedin_Briss_pdfout.jpg (http://postimage.org/image/sca6j2zch/) http://s16.postimage.org/jvf9lwk1d/original_pdfout.jpg (http://postimage.org/image/jvf9lwk1d/) 4 -click on a picture to enlarge view

As we can see soPdf didn't cut those two left margins on two pages (4th picture) when directly applied, whereas after cropping in Briss soPdf cropped those two margins correctly and we eliminated headers/footers by Briss also.

Briss and soPdf or k2pdfopt are complementary because usually there are pages that stick out in Briss (inch or half of an inch from stacked majority on odd or even pages) and we can freely include them all for cropping if we are to use soPdf or k2pdfopt after Briss for very precise cropping.

markom
11-20-2012, 04:54 PM
This is actually pretty sweet for automatically cropping text based PDF page margins. This is the first tool I found that does this automatically. If there are others, please comment. I'm not interested in the programs where you have to select a region by hand.

There is VeryDOC PDF-Margin-Crop from a few years ago.

http://www.verydoc.com/pdf-margin-crop.html

It is not freeware but we can have about 40 trial croppings.

Just enter some margin value like 5X5x5x5 points or 10x10x10x10 and it will crop text based pdf nicely.

For pdf scan (with or without ocr layer) it is again (as in the case of soPdf) advisable to crop pdf roughly beforehand by Pdf-Scissors or Briss and then apply PDF Margin Crop, results were always good for me that way.


I also used to crop margins of text based pdfs by printing them in virtual printer, in Adobe Reader, Acrobat etc.

We should first check exact dimensions of text (usually under comments/measure/distance) and then print with auto-center box checked and corresponding Width x Hight.