Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > PDF

Notices

Reply
 
Thread Tools Search this Thread
Old 11-15-2008, 11:03 PM   #1
theguru
Member
theguru doesn't littertheguru doesn't litter
 
Posts: 19
Karma: 139
Join Date: Nov 2008
Device: Sony PRS-505
soPdf - Better than Yet another PDF to LRF converter

I really liked the pdflrf tool from the "Yet another PDF to LRF converter" thread, but it has been taken down by the moderator for violation of GPL and has been down for quite some time because it seems like the author is not interested in providing the source for his tool. But there are some issues with the pdflrf tool.
  1. pdflrf renderes the pdf into image and then creates the lrf file.
    This makes the 4mb pdf file grow into more than 40mb file.
  2. No text information is preserved because of the image conversion
  3. Very slow
  4. No source for the tool <-- biggest disadvantage
So I decided to write a tool for myself. soPdf is a pdf formatter for sony reader. It is based on sumatrapdf's version of mupdf and fitz.

The advantages of soPdf over pdflrf
  1. Pdf to Pdf conversion
  2. Text and other contents of pdf are preserved
  3. Size of the output file is very close to size of input file
    and in some cases smaller than input file.
  4. Super fast conversion compared to pdflrf.
  5. Source available to make further changes !!!!!! <-- biggest advantage
The disadvantages over pdflrf
  1. Cannot yet convert the comic book. It can still split the image pdfs into two.
  2. soPdf is in alpha stage. (ver 0.1). There may be lots of bugs to be found yet. At least all of the mupdf bugs.
  3. ???
soPdf command line options
Code:
about: soPdf
   author: Navin Pai, soPdf ver 0.1 alpha
usage:
   soPdf -i file_name [options]
   -i file_name   input file name
   -p password  password for input file
   -o file_name  output file name
   -w               turn off white space cropping
                        default is on
   -m nn           mode of operation
                       0 = fit 2xWidth *
                       1 = fit 2xHeight
                       2 = fit Width
                       3 = fit Height
                       4 = smart fit Width (not yet implemented)
                       5 = smart fit Height (not yet implemented)
   -v nn          overlap percentage
                       nn = 2 percent overlap *
   -t title         set the file title
   -a author     set the file author
   -b publisher  set the publisher
   -c category  set the category
   -s subject    set the subject
   -e               proceed with errors
   -r               reverse landscape

   * = default values
The conversion algorithm is as follows
  1. If user specified Fit2xWidth or Fit2xHeight then simply make two copies of pdf page from source into destination pdf file.
  2. Render the page and get the actual boundary box that encompasses all of the content in the page. This step removes all the white space border of the page.
  3. If page cannot be rendered by mupdf and error option is specified then split the page w/o rendering by setting the MediaBox of the page.
  4. Try to split the file first by iterating all the elements that can fit in half a page and if that does not work then split the file half way with 2% overlap (this can be changed).
  5. If FitWidth or Fit2xWidth is specified then rotate the page by -90 deg.
Source code for soPdf is available from google code.
http://sopdf.googlecode.com

To compile the source code you will need Visual Studio 8.0 (Even free edition will work). Visual studio is not required if you just want to run the soPdf tool. If you are having issues running the binary then make sure you have VC runtime library. You can download the VC runtime library from Microsoft website.

Coming soon
  • Output to image pdf - for complex pdf that renders slowly on the reader devices.
Update 0.1 Rev 12
  • Added reverse landscape mode. Ever wished that you could hold your reader the other way around in landscape mode and scroll thru the pages using your right thumb. Use reverse landscape mode and start reading from last page onwards.
Update 0.1 Rev 10
  • Proceed with error option. With this option, soPdf can now process any pdf file, even the ones mupdf cannot handle. If mupdf cannot load the contents then it simply splits the page into two w/o any processing. The disadvantage is that the white space border in this case is not removed but you can still get a pdf output file.
  • Set subject of the pdf file option
  • Fixed stack over flow when processing complex pdf files
  • Better clipping algorithm
Update 0.1 Rev 7
  • Work around a mupdf bug where it is not able to allocate oid and gid numbers. This prevented some of the files from being split properly.
Attached Files
File Type: pdf ebooktestin.pdf (867.6 KB, 5845 views)
File Type: pdf ebooktestout.pdf (904.2 KB, 5897 views)
File Type: pdf ebooktestreverseout.pdf (904.2 KB, 3320 views)
File Type: zip soPdf.zip (895.1 KB, 16833 views)

Last edited by theguru; 11-23-2008 at 10:07 PM. Reason: Bug fixes
theguru is offline   Reply With Quote
Old 11-16-2008, 07:42 AM   #2
godel10
Connoisseur
godel10 doesn't littergodel10 doesn't littergodel10 doesn't litter
 
Posts: 80
Karma: 204
Join Date: Jun 2007
Device: Sony Librie, Irex DR1000S
Thanks for your effort.

I am not an user of Windows, so I wonder if anyone could upload an example of an input file and an output file.
godel10 is offline   Reply With Quote
Old 11-16-2008, 09:54 AM   #3
ProDigit
Karmaniac
ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.
 
ProDigit's Avatar
 
Posts: 2,157
Karma: 9023682
Join Date: Oct 2008
Location: Miami FL
Device: PRS-505, Jetbook, Jetbook Mini, Jetbook Color, Astak Ez Reader Pro
I'd suggest you to try recoding the prs-505's manual again,it seems kind of buggy!
ProDigit is offline   Reply With Quote
Old 11-16-2008, 11:55 AM   #4
ddavtian
Addict
ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.
 
Posts: 262
Karma: 332
Join Date: Nov 2003
Location: San Francisco, USA
Device: Sony 505 & 900, Kindle DX, Samsung Galaxy Tab, EVO
Does this mean I need VC runtime (have no idea what it means)?


Error: .\mupdf\pdf_xref.c(459) : pdf_loadindirect() - cannot load indirect objec
t 1586
Error: .\mupdf\pdf_xref.c(442) : pdf_loadobject() - cannot load object 1586 into
cache
Error: .\mupdf\pdf_xref.c(416) : pdf_cacheobject() - found object 1636 0 instead
of 1586 0
ddavtian is offline   Reply With Quote
Old 11-16-2008, 01:09 PM   #5
theguru
Member
theguru doesn't littertheguru doesn't litter
 
Posts: 19
Karma: 139
Join Date: Nov 2008
Device: Sony PRS-505
Quote:
Originally Posted by ProDigit View Post
I'd suggest you to try recoding the prs-505's manual again,it seems kind of buggy!
This bug has been fixed.

Last edited by theguru; 11-17-2008 at 01:30 AM.
theguru is offline   Reply With Quote
Old 11-16-2008, 01:12 PM   #6
theguru
Member
theguru doesn't littertheguru doesn't litter
 
Posts: 19
Karma: 139
Join Date: Nov 2008
Device: Sony PRS-505
Quote:
Originally Posted by ddavtian View Post
Does this mean I need VC runtime (have no idea what it means)?


Error: .\mupdf\pdf_xref.c(459) : pdf_loadindirect() - cannot load indirect objec
t 1586
Error: .\mupdf\pdf_xref.c(442) : pdf_loadobject() - cannot load object 1586 into
cache
Error: .\mupdf\pdf_xref.c(416) : pdf_cacheobject() - found object 1636 0 instead
of 1586 0
It means that there is error in your pdf file. Check if the pdf file can be loaded by sumatrapdf viewer. If the file cannot be handled by sumatrapdf viewer then soPdf cannot handle the file as well.
theguru is offline   Reply With Quote
Old 11-16-2008, 03:52 PM   #7
=X=
Wizard
=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.
 
=X='s Avatar
 
Posts: 3,672
Karma: 12205348
Join Date: Mar 2008
Device: Galaxy S, Nook w/CM7
Quite an excellent app. This tool provides the feature I have been sorely looking for. I have some scripts that do remove the margins but none provided this level of success. I have a feeling this tool will become my new favorite PDf tool.

This tool does struggle with the more complicated PDF but for those there are PDFLRF/PDFRead/PaperCrop

Thanks.


One recommendation is since the tool is written in CPP there is no reason to tie it to one platform. There is a surprising large number of users on this board that use Linux/Mac OSX.


Thank you,
=X=

Last edited by =X=; 11-16-2008 at 06:53 PM.
=X= is offline   Reply With Quote
Old 11-16-2008, 05:22 PM   #8
theguru
Member
theguru doesn't littertheguru doesn't litter
 
Posts: 19
Karma: 139
Join Date: Nov 2008
Device: Sony PRS-505
I am working on fixing the bugs for the complicated pdf's. And yes it can be easily ported to any platform. There is no platform specific stuff in the code and since the source is available, anyone who is interested in creating a port for Linux/Mac is welcome to do so.
theguru is offline   Reply With Quote
Old 11-17-2008, 11:36 AM   #9
ProDigit
Karmaniac
ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.
 
ProDigit's Avatar
 
Posts: 2,157
Karma: 9023682
Join Date: Oct 2008
Location: Miami FL
Device: PRS-505, Jetbook, Jetbook Mini, Jetbook Color, Astak Ez Reader Pro
So far I've only managed to get PDF to PDF working here.
So how do I convert it to LRF,or do you suggest keeping those documents in PDF?

(BTW thank you for the program,I've only had a brief look at it)
ProDigit is offline   Reply With Quote
Old 11-17-2008, 11:46 AM   #10
ProDigit
Karmaniac
ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.
 
ProDigit's Avatar
 
Posts: 2,157
Karma: 9023682
Join Date: Oct 2008
Location: Miami FL
Device: PRS-505, Jetbook, Jetbook Mini, Jetbook Color, Astak Ez Reader Pro
So far I've only managed to get PDF to PDF working here.
So how do I convert it to LRF,or do you suggest keeping those documents in PDF?

(BTW thank you for the program,I've only had a brief look at it)
ProDigit is offline   Reply With Quote
Old 11-17-2008, 12:35 PM   #11
theguru
Member
theguru doesn't littertheguru doesn't litter
 
Posts: 19
Karma: 139
Join Date: Nov 2008
Device: Sony PRS-505
Quote:
Originally Posted by ProDigit View Post
So far I've only managed to get PDF to PDF working here.
So how do I convert it to LRF,or do you suggest keeping those documents in PDF?

(BTW thank you for the program,I've only had a brief look at it)
That was the original plan. I wanted to keep the files in pdf format. These reformatted pdf files can be read easily on the reader.
theguru is offline   Reply With Quote
Old 11-17-2008, 04:11 PM   #12
=X=
Wizard
=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.
 
=X='s Avatar
 
Posts: 3,672
Karma: 12205348
Join Date: Mar 2008
Device: Galaxy S, Nook w/CM7
Okay I found my first bug. It seems bookmarks on PDF are getting removed from the PDF.

These are bookmarks in the PDF used by the SONY reader for the table of contents.

=X=
=X= is offline   Reply With Quote
Old 11-17-2008, 04:15 PM   #13
DDHarriman
Guru
DDHarriman can extract oil from cheeseDDHarriman can extract oil from cheeseDDHarriman can extract oil from cheeseDDHarriman can extract oil from cheeseDDHarriman can extract oil from cheeseDDHarriman can extract oil from cheeseDDHarriman can extract oil from cheeseDDHarriman can extract oil from cheeseDDHarriman can extract oil from cheese
 
Posts: 851
Karma: 1200
Join Date: Feb 2008
Location: Almada, Portugal
Device: Cybook Gen3, Sony PRS 505, Kindle DXG and Samsung Galaxy Note
Not bad at all.

Big files resulting from scanning still crash my Cybook or are so slow that hey are useless, but text ones (with images or not) are quite good!

One more info: does not convert PDF files with monochrome images JBIG2 compressed, CCITT Group 4 compressed, no problem.

Bets regards,
DDHarriman is offline   Reply With Quote
Old 11-17-2008, 04:39 PM   #14
theguru
Member
theguru doesn't littertheguru doesn't litter
 
Posts: 19
Karma: 139
Join Date: Nov 2008
Device: Sony PRS-505
Quote:
Originally Posted by =X= View Post
Okay I found my first bug. It seems bookmarks on PDF are getting removed from the PDF.

These are bookmarks in the PDF used by the SONY reader for the table of contents.

=X=
I am aware of the issue. I do not quite understand how the bookmarks work in PDF file. I will update the tool once I do or if anyone here does understand how bookmarks work then they are free to update the source.
theguru is offline   Reply With Quote
Old 11-17-2008, 06:29 PM   #15
DDHarriman
Guru
DDHarriman can extract oil from cheeseDDHarriman can extract oil from cheeseDDHarriman can extract oil from cheeseDDHarriman can extract oil from cheeseDDHarriman can extract oil from cheeseDDHarriman can extract oil from cheeseDDHarriman can extract oil from cheeseDDHarriman can extract oil from cheeseDDHarriman can extract oil from cheese
 
Posts: 851
Karma: 1200
Join Date: Feb 2008
Location: Almada, Portugal
Device: Cybook Gen3, Sony PRS 505, Kindle DXG and Samsung Galaxy Note
One question: any plans on evolving into a gui with options to choose from?

Last edited by DDHarriman; 11-17-2008 at 06:45 PM.
DDHarriman is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Yet another PDF to LRF converter cacapee LRF 583 11-28-2011 06:50 AM
comiclrf - Comics(CBZ) to LRF converter FangornUK LRF 274 06-16-2010 02:24 PM
Quick/easy LIT to LRF converter? OUTATIME Sony Reader Dev Corner 10 02-29-2008 09:44 AM
Anyone else want chm to lrf converter? buster Sony Reader 10 02-09-2008 05:07 PM
PRS-500 Linux based HTML to LRF converter? Thiana Sony Reader Dev Corner 3 04-08-2007 02:34 AM


All times are GMT -4. The time now is 08:43 PM.


MobileRead.com is a privately owned, operated and funded community.