Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > PDF

Notices

Reply
 
Thread Tools Search this Thread
Old 05-10-2010, 12:39 AM   #1
Pranananda
Connoisseur
Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.
 
Pranananda's Avatar
 
Posts: 97
Karma: 115862
Join Date: Apr 2010
Location: Humboldt County, California
Device: ipad, iPod touch, JetBook Lite
pdfreflow: reflow text PDFs

PDFReflow is a utility that reflows PDF text. Its input is a PDF document, to which it reflows the text, removes page number, header, footers, and hyphenation, and generates an HTML file output.

The reflow logic for PDFReflow is in the command line utility pdfreflow. The graphical user interfaces also uses the pdftohtml command to generate the XML input for pdfreflow

Graphical User Interface

There is now a graphical user interface for Windows, Mac, and Ubuntu (actually, any platform that runs java)

Windows: Download the attached PDFReflow-0.8.6.1-Setup.zip. Extract PDFReflow-0.8.6.1-Setup.exe from the zip file, and run it. It contains all the necessary binaries (ie pdfreflow.exe and pdftohtml.exe) to run. Requires Windows XP or newer. You must have Java installed. You can download java at http://java.com/download.

Mac: Download the attached PDFReflow-0.8.6.1.dmg.zip. Extract PDFReflow-0.8.6.1.dmg from the zip file, open it, and copy PDFReflow.app into your /Applications folder. Don't install Java, as Mac comes with Java installed. Requires Mac OS X Leopard and Snow Leopard.

Ubuntu, Linux: Download PDFReflow-0.8.6.1.jar.zip. Extract PDFReflow-0.8.6.1.jar from the zip file. You must separately download the pdftohtml and pdfreflow command line utilties, and they must be in your path. Java must be installed. Download java at http://java.com/download. To run:
Quote:
java -jar PDFReflow-0.8.6.1.jar
The Command Line Interface
In the attached pdfreflow-0.8.6.zip file is a version that will run under Windows XP, Mac OSX 10.5 Leopard, and Ubuntu 8.04 (and later). There is also pdfreflow.html, which is documentation of how to use the command, and how to find a prebuilt version of pdftohtml from Poppler.

The open source of pdfreflow is copyrighted under GNU GPL, and source is available at SourceForge.
Synopsis
pdfreflow [options] [filename]
Description
Pdfreflow, in conjunction with pdftohtml, will convert a PDF into a reflowed HTML file. Pdfreflow operates on the XML output from pdftohtml (from the Poppler utilities), converting it into an HTML file. To get the XML input for pdfreflow, use pdftohtml as follows:
Quote:
pdftohtml -xml mybook.pdf
The output of pdftohtml is in the file mybook.xml.

General Usage

Pdfreflow is oriented for operating on ebook PDFs, text based only, with minimal formatting, the kind of formatting you would get reading a fiction novel. By default pdfreflow expects justified text, but you can specify the input is rag right with the following option:
Quote:
pdfreflow --ragright mybook.xml
The output of pdfreflow is in the file mybook.html.

You might not want to reflow every page in your ebook. To specify which pages are NOT to be reflowed, use the following option:
Quote:
pdfreflow --dontreflow="1-6,10,198-201" mybook.xml
The ‑‑dontreflow option takes a comma separated list of page ranges. The first page in a book is page 1. Also, the page number is not the printed page number, but the page number that shows in the thumbnail view of PDF viewers like Acrobat, Preview, Evince, etc.

Cropping

While pdfreflow does its best to remove page numbers, headers and footers, you may have to assist by specifying the cropping options, ‑‑top=TOP_Y and ‑‑bottom=BOTTOM_Y. To find the Y values of a header or footer, you need to look inside the .xml file and find line of text that contains the header or footer. A sample entry looks as follows:
Quote:
<text top="36" left="203" width="209" height="11" font="0">Self Knowledge</text>

<text top="506" left="506" width="209" height="11" font="0">Self Realization</text>

pdfreflow --top=36 --bottom=506 mybook.xml
In this example, every text line that has a "top" value less than or equal to 36 will be cropped, and every text line that has a "top" value that is greater than or equal to 506 will be cropped.

Centered Text

Pdfreflow does its best to detect centered text. Sometimes, especially with rag right text, it is hard to detect the center point. To improve the center detection, you can specify a line in your document that is centered by specifying the page number and line number of a centered line. For example, if the 2nd line on page 3 is a centered line, you specify this with page:line argument to the ‑‑center option as follows (page numbers and line numbers both start at 1).
Quote:
pdfreflow --center=3:2 mybook.xml
To discover the line number to specify for the ‑‑center option, you can used the ‑‑print options to print out the contents of a page with linenumbers to the output.
Quote:
pdfreflow --print=3 mybook.xml
Reflow Specified Pages

It is also possible to only reflow a subset of the ebook by specifying the ‑‑first=FIRSTPAGE and ‑‑last=LASTPAGE options. This is useful if a book has sections with vastly different formatting. Create a different HTML file for each differently formatted section, and either concatenate the files together, or if you are creating an e-book, this step is not necessary as it is possible to specify multiple HTML files as input to ebook creation software.
Quote:
pdfreflow --first=1 --last=100 mybook.xml
cp mybook.html section1.html
pdfreflow --first=101 --last=200 mybook.xml
cp mybook.html section2.html
Files
If the filename command line argument is specified, file suffix is replace with .html and the ouput is written to that file, i.e. an input file of mybook.xml has an output file mybook.html. If no input file is specified, standard in used as the input, and standard out is the output.
Quote:
pdfreflow < mybook.xml > out.html
Options
Here is the usage output for pdfreflow.
Code:
usage: pdfreflow [options] [inputfile] 
Options:
       --absolute            font sizes are the same as the original document
                                (not the default)
  -b, --bottom=MAXTOP crop text whose top is greater than or equal to maxtop
  -c, --center=SPEC	argument is page:line, ie 2:1 is line 1 on page 2
                                is a centered line (sometimes this hint is needed)
  -d, --dontreflow=PAGES don't reflow comma separated page ranges,
                                i.e. "1,2,4-9,100"
  -f, --first=FIRSTPAGE starting page (default is 1)
  -l, --last=LASTPAGE   ending page (default is last page of the document)
      ‑‑nonfiction      for books that use block quoting at the same
                                inset as the paragraph indent
  -r, --ragright             text is rag-right, NOT justify (default is justify)
  -t, --top=MINTOP	crop text whose top is less than or equal to mintop

      ‑‑shortlines      paragraphs end with short lines (only necessary
                                for rag right documents with no paragraph
                                indent and no after paragraph spacing.
       --showdebug        print debugging options
  -v, --version             print current version
  -?, --help                 print this help
Example
Options can be combined. An example using a combination of the options in the description section is:
Quote:
pdfreflow --dontreflow="1-6,10,198-201" --top=36 --bottom=506 mybook.xml
Troubleshooting
While pdfreflow tries it best, sometimes it can not correctly reflow all documents. Here are some tips to get a better output document.

Paragraph are too large

If your book does not have paragraph indenting or vertical spacing after every paragraph, too much text may be reflowed into each paragraph. You might try the ‑‑shortlines option. The argument is a percentage between 1 and 100. If 0 is specified, you get the default value (currently 80). This percentage is used against the longest line width in the document, and lines that are shorter than this percentage are considered the end of a paragraph.
Quote:
pdfreflow --shortlines=0 mybook.xml
Paragraph are incorrectly reflowed

If your input document is not justified, make sure you specified the ‑‑ragright option.

Pdfreflow is configured to deal with fiction, which often has indented paragraphs and/or vertical spacing after a pararaph. If your book has indenting, but is not fiction with dialog, try using the ‑‑nonfiction option.
Quote:
pdfreflow --nonfiction mybook.xml
If your book has vastly differently formatted sections, you might try look at the Reflow Specified Pages section above.
Limitations
  • Only simple book formats are supported. This is not a general purpose reflower for a MS Word or desktop publishing document. Pictures are not supported.
  • Mutiple columns are not supported.
  • Footnotes will cause problems. At this point they just show up wherever they are in the paragraph, potentially splitting a paragraph into two pieces.
Getting pdfreflow
There are binaries for Windows XP, Ubuntu 8.04, and Mac OSX 10.5 (and later) attached to this post. The open source of pdfreflow is copyrighted under GNU GPL, and source is available at SourceForge.
Getting pdftohtml

To get a copy of pdftohtml, without building it from source, here are some options:

Ubuntu: Use Synaptic Package Manager to fetch poppler-utils

Macintosh: Download Calibre for Mac. There is a copy of pdftohtml inside of Calibre.app under /Applications/calibre.app/Contents/Frameworks/
Quote:
PATH=$PATH:/Applications/calibre.app/Contents/Frameworks
htmltopdf -xml mybook.pdf
Windows: Download Calibre for Windows. There is a copy of pdftohtml inside of Calibre under C:\Progam Files\Calibre2. Make sure to add C:\Progam Files\Calibre2 and C:\Progam Files\Calibre2\DLLs to your path, ie:
Quote:
PATH=%PATH%;C:\Progam Files\Calibre2;C:\Progam Files\Calibre2\DLLs
htmltopdf -xml mybook.pdf
Prana
Attached Files
File Type: zip pdfreflow-0.8.6.zip (102.6 KB, 1241 views)
File Type: zip PDFReflow-0.8.6.1-Setup.zip (2.31 MB, 2484 views)
File Type: zip PDFReflow-0.8.6.1.dmg.zip (1.98 MB, 1072 views)
File Type: zip PDFReflow-0.8.6.1.jar.zip (62.3 KB, 908 views)

Last edited by Pranananda; 05-29-2010 at 04:06 AM. Reason: see release notes for 0.8.6.1 user interface
Pranananda is offline   Reply With Quote
Old 05-10-2010, 05:13 PM   #2
frabjous
Wizard
frabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameter
 
frabjous's Avatar
 
Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
Thanks for uploading this.

Popper's pdftohtml is what calibre uses when converting PDFs, isn't it? So the output should be a lot like calibre's, except it'll give you HTML, which is nice. (You'd have to resort to some workarounds to save it as html using calibre alone...)

Or does someone know better than I?

I assume the Ubuntu executable will work under 10.04 too?
frabjous is offline   Reply With Quote
Old 05-10-2010, 11:01 PM   #3
Pranananda
Connoisseur
Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.
 
Pranananda's Avatar
 
Posts: 97
Karma: 115862
Join Date: Apr 2010
Location: Humboldt County, California
Device: ipad, iPod touch, JetBook Lite
frabjous,

Yes, calibre uses pdftohtml. People who don't want to build pdftohtml from source can use the copy found inside of calibre. But if you are on Ubuntu, you can use Synaptic to install poppler-utils.

The output of pdfreflow is going to have multiline paragraphs rather than the 1 line paragraphs of the default pdftohtml.

The Ubuntu executable will also run on 10.4 (I just tried it.)
Pranananda is offline   Reply With Quote
Old 05-13-2010, 12:19 AM   #4
frabjous
Wizard
frabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameter
 
frabjous's Avatar
 
Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
Nice tool.

It doesn't seem to work with double spaced PDFs, though you're aiming towards PDF ebooks which are unlikely to come double spaced. (This problem may be on the pdftohtml end... not sure.)
frabjous is offline   Reply With Quote
Old 05-13-2010, 02:17 AM   #5
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 1,432
Karma: 846401
Join Date: Jan 2009
Device: KoboGlo
Smile

It did process something but I am a little dumb how to use it. I put it in the same folder as my working file to begin with. I thought it could not harm.

roger@roger-laptop:~$ cd /home/roger/Bureau/pdfreflow-0.8.3/ubuntu/
roger@roger-laptop:~/Bureau/pdfreflow-0.8.3/ubuntu$ pdftohtml -xml Bowden\,\ Mark\ -\ Killing\ Pablo.pdf
Page-1
Page-2
.../...
Page-183
Page-184
roger@roger-laptop:~/Bureau/pdfreflow-0.8.3/ubuntu$

But I do not know what to do with the resulting xml file. Sorry for that.

Last edited by roger64; 05-13-2010 at 06:01 AM.
roger64 is offline   Reply With Quote
Old 05-13-2010, 03:09 AM   #6
Pranananda
Connoisseur
Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.
 
Pranananda's Avatar
 
Posts: 97
Karma: 115862
Join Date: Apr 2010
Location: Humboldt County, California
Device: ipad, iPod touch, JetBook Lite
Frabjous,

Yes, pdfreflow is not going to like double spaced text, as it will see the double spaced lines as new paragraphs. I can add an option to make this work though.

--update

I think I can make this work without adding a new option, but just detect that the lines are double spaced. I'll put this in the next update.

Last edited by Pranananda; 05-13-2010 at 03:32 AM.
Pranananda is offline   Reply With Quote
Old 05-13-2010, 03:15 AM   #7
Pranananda
Connoisseur
Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.
 
Pranananda's Avatar
 
Posts: 97
Karma: 115862
Join Date: Apr 2010
Location: Humboldt County, California
Device: ipad, iPod touch, JetBook Lite
Roger64

Try pdfreflow Bowden\,\ Mark\ -\ Killing\ Pablo.xml

Or perhaps ./pdfreflow Bowden\,\ Mark\ -\ Killing\ Pablo.xml

Also, read the pdfreflow.html to see the command line options.
Pranananda is offline   Reply With Quote
Old 05-13-2010, 06:04 AM   #8
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 1,432
Karma: 846401
Join Date: Jan 2009
Device: KoboGlo
Thank you

pdfreflow is amazingly quick and efficient for reflowing text-based PDF thru xml !!
A great tool. Congratulations and thanks.

PS: With Ubuntu I needed to use: ./pdfreflow ...

Last edited by roger64; 05-13-2010 at 06:07 AM.
roger64 is offline   Reply With Quote
Old 05-13-2010, 12:04 PM   #9
frabjous
Wizard
frabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameter
 
frabjous's Avatar
 
Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
Quote:
Originally Posted by Pranananda View Post
I think I can make this work without adding a new option, but just detect that the lines are double spaced. I'll put this in the next update.
That would be excellent. Don't rush on my part, though...
frabjous is offline   Reply With Quote
Old 05-13-2010, 04:36 PM   #10
Pranananda
Connoisseur
Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.
 
Pranananda's Avatar
 
Posts: 97
Karma: 115862
Join Date: Apr 2010
Location: Humboldt County, California
Device: ipad, iPod touch, JetBook Lite
I've posted pdfreflow-0.8.4.zip to the original post. Here are the release notes:
  • now building for Ubuntu 8.04 Hardy Heron and Mac OSX 10.5 Leopard (and later)
  • documents using double spaced lines are supported
  • don't print all debug options in --help, but added --showdebug option instead
  • documents with only large fonts would not reflow correctly
  • added --lineheight debugging option to print line height frequency
Pranananda is offline   Reply With Quote
Old 05-13-2010, 09:51 PM   #11
frabjous
Wizard
frabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameter
 
frabjous's Avatar
 
Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
Seems to work well now with double-spaced PDFs... or at least the one I tried on. Thanks a lot!

My dream tool for something like this would be able to recognize footnotes and treat them appropriately, but knowing how PDFs work (and the fact that they don't semantically mark footnotes as such), this is probably a pipe dream.

A more reasonably accomplished feature would break up typographical ligatures, though I could script this myself easily enough with sed or similar.

roger, if you want to be able to use it without using ./ before it, just copy the executable into your PATH, such as into the ~/bin/ folder (restart bash if need be).

Last edited by frabjous; 05-13-2010 at 09:56 PM.
frabjous is offline   Reply With Quote
Old 05-14-2010, 10:26 AM   #12
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 1,432
Karma: 846401
Join Date: Jan 2009
Device: KoboGlo
Quote:
Originally Posted by frabjous View Post

roger, if you want to be able to use it without using ./ before it, just copy the executable into your PATH, such as into the ~/bin/ folder (restart bash if need be).
Thanks for the tip.
roger64 is offline   Reply With Quote
Old 05-22-2010, 07:44 PM   #13
Pranananda
Connoisseur
Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.
 
Pranananda's Avatar
 
Posts: 97
Karma: 115862
Join Date: Apr 2010
Location: Humboldt County, California
Device: ipad, iPod touch, JetBook Lite
I've posted pdfreflow‑0.8.5.zip to the original post. Here are the release notes:
  • No more small fonts! The HTML output now uses relative font sizes, i.e. font‑size=120% versus font‑size=12px. Its possible to specify the previous behavior with absolute font sizes using the ‑‑absolute flag. See pdfreflow.html for more info.
  • Sometimes pdfreflow can't find the center X position of the page. This happens with rag right documents. Added a way to specify where the center X position to use for centered paragraphs with the ‑‑center=line_spec option. See pdfreflow.html for more info.
  • The HTML output and imbedded CSS styles are simpler because of using the default font more often.
  • Fixed lots of bugs,updated pdfreflow.html

Last edited by Pranananda; 05-24-2010 at 03:00 AM.
Pranananda is offline   Reply With Quote
Old 05-24-2010, 02:50 AM   #14
Pranananda
Connoisseur
Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.Pranananda can see what is invisible to the naked eye.
 
Pranananda's Avatar
 
Posts: 97
Karma: 115862
Join Date: Apr 2010
Location: Humboldt County, California
Device: ipad, iPod touch, JetBook Lite
I've posted pdfreflow-0.8.6.zip to the original post. Here are the release notes:
  • Added --shortlines=PERCENT option, to help with documents that don't use indented paragraphs and don't have vertical spacing after paragraphs. The argument is a value between 1 and 100. If 0 is specified, the the default value is used (80%). If a paragraph has a line that is less than the specified percentage of the longest line, it will be considered the end of a paragraph. This option is only necessary for poorly formatted fiction books, or perhaps for ebooks that are oriented for very tiny screens, that don't want to waste any vertical spacing or lose the space from the paragraph indent.
  • Added --nonfiction option, to specify that short lines don't necessarily mean end of paragraph. This is not necessary for typical fiction books that use either indented paragraphs or have vertical spacing after a paragraph. It is necessary for books that use block quoting that has an inset margin that is the same as the paragraph indent.
  • Added --print option, to print out the contents of a single page with line numbers, to standard error. This is useful for determining the line number argument for the --center option
  • And bug fixes, of course!
Pranananda is offline   Reply With Quote
Old 05-24-2010, 09:15 PM   #15
greenapple
Evangelist
greenapple will become famous soon enoughgreenapple will become famous soon enoughgreenapple will become famous soon enoughgreenapple will become famous soon enoughgreenapple will become famous soon enoughgreenapple will become famous soon enough
 
Posts: 400
Karma: 664
Join Date: Dec 2009
Device: Kindle Paperwhite, Kindle DX, Kobo Aura HD
This sounds like a very useful tool. Could you also make a front-end, windows GUI for this? I'm not very good with DOS stuff. Thanks.
greenapple is offline   Reply With Quote
Reply

Tags
pdf, reflow, utility

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
<pre> tags and no text reflow in EPUB sergio blum Calibre 24 10-14-2010 08:07 PM
What is the best reader read real reflow PDF ( not refow text ) ? familyhandh Which one should I buy? 1 08-05-2010 08:44 AM
Help with reflow text file siulayhumga Workshop 9 07-31-2010 06:36 PM
80-column text reflow - Hanlin V3 elewton Other formats 1 02-10-2009 05:00 AM
Now that the Sony 505 can reflow PDFs ... mollybo Sony Reader 6 07-27-2008 11:29 PM


All times are GMT -4. The time now is 04:25 PM.


MobileRead.com is a privately owned, operated and funded community.