Getting your PDF document to reflow could be as simple as:
The name of the input PDF Document. You can use the Open... button to choose a PDF document from your computer.
This field specifies the top Y position in the document for which each line whose Y position is less than or equal to this value will be eliminated. The top of the page starts at Y position 0, and the bottom of the page will have a large Y value. To discover the Y position of the lines you might want to crop, for now you must go into the XML file (in the same directory and root name as your PDF document, with a .xml extension), open the file with an editor or browser, and find the line of text that corresponds to a header you want to eliminate. The Y position of that line is in the top="xx" field. Every line in the document that has a "top" value that is less than or equal to the value you specify will be eliminated. A sample entry in the XML file looks as follows:
<text top="36" left="203" width="209" height="11" font="0">Self Knowledge</text> ⋮ <text top="506" left="506" width="209" height="11" font="0">Self Realization</text>
In this example, you would use 36 as the Top Crop: value.
This field specifies the top Y position in the document for which each line whose Y position is greater than or equal to this value will be eliminated. In the above example, you would use 506 as the Bottom Crop: value.
You can specify a list of pages whose text will not be reflowed into paragraphs. The value is a comma separated list of page ranges, where a range could just be a page numer, or range specified by two page numbers separated by a hyhen. The first page in a book is page 1. Also, the page number is not the printed page number, but the page number that shows in the thumbnail view of PDF viewers like Acrobat, Preview, Evince, etc. An example might be 1-6,10,198-201 where pages 1, 2, 3, 4, 5, 6, 10, 198, 199, 200, and 201 are not reflowed.
Setting First Page and Last Page will produce an output document with a subset of the original document. The first page is page 1, and the last page is the number of pages in the document. This feature is useful if a book has sections with vastly different formatting. Create a different HTML file for each differently formatted section, and either concatenate the files together, or if you are creating an e-book, this step is not necessary as it is possible to specify multiple HTML files as input to ebook creation software. If you use this feature, you will want to rename the HTML document after you complete a section, as the next time you do a reflow, the output HTML document is overwritten.
Sometimes PDFReflow can't correctly determine where the center X position of the document lies, and the centered text is not correctly detected. You can specify a line in the document that is centered by specifying the page number and line number of a centered line. You can use the Show Page button to print the contents of a page, with line numbers, so you can get the line number of a centered line of text
Show Page will show the contents of a page in the document. You enter the page number in the text field to the right of the Show Page button, and the output is shown in the text area below.
This option is necessary when there is no paragraph indentation or no after paragraph vertical spacing. The value for this field is a number between 0 and 100, which represents a percentage of the longest line that marks the difference between a line that ends a paragraph, and a line that is part of the paragraph. If you use the value 0, then a default value is supplied for this option.
This field is not necessary for most fiction based books with paragraph indentation or after paragraph vertical spacing. But some some books that non fiction, this might make your output look better.
Select this option if your document does not have justfied text, that is, the right margin has a ragged right edge instead of a straight edge. If your text is not justified, but ragged, and you don't choose this option, your paragraphs may break in the wrong place.
Shows this help.
This button reflows your PDF document into an HTML document that resides in the same directory as your original document, and has the same root name. For exmaple, mybook.pdf results in mybook.html.
While PDFReflow tries its best, sometimes it can not correctly reflow all documents. Here are some tips to get a better output document.
If your book does not have paragraph indenting or vertical spacing after every paragraph, too much text may be reflowed into each paragraph. You might try the Short Line option. The argument is a percentage between 1 and 100. If 0 is specified, you get the default value (currently 80). This percentage is used against the longest line width in the document, and lines that are shorter than this percentage are considered the end of a paragraph
If your input document is not justified, make sure you check the Rag Right option.
PDFReflow is configured to deal with fiction, which often has indented paragraphs and/or vertical spacing after a pararaph. If your book has indenting, but is not fiction with dialog, try using the Nonfiction option.
If your book has vastly differently formatted sections, you might take a look at the Don't Reflow option described above.
There are binaries for Windows XP, Ubuntu 8.04, and Mac OSX 10.5 (and later) at Mobile Read. The open source of pdfreflow is copyrighted under GNU GPL, and source is available at SourceForge