Shiny New E-Book Gizmo: The Amazon Kindle


View Full Version : pielrf beta - Text to LRF with Easy TOC, autoflow, etc.


EatingPie
04-28-2007, 01:38 AM
Out of beta, now officially released on the Reader Content forum. All new versions will be posted there, as well as questions answered and whatnot.

http://www.mobileread.com/forums/showthread.php?t=10752

Announcing pielrf, a Python command line tool to convert text to Reader lrf format, utilizing Fallstaff's pylrs.

I've only tested this using Python 2.5 on Mac OS X 10.4.9 (Intel). The main features are easy Table of Content creation, curly quotes, top-of-page header and paragraph autoflow.
pielrf -i flatland.txt -o flatland.lrf -t "Flatland" -a "Edwin A. Abbott"
Please feel free to give this a run. I've been using it primarily for Guteberg and OCR books. I've included a few short examples with the executable.

Chapters.

To create chapters, simply add "<chapter>" before the chapter name.
<chapter>Chapter One
This will add "Chapter One" to the main Table of Contents, along with a button on the Table of Contents page at the beginning of the book.

Paragraphs

There's several ways to delimit paragraphs. Gutenberg uses vertical whitespace (CRs), while other files use tabs on the first line of the paragraph, while still others use spaces. By default, the program automatically the actual method used. You can also force it using the "-b" flag: "-b tab" or "-b cr" for example.

Features.

+ Curly (typographic) quotes.
+ Top-of-page header like those in books from the Sony Connect Store.
+ Paragraph auto-flow.
+ Table of Contents and Chapterization if you use the <chapter> tag.
+ Understands HTML tags <i></i>, <b></b>, <center></center>, <sub></sub>, <sup></sup>, <p></p>.
+ Understands ALL HTML Ampersand tags - &amp;, &pound, &uumlat, etc.
+ Paragraphs can be delimited by tabs, spaces, vertical whitespace.
+ Font size / weight (bold) can be controlled from command line.
+ Ability to control almost everything else from the command line too!

Requirements

Requires install of Python 2.5 from the Developer Tools / Tiger Install CD. Also requires pylrs-1.0.0, along with ElementTree 1.2.6.


http://www.falstaffshouse.com
http://effbot.org/zone/element-index.htm


-Pie

Versions provided here, see subsequent posts for discussion of changes.

fireproof
04-28-2007, 06:33 PM
This looks like a great tool! I've run into a little trouble, though, and while I google around looking for an answer I figured I'd run it by you, too (I'm no Python expert).

When I run pielrf -h I get the following error:

Traceback (most recent call last):
File "/usr/local/bin/pielrf", line 43, in ?
from pylrs.pylrs import *
ImportError: No module named pylrs.pylrs


Background info -- this is running on an intel Mac mini, OS X 10.4.9, Python 2.5.1 from python.org, and pylrs 1.0.0 and elementtree-1.2.6-20050316 installed per your recommendation.

It seems to be having trouble finding the pylrs library or module, right? Any suggestions?

EatingPie
04-28-2007, 07:44 PM
EDIT: Version 1.2 solves this issue, so no need for links/edits as suggested below.

This is probably not a good time to admit that I knew absolutely NO Python whatsoever a week ago (that's no exaggeration, I'd never even seen the syntax). :)

I suspect that python.org's version went into the "wrong" place.

Pielrf depends on python being installed in "/usr/bin/python." If you type "/usr/bin/python -V" that will tell you if 2.5.1 installed in the right place. I'm betting that's still Python 2.3.

Next just type "python -V" to see if your default version of Python is 2.5. If it is, you can run pielrf by forcing it to use the right python:


python pielrf -h
(millions of options follow)


That assumes you're in the same directory as pielrf, otherwise you need the full path.

At this point you have a couple of options.

(1) Always run it by starting with python.


python pielrf -i infile -o outfile [etc.]


(2) Make a symbolic link from python2.5 to /usr/bin/python


sudo mv /usr/bin/python /usr/bin/python2.3
sudo ln -s /path/to/python2.5 /usr/bin/python


(3) Edit that first line in pielrf to point to your python install.

Finally, to see where pylrs actuall ended up, you can use the following (slow) command:

sudo find / -name pylrs\*

-Pie

fireproof
04-28-2007, 10:40 PM
Quite Right -- that fixed it.

You know, I've got to put a post-it on my monitor reminding me to use "find" instead of "locate", too.

Thanks!

fireproof
04-29-2007, 12:30 AM
Thanks for the help -- I'm using pielrf at manybooks.net now (look for "Sony (beta)" at the bottom of the download menu), and it seems to be working alright... but I no longer have a Reader to test with!

If anyone would care to try one or two I'd be very grateful.

The "headerfontsize" switch is very good news -- will that work with <h1> type header tags?

EatingPie
04-29-2007, 11:44 AM
The "headerfontsize" switch is very good news -- will that work with <h1> type header tags?
Now it will. :)

New version, pielrf-1.1 (see first post in thread to download).

+Added support for <h1>|<h2>|<h3> tags.

These tags provide only one level of header size (same as chapter header). These are uncentered, but you can use them with the <center> tag. See "h1-test.txt" in the examples directory for usage.

+ Added "--without-toc" switch

This switch keeps pielrf from generating the Table of Conents Menu for the Reader. The separate Table of Contents Page is still generated.

EDIT: Issue discussed below is FIXED as of version 1.7, and now TOC Menus load INSTANTLY! I still left in the "--without-toc" flag.

I added this flag because I had extremely slow performance with one book I generated using about ~30 TOC entries. Don't know why this happened, because I have several other books generated with at least the same amount of TOC entries, and they work fine. (I have a suspicion as to the cause, and it's not related to the lrf itself, but I need to explore this further.)

Bugs

There is an outstanding bug with Unicode in the input file. This causes pielrf to crash with a message like the following:


UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0:
ordinal not in range(128)


The workaround is to remove Unicode characters from your input text file. I hope to fix this in my next update.

-Pie

JSWolf
04-29-2007, 12:17 PM
Very nice job!

EatingPie
05-05-2007, 02:59 PM
New version, pielrf-1.2 -- "feature-complete" at this point, and my next release should be "official." I will, of course, still take feature requests. As always, see first post in thread to download.

Added Header Support ala the title that appears top/right in Sony Connect Books Book title is automatically taken as top-of-page header, but you can use Chapter Names or Book Title plus Chapter Name, or no header if you so choose. You can even specify the text from the command line via the "--headertext" switch.

The List


+ ADDED HEADER AT TOP OF EACH PAGE
+ Added "--headerstyle" to set how to determine the header.

'title': (default) use the the book title
'chapter': use name of each chapter
'titlechapter': combine book title and chapter name
'none': use no header at all


+ Added "--headertext" to force header to specified text

When using this option, "--headerstyle" setting is ignored.

+ Added "--chapterfontsize" and "--chapterfontweight"

These correctly identify options related to <chapter> Tagged font controls. These were originally called "headers", so the problem is pretty obvious there.

+ Options "--headerfontsize" and "--headerfontweight" now refer to top-of-page headers.
+ Added "--headerheight" to set header height
+ Updated page and margin defaults to fit a header
+ Now uses default python interpreter, which is more likel to be the "right one."


Bugs

Still not handling unicode in input files. :(

-Pie

utrost
05-11-2007, 04:42 AM
Bugs

Still not handling unicode in input files. :(

-Pie

Hi EatingPie,

for my webbased conversion engine I use pylrs, too.
Maybe you should try instead open an input file like
f = open(infile, 'rb')
with
import codecs

f = codecs.open(infile, 'wU', 'utf-8')

EatingPie
05-11-2007, 09:51 PM
Thanks utrost, I'll give that a shot!

I dabbled very briefly in the unicode string conversion, but it made everything else in my program totally fail. I was so busy with preparing a release version of the program, I just bailed. With your suggestion, I'll see if I can make some progress.

-Pie