|
|
View Full Version : Create LRS and LRF files from Python with pylrs
Falstaff 02-18-2007, 11:44 PM I have just uploaded a Python package that I created called pylrs. It is available from the main page of http://www.falstaffshouse.com.
The package can create both lrs anf lrf files. The lrf files are created directly, without the use of a compiler. An example of creating an lrf file using the package is:
from pylrs.pylrs import *
b = Book()
b.Page().TextBlock().Paragraph("Hello, World!")
b.renderLrf("hello.lrf")
This will create an lrf file with a single page containing the line "Hello, World!"
There are several example programs included in the distribution that illustrate the various features. I created the package in order to format the plays of Shakespeare, which are also available from the same web page.
The source code is included in the distribution.
The package is optimistically versioned 1.0, but it's never been used by anyone but me, so assume it's an early alpha version at best.
The software has only been tested on Windows XP, Python 2.4 and 2.5, and the PRS-500.
kovidgoyal 02-19-2007, 12:51 AM Atlast! I am really, really happy you did this. Could I ask you to upload the python module to the python package index http://www.python.org/pypi
That will make it easy for other projects to depend on it. If that's too much trouble, I'd be willing to do it for you.
Also can you add some platform detection to set the default encoding. Something like
if 'linux' in sys.platform.lower() or 'darwin' in sys.platform.lower():
default_encoding = 'utf-8'
Thanks and great work.
EDIT: Browsing through the code it doesn't look like you have support for embedded images, or did I miss it?
Falstaff 02-19-2007, 08:23 AM You can have embedded images in an ImageBlock, so they can be cover pages or illustrations. You can't use them by themselves in a Header or on a Canvas yet.
Yes, I forgot the default source encoding was set to the windows character set. Is "utf-8" or "latin-1" a better default for non-Windows 8 bit strings?
For any real projects, you should probably set the encoding explicitly with Book(sourceencoding="<codec>"). See the russanbook.py example program for an example of this.
The code can be uploaded anywhere -- there are no restictions on it's use. But I only plan to maintain the latest version on the web page.
kovidgoyal 02-19-2007, 11:32 AM On most modern Linux systems, utf-8 is a good choice for default encoding. Not so sure about OSX.
On most modern Linux systems, utf-8 is a good choice for default encoding. Not so sure about OSX.
Firstly, the standard forum newbie "hi all".
Yeah, OSX has it's own ver of UTF 8 ( known as UTF 8 MAC) and can cause headaches ( i.e. samba for one ).
And thanks for the python scripts, my readers should hopefully be here tomorrow, as an OS X user it looks as if I'm in for a bit of fun.
utrost 03-23-2007, 09:59 AM Hi Mike,
I've played arround with your scripts, and I am really, really happy about them.
The only problem is, that I dont know Python at all :rolleyes5 But hey - it cant be that difficult :scholar:
Uwe, tryin' to build a HTML2LRF-script the very moment.
Falstaff 03-27-2007, 07:25 PM Sorry, I was out of town and didn't see your message. I think it would be fairly hard to use these scripts without knowing python pretty well, but if you have any specific questions, let me know and I'll try to answer them.
ashkulz 03-31-2007, 06:42 PM Falstaff: I made some changes in your code, in file pylrs.py:587
if not isinstance(textBlock, TextBlock) and not isinstance(textBlock, ImageBlock):
I needed support for referring to ImageBlocks from a TOC for my PDFRead project (http://www.mobileread.com/forums/showthread.php?t=10184).
Thanks for the very wonderful library!
Falstaff 04-01-2007, 06:32 PM Thanks for the fix -- I'll incorporate it in the next version. I'm glad the code is of use to people.
EatingPie 04-06-2007, 03:33 PM Falstaff.
Would you mind providing the code for the conversion project you did?
The samples are handy, but a more complete example (one that opens files, for example) would be great for a Python newb like myself.
-Pie
EatingPie 04-19-2007, 12:21 AM BUMP.
Still hoping for sample code that reads a file for conversion!
-Pie
ashkulz 04-19-2007, 07:35 AM Still hoping for sample code that reads a file for conversion!-Pie Don't know if it will help you, but you can look at the code for generating LRF files (http://pdfread.svn.sourceforge.net/viewvc/pdfread/trunk/src/output.py?revision=7&view=markup) for PDFRead (http://pdfread.sourceforge.net) (look at the LrfOutput class, line 182).
kovidgoyal 04-22-2007, 12:48 AM @Falstaff
It may be a good idea to change the default blockStyle for an ImageBlock to
BlockStyle(blockrule='block-fixed') that way the reader doesn't scale images off the page when going from S->M->L
EatingPie 04-23-2007, 01:48 PM Don't know if it will help you, but you can look at the code for generating LRF files (http://pdfread.svn.sourceforge.net/viewvc/pdfread/trunk/src/output.py?revision=7&view=markup) for PDFRead (http://pdfread.sourceforge.net) (look at the LrfOutput class, line 182).
Thanks for the pointers, I'll give it a go.
Now another questions...
I have code to generate UTF-16 smart quotes, which I would want to embed into pylrs. I noted that just pre-pending a "u" to a string flags it as UTF-16, from the (super ultra crappy) Python reference I used. Problem is, this foregoes any algorithmic conversion. EG...
text = u"This is text";
But I need to scan the text and look for preceding and following characters to decide which curly-quote to apply (open or close). Once a string has the "u" prepended, I have no idea what I'm dealing with.
My lack of python knowledge really shows here. So let me break this down into two basic questions.
(1) Where would I parse text and convert it to UTF-16?
(2) Is there a GOOD Python reference on the Internet? The one I used was the Python Library Reference, is literally the worse documentation I've seen, ever! (http://docs.python.org/lib/lib.html).
I really prefer makelrf3, but I have several bugs (and lack of features) that makes pylrs look far more viable... if I can just figure out how to parse a string to apply my curly-quotes algorithm.
-Pie
kovidgoyal 04-23-2007, 03:23 PM pydoc codecs.decode
pydoc codecs.encode
and u'my string' is a python unicode object not a utf-16 encoded string.
EatingPie 04-23-2007, 11:20 PM Thanks Kov.
Turns out there's a quick reference online that is far better than the one I linked above. I figured out a lot of stuff, including how to implement my curly quote algorithm. Now I can move on to questions more directly related to pylrs.
I have two questions off the bat...
(1) How do you set the BOTTOM MARGIN in pylrs?
Using the default Text and Page objects, the text flows beyond the end of the screen, leaving the last sentence on the page partially obscured. I've been able to set top and side margin, but not bottom. If you add more text to the formatting.py example, you will see this behaviour.
(2) Anyone know what font size matches that of makelrf3?
For me, makelrf3 had the font size bullseyed... even better than the Sony Connect books. Pylrs uses a font that's too big for my tastes.
-Pie
kovidgoyal 04-23-2007, 11:24 PM I've added support for inline graphics to pylrs. Attached is a demo with the python code to create it.
@Falstaff The patch is available at
pylrs.py
https://libprs500.kovidgoyal.net/changeset?old_path=%2Ftrunk%2Fsrc%2Flibprs500%2Flr f%2Fpylrs%2Fpylrs.py&old=234&new_path=%2Ftrunk%2Fsrc%2Flibprs500%2Flrf%2Fpylrs% 2Fpylrs.py&new=235
pylrf.py
https://libprs500.kovidgoyal.net/changeset?old_path=%2Ftrunk%2Fsrc%2Flibprs500%2Flr f%2Fpylrs%2Fpylrf.py&old=234&new_path=%2Ftrunk%2Fsrc%2Flibprs500%2Flrf%2Fpylrs% 2Fpylrf.py&new=235
Here's the python code to create helloworld.lrf
from libprs500.lrf.pylrs.pylrs import *
"""
TextBlocks can be added to any page at any time, since the entire book is
kept in memory. Text can also be added to any paragraph at any time.
"""
def helloworld():
# create the book
book = Book()
# add a page to the book using the default page style
page = book.Page()
page.TextBlock().Paragraph('Demonstration of embedded graphics').CR()
# add a textblock to the page using defaults for TextStyle and BlockStyle
textBlock = page.TextBlock()
textBlock.blockStyle = BlockStyle(blockrule='horz-fixed', blockwidth='575')
# add a paragraph to the text block
istr = ImageStream('/tmp/try/cherubs.jpg')
im = Image(istr, x0=0, y0=0, x1=300, y1=300, xsize=100, ysize=100)
p = Paragraph('This line of text is before the graphic. ')
dc = DrawChar(lines=3)
dc.append(Plot(im, xsize=450, ysize=450))
p.append(dc)
textBlock.append(p)
textBlock.Paragraph('This line of text is after the graphic. It needs to be long enough to ensure that the graphic is not cut-off.')
# generate the lrf file
book.renderLrs("/tmp/helloworld.lrs")
book.renderLrf("/home/kovid/helloworld.lrf")
if __name__ == "__main__":
helloworld()
kovidgoyal 04-24-2007, 12:02 PM (1) How do you set the BOTTOM MARGIN in pylrs?
Using the default Text and Page objects, the text flows beyond the end of the screen, leaving the last sentence on the page partially obscured. I've been able to set top and side margin, but not bottom. If you add more text to the formatting.py example, you will see this behaviour.
I use the following code to keep the text from creeping off the edges.
Book(pagestyledefault=dict(textwidth=575, textheight=747))
Alternatively, if you use the pylrs from the libprs500 svn the defaults have been fixed to prevent this.
kovidgoyal 04-24-2007, 12:04 PM Here's a more complete example of using inline images. I had to Refactor the inheritance rules in pylrs a bit.
from libprs500.lrf.pylrs.pylrs import *
"""
TextBlocks can be added to any page at any time, since the entire book is
kept in memory. Text can also be added to any paragraph at any time.
"""
def helloworld():
book = Book()
istr = ImageStream('/tmp/try/cherubs.jpg')
im = Image(istr, x0=0, y0=0, x1=300, y1=300, xsize=100, ysize=100)
page = book.Page()
page.TextBlock().Paragraph(Span('Demonstration of embedded graphics', fontsize='180')).CR()
p = page.TextBlock().Paragraph('The image: ')
p.append(Plot(im, xsize=550, ysize=550))
page.TextBlock(TextStyle(), BlockStyle(blockrule='vert-fixed'))
page.TextBlock().Paragraph(Bold('Drop caps')).CR()
textBlock = page.TextBlock()
textBlock.blockStyle = BlockStyle(blockrule='horz-fixed', blockwidth='575')
p = Paragraph('This line of text is before the graphic. ')
dc = DropCaps(lines=3)
dc.append(Plot(im, xsize=450, ysize=450))
p.append(dc)
textBlock.append(p)
textBlock.Paragraph('This line of text is after the graphic. It needs to be long enough to ensure that the graphic is not cut-off.')
page.TextBlock().Paragraph(Bold('Inline graphic')).CR()
page.TextBlock(TextStyle(), BlockStyle(blockrule='vert-fixed'))
tb = page.TextBlock()
p = tb.Paragraph('This line of text is before the graphic.')
p.append(Plot(im, xsize=350, ysize=350))
p.append('This line of text is after the graphic.')
book.renderLrs("/tmp/helloworld.lrs")
book.renderLrf("/home/kovid/helloworld.lrf")
if __name__ == "__main__":
helloworld()
Falstaff 04-24-2007, 08:04 PM Nice work, kovidgoyal. I didn't look too closely yet, but I think you might have replaced the LrsDrawChar class -- this class is used to control what can nest inside of other classes (following the LRS specification). As I see you found out, the LRS spec uses the word DrawChar for two different things. I think there needs to be both the DrawChar class that you wrote and the empty LrsDrawChar that is used to enforce the nesting rules from the LRS spec.
LrsDrawChar and LrsSimpleChar1 are used just to identify the different types of things so that the code can complain if you try to put a LrsDrawChar where only a LrsSimpleChar1 is allowed. I forget why, but I didn't need LrsSimpleChar2...
I should have put in some comments to explain that strange usage of the empty classes. But I didn't. Sorry.
kovidgoyal 04-24-2007, 08:33 PM Thanks. Yeah in svn I've renamed DrawChar to DropCaps. I didn't put back LrsDrawChar though. As far as I can tell things work with just LrsSimpleChar1. At some point I should go through all the inheritance rules carefully and fix things.
ashkulz 04-25-2007, 12:07 AM Falstaff, can you create some sort of "official" development area where patches could be sent? You have the original release, I have the same (with a few minor changes) and looks like kovidgoyal has a more different version ...
kovidgoyal 04-25-2007, 07:35 PM Falstaff, can you create some sort of "official" development area where patches could be sent? You have the original release, I have the same (with a few minor changes) and looks like kovidgoyal has a more different version ...
I can volunteer my SVN server.
kovidgoyal 04-26-2007, 12:50 PM @Falstaff
I made some more changes to pylrs for functionality I needed in html2lrf
Here's the combined diff
https://libprs500.kovidgoyal.net/changeset/221/trunk/src/libprs500/lrf/pylrs?old=241&old_path=trunk%2Fsrc%2Flibprs500%2Flrf%2Fpylrs
EatingPie 04-28-2007, 01:45 AM Does pylrs have a function to add a per-page heading, like books from the CONNECT store? The CONNET store books have a the book title at the top/right of the page in small text. Very handy.
-Pie
EatingPie 04-28-2007, 01:01 PM Another Python question.
Unicode characters (binary) kill my app. Is there a way to convert unicode to text? Or to convert unicode to Pythons u"\0020" format?
-Pie
kovidgoyal 04-29-2007, 05:50 PM yes it does have support for headers. See html2lrf
EatingPie 04-29-2007, 07:52 PM I think I'm missing something here...
I have HTMLtoLRF-1.0 and it's a C++ program that requires a Windows DLL. Is there another version?
Also, I think I need some explanation of libprs500 vs. pylrf-1.0.0. You've mentioned the former several times in this thread. What's the difference between these two libraries? Is libprs500 an extension of pylrf-1.0.0?
-Pie
kovidgoyal 04-30-2007, 01:04 PM I think I'm missing something here...
I have HTMLtoLRF-1.0 and it's a C++ program that requires a Windows DLL. Is there another version?
Also, I think I need some explanation of libprs500 vs. pylrf-1.0.0. You've mentioned the former several times in this thread. What's the difference between these two libraries? Is libprs500 an extension of pylrf-1.0.0?
-Pie
libprs500 is my set of python tools for working with the SONY reader. It includes converters called html2lrf and txt2lrf. The converters are based on an improved version of pylrs.
EatingPie 05-03-2007, 01:56 PM In figuring out the header stuff, I did some diffs between libprs500's pylrs and Falstaff's pylrs-1.0.0. Are there any plans to merge the codebases?
I didn't do extensive diffing, but it looks like Kovidgoyal added the pylrs-1.0.0 -- for example Image support -- rather than making changes to existing code. I think it would benefit the community extensively to have an update to the pylrs code base with Kovidgoyal's changes.
-Pie
kovidgoyal 05-03-2007, 04:34 PM At the moment it looks like I'm going to fork pylrs as I need to make too many changes for html2lrf. Nonetheless, the patches are easily viewable via the source code browser on my website, so if Falstaff wants to backport some of them he is welcome to do so.
athlonkmf 06-04-2007, 05:35 PM ok, might be stupid, but I'm not familiar with python or phython programs although I do have a linux server with python. So... how do you use this tool?
EatingPie 06-04-2007, 06:03 PM Python is an interpreted language, similar to Perl in that respect. Or you can think of it as a super duper shell script.
If you have a program called "hellworld.py" you can run it like so:
python helloworld.py
And that's that.
In terms of pylrs, it's a library for converting e-books. It's not a final product, so pylrs itself will not convert books for you. You'd need to write a program that utilizes the pylrs library.
The pylrs library itself comes with some usage examples, so if you want to write a whole program, that's where to start. From a programmer's perspective, it's pretty straight-forward. But if you're not a programmer, this isn't what you're looking for.
And I get the impression that you are, indeed, looking for a finished product for conversion. I am aware of two which utilize pylrs for conversion, and they are fully finished products.
pielrf - A text to LRF (Sony EBook Format) converter. It mimics the look of books you download from the Sony Connect Store, and it's really easy to add chapters, headers, and Tables of Contents. Give you one guess why I know about this one! :)
http://www.mobileread.com/forums/showthread.php?t=10752
libprs500 - Don't let the name deceive you, it's not ONLY a library -- it has competed programs to do conversion for you as well, plus a number of other features, like transferring LRFs to the Reader via USB. Here's the link to the html2lrf (part of libprs500) discussion:
http://www.mobileread.com/forums/showthread.php?t=10582
If you want a finished product, those are the way to go.
-Pie
swr2408018 08-04-2007, 11:34 AM When I specify a blockstyle attribute in a TextBlock(), it doesn't appear to have any effect. (I can work around this by creating a new blockstyle with the change.)
It doesn't look like the code does anything with those "overriding" values. E.g.:
self.textSettings = {}
self.blockSettings = {}
for name, value in settings.items():
if name in TextStyle.validSettings:
self.textSettings[name] = value
elif name in BlockStyle.validSettings:
self.blockSettings[name] = value
else:
raise LrsError, "%s not a valid setting for TextBlock" % name
captures the override values in either self.textSettings or self.blockSettings as appropriate, and then
self.textStyle = textStyle
self.blockStyle = blockStyle
# create a textStyle with our current text settings (for Span to find)
self.currentTextStyle = textStyle.copy()
self.currentTextStyle.attrs.update(self.textSettin gs)
copies overriding textStyle values into the currentTextStyle, but there is nothing similar for the overriding blockStyle values.
Am I missing something here?
Thanks!
Steve
swr2408018 08-04-2007, 11:42 AM If you want to reproduce "SmallCaps" support, e.g., SMALLCAPS, here's a class you can add to a project.
I threw this together with pretty limited understanding of python, but it's working well for me in an ebook of a Schiller play where all the names of the parts are in smallcaps. Let me know if I've done anything crazy here?
Thanks!
Steve
class SmallCap(Span):
tagname = "SmallCap"
def __init__(self, text=None):
Span.__init__(self,"")
SmallCap.append(self,text)
def append(self,content):
for i in range(len(content)+1):
if content[i-1:i].islower():
Span.append(self,Span(content[i-1:i].upper(),fontsize=70))
else:
Span.append(self,Span(content[i-1:i],fontsize=100))
kovidgoyal 08-04-2007, 12:32 PM There's a much updated version of pylrs in the libprs500 svn that includes a lot of capabilities that the original doesn't have. You should use that.
EDIT: With regard to small caps I think that belongs in application code not pylrs, since it doesn't use any features of LRS at all. It is a neat idea though and I've added small caps to my custom Span class in html2lrf :-) THough with a more efficient implementation (w.r.t. the number of Span classes needed)
swr2408018 08-04-2007, 06:43 PM Digging a little more on my own, I verified that the LRS output does contain the overriding TextBlock attrs, so this is very unlikely to be a pylrs issue. I'll post again if I figure out anything more.
Steve
swr2408018 08-04-2007, 06:49 PM Thanks, I'd already picked up your changes for making Plot() work properly with imbedded images; I'll look at the rest of your updates.
Agreed that the SmallCap implementation is a hog. I was under the impression that pylrs was filtering out unnecessary tags, but I just realized (while looking through the LRS output to debug the TextBlock/BlockStyle issue on the other thread) that that optimization is still in the "experimental" stage and off by default. I'll have to tighten up my version, given that the ebook I'm working on has hundreds of SmallCap strings.
(And also agreed this is project or sample code, not pylrs code.)
Steve
|