Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 05-06-2010, 11:28 PM   #1
FatDog
Witless protection Agent
FatDog ought to be getting tired of karma fortunes by now.FatDog ought to be getting tired of karma fortunes by now.FatDog ought to be getting tired of karma fortunes by now.FatDog ought to be getting tired of karma fortunes by now.FatDog ought to be getting tired of karma fortunes by now.FatDog ought to be getting tired of karma fortunes by now.FatDog ought to be getting tired of karma fortunes by now.FatDog ought to be getting tired of karma fortunes by now.FatDog ought to be getting tired of karma fortunes by now.FatDog ought to be getting tired of karma fortunes by now.FatDog ought to be getting tired of karma fortunes by now.
 
Posts: 290
Karma: 1002898
Join Date: Nov 2009
Location: Los Angeles
Device: Kindle
Ahhhhh - Utility overload: BookDesigner, BookCreator, Textify, txt2lrf...too much

(I'm feeling a bit over-whelmed)

What software (other than Caliber) would you guys use to reformat .txt files and put them into a 'universal' format for ebook readers?

Let me explain...

Nerd cred: I am a software engineer with years of perl/unix experience. I'm currently using a Mac to develop Java (and struggling with all the supporting technologies that have sprung up, but I digress).

Since the days when 9600 baud modems were $500 I have been randomly collecting text-based stories from BBS's, usenet, fan fict sites, etc. Think lots of 5-25 meg .txt files.

As a weekend project I have created some perl modules that clean the files up, remove spam, and try to tease out titles, authors, chapters, keycodes, etc.

Then I use multi-edit (a programmers editor with a power full C-like macro language) to run through the files and re-paginate the paragraphs, find rows that end in hyphens and join them, remove more random crap and basically remove formatting.

Then more perl scripts to sort the stories by Title/Author and perhaps de-dup the stories.

For a while I studied XHTML and tried to find a standard DOCTYPE to use tags to mark things up. I looked at Project Gutenberg's styles and liked the Epic-Book-Chapter concepts but it was a very fat schema. I created a few of my own tags and used CSS to control the formatting. It worked on a story-by-story basis but load a 5 meg file with 150 stories into a browser to check things and it gets very, very slow. And I did not realize how important a table of contents was.

(But I had a lot of fun learning).

So my collection builds with no organization or focus.

Last week - (thanks to the enthusiasm in the Sony section of this forum) I bid on several and finally won a Sony 505 from eBay.

In anticipation of the new arrival, I decided to try and re-format some of the 33 megs of .txt files on my hard drive into some semblance of order.


I LOVE all the utilities that people here have written. But many of the postings are 3 years old so this means they are either perfect/tried-and-true, or other tools have supplanted them.

I LOVE Calibre and it will probably be my main management software.

So what is the best way to convert my .txt files into .lrf?

I downloaded Book Designer 4.0 and pasted in a few stories. It seemed to give me total control over titles, subtitles, authors, but I had to multi-select lines to create paragraphs. This could take hours.

I tried textify which seemed to join lines into single rows which would solve this problem, but it also put things into a single line that did not belong.

I opened a 7 meg text file in Book Designer and to my surprise it seemed to correctly join rows into paragraphs - When I started to run through the file to mark Title/Author, Author-Text, it crashed out.

Am I frustrated - hell no. This is fun.

Ok - lets assume I take my ... Buffy fan-fiction and break the stories into .txt files by author. There are multiple chapters to one story , and a bunch of short stories. Can I put these all into 1 ebook/.lrf file or am I fighting the model with multiple books in 1? What tools should I use to pre-process the file before Book Designer?
FatDog is offline   Reply With Quote
Old 05-07-2010, 01:15 AM   #2
pepak
Guru
pepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura about
 
Posts: 610
Karma: 4150
Join Date: Mar 2008
Device: Sony Reader PRS-T3, Kobo Libra H2O
Note: I don't like TXT as a source format, because it loses too much information (e.g. italics).

What works for me: Take a source format, whatever that is, and convert it to a bare-bone (X)HTML: chapters, paragraphs, italics (with classes denoting the type of italics, e.g. thought, foreign word, real emphasis...), blockquotes (again, with classes for letters, signs, poems...). That's about it as far as HTML tags go. Add a standardized header and footer. When done, convert all books at once using my H2LRF "preprocessor" into any target format I like.
pepak is offline   Reply With Quote
Old 05-07-2010, 01:17 AM   #3
pepak
Guru
pepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura about
 
Posts: 610
Karma: 4150
Join Date: Mar 2008
Device: Sony Reader PRS-T3, Kobo Libra H2O
As far as collections of stories are concerned, I treat them at a single book with stories being its chapters.

Here you go, an example of such a book along with its formatting. (It's not complete, as I have only left the PD stuff in it.)
http://www.pepak.net/files/e-books/u...ble_people.zip
pepak is offline   Reply With Quote
Old 05-07-2010, 10:51 PM   #4
FatDog
Witless protection Agent
FatDog ought to be getting tired of karma fortunes by now.FatDog ought to be getting tired of karma fortunes by now.FatDog ought to be getting tired of karma fortunes by now.FatDog ought to be getting tired of karma fortunes by now.FatDog ought to be getting tired of karma fortunes by now.FatDog ought to be getting tired of karma fortunes by now.FatDog ought to be getting tired of karma fortunes by now.FatDog ought to be getting tired of karma fortunes by now.FatDog ought to be getting tired of karma fortunes by now.FatDog ought to be getting tired of karma fortunes by now.FatDog ought to be getting tired of karma fortunes by now.
 
Posts: 290
Karma: 1002898
Join Date: Nov 2009
Location: Los Angeles
Device: Kindle
Ah pepak - that is very nice. I had a similar idea and you showed me some tricks with those tags. Does the resulting ebook match the style spelled out in the .css file?

So you use xhtml as a basic markup. Cool.
FatDog is offline   Reply With Quote
Old 05-08-2010, 12:48 PM   #5
=X=
Wizard
=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.
 
=X='s Avatar
 
Posts: 3,671
Karma: 12205348
Join Date: Mar 2008
Device: Galaxy S, Nook w/CM7
BookCreator (BC) is ideal for converting text to "Your format". I initially wrote the tool because I was dong a log of converting ProjectGutenberg text to LRF. Text is the best format for BC because it has no hidden metadata like HTML.

BC is just a gloried word template that adds menus, styles, and macros to MS word all targeted toward making the eBook creation easy. Some add features like title formating, word warp fixes, auto chapter detection, text-italic (this requires the _word_ text convention.), table of conent creator, etc...

The nice thing about the tool is it is built within MS Word so you get the rich editing experience MS Word provides with macros to help create an ebook. And the bad thing is you need MS Word


The tool uses calibre to make 7 different book types, LRF is one of them.

=X=
=X= is offline   Reply With Quote
Old 05-08-2010, 06:10 PM   #6
december
Nameless Being
 
I use a combination of Sigil and Calibre for editing and creating ebooks - I work on my source text in Notepad, port it into Sigil for editing in .epub format (I like the fact that I can break the text into distinct chapters and work on shorter 'files' rather than bogging my slow laptop down with the entire text all at once), then export the .epub to Calibre and convert to whatever file format I need, as necessary.
  Reply With Quote
Old 05-10-2010, 12:00 AM   #7
pepak
Guru
pepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura about
 
Posts: 610
Karma: 4150
Join Date: Mar 2008
Device: Sony Reader PRS-T3, Kobo Libra H2O
Quote:
Originally Posted by FatDog View Post
Ah pepak - that is very nice. I had a similar idea and you showed me some tricks with those tags. Does the resulting ebook match the style spelled out in the .css file?
To the limits of what the device and the converter allow. You can't get a fully-justified text on PRS-505 with EPUB format, but it will work with LRF or on PRS-900. You can't get emphasis within emphasis with LRF format (because Calibre's LRF renderer can't handle nested tags, unfortunately), but you can with EPUB. The beauty is that you have one source which you can easily convert - using one simple command - to any format Calibre (or other command-line tool!) supports while retaining e.g. metadata or specific needs of a given format...
pepak is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
BookDesigner HTML0 to clean HTML conversion utility Pablo Workshop 15 08-24-2010 12:05 PM
txt2lrf questions ShellShock Calibre 0 05-04-2009 04:15 AM
txt2lrf - New and Improved kovidgoyal LRF 42 02-10-2009 07:41 AM
txt2lrf daesdaemar Calibre 3 11-20-2008 05:24 PM
A Note to Myself: OVERLOAD Oh, Why Not? Lounge 10 06-07-2008 08:25 AM


All times are GMT -4. The time now is 07:57 AM.


MobileRead.com is a privately owned, operated and funded community.