Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Readers > Sony Reader > Sony Reader Dev Corner

Notices

Reply
 
Thread Tools Search this Thread
Old 12-29-2006, 05:59 PM   #16
cmumford
Connoisseur
cmumford began at the beginning.
 
cmumford's Avatar
 
Posts: 69
Karma: 34
Join Date: Dec 2006
Location: Dallas, TX
Device: PRS-500
Open issues

BTW to see my list of open issues please checkout http://code.google.com/p/bbebinder/issues/list. If you want to add any new issues you can either post them here to this forum, or I believe you can submit your own issue at the Google Code page.
cmumford is offline   Reply With Quote
Old 12-30-2006, 04:49 AM   #17
AndyQ
Enthusiast
AndyQ began at the beginning.
 
Posts: 38
Karma: 36
Join Date: Dec 2006
Device: Sony Reader PRS-500
Are you happy for people to contribute code changes as well as I've got another week of not having to go to work ?
AndyQ is offline   Reply With Quote
Advert
Old 12-30-2006, 08:55 AM   #18
cmumford
Connoisseur
cmumford began at the beginning.
 
cmumford's Avatar
 
Posts: 69
Karma: 34
Join Date: Dec 2006
Location: Dallas, TX
Device: PRS-500
Quote:
Originally Posted by AndyQ
Are you happy for people to contribute code changes as well as I've got another week of not having to go to work ?
Sure, I'd love some help. I'm still learning a lot about the LRF format which causes me to refactor my code a lot to make things clean. I was hoping to let things settle down a bit for fear that I would frustrate any other developers working with me on the project.

However, I think that if I just communicate with you beforehand about the changes that I intend to make then we should be OK. Why don't you email me at cjmumford@gmail.com and we can coordinate. Take a look at the open issues on the project page to see if any of them peak your interest. If there are any other fixes/enhancements that you'd rather work on then please feel free to submit an issue.
cmumford is offline   Reply With Quote
Old 01-01-2007, 01:35 PM   #19
Vienna01
Old Dog Learns New Tricks
Vienna01 doesn't litterVienna01 doesn't litter
 
Vienna01's Avatar
 
Posts: 123
Karma: 142
Join Date: Nov 2006
Location: Maryland USA
Device: Sony PRS-500,PocketBook 301, Sony 650
Great Tool. Do you think that orphan control should be added. Here is reply I added to threads on other tools.

Orphan control is a feature that I think would be helpful in ALL the programs that generate file formats for the SONY Reader.

I find reading a bit awkward when one part of the sentence is on one page and one on the next page. I think the orphan control feature would be helpful. Maybe it needs to be more complex than IF SENTENCE DOESN"T FIT GO TO NEW PAGE because with large font size(TR 16-TR18)and small display size, long sentences would leave some pages very short as the moved to the next page.{such as this sentence} Maybe for sentences less than ## characters the rule would hold but for other a break in the sentence across pages would be "a necessary evil" <grin>
Vienna01 is offline   Reply With Quote
Old 01-01-2007, 02:17 PM   #20
cmumford
Connoisseur
cmumford began at the beginning.
 
cmumford's Avatar
 
Posts: 69
Karma: 34
Join Date: Dec 2006
Location: Dallas, TX
Device: PRS-500
Quote:
Originally Posted by Vienna01
Great Tool. Do you think that orphan control should be added. Here is reply I added to threads on other tools.

Orphan control is a feature that I think would be helpful in ALL the programs that generate file formats for the SONY Reader.

I find reading a bit awkward when one part of the sentence is on one page and one on the next page. I think the orphan control feature would be helpful. Maybe it needs to be more complex than IF SENTENCE DOESN"T FIT GO TO NEW PAGE because with large font size(TR 16-TR18)and small display size, long sentences would leave some pages very short as the moved to the next page.{such as this sentence} Maybe for sentences less than ## characters the rule would hold but for other a break in the sentence across pages would be "a necessary evil" <grin>
By "page" are you referring to the BBeB Page object, or just a displayed page on the Reader? It's my understanding that BBeB Page objects map most closely to chapters, and I agree completely that words/sentences/paragraphs shouldn't be split across multiple page objects.

If instead you meant a viewable page I'm not sure if the BBeB format allows me to control this being that text can be resized, and frankly a new reader with 768x1024 resolution could be released. I have no way of knowing how the Sony Reader is going to layout the eBook. I think that what you would want here is something like the MS Word paragraph stiles "keep with next" and "page break before" - and I'm not sure that BBeB supports this.

BTW - thanks for taking the time to give the program a try. We're working on table of contents and image support, and assuming that the general consensus is that the quality is high enough I'll announce this program to the rest of the forum readers.
cmumford is offline   Reply With Quote
Advert
Old 01-01-2007, 03:31 PM   #21
FangornUK
Addict
FangornUK has a complete set of Star Wars action figures.FangornUK has a complete set of Star Wars action figures.FangornUK has a complete set of Star Wars action figures.FangornUK has a complete set of Star Wars action figures.
 
FangornUK's Avatar
 
Posts: 205
Karma: 317
Join Date: Oct 2006
Location: England
Device: Sony PRS-505, iPad, Kindle 3
Nice work! Looks very promising.

You seem to be doing some Gutenberg specific detections and a simple clean-up for the HTML versions is page number stripping, I do that in gutlrf.pl like so:
$_ =~ s#<span class='pagenum'>.*</span>## ;
$_ =~ s#<span class=\"pagenum\">.*</span>## ;

I'll post more bug reports to the google code site.
FangornUK is offline   Reply With Quote
Old 01-01-2007, 04:15 PM   #22
cmumford
Connoisseur
cmumford began at the beginning.
 
cmumford's Avatar
 
Posts: 69
Karma: 34
Join Date: Dec 2006
Location: Dallas, TX
Device: PRS-500
Quote:
Originally Posted by FangornUK
Nice work! Looks very promising.

You seem to be doing some Gutenberg specific detections and a simple clean-up for the HTML versions is page number stripping, I do that in gutlrf.pl like so:
$_ =~ s#<span class='pagenum'>.*</span>## ;
$_ =~ s#<span class=\"pagenum\">.*</span>## ;

I'll post more bug reports to the google code site.
Can you point me to a book on Gutenberg that has these page number spans? I was using the Adventures of Sherlock Holmes, but it doesn't have any.

BTW does Gutenberg have a recommended HTML format that you're aware of, or are they at the mercy of every submitters ideas of what good HTML is?

If I wind up doing a bunch of html cleanup then I'll probably implement it where it reads various cleanup parameters (maybe like the two you put above) from a data file so that users can add their own values.
cmumford is offline   Reply With Quote
Old 01-01-2007, 08:08 PM   #23
FangornUK
Addict
FangornUK has a complete set of Star Wars action figures.FangornUK has a complete set of Star Wars action figures.FangornUK has a complete set of Star Wars action figures.FangornUK has a complete set of Star Wars action figures.
 
FangornUK's Avatar
 
Posts: 205
Karma: 317
Join Date: Oct 2006
Location: England
Device: Sony PRS-505, iPad, Kindle 3
Try 19337-h, most HTML ones have this page number span.

Yes there is an HTML standard for Gutenberg that most people follow, alas some don't though. http://www.gutenberg.org/wiki/Gutenberg:HTML_FAQ and http://gutenberg.hwg.org/index.html
FangornUK is offline   Reply With Quote
Old 01-01-2007, 08:48 PM   #24
RWood
Technogeezer
RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.
 
RWood's Avatar
 
Posts: 7,233
Karma: 1601464
Join Date: Nov 2006
Location: Virginia, USA
Device: Sony PRS-500
Vienna01, I'm confused. I have always heard that widow/orhpan referenced the last and first few lines on a physical page. Most times two lines are required on each. Thus if the page breaks with only one line at the bottom of the page, the line would be moved to the following page. If only one line would be at the start of the following page the page break is moved back one line unless that would leave only one line on the first page then both would be moved to the following page. I have never seen it referenced to whole sentances or paragraphs.

More critical from my view would be keeping the headers together with the following text. Few things are more jarring than to see a header as the last line of a page and have to turn the page to start the associated paragraph. Just my two cents.
RWood is offline   Reply With Quote
Old 01-01-2007, 09:17 PM   #25
cmumford
Connoisseur
cmumford began at the beginning.
 
cmumford's Avatar
 
Posts: 69
Karma: 34
Join Date: Dec 2006
Location: Dallas, TX
Device: PRS-500
Quote:
Originally Posted by FangornUK
Try 19337-h, most HTML ones have this page number span.

Yes there is an HTML standard for Gutenberg that most people follow, alas some don't though. http://www.gutenberg.org/wiki/Gutenberg:HTML_FAQ and http://gutenberg.hwg.org/index.html
Ahh yes, I see now. Those definately need to be stripped out. Thanks for the pointer.
cmumford is offline   Reply With Quote
Old 01-01-2007, 09:22 PM   #26
cmumford
Connoisseur
cmumford began at the beginning.
 
cmumford's Avatar
 
Posts: 69
Karma: 34
Join Date: Dec 2006
Location: Dallas, TX
Device: PRS-500
Quote:
Originally Posted by RWood
... <snip>

More critical from my view would be keeping the headers together with the following text. Few things are more jarring than to see a header as the last line of a page and have to turn the page to start the associated paragraph. Just my two cents.
Yes I agree with you here on this one. Unfortunately I am again faced with the problem of having no idea where a heading will be placed, and I don't believe that I can somehow attach the heading with the first paragraph of text so that the two are not separated (but I will continue to look for a way to do just that). Another solution is to do a page break before the heading, but then you wind up with more half filled pages, and more page turns (er. button presses) to read the book. That may just have to be an option that the user can set to their preference.
cmumford is offline   Reply With Quote
Old 01-03-2007, 02:24 PM   #27
airlik
Connoisseur
airlik began at the beginning.
 
Posts: 76
Karma: 15
Join Date: Oct 2006
Device: Sony Reader
I've been using this wonderful tool a lot over the last few days - thanks again! I had a suggestion I thought I'd make about text handling.

I was recently re-reading the Marlowe plays, and thought I'd use the reader this time around. I downloaded the plain text versions from Gutenberg (most are only available in plain text). Most Gutenberg formatting apps, including this one, strip the single returns so the text flows on the display. However, with plays and poetry, you don't want that. I just used Wordpad to save as RTF (need that metadata so the book list looks nice), but it made me think - there are any number of operations that might need to be tweaked from book to book (like the page number stripping someone mentioned). I believe you said you might put things like that into a config file so people could modify/add, perhaps it could be a little more dynamic - have a little "always on top" window of checkboxes listing each operation that can be done on the text. You could check/uncheck various operations and hit "apply" to see how it would look.

It would also be nice, when TOC generation is working, to be able to change the pattern you use to find headings worthy of a TOC entry from within the program, rather than having to exit, change the config file, and try again.
airlik is offline   Reply With Quote
Old 01-03-2007, 07:29 PM   #28
cmumford
Connoisseur
cmumford began at the beginning.
 
cmumford's Avatar
 
Posts: 69
Karma: 34
Join Date: Dec 2006
Location: Dallas, TX
Device: PRS-500
Quote:
Originally Posted by airlik
I've been using this wonderful tool a lot over the last few days - thanks again!
You're very welcome. I'm glad that it's useful and we hope to improve it substantially in the coming weeks.

Quote:
Originally Posted by airlik
I had a suggestion I thought I'd make about text handling.

I was recently re-reading the Marlowe plays, and thought I'd use the reader this time around. I downloaded the plain text versions from Gutenberg (most are only available in plain text). Most Gutenberg formatting apps, including this one, strip the single returns so the text flows on the display. However, with plays and poetry, you don't want that.
Yeah I was concerned about plays, and how to algorithmically differentiate them from text that should reflow. I looked at Dido Queene of Carthage and it has commas at the end of most of the lines. I'm sure that over the centuries there have been a ton of different styles used, and it's going to be quite a challenge to get this right.

I noticed that at the beginning of each paragraph there are names like _Cloan._ and _Iar._. Do you know what these mean?

Quote:
Originally Posted by airlik
I just used Wordpad to save as RTF (need that metadata so the book list looks nice), but it made me think - there are any number of operations that might need to be tweaked from book to book (like the page number stripping someone mentioned). I believe you said you might put things like that into a config file so people could modify/add, perhaps it could be a little more dynamic - have a little "always on top" window of checkboxes listing each operation that can be done on the text. You could check/uncheck various operations and hit "apply" to see how it would look.

It would also be nice, when TOC generation is working, to be able to change the pattern you use to find headings worthy of a TOC entry from within the program, rather than having to exit, change the config file, and try again.
Excellent suggestion. It shouldn't be that difficult to make it automatically resurce whatever configuration file it reads, and maybe to have an editor for the config file contents.

<FingersCrossed>BTW I'm hoping that TOC and images will be coming in the next two weeks.</FingersCrossed>
cmumford is offline   Reply With Quote
Old 01-04-2007, 12:29 AM   #29
airlik
Connoisseur
airlik began at the beginning.
 
Posts: 76
Karma: 15
Join Date: Oct 2006
Device: Sony Reader
Quote:
Originally Posted by cmumford
Yeah I was concerned about plays, and how to algorithmically differentiate them from text that should reflow. I looked at Dido Queene of Carthage and it has commas at the end of most of the lines. I'm sure that over the centuries there have been a ton of different styles used, and it's going to be quite a challenge to get this right.

I noticed that at the beginning of each paragraph there are names like _Cloan._ and _Iar._. Do you know what these mean?
Gutenberg is tough. I've done several macros to treat different kinds of text - that's what made me think of a list of checkboxes for "do this" and "do that" where you could hit "apply". One could be "turn single line feeds into two and leave two as two" kind of thing, that would fix most novels, but you could just uncheck it for plays.

MOST lines in plays look something like one of the following, which makes it hard (and hence a good toggle):

HAMLET.
Oh, what shall I do?

POLONIUS. Oh! I am slain.

Joe.Blow. Heya

So hard to recognize. There are also books that break, due to scanning badness, in mid-sentence. I detect those in my lame-o search-replace macros with lower-case letter followed by no punctuation mark, possibly a space, then a line break or two, followed by a lower case letter. So it picks up stuff like:

And then the man jumped

off the cliff.

Shakespeare and others often line break on purpose, but almost always start the next line with a cap. ex:
NORTHUMBERLAND.
What news, Lord Bardolph? every minute now
Should be the father of some stratagem:

QUEEN.
No, be assur'd you shall not find me, daughter,
After the slander of most stepmothers,
Evil-ey'd unto you. You're my prisoner, but
Your gaoler shall deliver you the keys
That lock up your restraint. For you, Posthumus,

etc etc.

Last edited by airlik; 01-04-2007 at 01:22 AM.
airlik is offline   Reply With Quote
Old 01-04-2007, 04:43 AM   #30
FangornUK
Addict
FangornUK has a complete set of Star Wars action figures.FangornUK has a complete set of Star Wars action figures.FangornUK has a complete set of Star Wars action figures.FangornUK has a complete set of Star Wars action figures.
 
FangornUK's Avatar
 
Posts: 205
Karma: 317
Join Date: Oct 2006
Location: England
Device: Sony PRS-505, iPad, Kindle 3
Not sure if you know about GutenMark but I'd suggest you look at this for Gutenberg text files, it really is the best for converting into HTML - Gutenberg themselves recommend it. Easiest way would be to simply call it instead of spending your time implementing its functionality into BBeBinder, but at least the source code would really help.

Just a suggestion on formatting, I've noticed you create BBeBs with formatting for paragraphs that follows the Web page version, i.e. with a new line between paragraphs. Most ebooks skip this and simply start on a newline, it does look better.
FangornUK is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Converting to BBeb? terrycagg Workshop 7 12-10-2007 09:58 AM
how to create BBeB File? mkarthic Introduce Yourself 2 10-30-2007 10:54 AM
PRS-500 Announcing BBeB Binder 0.2 cmumford Sony Reader Dev Corner 29 03-17-2007 10:41 AM
[Librie] Sony Reader BBeB vs. Libre BBeB CCDMan Legacy E-Book Devices 1 03-30-2006 03:53 AM


All times are GMT -4. The time now is 09:05 AM.


MobileRead.com is a privately owned, operated and funded community.