Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 11-06-2007, 08:24 PM   #46
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
Quote:
Originally Posted by sartori View Post
I've been playing around with representing print versions online as faithfully as possible see sample. Unfortunately I can't see any way this would translate into a reflowable page size.

(This is just a sample and was more of an experiment to see how it could be done)
Really nice work at making it look like a book. Very close to a PDF. It would translate to a smaller page just fine but, of course, would not look the same. The text would all wrap differently and the TOC would have to be formatted a little different. There is nothing magic about a particular page size except that we get used to looking at it in that size. If you first saw this document formatted for a 6x9 paper back book then you would likely think that was how it would always look.

Dale
DaleDe is offline   Reply With Quote
Old 11-06-2007, 08:26 PM   #47
sartori
Connoisseur
sartori began at the beginning.
 
Posts: 54
Karma: 29
Join Date: Oct 2006
Those pages I added were time consuming but mainly because I was figuring out the layout. I do plan on working through the whole book but I haven't found a plain text version available so I am ocr'ing the pdf from archive.org. This is currently the slowest part as I am proofing and converting quotes and dashes over.

Right now it's more the challenge on seeing how it could be done and figuring out any of the quirks that may crop up.

For example, if you increase the display font size in your browser, the pages expand lengthwise to accommodate it. It just runs into problems with items that are specifically positioned, such as the table of contents. I think I'll continue playing with this and see what I can come up with.

Last edited by sartori; 11-06-2007 at 08:32 PM.
sartori is offline   Reply With Quote
Advert
Old 11-06-2007, 08:48 PM   #48
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,229
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Quote:
Originally Posted by sartori View Post
Ok, been playing around with adding paragraph markers to my sample as suggested earlier in this thread. Just a quick question - do any of the current html->lrf converters respect css hidden properties? If so it wouldn't be too hard to created a library of books that display paged as in my example but then you could easily convert them to lrf and ignore page numbers, etc. (It would be time consuming but not difficult).

This could almost become a master library that looks good online for people doing research and referencing certain sections/pages but also great for those who want to just read them on their portable device.
html2lrf will ignore tags that have display=none set
kovidgoyal is offline   Reply With Quote
Old 11-06-2007, 08:49 PM   #49
jbenny
Addict
jbenny has a complete set of Star Wars action figures.jbenny has a complete set of Star Wars action figures.jbenny has a complete set of Star Wars action figures.jbenny has a complete set of Star Wars action figures.
 
Posts: 323
Karma: 358
Join Date: May 2007
Device: Tablet PC and Nokia N800
Quote:
Originally Posted by sartori View Post
Those pages I added were time consuming but mainly because I was figuring out the layout. I do plan on working through the whole book but I haven't found a plain text version available so I am ocr'ing the pdf from archive.org. This is currently the slowest part as I am proofing and converting quotes and dashes over.

Right now it's more the challenge on seeing how it could be done and figuring out any of the quirks that may crop up.

For example, if you increase the display font size in your browser, the pages expand lengthwise to accommodate it. It just runs into problems with items that are specifically positioned, such as the table of contents. I think I'll continue playing with this and see what I can come up with.
There is also a PDF copy at Google Books:
http://books.google.com/books?id=j-s...est+literature

They have apparently OCRed the text, as you can "view text" for each individual page. Sadly, the downloadable PDF doesn't include the OCRed text. That would have saved you some effort.
jbenny is offline   Reply With Quote
Old 11-06-2007, 08:53 PM   #50
jbenny
Addict
jbenny has a complete set of Star Wars action figures.jbenny has a complete set of Star Wars action figures.jbenny has a complete set of Star Wars action figures.jbenny has a complete set of Star Wars action figures.
 
Posts: 323
Karma: 358
Join Date: May 2007
Device: Tablet PC and Nokia N800
Quote:
Originally Posted by kovidgoyal View Post
html2lrf will ignore tags that have display=none set
That's good to know. Being based on XHTML, epub should also respect the "display=none" attribute. I'll have to see if Digital Editions honors this. The Lector plugin most certainly should.
jbenny is offline   Reply With Quote
Advert
Old 11-06-2007, 08:54 PM   #51
sartori
Connoisseur
sartori began at the beginning.
 
Posts: 54
Karma: 29
Join Date: Oct 2006
kovidgoyal,

So if I was to create a secondary css file that hides all the page breaks and page numbers and just displays the text with simple formatting (ie justified, centered, different sizes) html2lrf would be able to create a decent looking lrf from the file?
sartori is offline   Reply With Quote
Old 11-06-2007, 08:56 PM   #52
jbenny
Addict
jbenny has a complete set of Star Wars action figures.jbenny has a complete set of Star Wars action figures.jbenny has a complete set of Star Wars action figures.jbenny has a complete set of Star Wars action figures.
 
Posts: 323
Karma: 358
Join Date: May 2007
Device: Tablet PC and Nokia N800
Hey, did you check Gutenberg? I just saw that they have six volumes.

http://www.gutenberg.org/browse/authors/w#a993
jbenny is offline   Reply With Quote
Old 11-06-2007, 09:00 PM   #53
sartori
Connoisseur
sartori began at the beginning.
 
Posts: 54
Karma: 29
Join Date: Oct 2006
Quote:
Originally Posted by jbenny View Post
Hey, did you check Gutenberg? I just saw that they have six volumes.

http://www.gutenberg.org/browse/authors/w#a993
Thanks, for that - I just checked those out and they appear to be from a slightly different version than the ones on archive.org (and they have all 31 volumes). As my goal is to represent the printed version, the differences may become a problem with page numbers being different.
sartori is offline   Reply With Quote
Old 11-06-2007, 09:04 PM   #54
jbenny
Addict
jbenny has a complete set of Star Wars action figures.jbenny has a complete set of Star Wars action figures.jbenny has a complete set of Star Wars action figures.jbenny has a complete set of Star Wars action figures.
 
Posts: 323
Karma: 358
Join Date: May 2007
Device: Tablet PC and Nokia N800
Quote:
Originally Posted by sartori View Post
Thanks, for that - I just checked those out and they appear to be from a slightly different version than the ones on archive.org (and they have all 31 volumes). As my goal is to represent the printed version, the differences may become a problem with page numbers being different.
Too bad it is a different version. It would have saved you a lot of work with the OCR part on at least those six volumes.

Well, good luck with the project. What you have so far looks very nice.
jbenny is offline   Reply With Quote
Old 11-06-2007, 10:16 PM   #55
Panurge
Enthusiast
Panurge has a complete set of Star Wars action figures.Panurge has a complete set of Star Wars action figures.Panurge has a complete set of Star Wars action figures.Panurge has a complete set of Star Wars action figures.
 
Panurge's Avatar
 
Posts: 34
Karma: 336
Join Date: Dec 2006
Location: Texas
Device: Sony Reader
I'm rather surprised that my (admittedly minor) point has generated such a discussion, so allow me to make one or two more:
Scholarly citation is meant to serve two main purposes:
1. establish the authority for a reference so that if someone cares to check your accuracy or honesty, the location of the quotation or reference can be pinpointed and verified;
2. provide a context for a quotation or reference so that the reader can understand the total argument or occasion to which it belongs.
I am convinced that electronic forms of delivery will ultimately prevail; if future readers can locate the exact source with ease (perhaps even greater ease than was possible in the print world--hyperlinks, search engines, whatever works), then we don't need page numbers. We do need to know how closely the electronic version resembles its print source.
However, there is sometimes more information in a print or handwritten source than can be easily captured in its digitized version. Medieval manuscripts, an English scholar realized recently, can sometimes be dated and associated more precisely by using DNA information from its parchment (aka, sheepskin) and ink media. Yet, as the digitization of the Beowulf manuscript also showed, high-resolution and other scanning techniques can also reveal aspects of the original that would otherwise be impossible to recognize. When you've got only one copy (like the Beowulf manuscript), you need all the help you can get.
So the original is irreplaceable for the scholar, in many cases, because its verbal content is only part of the information it contains.
Perhaps in the future we will find a way to capture all the information we are likely to need for the foreseeable future, but then there are always surprises, as the identification of parchment provenance using DNA analysis illustrates. At some point we'll simply have to draw the line and admit that we can't do everything; some information will have to be lost. The goal of the user of a particular document will determine if that loss is critical, incidental, or trivial.
For most of us, it won't matter. But for archeologists of the text, it will.
Panurge is offline   Reply With Quote
Old 11-06-2007, 10:26 PM   #56
bowerbird
Banned
bowerbird has been very, very naughtybowerbird has been very, very naughtybowerbird has been very, very naughty
 
Posts: 269
Karma: -273
Join Date: Sep 2006
Location: los angeles
panurge said:
> then we don't need page numbers.

we still need them, because prior aspects of the record
use them. we cannot forfeit all those earlier pointers...


> We do need to know how closely
> the electronic version resembles its print source.

and, for that, we need to sync the two. by page number.
(because, realistically, what else are we going to use?)


> there is sometimes more information
> in a print or handwritten source
> than can be easily captured in its digitized version.

that's a different problem. but we always had that one.
there's no substitute for access to the original, at least
for some things. still, for a good many _other_ things,
access to a digital copy is better than nothing, _much_
better than we used to have (i.e., which was nothing...)

if you have feedback on the numerous examples i gave,
i'd love to hear it. if not, that's fine too...

-bowerbird
bowerbird is offline   Reply With Quote
Old 11-06-2007, 10:36 PM   #57
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,229
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Quote:
Originally Posted by sartori View Post
kovidgoyal,

So if I was to create a secondary css file that hides all the page breaks and page numbers and just displays the text with simple formatting (ie justified, centered, different sizes) html2lrf would be able to create a decent looking lrf from the file?
It wont display the hidden elements. Whether the resulting LRF will look good or not depends on the kind of HTML you use. But I'm always willing to add support for more esoteric HTML to html2lrf, within reason :-)
kovidgoyal is offline   Reply With Quote
Old 11-06-2007, 10:48 PM   #58
sartori
Connoisseur
sartori began at the beginning.
 
Posts: 54
Karma: 29
Join Date: Oct 2006
Quote:
Originally Posted by kovidgoyal View Post
It wont display the hidden elements. Whether the resulting LRF will look good or not depends on the kind of HTML you use. But I'm always willing to add support for more esoteric HTML to html2lrf, within reason :-)
Ok, thanks. I think I'll play around with this tomorrow and see if I can come up with a 'plain' css version of the same page.
sartori is offline   Reply With Quote
Old 11-07-2007, 12:06 AM   #59
Panurge
Enthusiast
Panurge has a complete set of Star Wars action figures.Panurge has a complete set of Star Wars action figures.Panurge has a complete set of Star Wars action figures.Panurge has a complete set of Star Wars action figures.
 
Panurge's Avatar
 
Posts: 34
Karma: 336
Join Date: Dec 2006
Location: Texas
Device: Sony Reader
[> then we don't need page numbers.

we still need them, because prior aspects of the record
use them. we cannot forfeit all those earlier pointers...


> We do need to know how closely
> the electronic version resembles its print source.

and, for that, we need to sync the two. by page number.
(because, realistically, what else are we going to use?)]

Page numbers are simply a way of keeping track of pages. The earliest printed books don't have them. For incunabulae, the books published in the second half of the 15th century, there were numbers, not of pages but of groups of pages, so that when the book was put together for binding the sections would not be out of order. Manuscripts may or may not have page numbers. Sometimes the first word of the following page was printed (or written) at the bottom of the preceding page to establish sequence.
What really counts, for the most part, is textual accuracy--that is, identity of the two texts. For routine purposes, one wouldn't have to refer to the original if the electronic copy were certifiably accurate. But there's the rub, perhaps. When I edit an older text, say an unprinted manuscript, I'm not usually obliged to give its original page numbers. I just need to identify the original source and signal each time I depart from its authority (for example, to correct an obvious error in spelling or printing).
The scholarly world has had many ways of ensuring synchronization between two texts; page numbers are one but not the only one. Of course they are helpful, but historically printers have sometimes ignored them. In the case of Greek and Latin texts, individual passages were identified by paragraph and sentence numbering, and that is still used among classicists today, as was observed above.
So, yes, I agree that page numbers are useful for synchronizing two versions of a text; in the case of verse, however, we go by line numbers and larger divisions or sections of the poem. So the physical page isn't always what matters.
My only intention in bringing up this matter was to point out that digitization of books in the future may not be as simple a matter as we would like and that there is no one solution that will fit some of these odd cases. Nor will past practice always be a reliable guide to what will work in the future. At some point electronic texts will be recognized as the accepted authority, and page numbers will no longer matter; for us, in a time of transition, they still do on occasion, depending on our relationship to what we're reading.

Let me say that as someone who guards, keeps track of, and preserves books from harm, I'm delighted to see such a vigorous discussion about how to address the problem and find solutions. We are in a time of tremendous change that will have at least as much impact on the distribution of information as resulted from the invention of moveable type, and groups like this one are at the forefront because they include not simply programmers and designers but regular readers and enthusiasts who understand the users' needs. More power and glory to them.
Panurge is offline   Reply With Quote
Old 11-07-2007, 01:16 AM   #60
Panurge
Enthusiast
Panurge has a complete set of Star Wars action figures.Panurge has a complete set of Star Wars action figures.Panurge has a complete set of Star Wars action figures.Panurge has a complete set of Star Wars action figures.
 
Panurge's Avatar
 
Posts: 34
Karma: 336
Join Date: Dec 2006
Location: Texas
Device: Sony Reader
Perhaps I should have also said "because they include not simply regular readers and enthusiasts but also programmers and designers." I'm looking forward to examining all the examples that have been posted in this thread as soon as I can get the time to do so.
Panurge is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Page numbers Fincary Astak EZReader 4 02-18-2010 03:06 PM
page numbers nenad Amazon Kindle 2 12-19-2009 09:01 AM
Professional and scholarly ebooks account for 75% of ebook market? anurag News 1 11-26-2009 12:40 PM
Page numbers, AGAIN orlincho Bookeen 92 08-19-2008 07:15 AM
Page numbers (again) Prospect Workshop 50 04-10-2008 02:19 AM


All times are GMT -4. The time now is 07:49 AM.


MobileRead.com is a privately owned, operated and funded community.