Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 05-17-2009, 03:39 PM   #31
ahi
Wizard
ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.
 
Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
Some of the comments, and particularly the special focus on quotation marks makes me wonder if my original "dream" of a single script to take a plaintext document from A to Z is misguided. (Even though, as I stated earlier, it seems to me GutenMark does a reasonable job of that.)

Perhaps a better way would be to write small utilities each of which focus on just one aspect of the document cleanup/conversion/fix-up process. I myself might play around with a quotation mark fixing utility, when I get a chance over the next week or so.

Some other utilities I can think of:

- metadata recognizer (i.e.: figures out title, author, chapter titles, et al)
- paragraph normalizer (remove manual linebreaks between lines, keep only one between paragraphs)
- emphasis normalizer (convert the myriad ways of indicating emphasis into a single standard [and ideally simple to accurately parse] markup)

All of these utilities I think of as being command line tools that do as much as possible without human intervention but ask for human/manual arbitration when up against a case that requires a judgement call (or, rather, actually understanding the text).

Does anything like this exist? Is the idea kind of crazy, or kind of sensible?

Sincerely,

AHI
ahi is offline   Reply With Quote
Old 05-17-2009, 04:01 PM   #32
pepak
Guru
pepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura about
 
Posts: 610
Karma: 4150
Join Date: Mar 2008
Device: Sony Reader PRS-T3, Kobo Libra H2O
It is sensible, I am just a bit doubtful about it being realistic :-)

I was thinking along these lines, but really, even with same-source files, there are too many variations so that it's too much effort writing a fully-automatic utility. I just keep a collection of useful regexps (or normal replaces) and in each case decide which one (and in which order) to use.
pepak is offline   Reply With Quote
Advert
Old 05-17-2009, 04:07 PM   #33
ahi
Wizard
ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.
 
Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
Quote:
Originally Posted by pepak View Post
It is sensible, I am just a bit doubtful about it being realistic :-)

I was thinking along these lines, but really, even with same-source files, there are too many variations so that it's too much effort writing a fully-automatic utility. I just keep a collection of useful regexps (or normal replaces) and in each case decide which one (and in which order) to use.
Well, in a way, a utility could be as simple as a set of regexes, with command-line switches (or heuristics at start-up time) mandating which specific one to use. I do think though that some of the fix-ups that can be addressed using regexes might be addressed more reliable through simpler sequential processing. (Though, as with all things, I could be wrong.)

It definitely is a tall order... but broken down into its components like this might make it more achievable. Like I said, I might try my hands at the quotation stuff in the near future.
ahi is offline   Reply With Quote
Old 05-18-2009, 05:11 AM   #34
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,515
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
Quote:
Originally Posted by rogue_ronin View Post
Is ' well-supported now?
Unfortunately (as far as I know) there is no different character for a curly apostrophe, we have to use a single right quote (' is a straight apostrophe, just as " is a straight double quote). That's why I use ’ and & #8217; for single right quotes and apostrophes, they are exactly the same character, with two different names, but at least I can keep them different in the source file.
Jellby is offline   Reply With Quote
Old 05-18-2009, 05:20 AM   #35
rogue_ronin
Banned
rogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-books
 
Posts: 475
Karma: 796
Join Date: Sep 2008
Location: Honolulu
Device: Nokia 770 (fbreader)
Hmmm, but that's for presentation, right? Is there a way to hack that with CSS? Calling an apostrophe a quote makes for inaccurate meta.

(Okay, now that I've begun thinking about it, CSS is becoming like a magical pony, bringing gifts and hopes to all the children.)

m a r
rogue_ronin is offline   Reply With Quote
Advert
Old 05-18-2009, 06:01 AM   #36
Sweetpea
Grand Sorcerer
Sweetpea ought to be getting tired of karma fortunes by now.Sweetpea ought to be getting tired of karma fortunes by now.Sweetpea ought to be getting tired of karma fortunes by now.Sweetpea ought to be getting tired of karma fortunes by now.Sweetpea ought to be getting tired of karma fortunes by now.Sweetpea ought to be getting tired of karma fortunes by now.Sweetpea ought to be getting tired of karma fortunes by now.Sweetpea ought to be getting tired of karma fortunes by now.Sweetpea ought to be getting tired of karma fortunes by now.Sweetpea ought to be getting tired of karma fortunes by now.Sweetpea ought to be getting tired of karma fortunes by now.
 
Sweetpea's Avatar
 
Posts: 9,707
Karma: 32763414
Join Date: Dec 2008
Location: Krewerd
Device: Pocketbook Inkpad 4 Color; Samsung Galaxy Tab S6
Quote:
Originally Posted by rogue_ronin View Post
I think I need to study up on it. I'm using a sort of bastardized HTML mix. I've been using clips (macros) in NoteTab for a long time now, so I re-checked what I'm doing. I actually use both <a name="chapter_ChapterNumber"> and <h3 id="chapter_ChapterNumber" class="chapter" align="center"> in my files to mark a chapter. A combination of overkill and ignorance.
I used to use that too. Untill I started making my source files epub compliant. The <a name=""> isn't valid with that. So, I only use the id now.
Sweetpea is offline   Reply With Quote
Old 05-18-2009, 06:11 AM   #37
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,515
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
Quote:
Originally Posted by rogue_ronin View Post
Hmmm, but that's for presentation, right? Is there a way to hack that with CSS? Calling an apostrophe a quote makes for inaccurate meta.
I agree that's somewhat undesirable... maybe another solution would be keeping the ' in the source and postprocessing it to a curly apostrophe when creating a particular end format, but I don't think that can be done with CSS (at least not with the CSS2 used in ePUB).
Jellby is offline   Reply With Quote
Old 05-18-2009, 08:02 AM   #38
rogue_ronin
Banned
rogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-books
 
Posts: 475
Karma: 796
Join Date: Sep 2008
Location: Honolulu
Device: Nokia 770 (fbreader)
Can you name your own entities with CSS2?

@sweetpea: yeah, I'm starting to get that...

m a r
rogue_ronin is offline   Reply With Quote
Old 05-18-2009, 10:41 AM   #39
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,515
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
Quote:
Originally Posted by rogue_ronin View Post
Can you name your own entities with CSS2?
Hm... good question.

I don't think you can with CSS2, but maybe there are other means. See an example here (view the source). Now the question is whether this can be used with ePUB.
Jellby is offline   Reply With Quote
Old 05-18-2009, 12:53 PM   #40
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
Quote:
Originally Posted by HarryT View Post
I gave up on "curly quotes" a long time ago, for that very reason. If the source has them, fine. If it doesn't, it's a virtually impossible task to programatically get them right; you certainly can't assume that they always occur in pairs!
I find that word does a pretty good job of handling curly quotes. It may not be perfect but does the right thing the majority of the time. You just turn on smart quotes and then replace " with " and the magic happens. Do it again with ' with '.

Dale
DaleDe is offline   Reply With Quote
Old 05-18-2009, 01:00 PM   #41
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
Quote:
Originally Posted by Jellby View Post
Hm... good question.

I don't think you can with CSS2, but maybe there are other means. See an example here (view the source). Now the question is whether this can be used with ePUB.
I have seen this done in ePUB but not all readers seem to support it. Generally it is add at the top of the xhtml file, not in the CSS file.

Dale
DaleDe is offline   Reply With Quote
Old 05-18-2009, 03:36 PM   #42
pepak
Guru
pepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura about
 
Posts: 610
Karma: 4150
Join Date: Mar 2008
Device: Sony Reader PRS-T3, Kobo Libra H2O
Quote:
Originally Posted by rogue_ronin View Post
Can you name your own entities with CSS2?
No, but you can name your own entities with XHTML :-)
pepak is offline   Reply With Quote
Old 05-18-2009, 11:56 PM   #43
rogue_ronin
Banned
rogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-books
 
Posts: 475
Karma: 796
Join Date: Sep 2008
Location: Honolulu
Device: Nokia 770 (fbreader)
So that means you could re-name them as well, right?

Such that for problems like $apos; -- you could use the common entity-name, then anyone that wanted to could redefine it as a $rsquo; or whatever they like.

Still, it'd be better if it could be called externally to the ebook file, since it's really a presentation issue, no?

m a r
rogue_ronin is offline   Reply With Quote
Old 05-19-2009, 06:08 AM   #44
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,515
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
I don't know if you can actually redefine &apos;, since it's one of the predefined XML entities, but it looks like you can redefine &mdash; and others. In any case, it's possible to create a new &ap; entity and make it look whatever you like. I'll do some tests with this...
Jellby is offline   Reply With Quote
Old 05-19-2009, 07:43 AM   #45
rogue_ronin
Banned
rogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-books
 
Posts: 475
Karma: 796
Join Date: Sep 2008
Location: Honolulu
Device: Nokia 770 (fbreader)
I was just looking at the CSS tutorial at w3schools.

I'm wondering if a different <link> tag in the head might allow you to call some customizations? I'm just spit-balling here -- I have no idea how the <link> tag works; but if the CSS file is read in the head, it makes me wonder if you could use one of the other rel="" attribute/values to call a set of entity redefinitions? Like maybe a "section" or a "bookmark"

Anyway, I'll leave it to you experts.

m a r

Last edited by rogue_ronin; 05-19-2009 at 07:44 AM. Reason: Add Link
rogue_ronin is offline   Reply With Quote
Reply

Tags
conversion, typography


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Kindle Typography ChaoZ Amazon Kindle 21 08-14-2010 12:50 PM
Is there hope for better ebook typography? tomsem Amazon Kindle 0 08-12-2010 10:44 PM
Typography on the iPad LDBoblo Apple Devices 1 04-14-2010 03:33 PM
French Typography ahi Workshop 14 09-16-2009 02:22 PM
Chinese Typography ahi Workshop 81 09-14-2009 09:34 AM


All times are GMT -4. The time now is 12:24 PM.


MobileRead.com is a privately owned, operated and funded community.