![]() |
#16 | ||||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
Code:
This is an example of code. Quote:
It's a giant step above Word's horrible HTML, but there are better and more reliable conversion methods out there. Quote:
Not too get too technical, but Sigil uses this large program called Qt to display the GUI and code (similar to a web browser). Qt has been updating/changing its own guts, which has made it incompatible with Sigil's Book View. Diap and Kevin have been bandaiding over these Qt<->Book View bugs for a long time, but it has reached a point where it is HELL, so they finally decided to cut the cord. ![]() Doing this will make maintaining Sigil much easier, as they won't have to be wasting time on all those bandaids, and can spend more time making Sigil better and more stable than it already is. Quote:
And learn how to use Styles! And teach that author how to use Styles too! Even if they use Pages, it has similar functionality: Pages for Mac: Intro to paragraph styles Pages for Mac: Create, rename, or delete paragraph styles in a Pages document All the Styles logic should be pretty similar across Word/LibreOffice/Pages, it's just the steps are slightly different. Last edited by Tex2002ans; 07-27-2019 at 04:00 PM. |
||||
![]() |
![]() |
![]() |
#17 | |
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 21,612
Karma: 29710338
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
That set of styles is the default template from MS Word 2016 - its a good place to start. BR Last edited by BetterRed; 07-27-2019 at 08:22 PM. |
|
![]() |
![]() |
Advert | |
|
![]() |
#18 |
mostly an observer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,518
Karma: 987654
Join Date: Dec 2012
Device: Kindle
|
@Lizard: you might try Word2CleanHtml dot Com online. I've used it for years with never a hiccup.
|
![]() |
![]() |
![]() |
#19 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,450
Karma: 5703586
Join Date: Nov 2009
Device: many
|
We have also added code to PageEdit to clean up pastes from Word that use lots of long html comments, styles tags in the body, and o: prefixed p tags.
This leaves pretty clean xhtml at the end. This is still in the testing stage, and I hope to make it a user preference setting for the next Version of PageEdit. |
![]() |
![]() |
![]() |
#20 | |
Bookmaker & Cat Slave
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 11,503
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
|
Quote:
Secondly--if you're cleaning up the file in Word and, presumably, using Styles and headings in so doing, why aren't you simply exporting the file as filtered HTML and then putting it into Sigil? Rather than pasting plain text and having to add in italics, bold, etc.? That's...well. If you use Styles and Headings, the HTML output file will be squeaky clean. If your output file isn't squeaky clean, you should--as a commercial formatter--be cleaning it, before putting it into Sigil (or when it's in Sigil) so that the crap coding doesn't cause problems in the book, right? (I just read--you don't know Styles? How on earth are you working with CSS, if you don't? They're the same exact thing! If you're not using Styles, to format this guy's print and eBooks, how the hell ARE you doing this work????) So, if you don't know how to use Sigil to do that, and you don't know how to use Word's built-in Styles and Headings (which are actually Word's superpower, and if you're not using them, you're wasting a crapload of your time), then clean the exported HTML file in a text editor that works on your Mac, and then dump it into Sigil. Presumably, you're not simply pasting crap Word code into Sigil and then splitting chapters and calling it an ePUB, so, the easy way is to simply export the STYLED Word doc into HTML and then put that into Sigil. It's the right way to be working on ePUBs, anyway, if you're not already doing that. I guess you're what, creating a Stylesheet, post-facto and assigning the paragraph styles, etc., manually? Geeeze, that's the long way around... And while Toxaris' plugin is for Windows only, unfortunately, there are other ways of cleaning HTML auto-magically if you're not doing it yourself. I mean... a long, long time ago, I did a book the way you're saying, having to manually add all those italics back, and I swore I'd certainly never do THAT again! That way lies madness. Hitch |
|
![]() |
![]() |
Advert | |
|
![]() |
#21 | |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 79,022
Karma: 144284074
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Quote:
I agree that not many people use styles and for making an eBook, that is important. You write your book, send it off to be converted to an eBook. The person making the eBook spends a long time fixing it and converting and gets it looking good. Then you g and make a number of changes in different places and you send the file over to again be converted. You are then making it a reall hardship for the person making your eBook as that person has to (once again) recreate the eBook from your mess. That's not being nice at all. |
|
![]() |
![]() |
![]() |
#22 | |
Bookmaker & Cat Slave
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 11,503
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
|
Quote:
I mean, if not...daaaayaaam. Hitch |
|
![]() |
![]() |
![]() |
#23 |
just an egg
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,793
Karma: 6758980
Join Date: Mar 2015
Device: Kindle, iOS
|
I have an atrocious Word Doc that resulted from scanning a paper book, and this thread encouraged me to finally attempt to convert it into an epub.
The DocXImport plugin was perfect (thank you DiapDealer!), but I also tried saving the Word Doc as filtered html, which resulted in a lot of crazy in-line styles. I figured I could do a nuclear search/replace to clean it up, but my regex skills are weak. <p.*?> picked up this: Code:
<p class=MsoNormal style='margin-left:.15in;line-height:13.1pt;background:white'> Code:
<p class=MsoNormal style='margin-top:26.3pt;margin-right:.25pt;margin-bottom: 0in;margin-left:1.7pt;margin-bottom:.0001pt;text-align:justify;text-justify: inter-ideograph;text-indent:11.15pt;line-height:12.95pt;background:white'> Help, please, thank you! |
![]() |
![]() |
![]() |
#24 |
Interested in the matter
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 421
Karma: 426094
Join Date: Dec 2011
Location: Spain, south coast
Device: Pocketbook InkPad 3
|
The only thing that occurs to me is that the "rebel" text is not in a single line.
|
![]() |
![]() |
![]() |
#25 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,680
Karma: 23983815
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
Alternatively, you should be able to use: Code:
<p[^>]+> |
|
![]() |
![]() |
![]() |
#26 |
just an egg
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,793
Karma: 6758980
Join Date: Mar 2015
Device: Kindle, iOS
|
Ahhhh!
![]() ![]() ![]() Thank you Doitsu and jbacelar for supporting the "oda is trying to understand regex even though it makes her brain curl up in a fetal ball and whimper" project. I am making slow but steady progress ![]() |
![]() |
![]() |
![]() |
#27 | |
Bibliophagist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 44,648
Karma: 168431851
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
Quote:
![]() |
|
![]() |
![]() |
![]() |
#28 | |
just an egg
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,793
Karma: 6758980
Join Date: Mar 2015
Device: Kindle, iOS
|
Quote:
![]() Trying to grasp the difference between using .*? vs [^>]+> I get that .*? is limited to a single line, whereas [^>]+> isn't, and that's what messed me up here. But are there other advantages to using the [^x]+x construction over .*? |
|
![]() |
![]() |
![]() |
#29 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
#1: [^>]+ = Any character that's NOT a '>' 1 or more times. #2: .* = ANYTHING 0 or more times. Note: The '?' at the end makes it "less greedy". So it tries to match the least amount. So, what #1 is saying is: Is the next character a '>'? No? Keep going until you hit a '>'. If you hit the end of a line, keep going. What #2 is saying is: Is this anything? Yes? Keep going until you hit a '>'. If you hit the end of a line, stop. There are some advantages either method, but if you don't know enough about Regex, it'll probably just confuse you. Just know that Regex #1 would be "less dangerous", and #2 could potentially be much more dangerous. Sometimes you accidentally Replace All and big chunks of your text got deleted because of some weird edge case. :P Edit: Actually mixed up some explanations. Last edited by Tex2002ans; 08-04-2019 at 12:04 AM. |
|
![]() |
![]() |
![]() |
#30 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,352
Karma: 203720150
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
The main difference is that <p[^>]+> will never get "greedy" no matter what kind of regex engine or settings might be being used. Whereas <p.*?> is inherently greedy and can include more than you want.
It's just generally considered a bit universally safer when trying to limit your match to within a single html element. It will never cross the single tag boundary. Try your original regex <p.*?> on the following markup for example: Code:
<p><span class="rule_24">The Everyman’s</span> description</p> ![]() |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
A Few Questions | KayLee | Calibre | 32 | 04-21-2016 11:37 AM |
Various questions | AlexBell | Upload Help | 3 | 06-13-2013 03:16 AM |
Two Questions | nynaevelan | Calibre | 19 | 10-30-2010 06:39 PM |
K3 Here, Have any questions? | Anarel | Amazon Kindle | 15 | 08-26-2010 08:34 PM |
a few questions | Thetaeta | Which one should I buy? | 4 | 07-31-2008 11:15 PM |