Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Plugins

Notices

Reply
 
Thread Tools Search this Thread
Old 11-17-2017, 02:54 AM   #2536
DarkMagda
Junior Member
DarkMagda began at the beginning.
 
Posts: 5
Karma: 10
Join Date: May 2015
Device: Kindle Fire HD8
Thank You! I used edge instead of firefox and it worked just fine.
DarkMagda is offline   Reply With Quote
Advert
Old 11-17-2017, 06:30 PM   #2537
mehetabelo
e-Bibliophile
mehetabelo began at the beginning.
 
mehetabelo's Avatar
 
Posts: 57
Karma: 10
Join Date: Jun 2009
Location: California
Device: Paperwhite 1-3, Kobo AuraHD, Boox Afterglow2
I know that this is probably not an easy thing to do - but I have a suggestion for adding an option into the interface.

Somewhere, it would be *really* helpful to have a regex option for find/replace that directly applies to the html. This suggestion is based on spending several hours (maybe even days) of time working through some of the newer sites and fixing the html particularly the paragraph setup so that they show properly on my ereaders. It's not the fault of FFF, but rather the shitty HTML on the sites.

I've been running into having multiple <br> lines, or an extra <p> </p> between, or even a paragraph with <p> </p> as the separator (but not actually showing identifying the paragraph) an example of the last is:
<p> </p>
Paragraph X.
<p> </p>
Paragraph Y.
<p> </p>
Which in a kindle leads to a massive cluster because it doesn't read these as it is intended (correct or not). There are other things like &quote appears fairly commonly, instead of " symbols on some older html from webnovel. But it's weird, as it only does it sometimes and I've literally had a chapter that's good, one bad and the next good again.

Different sites do different things and I've worked out some regex replacements that I use in the convert dialog, or (mostly used) in the editor dialog.

Generally these regex are specific enough they should not effect each other, or screw up good text. My problem is anytime I get an update I have to fix the book again with the new chapters. That's why I'm asking for this option. I'm not sure exactly

Since the needed regex would likely vary a little from site to site, I'm Not sure exactly how it should be done, maybe within the preferences (probably best) or maybe separately, but it'd be good to have it available by site.

This is just a suggestion, but I believe it's a good one.

Last edited by mehetabelo; 11-17-2017 at 06:35 PM.
mehetabelo is offline   Reply With Quote
Old 11-17-2017, 09:12 PM   #2538
JimmXinu
Plugin Developer
JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.
 
JimmXinu's Avatar
 
Posts: 3,142
Karma: 587516
Join Date: Dec 2011
Location: Midwest USA
Device: Kindle Voyage
Quote:
Originally Posted by mehetabelo View Post
Somewhere, it would be *really* helpful to have a regex option for find/replace that directly applies to the html.
Have you tried the replace_br_with_p option? That might help with some of it.

This idea has been raised before. I'm reluctant to implement such a feature in FanFicFare for a couple reasons:

First, it would be a very dangerous feature that could very, very easily be misconfigured to destroy the text it's trying to fix. It's a shotgun equipped with a stirrup on the end--you inevitably will shoot yourself in the foot with it. Plus you'll also end up hitting sites over and over again while fine tuning the settings.

Second, IMO, it's really outside of scope for FFF to be doing that level of editing on the story text. I know it can be done in Calibre's Edit Book, and while I haven't experimented with much, I know that Calibre's Convert has a 'Search & replace' regexp feature that can load and save lists of replacements.

You may then wonder about the replace_br_with_p option in FFF I just mentioned? That feature was contributed by another developer (Asbjørn Grandt). I was a bit hesitant about it, but I've since come to use it myself for some sites.

But earlier this year I discovered that it was accidentally corrupting the story text when run more than once on the same HTML during some updates. It's since fixed to only run once, but it was a reminder that such processing is dangerous.
JimmXinu is offline   Reply With Quote
Old 11-18-2017, 03:25 PM   #2539
mehetabelo
e-Bibliophile
mehetabelo began at the beginning.
 
mehetabelo's Avatar
 
Posts: 57
Karma: 10
Join Date: Jun 2009
Location: California
Device: Paperwhite 1-3, Kobo AuraHD, Boox Afterglow2
First, I don't want you to feel pressured to make the option. I want you to really consider it, and this post is for that specifically. Yes, I want the option but I'm not a programmer, so I'm not going to be the one doing any of the work (should you actually decide to do it).

I'll address the first comment by saying yes, I have tried the replace with br option. It doesn't always work, but it did cut down on the issues I had with several sites. And it works well enough most of the time for them. It's not perfect, and I'd have to test it to see why, but I know it doesn't fix all the <br> problems I did have.

Next, the dangerous part. Honestly you're right, it *could* be a dangerous feature.

As a note, I'd try implementing it myself if I knew how to program so that others wouldn't have to deal with the possibility, and I'd be content instead of trying to fix each book every time I try and read a new chapter. I'm not exactly OCD, but when I am reading and a paragraph is spaced far apart on my kindle (with a limited screen size to start with), it makes my mind go into fits. I HATE it, and thus have to fix it.

Back to be dangerous.

First, honestly how dangerous is it really? With a little time and effort the find/replace setup isn't that hard to figure out, and if it's screwed up, a story can be re-downloaded. If it's a chapter or two, it's easy to delete the chapter and re-download it after disabling (or fixing) the regex. As long as it's the same, or similar to a standard regex format (as used within the editor) I've already got 8-9 setup that haven't had any real problems when I've used them within the editor.

Second, you could specifically place a statement/note somewhere (on boards, in the INI, whatever) that this is unsupported. Beyond a basic rundown of the setup and an example regex maybe, noting if there are non-standard regex then point out the differences. People that want to use it can use it, and people that don't, don't need to bother with it. From past history on this thread, there are a minimum of a dozen (likely many more) people that have more than a little programming experience and have contributed in some way to the project. These people would likely find the regex feature nice when addressing formatting issues from certain sites. I have a fairly limited list I download from, and of those the only one that never has problems is fanfiction.net, maybe AOOO (though I don't download much from that site). Many of these site issues are fixed with replace_br_with_p option - but again it doesn't always work for the BR tags, and even then, it doesn't address other problems.

Next is your mention of being outside the scope of editing the story text. Yes, this could truly edit story text if needed, but the need called for is more of a formatting issue. Let me give you a couple examples of my regex find/replace statements I use. I don't have them all with me, as I'm traveling again, but I have some of the more common ones I use just for cleaning up spacing and major issues.

First is for empty paragraphs. I started with:
Find:<p>[^\S\r\n]{1,}<p>
But after many, many hours of use and modification, I believe (not 100% sure) the final one came to:
Find[\r\n])?<p[^>]*>(\s)?(<span[^>]*>)?(\s)?(<br/>|<br>)?(\s)?(</span>)?(\s)?</p>([\r\n])?
Replace:
Yes, replace with nothing. This specific function removes any empty paragraphs I have found so that I'm not faced with larges spaces between paragraphs. I've yet to see it fail or screw anything up, and I've used it in dozens, if not hundreds of different downloaded books. RoyalRoad, Wuxiaworld, and Webnovel are generally the targets of this specific F&R.
There are also some odder <p> setups I've seen used like:
<p class="p_line_space"> </p> that I believe this covers. Just to be clear, I'm pretty sure the <p class="p_line_space"> was in a fff download, but I can't be 100% sure as I was not keeping track of everything I've had to fix. Most of them (easily 95%+) have been FFF, but some were not.

Next
Find:&amp;quot
Replace:"
I previously mentioned this issue and it's a weird one on webnovel. I can find a handful of chapters with this problem, followed by good chapters, then bad chapters again. It's in no way consistent but is annoying as all get out when you run into it while reading. There was another that was an apostrophe problem that occurred as well, but it was rarer, and I don't have the current statement in my laptop.

Another, this on for BR stuff that pops up - these are specifically for places where I found the replace_br_with_p had not worked right (possibly because it was the older code you mentioned). Sometimes I download stories and put them on my kindle but don't get to them for months.
Find\s)?([\r\n])?(<br/>|<br>)(\s)?(<br/>|<br>)?(\s)?(<br/>|<br>)?(\s)?(<br/>|<br>)?(\s)?([\r\n])?
Replace:</p>\r\n<p>
this is also used in conjunction with:

Find:chapter-content">(<p>|<br/>|<br>)?(\s)?
Replace:chapter-content">\r\n<p>

Of note, I don't think that was the actual finished F&R statement, I had some modifications I made to it, but it cleaned up the paragraphs a little more so the first one (after chapter header) had a starting paragraph mark. I also had made one for the ending paragraph mark as well. However, I do not have it on this laptop - I was just using cleanup HTML as a quick fix for a while.

These were three of my most used ones and are therefore on my laptop for quick fixes when I was on the road. I don't have the others with me.

Now, looking at these, there is nothing here modifying any of the content, just spacing and fixing the quote mark. We're not doing any actual editing of the story, we're just cleaning the crap HTML that's downloaded. I have found that of all the stories I've had to edit to fix what they look like, probably less than 2% are actual fiction. Generally published stories have had a good editing job done and are clean without fixes. They're not perfect, and I have had to edit afew, but the vast majority of content requiring editing is specifically FFF downloaded stuff. Since this is the case, should it not be a part of the program to allow for something to edit and fix the formatting as they download?

You yourself said:
Quote:
Originally Posted by JimmXinu View Post
You may then wonder about the replace_br_with_p option in FFF I just mentioned? That feature was contributed by another developer (Asbjørn Grandt). I was a bit hesitant about it, but I've since come to use it myself for some sites.
Think about this for a little while, and consider again. Would this not be likely something similar? I don't know what sites you do use and if you have problems on them that are not addressed by the replace_br function, but if you do have some after spending a little while to make a correct F&R statement, you'd never have to worry about them again. I have had a few small issues with the F&R, such as the br statement generating too many empty paragraphs, (people using a combination of br and p in their html causes this) but I generally use both, with the br statement first and it fixes everything.

Now, I know some of this can be done through other functions of heuristic processing, as an option in the convert box or can be done in the search and replace function in the convert box, but both options do not work the way intended at all times. I very rarely find the heuristic 'Delete Blank lines between paragraphs' works well. That may be because it works a lot of the time, but I only see the times when it doesn't (and am forced to edit it myself). The second option using the search/replace function built into convert. I have set these up in the past, and it works pretty good. However, I find that I have to load this often when I convert for some reason, though not every time. This means that if I don't remember I have to go back and reconvert the document. If I could automate it, it'd be easier to use this function.

I know I rambled on a lot, but I'd like you to really consider the good and bad this could do - yes, people could really screw up their downloads, but realistically if they do some of their own research regex is not hard to learn. I've asked question in the past, but mostly I learned from researching through different boards. Most people will make a few statements, make sure it's working and just let it run in the background and never have to bother with it. Those people that do use it (and need it) will love the feature. I was not kidding when I stated how long I've spent manually editing files to get them to a place where they were readable on my kindle. The overall time is dozens and dozens of hours between all the different stories I've edited. I did use the built in S&R function, even replacing my original epub at times, but I've got to do it over and over again, and I think this would be a better way of doing it.

I don't know how much more work it would be, but if you're that worried about people screwing up downloads, you could even had a secondary button in the personal.ini tab that allows enabling/disabling with a specific warning about how messed up it could make your document. I honestly don't think it would require that level of caution, but it would make it very clear you're not responsible and make things easier to turn off/on for someone to check and see if that's the problem - should one occur.

Anyway, that's my thoughts. I spent a good week thinking this over before I even posted the suggestion. I hope you truly consider it, and maybe other people may want to chime in and say if they think it's a good idea. I know the board isn't terribly busy, and I don't know how many people keep up to date (I generally check every couple days) but I really do think it would be useful for lots of people.
mehetabelo is offline   Reply With Quote
Old 11-19-2017, 11:43 AM   #2540
cryzed
Evangelist
cryzed ought to be getting tired of karma fortunes by now.cryzed ought to be getting tired of karma fortunes by now.cryzed ought to be getting tired of karma fortunes by now.cryzed ought to be getting tired of karma fortunes by now.cryzed ought to be getting tired of karma fortunes by now.cryzed ought to be getting tired of karma fortunes by now.cryzed ought to be getting tired of karma fortunes by now.cryzed ought to be getting tired of karma fortunes by now.cryzed ought to be getting tired of karma fortunes by now.cryzed ought to be getting tired of karma fortunes by now.cryzed ought to be getting tired of karma fortunes by now.
 
cryzed's Avatar
 
Posts: 403
Karma: 1050547
Join Date: Mar 2011
Device: Kindle Oasis 2
I admit I haven't read the post in its full detail, but modifying HTML with regular expressions is a bad idea for anything that exceeds trivial changes -- and this is quite clearly complex and handles many corner cases.

This would be much easier and more sane by simply using the parsed DOM tree (BeautifulSoup object) to make the desired changes from within FFF.
cryzed is offline   Reply With Quote
Old 11-19-2017, 12:17 PM   #2541
JimmXinu
Plugin Developer
JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.
 
JimmXinu's Avatar
 
Posts: 3,142
Karma: 587516
Join Date: Dec 2011
Location: Midwest USA
Device: Kindle Voyage
As requested by mehetabelo, I have been thinking about this.

But I stand by my previous statement: It's a shotgun equipped with a stirrup on the end--you inevitably will shoot yourself in the foot with it.

It's just not something I want to have in software I support.
JimmXinu is offline   Reply With Quote
Old Today, 03:09 PM   #2542
Tanjamuse
Guru
Tanjamuse , Klaatu Barada Niktu!Tanjamuse , Klaatu Barada Niktu!Tanjamuse , Klaatu Barada Niktu!Tanjamuse , Klaatu Barada Niktu!Tanjamuse , Klaatu Barada Niktu!Tanjamuse , Klaatu Barada Niktu!Tanjamuse , Klaatu Barada Niktu!Tanjamuse , Klaatu Barada Niktu!Tanjamuse , Klaatu Barada Niktu!Tanjamuse , Klaatu Barada Niktu!Tanjamuse , Klaatu Barada Niktu!
 
Posts: 910
Karma: 5258
Join Date: Jan 2014
Device: Samsung Galaxy Tab 3
I tried searching to see if this has been answered before, but couldn't find anything.

Is there any way to collect the url for a series on AO3 when downloading metadata?

Example story url: https://archiveofourown.org/works/7695883

Example Series URL: https://archiveofourown.org/series/524566
Tanjamuse is offline   Reply With Quote
Old Today, 09:20 PM   #2543
JimmXinu
Plugin Developer
JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.
 
JimmXinu's Avatar
 
Posts: 3,142
Karma: 587516
Join Date: Dec 2011
Location: Midwest USA
Device: Kindle Voyage
Yes. Just the series URL is available in seriesUrl. AO3, because it allows multiple series has series00Url (same as value seriesUrl), plus series01Url - series03Url.
JimmXinu is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[GUI Plugin] Open With kiwidude Plugins 328 08-21-2017 12:34 AM
[GUI Plugin] KindleUnpack - The Plugin DiapDealer Plugins 403 07-29-2017 02:19 PM
[GUI Plugin] Marvin XD Philantrop Plugins 126 01-29-2017 01:48 PM
[GUI Plugin] KiNotes -axel- Plugins 0 07-14-2013 07:39 PM
[GUI Plugin] Plugin Updater **Deprecated** kiwidude Plugins 159 06-19-2011 01:27 PM


All times are GMT -4. The time now is 10:00 PM.


MobileRead.com is a privately owned, operated and funded community.