First, I don't want you to feel pressured to make the option. I want you to really consider it, and this post is for that specifically. Yes, I want the option but I'm not a programmer, so I'm not going to be the one doing any of the work (should you actually decide to do it).
I'll address the first comment by saying yes, I have tried the replace with br option. It doesn't always work, but it did cut down on the issues I had with several sites. And it works well enough most of the time for them. It's not perfect, and I'd have to test it to see why, but I know it doesn't fix all the <br> problems I did have.
Next, the dangerous part. Honestly you're right, it *could* be a dangerous feature.
As a note, I'd try implementing it myself if I knew how to program so that others wouldn't have to deal with the possibility, and I'd be content instead of trying to fix each book every time I try and read a new chapter. I'm not exactly OCD, but when I am reading and a paragraph is spaced far apart on my kindle (with a limited screen size to start with), it makes my mind go into fits. I HATE it, and thus have to fix it.
Back to be dangerous.
First, honestly how dangerous is it really? With a little time and effort the find/replace setup isn't that hard to figure out, and if it's screwed up, a story can be re-downloaded. If it's a chapter or two, it's easy to delete the chapter and re-download it after disabling (or fixing) the regex. As long as it's the same, or similar to a standard regex format (as used within the editor) I've already got 8-9 setup that haven't had any real problems when I've used them within the editor.
Second, you could specifically place a statement/note somewhere (on boards, in the INI, whatever) that this is unsupported. Beyond a basic rundown of the setup and an example regex maybe, noting if there are non-standard regex then point out the differences. People that want to use it can use it, and people that don't, don't need to bother with it. From past history on this thread, there are a minimum of a dozen (likely many more) people that have more than a little programming experience and have contributed in some way to the project. These people would likely find the regex feature nice when addressing formatting issues from certain sites. I have a fairly limited list I download from, and of those the only one that never has problems is fanfiction.net, maybe AOOO (though I don't download much from that site). Many of these site issues are fixed with replace_br_with_p option - but again it doesn't always work for the BR tags, and even then, it doesn't address other problems.
Next is your mention of being outside the scope of editing the story text. Yes, this could truly edit story text if needed, but the need called for is more of a formatting issue. Let me give you a couple examples of my regex find/replace statements I use. I don't have them all with me, as I'm traveling again, but I have some of the more common ones I use just for cleaning up spacing and major issues.
First is for empty paragraphs. I started with:
Find:<p>[^\S\r\n]{1,}<p>
But after many, many hours of use and modification, I believe (not 100% sure) the final one came to:
Find[\r\n])?<p[^>]*>(\s)?(<span[^>]*>)?(\s)?(<br/>|<br>)?(\s)?(</span>)?(\s)?</p>([\r\n])?
Replace:
Yes, replace with nothing. This specific function removes any empty paragraphs I have found so that I'm not faced with larges spaces between paragraphs. I've yet to see it fail or screw anything up, and I've used it in dozens, if not hundreds of different downloaded books. RoyalRoad, Wuxiaworld, and Webnovel are generally the targets of this specific F&R.
There are also some odder <p> setups I've seen used like:
<p class="p_line_space"> </p> that I believe this covers. Just to be clear, I'm pretty sure the <p class="p_line_space"> was in a fff download, but I can't be 100% sure as I was not keeping track of everything I've had to fix. Most of them (easily 95%+) have been FFF, but some were not.
Next
Find:&quot
Replace:"
I previously mentioned this issue and it's a weird one on webnovel. I can find a handful of chapters with this problem, followed by good chapters, then bad chapters again. It's in no way consistent but is annoying as all get out when you run into it while reading. There was another that was an apostrophe problem that occurred as well, but it was rarer, and I don't have the current statement in my laptop.
Another, this on for BR stuff that pops up - these are specifically for places where I found the replace_br_with_p had not worked right (possibly because it was the older code you mentioned). Sometimes I download stories and put them on my kindle but don't get to them for months.
Find\s)?([\r\n])?(<br/>|<br>)(\s)?(<br/>|<br>)?(\s)?(<br/>|<br>)?(\s)?(<br/>|<br>)?(\s)?([\r\n])?
Replace:</p>\r\n<p>
this is also used in conjunction with:
Find:chapter-content">(<p>|<br/>|<br>)?(\s)?
Replace:chapter-content">\r\n<p>
Of note, I don't think that was the actual finished F&R statement, I had some modifications I made to it, but it cleaned up the paragraphs a little more so the first one (after chapter header) had a starting paragraph mark. I also had made one for the ending paragraph mark as well. However, I do not have it on this laptop - I was just using cleanup HTML as a quick fix for a while.
These were three of my most used ones and are therefore on my laptop for quick fixes when I was on the road. I don't have the others with me.
Now, looking at these, there is nothing here modifying any of the content, just spacing and fixing the quote mark. We're not doing any actual editing of the story, we're just cleaning the crap HTML that's downloaded. I have found that of all the stories I've had to edit to fix what they look like, probably less than 2% are actual fiction. Generally published stories have had a good editing job done and are clean without fixes. They're not perfect, and I have had to edit afew, but the vast majority of content requiring editing is specifically FFF downloaded stuff. Since this is the case, should it not be a part of the program to allow for something to edit and fix the formatting as they download?
You yourself said:
Quote:
Originally Posted by JimmXinu
You may then wonder about the replace_br_with_p option in FFF I just mentioned? That feature was contributed by another developer (Asbjørn Grandt). I was a bit hesitant about it, but I've since come to use it myself for some sites.
|
Think about this for a little while, and consider again. Would this not be likely something similar? I don't know what sites you do use and if you have problems on them that are not addressed by the replace_br function, but if you do have some after spending a little while to make a correct F&R statement, you'd never have to worry about them again. I have had a few small issues with the F&R, such as the br statement generating too many empty paragraphs, (people using a combination of br and p in their html causes this) but I generally use both, with the br statement first and it fixes everything.
Now, I know some of this can be done through other functions of heuristic processing, as an option in the convert box or can be done in the search and replace function in the convert box, but both options do not work the way intended at all times. I very rarely find the heuristic 'Delete Blank lines between paragraphs' works well. That may be because it works a lot of the time, but I only see the times when it doesn't (and am forced to edit it myself). The second option using the search/replace function built into convert. I have set these up in the past, and it works pretty good. However, I find that I have to load this often when I convert for some reason, though not every time. This means that if I don't remember I have to go back and reconvert the document. If I could automate it, it'd be easier to use this function.
I know I rambled on a lot, but I'd like you to really consider the good and bad this could do - yes, people could really screw up their downloads, but realistically if they do some of their own research regex is not hard to learn. I've asked question in the past, but mostly I learned from researching through different boards. Most people will make a few statements, make sure it's working and just let it run in the background and never have to bother with it. Those people that do use it (and need it) will love the feature. I was not kidding when I stated how long I've spent manually editing files to get them to a place where they were readable on my kindle. The overall time is dozens and dozens of hours between all the different stories I've edited. I did use the built in S&R function, even replacing my original epub at times, but I've got to do it over and over again, and I think this would be a better way of doing it.
I don't know how much more work it would be, but if you're that worried about people screwing up downloads, you could even had a secondary button in the personal.ini tab that allows enabling/disabling with a specific warning about how messed up it could make your document. I honestly don't think it would require that level of caution, but it would make it very clear you're not responsible and make things easier to turn off/on for someone to check and see if that's the problem - should one occur.
Anyway, that's my thoughts. I spent a good week thinking this over before I even posted the suggestion. I hope you truly consider it, and maybe other people may want to chime in and say if they think it's a good idea. I know the board isn't terribly busy, and I don't know how many people keep up to date (I generally check every couple days) but I really do think it would be useful for lots of people.