![]() |
#1 |
Evangelist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 435
Karma: 572984
Join Date: Jan 2010
Location: Long Island
Device: Kobo Libra 2, Kindle 4, Nook Gl4, Nook STR, REB 1100, Ebookwise 1500,
|
![]()
I'm working on a project in which I'm harvesting story links from downloaded webpages to use with the FFDL plugin for Calibre to import the actual story into Calibre. The result I'm trying to get is:
Code:
/works/1064569 Code:
/works/.+\d Code:
Line 72895: href="/works/1064569"</a><o:p></o:p></span></p> Line 72904: href="/works/1064569?show_comments=true&view_full_work=true#comments">270</a><o:p></o:p></span></p> Line 72911: href="/works/1064569?view_full_work=true#comments">229</a><o:p></o:p></span></p> Line 72917: "Times New Roman";mso-ansi-language:EN'><a href="/works/1064569/bookmarks">21</a><o:p></o:p></span></p> Code:
/works/.+[0-9] Code:
/works/1064569 /works/1064569?show_comments=true&view_full_work=true#comments">270 /works/1064569?view_full_work=true#comments">229 /works/1064569/bookmarks">21 Is there a way to get the results to end at the first string of numbers? I don't care if I get this as a result: Code:
/works/1064569 /works/1064569 /works/1064569 /works/1064569 Notepad++ and OpenOffice are latest version as of 6/30, OS is either Win 7 Pro or Win XP Pro depending on what computer I'm on at the moment. I tried jEdit and EditPad Lite but neither seem to let you do Find All searches. Thanks |
![]() |
![]() |
![]() |
#2 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,572
Karma: 204127028
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
I think you may be a little misguided about what you can actually accomplish with text editors and their Search & Replace features (regex or otherwise).
You can find a pattern in a document (just like Notepad++ is dutifully doing for you), and step through them one at a time. Once found, you can replace that pattern with something else (sometimes using complicated captures from the original pattern-match). Then you can move on to the next occurrence of the pattern and replace IT with something (or skip it and move on to the next). Or you can replace all occurrences of the matched pattern in one fell swoop. What you cannot do is extract all occurrences of a particular pattern--effectively getting rid of everything else. For that you would probably need to script a solution to extract the info (quite possibly using that scripting language's regex capabilities) that you want. As far as your original regular expression to find the pieces you want: (judging by the limited amount of data I can see) I would think something like Code:
/works/\d+ I would change your second expression (the one you were using in OpenOffice) to: Code:
/works/[0-9]+ You were getting bit by regex's greediness in your expressions. Not sure how (or if) you can control Notepad++ or OpenOffice's greedy/non-greedy behavior. I tend to build expressions (wherever possible) that don't rely on manipulating the greedy/non-greedy behavior. Last edited by DiapDealer; 07-01-2014 at 02:40 PM. |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Evangelist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 435
Karma: 572984
Join Date: Jan 2010
Location: Long Island
Device: Kobo Libra 2, Kindle 4, Nook Gl4, Nook STR, REB 1100, Ebookwise 1500,
|
Thank you! That gave me exactly what I was looking for.
![]() Re-reading what I wrote I realized that I wasn't specific in the whole process I use. Notepad++ drops the results of the search into a separate window that I can then copy so I only have the lines with the links I'm looking for, and not all the miscellaneous coding that's in the file. I paste those results into an OpenOffice document where I run the slightly modified search string using Find All and it highlights the results. With everything still highlighted I can then copy and paste just that data into another OpenOffice document. And, voila, I have the list of links that I was looking for. It would be awesome if OpenOffice had a Select Inverse like I use in Photoshop but this gets the job done. |
![]() |
![]() |
![]() |
#4 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,572
Karma: 204127028
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Glad to help.
![]() |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Need help for a regex | wobohohoho | Sigil | 4 | 01-02-2013 04:42 AM |
Regex help | paulfiera | Sigil | 4 | 06-14-2012 07:55 AM |
RegEx Help | ghostyjack | Workshop | 4 | 03-22-2012 09:24 AM |
regex help please | thevoiceofcheese | Calibre | 2 | 08-01-2011 11:27 PM |
Help with a regex | A.T.E. | Calibre | 1 | 04-05-2010 07:50 AM |