08-18-2017 3:48 PM
Hi,
I'm very new to Regex, in fact I'm very new to ABAP altogether 🙂
I'm trying to figure out why I have a difference in my result when I use this piece of code vs when I use the Regex Toy.
What I'm trying to do is replace the digits after the last "."
My Regex is [^.]*$ for the text 'DV-102.1.1' replace with 2
Regex toy gives me the correct answer.... DV-102.1.2
with this snippet of ABAP code however it doesn't... it gives me DV-2
REPORT znw_regex_play.
DATA lv_count TYPE i value 1.
DATA(lv_new_wbs_no) = lv_count + 1.
DATA(lv_val) = 'DV-102.1.1'.
SPLIT lv_val AT match( val = lv_val
regex = '[^.]*$' ) INTO DATA(lv_wbs_part1) DATA(lv_wbs_part2).
lv_val = lv_wbs_part1 && lv_new_wbs_no .
write: lv_val.
Can anyone tell me where I've gone wrong?
thanks.
08-19-2017 9:57 AM
DEMO_REGEX_TOY (find regex) and match, work identically. With regex [^.]*$ applied to 'DV-102.1.1' they both return "1".
Your issue is only with the SPLIT, because SPLIT 'DV-102.1.1' AT '1' INTO part1 part2 gives the 2 segments 'DV-' and '02.1.1'.
There are many ways to do what you want. I would opt for
REPLACE REGEX '[^.]*$' IN lv_val WITH lv_new_wbs_no.
or
lv_val = replace( val = lv_val regex = '[^.]*$' with = lv_new_wbs_no ).
08-18-2017 5:23 PM
Two problems:
The "." means match a single character. So to find a real "." you have to escape it using the backslash:
[^\.]*$
Secondly, I think your code won't replace the last one. It looks like it will find the last "1" using the regex and then match the first "1" on the replace portion.
A straight regex alternative is to do two matches up to and after the last "." and increment the second match by 1:
mystr = `1.1.1.1.1`.
mystr = |{ match( val = mystr regex = `.*\.` }| &&
|{ conv i( match( val = mystr regex = `[^\.]+$` ) + 1 }|.
"Result: 1.1.1.1.2
08-19-2017 10:02 AM
It's a good habit to always escape special characters, but for value sets (inside [...]) only the following characters \ [ ] (as far as I know) are considered to be special characters and need to be escaped.
08-19-2017 9:57 AM
DEMO_REGEX_TOY (find regex) and match, work identically. With regex [^.]*$ applied to 'DV-102.1.1' they both return "1".
Your issue is only with the SPLIT, because SPLIT 'DV-102.1.1' AT '1' INTO part1 part2 gives the 2 segments 'DV-' and '02.1.1'.
There are many ways to do what you want. I would opt for
REPLACE REGEX '[^.]*$' IN lv_val WITH lv_new_wbs_no.
or
lv_val = replace( val = lv_val regex = '[^.]*$' with = lv_new_wbs_no ).
08-19-2017 10:24 PM
Very interesting, I learnt something.
So, the replace command is position-aware when using regex. I thought regex behaved as a match criteria. So the first two were equivalent in my understanding up to now:
DATA(mystr) = `10.10.10`.
WRITE : / replace( val = mystr
regex = `[^.]*$` "finds `10`
with = `99` ). "10.10.99 << Position aware
WRITE : / replace( val = mystr
sub = `10`
with = `99` ). "99.10.10 << Simple match
WRITE : / replace( val = mystr
sub = match( val = mystr regex = `[^.]*$` ) "finds `10`
with = `99` ). "99.10.10 <<<< OP's issue
"And just for fun:
WRITE : / replace( val = `10.10.10`
regex = `([0-9]+)\.([0-9]+)\.([0-9]+)`
with = `$1.99.$3` ). "10.99.10
So the OP's scenario was the equivalent of my third example.
But I wasn't aware that the 'with' clause will replace the string when using sub =, but will replace implicit group $0, not the match string, when using regex =. I don't think I'll be the only one to be caught out by this dual nature, the doco is a little vague.
08-21-2017 7:03 AM
I do not agree.
As DEMO_REGEX_TOY shows, the regex [^.]*$ matches the last occurrence of "10" because the $ sign denotes the end of a line (as documented); leave away the $ and it will match the first "10".
Therefore,
You conclusion about a double nature is not valid and there is no vagueness in the docu. I assume, you missed the meaning of $ in the pattern and you misinterpreted the function match and its regex behind sub.
08-21-2017 7:09 AM
Horst,
I searched for the documentation listing all the possible combinations of keywords like $ * [ ] ^ etc and their meaning but so far I got only the below mentioned one.
https://help.sap.com/doc/abapdocu_750_index_htm/7.50/en-US/abenregex_syntax_specials.htm
Kindly share if you are referring to any other documentation in addition to this.Thanks.
K.Kiran.
08-21-2017 7:44 AM
08-21-2017 9:27 AM
Very confusing to understand the keywords associated with REGEX.Seems a few trail and error iterations are needed before we come to a conculsion on each of the keywords usage.
K.Kiran.
08-21-2017 11:45 AM
Hi Horst,
I think you either misunderstood or we are coming from different points of view. You know the REPLACE command very well with it's full regex incarnation so to you it may not be that obvious.
I used REPLACE and regex well before ABAP did regex. In days of old, REPLACE did the searching and WITH was what you put in place of the string it found. The addition of regex was just another fancy way to specify the string to search for, i.e. look for the string provided by regex. I know now this statement is incorrect, but that was the assumption I made, and the doco for REPLACE is not totally unambiguous on that.
This is the semantic difference I tried to explain when I said that WITH is position-aware when used with regex. The double nature is that with a search string REPLACE is actively searching for a string, whereas with regex, the search is being done by the regex expression engine... if that makes sense.
And thanks for the regex syntax links, the latest docs are very useful. From what I remember they were a bit lacking when regex was first introduced so I tended to stick with non-SAP info.
08-21-2017 12:48 PM
Hi Mike,
Why is the statement "The addition of regex was just another fancy way to specify the string to search for, i.e. look for the string provided by regex." incorrect? It is perfectly correct. Therefore, I still don't see any "unambiguousity" in the documentation.
The ABAP docu for the REPLACE statement as well as for the replace function say, that there is a search for a match with the substring specified in substring or with the regular expression specified in regex and that the occurrence is replaced.
Therefore, if you want to speak about "position awareness", well both variants are position aware, because in both cases the found occurrences are replaced at their position. Most primitive example:
DATA(text) = `abcdef`.
REPLACE SUBSTRING `de` IN text WITH `xx`.
same as
DATA(text) = `abcdef`.
REPLACE REGEX `de` IN text WITH `xx`.
No difference in substrings and regexes. Both find "de" at offset 2. As long no special characters are used they do the same. If special characters are used, regexes find other things in other positions of course. But that is no surprise, or?
Please point out, why you think that regexes are differently "position aware" then substrings. I will happily correct the documentation, but up to now, I don't see the point.
08-21-2017 4:13 PM
Hi Horst,
I'm talking about the resulting string versus $0. By 'string provided by regex' I meant the result string of a regex search. I had understood the addition of REGEX to ABAP's REPLACE back in 4.x to act as a short notation for:
REPLACE SUBSTRING match( regex = ... ) IN text WITH `xx` .
By position-aware, I mean that a regex result string (`de`) and a regex result ($0) are two different concepts, one having a position in the source (internally anyway) and the other not.
So again in a shorter example:
data(text) = `abcde-de`.
REPLACE SUBSTRING match( str = text regex = `de$` ) IN text WITH `xx`.
REPLACE REGEX `de$` IN text WITH `xx`.
These are very different, I get that. I had assumed the first variant applied when regex initially came to ABAP. Personally I found that a casual read of the doc hints more at equivalence to SUBSTRING, as both search methods are described in the same ORed phrase:
a match with the substring specified in substring or with the regular expression specified in regex. Maybe I'm the only one who misunderstood it... maybe not.
As an aside, UK Postal Codes are a nice example where a single expression can encode a whole lot of seemingly arbitrary rules in one expression.
08-21-2017 4:37 PM
"Maybe I'm the only one who misunderstood it... maybe not."
I hope that you are the only one 😉
Your last example: they are very different, yes of course. Because REPLACE REGEX simply isn't a short form of
REPLACE SUBSTRING match(...) ...
That would make no sense, or? In fact it would involve two searches, one for the pattern and a second for the match. But finally I see that you assumed exactly that and therefore all the confusion.
In ABAP, any pattern based REPLACE is rather a short form of
FIND SUBSTRING|REGEX ...
MATCH OFFSET off
MATCH LENGTH len.
REPLACE SECTION OFFSET off LENGTH len ...
So, there is first a search for a substring or a pattern and then a replacement of the section found. And this also explains, why they are handled similarly in the documentation. For FIND you would never expect that after finding a match, another search for the match itself will take place.
Feel free to suggest how that can be made more clear.
08-21-2017 4:58 PM
"That would make no sense."
A lot of things in ABAP make no sense 🙂
I can't remember the reasons why, but nevertheless that's how I understood ABAP regex to work since I first used it.
Sometimes things don't make sense and the easiest is to shrug shoulders, accept the way it is and move on. bool vs xsdbool comes to mind...
08-21-2017 6:20 AM
Neil,
For additional Info and ready referrence.
https://help.sap.com/doc/abapdocu_750_index_htm/7.50/en-US/abenregex_syntax_specials.htm
K.Kiran.
08-21-2017 6:30 PM
REGEX is used to validate the specific format telephone format, zipcode format. but not replacing a value.
for your requirement use SPLIT into table at ".", so the data after last dot comes into last record of the table. increment it by 1 again concatenate the same set of records from table separated by dot.
08-22-2017 4:48 AM
Hi,
There is a long discussion going on within this thread.Did you bother to read before answering ?
K.Kiran.
08-22-2017 7:00 AM
08-29-2017 1:27 PM
wow this generated a lot of discussion which is great!
I debugged what regex toy was doing and opted for the REPLACE solution instead of the SPLIT.
In the end my code is pretty much identical to what Sandra suggested.
thanks everyone for the input. very much appreciated.