Solved: Help with Regex please

smith_john · ‎08-18-2017

Hi,

I'm very new to Regex, in fact I'm very new to ABAP altogether 🙂

I'm trying to figure out why I have a difference in my result when I use this piece of code vs when I use the Regex Toy.

What I'm trying to do is replace the digits after the last "."

My Regex is [^.]*$ for the text 'DV-102.1.1' replace with 2

Regex toy gives me the correct answer.... DV-102.1.2

with this snippet of ABAP code however it doesn't... it gives me DV-2

REPORT znw_regex_play.

        DATA lv_count TYPE i value 1.
        DATA(lv_new_wbs_no) = lv_count + 1.
        DATA(lv_val) = 'DV-102.1.1'.
        
SPLIT lv_val AT match( val   = lv_val
                       regex = '[^.]*$' ) INTO DATA(lv_wbs_part1) DATA(lv_wbs_part2).
     
lv_val   = lv_wbs_part1 && lv_new_wbs_no .

        write: lv_val.

Can anyone tell me where I've gone wrong?

thanks.

Sandra_Rossi · ‎08-19-2017

DEMO_REGEX_TOY (find regex) and match, work identically. With regex [^.]*$ applied to 'DV-102.1.1' they both return "1".

Your issue is only with the SPLIT, because SPLIT 'DV-102.1.1' AT '1' INTO part1 part2 gives the 2 segments 'DV-' and '02.1.1'.

There are many ways to do what you want. I would opt for

REPLACE REGEX '[^.]*$' IN lv_val WITH lv_new_wbs_no.

or

lv_val = replace( val = lv_val regex = '[^.]*$' with = lv_new_wbs_no ).

pokrakam · ‎08-18-2017

Two problems:

The "." means match a single character. So to find a real "." you have to escape it using the backslash:

[^\.]*$

Secondly, I think your code won't replace the last one. It looks like it will find the last "1" using the regex and then match the first "1" on the replace portion.

A straight regex alternative is to do two matches up to and after the last "." and increment the second match by 1:

mystr = `1.1.1.1.1`. 
mystr = |{ match( val = mystr regex = `.*\.` }| &&
        |{ conv i( match( val = mystr regex = `[^\.]+$` ) + 1 }|.
"Result: 1.1.1.1.2

Sandra_Rossi · ‎08-19-2017

It's a good habit to always escape special characters, but for value sets (inside [...]) only the following characters \ [ ] (as far as I know) are considered to be special characters and need to be escaped.

Sandra_Rossi · ‎08-19-2017

DEMO_REGEX_TOY (find regex) and match, work identically. With regex [^.]*$ applied to 'DV-102.1.1' they both return "1".

Your issue is only with the SPLIT, because SPLIT 'DV-102.1.1' AT '1' INTO part1 part2 gives the 2 segments 'DV-' and '02.1.1'.

There are many ways to do what you want. I would opt for

REPLACE REGEX '[^.]*$' IN lv_val WITH lv_new_wbs_no.

or

lv_val = replace( val = lv_val regex = '[^.]*$' with = lv_new_wbs_no ).

pokrakam · ‎08-19-2017

Very interesting, I learnt something.

So, the replace command is position-aware when using regex. I thought regex behaved as a match criteria. So the first two were equivalent in my understanding up to now:

    DATA(mystr) = `10.10.10`.

    WRITE : / replace( val   = mystr
                       regex = `[^.]*$`  "finds `10`
                       with  = `99` ).   "10.10.99  << Position aware

    WRITE : / replace( val  = mystr
                       sub  = `10`
                       with = `99` ).    "99.10.10  << Simple match

    WRITE : / replace( val  = mystr
                       sub  = match( val = mystr regex = `[^.]*$` ) "finds `10`
                       with = `99` ).    "99.10.10   <<<< OP's issue

    "And just for fun:
    WRITE : / replace( val   = `10.10.10` 
                       regex = `([0-9]+)\.([0-9]+)\.([0-9]+)` 
                       with  = `$1.99.$3` ). "10.99.10

So the OP's scenario was the equivalent of my third example.

But I wasn't aware that the 'with' clause will replace the string when using sub =, but will replace implicit group $0, not the match string, when using regex =. I don't think I'll be the only one to be caught out by this dual nature, the doco is a little vague.

horst_keller · ‎08-21-2017

I do not agree.

As DEMO_REGEX_TOY shows, the regex [^.]*$ matches the last occurrence of "10" because the $ sign denotes the end of a line (as documented); leave away the $ and it will match the first "10".

Therefore,

your first replace function is not "position aware", but replaces simply the "10" directly before the end of line because this is matched by the regex.
your second replace function has nothing to do with regexes but replaces the first substring found
your third replace function does exactly the same as the second; you just use another way of writing the value "10" behind sub. The function match returns the value "10" that is used for sub. The regex inside match finds the last "10" but has no influence on the replacement at all.

You conclusion about a double nature is not valid and there is no vagueness in the docu. I assume, you missed the meaning of $ in the pattern and you misinterpreted the function match and its regex behind sub.

kiran_k8 · ‎08-21-2017

Horst,

I searched for the documentation listing all the possible combinations of keywords like $ * [ ] ^ etc and their meaning but so far I got only the below mentioned one.

https://help.sap.com/doc/abapdocu_750_index_htm/7.50/en-US/abenregex_syntax_specials.htm

Kindly share if you are referring to any other documentation in addition to this.Thanks.

K.Kiran.

horst_keller · ‎08-21-2017

https://help.sap.com/http.svc/rc/abapdocu_751_index_htm/7.51/en-US/index.htm?file=abenregex_syntax.h...

kiran_k8 · ‎08-21-2017

Very confusing to understand the keywords associated with REGEX.Seems a few trail and error iterations are needed before we come to a conculsion on each of the keywords usage.

K.Kiran.

pokrakam · ‎08-21-2017

Hi Horst,

I think you either misunderstood or we are coming from different points of view. You know the REPLACE command very well with it's full regex incarnation so to you it may not be that obvious.

I used REPLACE and regex well before ABAP did regex. In days of old, REPLACE did the searching and WITH was what you put in place of the string it found. The addition of regex was just another fancy way to specify the string to search for, i.e. look for the string provided by regex. I know now this statement is incorrect, but that was the assumption I made, and the doco for REPLACE is not totally unambiguous on that.

This is the semantic difference I tried to explain when I said that WITH is position-aware when used with regex. The double nature is that with a search string REPLACE is actively searching for a string, whereas with regex, the search is being done by the regex expression engine... if that makes sense.

And thanks for the regex syntax links, the latest docs are very useful. From what I remember they were a bit lacking when regex was first introduced so I tended to stick with non-SAP info.

horst_keller · ‎08-21-2017

Hi Mike,

Why is the statement "The addition of regex was just another fancy way to specify the string to search for, i.e. look for the string provided by regex." incorrect? It is perfectly correct. Therefore, I still don't see any "unambiguousity" in the documentation.

The ABAP docu for the REPLACE statement as well as for the replace function say, that there is a search for a match with the substring specified in substring or with the regular expression specified in regex and that the occurrence is replaced.

Therefore, if you want to speak about "position awareness", well both variants are position aware, because in both cases the found occurrences are replaced at their position. Most primitive example:

DATA(text) = `abcdef`.
REPLACE SUBSTRING `de` IN text WITH `xx`.

same as

DATA(text) = `abcdef`.
REPLACE REGEX `de` IN text WITH `xx`.

No difference in substrings and regexes. Both find "de" at offset 2. As long no special characters are used they do the same. If special characters are used, regexes find other things in other positions of course. But that is no surprise, or?

Please point out, why you think that regexes are differently "position aware" then substrings. I will happily correct the documentation, but up to now, I don't see the point.

pokrakam · ‎08-21-2017

Hi Horst,

I'm talking about the resulting string versus $0. By 'string provided by regex' I meant the result string of a regex search. I had understood the addition of REGEX to ABAP's REPLACE back in 4.x to act as a short notation for:

REPLACE SUBSTRING match( regex = ... ) IN text WITH `xx` .

By position-aware, I mean that a regex result string (`de`) and a regex result ($0) are two different concepts, one having a position in the source (internally anyway) and the other not.

So again in a shorter example:

data(text) = `abcde-de`. 
REPLACE SUBSTRING match( str = text regex = `de$` ) IN text WITH `xx`. 
REPLACE REGEX `de$` IN text WITH `xx`.

These are very different, I get that. I had assumed the first variant applied when regex initially came to ABAP. Personally I found that a casual read of the doc hints more at equivalence to SUBSTRING, as both search methods are described in the same ORed phrase:

a match with the substring specified in substring or with the regular expression specified in regex. Maybe I'm the only one who misunderstood it... maybe not.
As an aside, UK Postal Codes are a nice example where a single expression can encode a whole lot of seemingly arbitrary rules in one expression.

horst_keller · ‎08-21-2017

"Maybe I'm the only one who misunderstood it... maybe not."

I hope that you are the only one 😉

Your last example: they are very different, yes of course. Because REPLACE REGEX simply isn't a short form of

REPLACE SUBSTRING match(...) ...

That would make no sense, or? In fact it would involve two searches, one for the pattern and a second for the match. But finally I see that you assumed exactly that and therefore all the confusion.

In ABAP, any pattern based REPLACE is rather a short form of

FIND SUBSTRING|REGEX ... 
  MATCH OFFSET off
  MATCH LENGTH len.

REPLACE SECTION OFFSET off LENGTH len ...

So, there is first a search for a substring or a pattern and then a replacement of the section found. And this also explains, why they are handled similarly in the documentation. For FIND you would never expect that after finding a match, another search for the match itself will take place.

Feel free to suggest how that can be made more clear.

pokrakam · ‎08-21-2017

"That would make no sense."

A lot of things in ABAP make no sense 🙂

I can't remember the reasons why, but nevertheless that's how I understood ABAP regex to work since I first used it.

Sometimes things don't make sense and the easiest is to shrug shoulders, accept the way it is and move on. bool vs xsdbool comes to mind...

kiran_k8 · ‎08-21-2017

Neil,

For additional Info and ready referrence.

https://help.sap.com/doc/abapdocu_750_index_htm/7.50/en-US/abenregex_syntax_specials.htm

K.Kiran.

Former Member · ‎08-21-2017

REGEX is used to validate the specific format telephone format, zipcode format. but not replacing a value.

for your requirement use SPLIT into table at ".", so the data after last dot comes into last record of the table. increment it by 1 again concatenate the same set of records from table separated by dot.

kiran_k8 · ‎08-22-2017

Hi,

There is a long discussion going on within this thread.Did you bother to read before answering ?

K.Kiran.

horst_keller · ‎08-22-2017

This answer is wrong.

What do you think is the purpose of the special replacement operators?

smith_john · ‎08-29-2017

wow this generated a lot of discussion which is great!

I debugged what regex toy was doing and opted for the REPLACE solution instead of the SPLIT.

In the end my code is pretty much identical to what Sandra suggested.

thanks everyone for the input. very much appreciated.