Solved: correcting chars

marcin_cholewczuk · ‎02-19-2007

Hi all

I'm getting a field of type char(12) which can have not allowed characters. I would like to correct them by changing them in to _ Allowed characters are !"%&'()*+,-./:;<=>?_0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ. Performance is cruicial, because this is done for each data set out of up to Millions each day.

I've thought about converting this field into internal table and use modify with a lot of where conditions, but I'm not sure is it good method because each time I would create little tables (12 rows) which may take a lot of time.

Anybody know to solve this problem ?

Former Member · ‎02-19-2007

Look at the CN option of the IF statement, and the Translate statement.

MattG.

Former Member · ‎02-19-2007

Look at the CN option of the IF statement, and the Translate statement.

MattG.

marcin_cholewczuk · ‎02-19-2007

but if I this in such a way than I have to use string in the middle


data chr(12) type c.
data str type string.
...

str = chr.
if str CN '!"%&''()*+,-./:;<=>?_0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ'.
     ;corrections
endif.

because I can't use if chr(12) CN since I'm not sure if chr will have all 12 characters (it can have 8 characters and other 4 are empty). Will that have any impact on perormance (I mean using string like this)?

Former Member · ‎02-21-2007

hi,

Try

data chr(12) type c.
data str type string.
...
 
str = chr.
if str CA '!"%&''()*+,-./:;<=>?_0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ'.
     ;corrections
endif.

If ur string Contains Any of tehse specified values it will work for corrections.

I have explained u in earlier post.

Please close all the existing posts opened by you by rewarding points.

Former Member · ‎02-21-2007

Well, yes, that will work, but then there is still the issue of finding the quickest way to replace invalid characters by the underscore.

For example, a string like "ABC$EF" would need to be changed to "ABC_EF".

You could do that by a do loop, for as long as there is a next character, and checking if the character is valid. If not, replace it by '_'.

When there are many values to be checked it will be very bad for performance.

CN (Contains Not only) will correctly determine that there is an invalid character in the input string, but what then?

I would build a string with the invalid characters, separated by '_'.

Then you can use that string to simply and quickly replace the invalid characters with the underscore character by using the following statement:

TRANSLATE input_string USING invalid_character_string.

See the example given above...

It can be copied and executed locally in test environment.

Former Member · ‎02-21-2007

Remember the CN will set the SY-FDPOS value. So you get the offset of both the bad character and the remaining un-checked string.

Part of the performance issue is: the best solution for a small width field may be different to that required for a long field. Also the number of expected error values may change the solution.

The fastest solution depends on your data and system setup, both of which will change. Have you thought about correcting the data generator so you do not have these invalid characters.

The best solution may be the easiest to maintain.

marcin_cholewczuk · ‎02-22-2007

Well I must admit that I did'nt write you all informations, so sorry about that. This is not a standard report. It's a procedure which is executed during transporting data beetwen cubes in BW. Gifford I can't correct data earlier because it is transport of data from sweedish system (in which those characters are correct) to german system. Edwin I've lately get information that this bad characters occurs rarly so I think that performence in field of changing bad chars to underscore is not so important. How ever it would be good to have it working fine. Also, since this is procedure I think there is no sense in getting all bad characters for every data I'm transporting. I guess I'll just do a loop.

By the way I found out that is better to do checking this way

data str type string.

data chars(12) type c.

data len type i.

str = chars

if str CN...

than this way

len = strlen( chars ).

if chars(len) CN...

It looks like that creation of string takes less time that counting characters by strlen function.

Former Member · ‎02-22-2007

The point I was trying to get across was, the solution depend on the 'data'. In your case, there are only a few invalid characters and they occur rarely, and your string is only 12 long.

If your situation was; a few invalid characters and they occur frequently. Then I would just use TRANSLATE, and allow SAP to work out the best algorithm.

If your situation was; invalid characters occur in-frequently, but your string was long (at least 20). Then I would look at a DO loop with a IF CN to find the next invalid character, manage the search area.

MattG.

Former Member · ‎02-20-2007

First build a string of all allowed characters. You will need to use hexadecimal numbers to add the unusable characters like ' and ".

I've written a short program in which I only added the ' this way.

Then build a string containing ALL characters that are NOT allowed. And have each of these characters followed by an underscore. This string will be used to translate the incoming value.

Basically, using this method, the incoming strings can be quickly translated using the TRANSLATE statement. That's just a very small piece of code at the end.

Preparation (of the check strings) will only need to be done once, in the beginning.

I think this is the quickest way... Took some puzzeling though.

P.S. Hex is actually Decimal. Strange but true. So if you look at a Hexadecimal character sheet, be sure to look at the correct values!

<b>EXAMPLE:</b>

REPORT zbc_temp.

* Data declarations:
DATA: string1(12) VALUE '1 3$567$@0AB'. "Incorrect value!

* Hexadecimal code to be able to add it to the string:
DATA: z_hex TYPE x.
DATA: z_hyphen TYPE x VALUE '27'.

DATA: allowed_chars TYPE string.
DATA: not_allowed_chars TYPE string.
DATA counter TYPE i VALUE 32.           "Hex value for SPACE

* Build string with allowed characters:
CONCATENATE z_hyphen '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ' INTO
        allowed_chars.

* Build string with characters that are not allowed (followed by '_').
* Characters that should be evaluated range from 33 (= '!') up to and
* including 126 (= '~').
DO.
* Last character from hex code page that is checked is hex 126
  IF counter > 126.
    EXIT.            "No more checks need to be done... Exit DO loop.
  ENDIF.
  MOVE counter TO z_hex.
* Check if the character is allowed
  IF allowed_chars NS z_hex.
    IF counter = 32. "SPACE
*     Special treatment. SPACE gets removed during the other
*     concatenate.
      CONCATENATE not_allowed_chars ' _' INTO not_allowed_chars.
    ELSE.
*   This character is NOT allowed, add to 'not_allowed_chars'
      CONCATENATE not_allowed_chars z_hex '_' INTO not_allowed_chars.
    ENDIF.
  ENDIF.
  counter = counter + 1.
ENDDO.

* Now all allowed and non-allowed characters are stored in two strings.
WRITE: / 'Allowed characters: ', allowed_chars.
WRITE: / 'NOT Allowed characters: ', not_allowed_chars.

* Here the actual values of the incoming parameters are checked and
* corrected:
IF string1 CO allowed_chars.
  WRITE: / string1, ' is okay'.
ELSE.
  WRITE: / string1, ' is not okay'.
  TRANSLATE string1 USING not_allowed_chars.
  WRITE: / string1, ' but corrected'.
ENDIF.