Skip to Content

Regex - to identify non printable characters

In the online REGEX tester, I am able to see that [^[:print:]] regex is able to correctly identify TAB as a non-printable character.

But, when I use the same REGEX in ABAP, it doesn't find TAB as a non-printable character. I wrote this simple program to try an loop through all unicode characters and see how many of them ABAP identifies as "printable".

constants:hex type char16 value '0123456789ABCDEF'.

data:     p   type i,

           q   type i,

           r   type i,

           s   type i,

           str type char4,

           val type string,

           rpl type string.

do 16 times.

   p = sy-index - 1.

   do 16 times.

     q = sy-index - 1.

     do 16 times.

       r = sy-index - 1.

       do 16 times.

         s = sy-index - 1.

         str = |{ hex+p(1) }{ hex+q(1) }{ hex+r(1) }{ hex+s(1) }|.

         rpl = val = cl_abap_conv_in_ce=>uccp( str ).

         replace all occurrences of regex '[^[:print:]]' in rpl with ` `.

         if rpl = val.

           write:/ str, val.

         endif.

       enddo.

     enddo.

   enddo.

enddo.

I got many characters, which were skipped by ABAP, saying they are all printable.

TAB:

Many many other non-printable characters:

I understand that some of these characters, may be appearing on my system as [] because of missing font on my system - am I right?


But, how about TAB character? Is this a bug? ABAP correctly identifies new line characters as non-printable.


I tried using regex [:cntrl:] and the condition was worse, as shown below. It couldn't catch TAB as well as NEWLINE.



Inviting Former Member, @Michael KozlowskiFormer Member

Former Member

Former Member

Former Member

Former Member

Add comment
10|10000 characters needed characters exceeded

  • Get RSS Feed

1 Answer

  • Best Answer
    Apr 14, 2016 at 02:03 PM

    This works for me:

    DATA: l_text type string.

    l_text = 'A' && cl_abap_char_utilities=>horizontal_tab && 'B'.

    REPLACE ALL OCCURRENCES OF REGEX '[^[:print:]]' IN l_text WITH space.

    Hexadecimal value of l_text before replace is:

    410942

    after:

    4142

    Add comment
    10|10000 characters needed characters exceeded

    • Naimesh Patel Juwin Pallipat Thomas

      Actually PRINT is a union of GRAPH and BLANK. If you run them both individually, you would notice that both are opposite to each other. Hence [^[:print:]] doesn't replace anything and displays the string as is.

      • [:print:]
        Set of all displayable characters (union of [:graph:] and [:blank:])

      DATA: l_text TYPE string.
      l_text = 'A' && cl_abap_char_utilities=>horizontal_tab && 'B'.
      WRITE: l_text.
      REPLACE ALL OCCURRENCES OF REGEX '[^[:blank:]]' IN l_text WITH ` `.
      WRITE: / l_text.


      output

      A#B

      #

      l_text = 'A' && cl_abap_char_utilities=>horizontal_tab && 'B'.
      WRITE: / l_text.
      REPLACE ALL OCCURRENCES OF REGEX '[^[:graph:]]' IN l_text WITH ` `.
      WRITE: / l_text.


      output

      A#B

      A B

      Also helps says, that this functions greatly depend on language and local platform. That could be the reason its not working for both of us, but works for @Tomas Buryanek

      Within sets for single characters defined using [ ], predefined character classes can be specified for certain sets for single characters whose behavior can, however, depend on the language and platform.

      Regards,
      Naimesh Patel