cancel
Showing results for 
Search instead for 
Did you mean: 

Remove HTML characters

Former Member
0 Kudos

Hi Experts,

I want to remove HTML characters which are coming in one of the fields in XML file. For example:

Input field:

<Text><p align="center">Input Contains<font color="Green" face="arial" size="6">Test-</font>HTMl characters- <u>which needs to be removed- </u> Thanks </p></Text>

Output:

<Text>I want to remove HTML characters which are coming in one of the fields in XML file<|Text>

So is there any standard function which could help me in removing HTML characters or do I need to write a UDF for this? Please help.

Regards

Vasant

Accepted Solutions (1)

Accepted Solutions (1)

Former Member
0 Kudos

Hello,

I think there is no regex expression to remove these HTML characters (at-least i can't able to find it), so use old fashioned way of removing these characters

Input value is : Input

In case u have multiple values possible for a given field then add one more for loop at the top.

for (int i=0; i < input.length(); ++i)

{

if (!intag && input.charAt(i) == '<')

{

intag = true;

continue;

}

if (intag && input.charAt(i) == '>')

{

intag = false;

continue;

}

if (!intag)

{

output = output + input.charAt(i);

}

}  

result.addValue(output);

Thanks

Amit Srivastava

Former Member
0 Kudos

Hi Amit,

Udf code is throwing cannot find symbol error.

Could you please check your code and provide me correct one.

Regards,

Vasant

Former Member
0 Kudos

Hello,

My mistake...i haven't tested above code.

Please check this (now i have tested it and it should work fine😞

Input will be var1

Execution type: All values of a context

under import statements -> add one more entry and paste-> java.text.*

String output="";

boolean intag =false;

String input = var1[0];

for (int i=0; i < input.length(); ++i)

{

if (!intag && input.charAt(i) == '<')

{

intag = true;

continue;

}

if (intag && input.charAt(i) == '>')

{

intag = false;

continue;

}

if (!intag)

{

output = output + input.charAt(i);

}

}  

result.addValue(output);

Thanks

Amit Srivastava

Answers (5)

Answers (5)

Former Member
0 Kudos

Hi Vasanth,

Use standard replace function with below strings and replace with blank character.

<p align="center">

<font color="Green" face="arial" size="6">

</font>

<u>

</u>

</p>

This is a work around for UDF.

Regards,

Pranav

Former Member
0 Kudos

Hi All,

Thanks for the reply

Can you please tell me how to create a udf

Regards,

Vasant

praveen_sutra
Active Contributor
0 Kudos

hi Vasant,

I agree with the experts. There doesnt seems to be any standard way to remove it but probably you can deal with UDF.

Please let us know if you need help regarding UDF.

thanks and regards,

Praveen T

ambrish_mishra
Active Contributor
0 Kudos

hi Vasant,

AFAIK, this is to be done using a UDF.

Ambrish

Former Member
0 Kudos

Hello,

IMO, u should have asked sender system to remove these characters?

Having said that, in PI there is no standard function to remove HTML characters so UDF is the only option.

May be u can search google for regex expressions (i am not sure if there are any) ?

Thanks

Amit Srivastava