Skip to Content
0

Check a value in a column for capitals and lower case

Jul 17, 2017 at 10:39 PM

53

avatar image
Former Member

Hello,

I'm working with Data Services to check the Data Quality.

We want to check if every word in a column starts with a capital, except for the prefixes of names (in Dutch we have a lot of prefixes. For instance: van, van den, der, etc.)

I've thought of one possibility: Create a new column with the function init_cap() and compare the value with the original value. However, this function will capatilize every first letter, including the prefixes.

Can anyone think of a way to check if every word starts with a capital, except for some specific words (prefixes). Hard-Coding is too much work and have to be adapted manually, which is not an option.

Kind regards,

Jeroen

10 |10000 characters needed characters left characters exceeded
* Please Login or Register to Answer, Follow or Comment.

2 Answers

Dirk Venken
Jul 20, 2017 at 07:20 AM
0

Move to Belgium. We are capitalising lastname prefixes, too ;).

If that's not possible, hardcoding is an option, indeed. But why don't you do that in combination with an "exception table" containing all prefixes? Build a custom validation function that "initcaps" the input paramter unless its fiirst word is in the table, then compares.

If you later come accross additional prefixes, you don't have to adapt the code, just add a row to the table.

Share
10 |10000 characters needed characters left characters exceeded
Ravi kiran Pagidi Jul 18, 2017 at 02:28 PM
0

Hi Jeroen,

how many distinct prefixes of names will be there?

if we can remove those prefixes then we can use MATCH_PATTERN function like below

MATCH_PATTERN(substr(Name,1,1), 'X')

write this function in a new column (integer data type) , if this returns 1 then it is matched, 0 is not matched.

How this function works?

sysntax :- match_pattern(<input_string>,<pattern_string>)

X represents upper case characters and x represents lower case.

suppose if your input is 'Henrick' then output is 'Xxxxxxx' and if your input is 'DAVID' then output is 'XXXXX'

So based on your Distinct prefix of names we can try filter prefixes and have only names in the column and implement your requirement.

Thanks,

Ravi kiran

Share
10 |10000 characters needed characters left characters exceeded