Skip to Content

Best way to parse US addresses when street/city/state/zip are all in one field?

Jan 25 at 10:26 PM


avatar image


I have some really ugly data with records that look like this:

1;BIG SLICE OF NEW YORK LLC, THE;146 5TH AVE;NEW YORK, NY 10011;liquor match;143649
2;373 PARK LLC;373 PARK AVENUE SOUTH;NEW YORK, NY 10016;liquor match;117318
3;Pac Food;6005 Eliot Avenue Flushing;bud non match;13436
4;JAY'S LIQUOR INC;1621 ST NICHOLAS AVE;NEW YORK, NY 10040;liquor match;133418
5;CMR CORP;249 251 COLUMBUS AVENUE;NEW YORK, NY 10023;liquor match;114919
6;121 CITY ISLAND REALTY CORP;101 CITY ISLAND AVE BRONX , NY 10464;liquor match;17789
7;SONIKA FOOD MART INC;902 SOUNDVIEW AVENUE BRONX, NY 10473;liquor match;14320

In some cases, there is a field separator (semicolon) between the primary address and the city/state/zip. In other cases, there are no separators.

When I try to put this through the USA Regulatory transform (CASS mode), it is unable to parse out the city, state and zip when they aren't split out separately.

I am using "hybrid" mode, where I pass secondary address as Multiline1, primary address as Multiline2, city as Locality, state as Region and zip as PostalCode1.

When I can split them out, everything is fine. When I can't split them out, then the Regulatory transform rejects them because the city, state and zip are empty.

What I want it to do is to detect city, state and zip are provided as part of Multiline1 or Multiline2 and break them out itself.

Is there ANY way to make that happen? An option I am missing or another transform before USA Regulatory that I should add?

Thanks in advance...

Eric Raskin
eraskin <at> paslists <dot> com

10 |10000 characters needed characters left characters exceeded
* Please Login or Register to Answer, Follow or Comment.

0 Answers