Data reviewer regular expression not working

5905
20
10-12-2015 06:42 AM
NeilWrobel
New Contributor II

Hi

I would like some help with the following:

I have a feature class with the field REF_ID. I would like to constrain what is entered by setting a regular expression for data reviewer to pick up any mismatches.

The REF_ID entries are a semi-colon delimited list of ids.

The REF_IDs can be anything from a single digit starting at '1' and can be up to a maximum of five digits.

The delimited list has no limit other than the character limit of the field.

Here are some examples:

12345                         OK

1234                           OK

123                             OK

12                               OK

1                                 OK

123456                       BAD

12345; 54321; 1234; 123                           OK

123456; 654321                                         BAD

12345;54321;                                             BAD

I have done the expression a few different ways and changed it because it seems that Data Reviewer doesn't like something.

I have checked all the expressions in Debuggex and all worked correctly, but just don't work in Data Reviewer.

I was getting all records being returned.

Here is the expression I ended up with which appears to work on records with a single REF_ID entry but not on a delimited list.

i.e. works on '12345' but not on '12345; 54321'

^([1-9](\d)?(\d)?(\d)?(\d)?)(;\s?([1-9](\d)?(\d)?(\d)?(\d)?))*$

Some help with this would be appreciated.

Thanks.
Neil

PythonData Quality ManagementData Reviewer

Tags (2)
0 Kudos
20 Replies
ChrisSmith7
Frequent Contributor

Neil,

I believe this may work for you:

^((\d{1,5};*)(\d{1,5};\s*\d{1,5})*)$

I tried it for the following scenarios:

12345                         OK

1234                           OK

123                             OK

12                               OK

1                                 OK

123456                       BAD

12345; 54321; 1234; 123                           OK

123456; 654321                                         BAD

12345;54321;                                             BAD

0 Kudos
ChrisSmith7
Frequent Contributor

It's no good when it's just something like:

12345;

Trying to rework a bit...

0 Kudos
NeilWrobel
New Contributor II

But that's good because 12345; would be incorrect as i don't want a trailing semi-colon.

0 Kudos
ChrisSmith7
Frequent Contributor

A bit inelegant, but this should cover all bases:

(^\d{1,9}$)|(^((\d{1,9};\s*\d{1,9})(\d{1,9};\s*\d{1,9})*)$)

Here's how I tested:

Matched:

1

12

123

1234

12345

123;12345

123; 12345

123;12345;12345

123; 12345; 12345

123; 12345;12345

Not matched:

123456

12345;

123;123456

123; 12345; 12345;

0 Kudos
NeilWrobel
New Contributor II

Thanks Chris, but the solution you provided is the one we tried in the first place but it doesn't work.

Data Reviewer doesn't like something.

I have had three very experienced developers look at this and they don't know why it won't work.

0 Kudos
ChrisSmith7
Frequent Contributor

You tried it with the updated suggestion?

(^\d{1,9}$)|(^((\d{1,9};\s*\d{1,9})(\d{1,9};\s*\d{1,9})*)$)

It's a pretty standard regex - if it doesn't work, there may be something specific to Data Reviewers' implementation of regex that makes it special.

0 Kudos
NeilWrobel
New Contributor II

Yes. Tried the updated one and doesn't work.

Not great!!

There is nothing particularly complicated with the data.

0 Kudos
NeilWrobel
New Contributor II

Did you test it in Data Reviewer or a regex tester??

0 Kudos
ChrisSmith7
Frequent Contributor

Yeap - I used Regex Hero:

Online .NET Regular Expression Tester and Reference

It's geared towards the .net implementation of regex, but I didn't use anything too complicated - I've used similar expressions in Java in the past.

0 Kudos