Data Reviewer Regular Expression Check

1217
7
10-11-2016 01:27 PM
by Anonymous User
Not applicable

Hello,

I am very new to the Data Reviewer extension, so I'm a little confused on how to format the expression parameters.  I would like to utilize the extension to check a whole number of fields, but I'll get started with one in particular that is somewhat troublesome in terms of how to structure the regular expression.

We have a field that denotes the ID of the project that the feature is associated with.  The ID is structured as such:

XX-YY-ZZZZZZZZ-ABC

"XX" and "YY" are always 2 characters each. 

"ABC" is always 3 characters.

"ZZZZZZZZ" can range from between 4-8 characters long. 

Is there a way to format the regular expression to return anything that doesn't adhere to the  structure?

0 Kudos
7 Replies
DanPatterson_Retired
MVP Emeritus
  • check for dashes.... a.count("-") == 3 ?? no?  bail
  • split the string and count the sub lists
  • parse the 'r' to get the length of r, the length of 2 first two and last sub array
  • is it true?  bail otherwise
  • >>> a= "XX-YY-ZZZZZZZZ-ABC"
    >>> r = [len(i) for i in a.split("-")]
    >>> tf = [len(r), *r[:2], r[-1]] == [4, 2, 2, 3]
    >>> tf
    True
    

implement if you can in your environment using the above as a demo

by Anonymous User
Not applicable

Hi Dan,


Thanks for the suggested code.   Just to clarify, this is something I would actually implement within a script and not within the data reviewer tool, right?

0 Kudos
DanPatterson_Retired
MVP Emeritus

I was using scripting logic since your conditions are too convoluted for a regular expression unless you can guarantee that there will always be 3 dashes (hence 4 bits) yielding 4 parts and parts 1,2 and 4 will be of size , 2, 2 and 3.  This yields the final logic test of tf (true/false) where the length of the split string has 4 parts, the first 2 bits of the split string are of size 2 and the last one is 3 .... [4, 2, 2, 3] .  If any of that fails, then one of the conditions isn't met

0 Kudos
by Anonymous User
Not applicable

Yeah, there will always be 4 bits, with the first two being 2 characters each, and the last bit being 3 characters.  The 3rd bit is what can differ, being anywhere from 4-8 characters. 

12-12-12345-123

12-12-123456-123

12-12-12345-123

12-12-12345678-123

That's basically what the listing of ID's looks like (broken into character counts). 

0 Kudos
DanPatterson_Retired
MVP Emeritus

Then I don't follow... if the string can be parsed into 4 pieces by splitting it on a - and the first 2 will always be 2 characters in length and the last 3, then there is nothing to check and you can decide what to do.... for example

>>> a = "12-12-alsdjf-2016"
>>> bit = a.split("-")[2]
>>> bit
'alsdjf'

or 

z = "-".join([a.split("-")[i] for i in [0, 1, -1]])
>>> z
'12-12-2016'
0 Kudos
by Anonymous User
Not applicable

I should clarify that the ID format of 12-12-12345678-123 is what the data should be, but the point of the check is to ensure that the data is indeed in that format.  I wanted to use the Data Reviewer extension to constantly monitor all newly added features to ensure that the inputs were adhering to the standardized input that we have in place.  Anything that didn't adhere to this would be flagged and sent to the data reviewer log table. 

There are instances where users have entered completely wrong input formats, and there are other times where users were careless during a load process and inadvertently mapped the wrong fields to where they should have gone.  I want those instances to be called out to our attention, and the data reviewer extension seemed to be the solution.

0 Kudos
DanPatterson_Retired
MVP Emeritus

then my original suggestions apply since the 3rd term can be anything as long as it is something.  4 bits, of size [2,2,anything,3]

0 Kudos