Thursday, March 21, 2013

Pattern Matching in Excel/VBA/Regex

So posting here for posterity and in case I need to come back to this at a later date:

Originally posted here: http://stackoverflow.com/questions/15554132/pattern-match-count-in-excel-regex-vba

I was posed with a problem yesterday and could find no solution anywhere, I was about to post a help request here when I figured out a way to meet the needs of my customer. It isn't pretty, it isn't fast, but it works, customer is happy, and I've suggested we visit how they generate their data rather than indefinitely use this fix. Since I could find no solution anywhere, I will post it here in hopes it will help someone else out.
Office 2007 XLSX file containing over 5000 records like the below (a single cell with multiple rows of text). The issue, on adjacent cell, put count of incidents from the cell. Looking at the cell data for A1 you can see 3 incidents.

Cell A1

1/15/2013 1:30:11 AM Userx Had to reboot system  
1/15/2013 1:32:11 AM Userx System running finished rebooting and appears to be working  
11/15/2013 12:30:11 AM Userx System hung again

The problem is that I'm no regular expression guru and well the date value isn't consistent. The day month and hour can be single or double digit, but their always noted on a new line.
My code solution was to take the cell, split it on line break, trim everything 5 characters after the last ':' and evaluate the the result against my regular expression. After that, some basic tally and text insert to adjacent cell. I am sure this can be improved and that there maybe plugins/addons for Excel that would have made this a lot easier, but solution needed to use the built in formulas or vba of Excel
The below is an example of how the function would be called.
'calling function from another source:
...
thecount = CountOfDateValues(Range("a1").Value) 'get count
Range("b1").Value = thecount 'put count to adjacent cell
...
Below is the VBA code of a function that takes in a string value and returns count of matches to the regular expression. I hope it is of use for someone.
Function CountOfDateValues(thetext)

Dim data() As String 
Dim yourInput As String
yourInput = thetext 
Dim TheSplitter As String
TheSplitter = Chr(10) 'the character that represents a line break

data = Split(yourInput, TheSplitter ) ' creates an array of strings for each line in the cell
Dim re
Set re = CreateObject("VBscript.regexp")
'regular expression that matches ##/##/#### ##:##:## ##
re.Pattern = "(?=\d)^(?:(?!(?:10\D(?:0?[5-9]|1[0-4])\D(?:1582))|(?:0?9\D(?:0?[3-9]|1[0-3])\D(?:1752)))((?:0?[13578]|1[02])|(?:0?[469]|11)(?!\/31)(?!-31)(?!\.31)|(?:0?2(?=.?(?:(?:29.(?!000[04]|(?:(?:1[^0-6]|[2468][^048]|[3579][^26])00))(?:(?:(?:\d\d)(?:[02468][048]|[13579][26])(?!\x20BC))|(?:00(?:42|3[0369]|2[147]|1[258]|09)\x20BC))))))|(?:0?2(?=.(?:(?:\d\D)|(?:[01]\d)|(?:2[0-8])))))([-.\/])(0?[1-9]|[12]\d|3[01])\2(?!0000)((?=(?:00(?:4[0-5]|[0-3]?\d)\x20BC)|(?:\d{4}(?!\x20BC)))\d{4}(?:\x20BC)?)(?:$|(?=\x20\d)\x20))?((?:(?:0?[1-9]|1[012])(?::[0-5]\d){0,2}(?:\x20[aApP][mM]))|(?:[01]\d|2[0-3])(?::[0-5]\d){1,2})?$"
re.Global = True

Dim t As String
Dim theCount As Integer
theCount = 0
For i = LBound(data) To UBound(data) 'from first item in array to last item in array

        For Each Match In re.Execute(Left(data(i), InStrRev(data(i), ":") + 5))
            'from start of string to 5 characters past the last ':' of string
            theCount = theCount + 1
        Next
    Next

CountOfDateValues = theCount 

End Function
Referencing urls:
MS Access 2003 VBA String Split on Line Break
http://sourceforge.net/projects/regexbuilder/files/regexbuilder/1.4.0/
This tool made testing my regular expression against various date formats remarkably easy.
http://regexlib.com/DisplayPatterns.aspx?cattabindex=4&categoryid=5&p=2
I was able to save a lot of time crafting a regular expression by using a precrafted one from here. Sadly did not learn much by do1ing so, but I believe I saved a lot of time on this 'we need it done now' request.
*Note: There is a window for a false positives if someone starts their worklog note with a timetamp, I noted this to the customer and they were fine with it.

No comments:

Post a Comment