Tuesday, March 02, 2010

multiline matching

Similar to my previous post on adjacent nontrivial searches, I've recently had to do something similar except that I need to find things that are the same on adjacent lines.

For instance given the following:

1003065,PARKBT,99.00,0.00,0.00,,2009,1089.00
1003133,PARKBT,210.00,0.00,0.00,,2009,1470.00
1003234,PARKBT,180.00,0.00,0.00,,2009,1980.00
1003316,PARKAT,45.00,0.00,0.00,,2009,45.00
1003316,PARKBT,230.00,0.00,0.00,,2009,2705.00
1003360,PARKBT,210.00,0.00,0.00,,2009,1260.00
1003377,PARKBT,110.00,0.00,0.00,,2009,1210.00
1003404,PARKBT,180.00,0.00,0.00,,2009,1980.00


In this case I'm interested in finding matches where the number at the beginning is repeated in a couple more than once (the lines starting with 1003316). This is the resulting search: /^\v(\d+),.*$\_.^\1,

The trick here is to use the \_. to match the end of line, and then match zero width beginning of line.