Friday, July 15, 2011

Using regular expressions on iOS

Recently I encountered a problem when trying to extract info from a webpage. My first option was to tidy the HTML and use DOM transversal to obtain what i needed. The problem was the overhead of doing that: first I needed to clean up the HTML code, then make the DOM tree and last but not least get the info. Of course I wouldn't do that myself, I was planning on using Element Parser but the amount of resources that was going to consume was a concern to me.

After analyzing other available options I thought about using Regular Expressions, I had worked with them before when I developed for Goby and this looked like a good opportunity to use them again.

Before iOS 4.0 there was no way of using regular expressions other than external libraries. Since 4.0 Apple introduced the NSRegularExpression class and with it a new way to use regular expressions in your iOS Apps.

For those of you not sure on what regular expressions are or if you just want to know a little bit more about them you can check http://www.regular-expressions.info/. That site contains almost everything you need to know about regexes.

Creating a regular expression

Let's suppose our test text is the following:
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Now, if we wanted to find all words that start with an "L" (case insensitive) we would use something like

(?si)((?<!\w)L\w+)


How do we do this on our app? In order to start working with regular expressions you will need to create an NSRegularExpression instance.

NSRegularExpression *regex = [NSRegularExpression 
               regularExpressionWithPattern:@"((?<!\\w)L\\w+)" 
               options:NSRegularExpressionCaseInsensitive|NSRegularExpressionDotMatchesLineSeparators 
               error:nil];

You will note a couple of things here:


  • \w was changed to \\w, that's because we have to escape \ keywords when compiling a pattern in iOS
  • No more (?si) at the beginning of the regex. Those options are added as the option parameter when creating the NSRegularExpression object

Once you have your NSRegularExpression instance you have 2 options


  • You can get the first match, which in our case would be "Lorem"
  • You can get an array of matches, which would give you every match in the text
Let's get the full array of matches


NSString *testString = @"Lorem ..."; // The full test string

NSArray *matches = [regex matchesInString:testString 
               options:0 range:NSMakeRange(0, [testString length])];
Now we have an array of NSTextCheckingResult objects and we can use a for loop to iterate it. Bear in mind that matching group indexes start at 1, since 0 is the whole text that match the regex.


for (NSTextCheckingResult *textMatch in matches) {
  NSRange *textMatchRange = [textMatch rangeAtIndex:1];
  NSString *theLWord = [testString substringWithRange:textMatchRange];

  // Do you processing here

}