Example: Unique Names using Regular Expressions

Never underestimate the power of regular expressions.  We have a table with a text field that contains a list of names, separated by commas.  Some of the names are repeated.   We would like to transform the list of names into a similar list, but without any repetitions of names.    This topic shows how with an insightful use of a regular expression that was contributed by Manifold expert Chris Sloots to the georeference.org forum as a solution to a real-life task.  

 

It shows the power and speed of a concise regular expression, and the flexibility with which regular expressions can be used in SQL queries as well as in the Transform pane.

 

We have a table, called Table,  with a text field called names that contains a list of names, separated by commas.  Some of the names are repeated.   

 

 

We would like to transform the list of names into a similar list, but without any repetitions of names.    A list like

 

Marty, Joe, Michael, Marty, Jean, Joe, Jean, Jane

 

without duplicates could be

 

Michael, Marty, Joe, Jean, Jane

 

That is easy to do with the regular expression below, which uses capture groups to match any repeated names:

 

\b(\w+),\s(?=.*\b\\1,?\s?)

 

We use that regular expression within an SQL function, the StringRegexpReplace function.  The syntax of the function is:

 

StringRegexpReplace(<string>, <regexp>, <target>, <flags>) : <string>

 

Given a string, a  regular expression pattern, a target replacement string and a flag to use case or to not use case, returns a string where all substrings of characters that match the regular expression have been replaced by the target replacement string.  If no matches are found returns the original string.  The <flag> is not optional and must be either 'i' to ignore case or 'c' to use case.  

 

To use the regular expression within the function we write the following SQL expression:

 

StringRegexpReplace([names], '\\b(\\w+),\\s(?=.*\\b\\1,?\\s?)', '', 'i')

 

Any substring that matches (case insensitive) the regular expression, that is, any repeated name, will be replaced by an empty string, that is, deleted.   Since the backslash \ character is a special character in SQL, we escape it by using two backslash \\ characters, so the regular expression we want to use will be passed correctly as an argument to the SQL function.

 

We can use the above expression either within the Transform pane or within an SQL query written in a Command Window.   

Transform Pane

With the focus on the open Table window,  in the Transform pane we choose the Name field, and then we choose the Expression template.

 

 

In the Expression template we press the Edit Expression button and we enter

 

StringRegexpReplace([names], '\\b(\\w+),\\s(?=.*\\b\\1,?\\s?)', '', 'i')

 

For the Result destination we chose New Field and enter unique names as the name for the new field.   

 

Press Preview to see a preview of what the expression will do.

 

 

The template will create a new field and will populate it with a list from the names field, but without any duplicates.   

 

Press Transform to create a new unique names field and fill it with the results

Command Window / Query

We can also achieve our task using a simple query in the Command Window.  

 

 

We use the Edit - Schema dialog to add a text field called unique names to the table.  We could do that in SQL as well, but to keep the example simple we use the Schema dialog.

 

 

We launch a Command Window for SQL and enter the following query:

 

UPDATE [Table] SET [unique names] =  

   StringRegexpReplace([names], '\\b(\\w+),\\s(?=.*\\b\\1,?\\s?)', '', 'i');

 

Press ! in the main toolbar to run the query.

 

 

Immediately, the unique names field is filled with lists that repeat the list in the names field but with only unique names.

 

See Also

Transform Pane

 

Queries

 

Command Window

 

SQL Statements

 

Regular Expressions

 

Edit - Schema

 

String SQL Functions

 

Example: Edit Tables with the Transform Pane - This topic shows how fast and easy it is to edit tables using the Transform pane.  Repetitive edits applying similar changes to different fields can be done rapidly with highly efficient workflow.  Best of all, on the fly previews make it easy to use sophisticated editing techniques like regular expressions without fear of getting something wrong.