Never underestimate the power of regular expressions. We have a table with a text field that contains a list of names, separated by commas. Some of the names are repeated. We would like to transform the list of names into a similar list, but without any repetitions of names. This topic shows how with an insightful use of a regular expression that was contributed by Manifold expert Chris Sloots to the georeference.org forum as a solution to a real-life task.
It shows the power and speed of a concise regular expression, and the flexibility with which regular expressions can be used in SQL queries as well as in the Transform pane.
We have a table, called Table, with a text field called names that contains a list of names, separated by commas. Some of the names are repeated.
We would like to transform the list of names into a similar list, but without any repetitions of names. A list like
Marty, Joe, Michael, Marty, Jean, Joe, Jean, Jane
without duplicates could be
Michael, Marty, Joe, Jean, Jane
That is easy to do with the regular expression below, which uses capture groups to match any repeated names:
\b(\w+),\s(?=.*\b\\1,?\s?)
We use that regular expression within an SQL function, the StringRegexpReplace function. The syntax of the function is:
StringRegexpReplace(<string>, <regexp>, <target>, <flags>) : <string>
Given a string, a regular expression pattern, a target replacement string and a flag to use case or to not use case, returns a string where all substrings of characters that match the regular expression have been replaced by the target replacement string. If no matches are found returns the original string. The <flag> is not optional and must be either 'i' to ignore case or 'c' to use case.
To use the regular expression within the function we write the following SQL expression:
StringRegexpReplace([names], '\\b(\\w+),\\s(?=.*\\b\\1,?\\s?)', '', 'i')
Any substring that matches (case insensitive) the regular expression, that is, any repeated name, will be replaced by an empty string, that is, deleted. Since the backslash \ character is a special character in SQL, we escape it by using two backslash \\ characters, so the regular expression we want to use will be passed correctly as an argument to the SQL function.
We can use the above expression either within the Transform pane or within an SQL query written in a Command Window.
With the focus on the open Table window, in the Transform pane we choose the Name field, and then we choose the Expression template.
In the Expression template we press the Edit Expression button and we enter
StringRegexpReplace([names], '\\b(\\w+),\\s(?=.*\\b\\1,?\\s?)', '', 'i')
For the Result destination we chose New Field and enter unique names as the name for the new field.
Press Preview to see a preview of what the expression will do.
The template will create a new field and will populate it with a list from the names field, but without any duplicates.
Press Transform to create a new unique names field and fill it with the results
We can also achieve our task using a simple query in the Command Window.
We use the Edit - Schema dialog to add a text field called unique names to the table. We could do that in SQL as well, but to keep the example simple we use the Schema dialog.
We launch a Command Window for SQL and enter the following query:
UPDATE [Table] SET [unique names] =
StringRegexpReplace([names], '\\b(\\w+),\\s(?=.*\\b\\1,?\\s?)', '', 'i');
Press ! in the main toolbar to run the query.
Immediately, the unique names field is filled with lists that repeat the list in the names field but with only unique names.
Example: Edit Tables with the Transform Pane - This topic shows how fast and easy it is to edit tables using the Transform pane. Repetitive edits applying similar changes to different fields can be done rapidly with highly efficient workflow. Best of all, on the fly previews make it easy to use sophisticated editing techniques like regular expressions without fear of getting something wrong.