This topic shows how fast and easy it is to edit tables using the Transform pane. Repetitive edits applying similar changes to different fields can be done rapidly with highly efficient workflow. Best of all, on the fly previews make it easy to use sophisticated editing techniques like regular expressions without fear of getting something wrong. This topic is the user manual version of the 10 Minute Tutorial - Fast Table Editing with Transform video.
Our task is to take a table with poor organization of data and to restructure the table into better form.
We start with a table scraped from a NASA web site. The table lists solar eclipses from 2021 to 2040. The main problem with the table is that for six of the fields the HTML for the table incorporates not one but two very different data sets into the same field. For example, the Calendar Date field gives the date of the eclipse, but the link within that field is a link to an overview page that provides a global map along with tables of various eclipse parameters. The TD field gives the time of the eclipse, but the time is linked to a URL that shows an animation of the eclipse shadow traversing the Earth.
When we scrape the table into a CSV file (using an online web tool), we can see how the Date field contains two different types of data, a URL and a date, both jammed together into the same text field. As people with good database skills know, that's a really terrible way to organize data. A similar problem exists for the other five fields, Time, Type, Class, Saros, and Gamma, that had links on them.
When we import that table into Manifold, for those fields that jam different data together into one field, we will split out the two different types of data into their own fields.
We begin by importing the CSV file containing the table into Manifold. To do that, we simply drag the CSV file from Windows Explorer and drop it into the Project pane in Manifold. That creates a new table in the Project pane. Double-click the table to open it.
The table appears with gray background color because, not having a key, identity field and index, it is read-only. We can make it read-write by adding an identity field and index.
With the focus on the opened table, choose Edit - Schema to launch the Schema dialog.
In the Schema dialog's toolbar, click the Add Identity button.
That automatically adds the standard Manifold identity field, mfd_id, and an index on that field. Press Save Changes.
The table's background now switches to white, to show it is read-write. In the illustration above we have also taken a moment to rename the table, and we have widened the first column so we can see the complete contents of the Date field.
Our strategy for splitting the data in the Date field into two fields is simple. We will first make a copy of the field in a new field in the table. That will give us two fields that both have the two types of data jammed into them. In one of the fields we will remove the date info, leaving just the URL, and in the other field we will remove the URL, leaving just the date. The end result will be to have two fields, one which has just the URL data and the other which has just the date data.
We begin by making a copy of the Date field into a new field in the table.
With the focus on the table window, in the Transform pane we choose the Date field. We click the Copy template to launch that template in the Transform pane.
In the Copy template we choose text as the Use method, and in the Result box we enter the name URL_Overview for the new field to be created, using the nvarchar type.
This is such a simple thing we do not bother seeing a preview. Press Transform.
A new field of type nvarchar appears in the table, named URL_Overview. It is populated with a copy of the data from the Date field. The Copy transform template has with one click created a new field and has also copied the contents of the Date field into the new field.
We will now copy the data from the Time field into a new field in the table.
The Transform pane is still set to the Copy template. In the Field box we choose the Time field, and for the Result we enter URL_Animation as the name of the new field. That takes just a few seconds.
That creates a new field, called URL_Animation in the table, and it populates it with data copied from the Time field.
We continue in this way quickly changing the Field box to Type, and then Class, and then Saros, and then Gamma, in each case specifying a name for the Result and pressing Transform.
In a matter of seconds we can create six new fields in the table, as seen in the illustration above, that are copies of the Date, Time, Type, Class, Saros, and Gamma fields, with names that indicate what URL data they will end up containing.
We can use the Layers pane to adjust the widths of the fields, and their display order in the table window, to suit our taste.
Our task now is to delete the non-URL portion of the various URL fields we have just created and populated.
We begin with the URL_Overview field, as shown above. We will delete the date portion of the text data in that field, leaving only the URL part of the data.
To delete the date part, we will use a regular expression that picks out the date part and we will replace it with nothing. That is the same as deleting it.
With the focus on the table window, we press the Up button to go up from the Copy template to the main Transform pane menus. We choose URL_Overview as the field and then in the list of templates we click on Replace to launch that template in the Transform pane.
In the Replace template we choose regular expression as the method in the Replace box. In the Search for box we enter the regular expression \s.+$ and the Replace with box we leave empty. The Result we will put into the same field, that is, editing the field in place.
The regular expression looks cryptic if we have not yet learned about regular expressions, but it is a very simple expression. \s matches a space character. The dot . character matches any character, and the dot character followed by a plus + sign means any sequence of one or more characters. The dollar $ character matches the end of the field. The regular expression we have written therefore matches any sequence of characters starting with a space character up through the end of the field.
That is exactly the part of the field we want to eliminate, that is everything from the first space character onward (the URL string itself has no space characters).
We can press the Preview button to see what the Replace operation using the regular expression will do.
Previews are shown in blue preview color. We can see the result of the operation will be to eliminate the date portion of the text that is in the field, leaving only the URL portion of the field.
When learning regular expressions, it is a good idea to use Preview to avoid surprises. Now that we know the expression will do exactly what we want, we can with confidence press the Transform button to apply the transform.
The result is that the URL_Overview field is edited in place to remove the date portion of the field, leaving only the URL portion.
We now will apply exactly the same regular expression in the Replace template to the other URL fields. What they all have in common is that they begin with a URL portion, followed by one or more space characters, followed by some other data. So getting rid of everything after the first space character encountered will leave only the URL portion of the field.
The Transform pane is still loaded with the Replace template and all parameters that we used. In the Field box we choose the URL_Animation field and all the other parameters remain the same.
We press Transform to apply the transform to the URL_Animation field. That edits the URL_Animation field to eliminate everything except the URL portion of the field.
It takes just a few seconds to in turn choose the various other URL fields in the Field box and to press Transform to edit each of them.
The result is that all of the URL fields are edited to remove everything other than the URL portion of the field. We can see how that worked by looking at the last of the URL fields, the URL_Besselian field, to see that we have removed everything but the URL portion of the field.
So far, so good. Now we will go back to the original fields and remove the URL portions of those fields.
We begin with the Date field, as seen above. In this case, instead of removing everything from the first space character to the end, we want to remove everything from the beginning to the first space character. In fact, with the Date field, we want to remove everything from the beginning to the first sequence of two space characters, since this particular field has two space characters between the URL part of the text and the date part of the text.
The Transform pane is still set to the Replace template, which is the template we will use. We change the Field box to the Date field. We will change the regular expression in the Search for box to ^.+\s\s while leaving all other settings as they were.
The regular expression is also very simple. The caret ^ character matches the beginning of the field. The dot . character and plus + character combination as before matches any sequence of zero or more characters, and the \s sequence matches a space character. So the regular expression will match any sequence of characters from the beginning of the field to the first occurrence of two space characters. That is exactly the URL sequence we want to match, and to replace with nothing.
We press Preview to see what will happen.
The preview shows the result is exactly what we want. The field will be edited to remove everything except the date portion of the text.
We press Transform. Instantly the Date field is modified in place.
We will now apply the same template to the other fields where we want to get rid of the URL portion of the field.
The next such field is the Time field. In the Replace template we choose Time in the Field box. We edit the Search for regular expression to remove the last \s sequence, since in the Time field and other fields everything after the first space character encountered is to be removed.
We press Preview.
The preview shows how the URL portion of the text will be removed, leaving only the time portion.
That edits the Time field in place to remove the URL portion, as seen above, with the field width reduced to a sensible size.
In just a few seconds, we can edit the other fields where we want to remove the URL portion. We choose the field in the Field box and then press Transform.
For example, we can choose the Type field and press Transform. In turn we choose the Class, Saros, and Gamma fields and press Transform. That ends removing the URL portion of each of those fields.
Our final result is seen above. We now have a table where each field contains only the data relevant to that field, without fields that have two different kinds of data jammed together into the same field. The table is now far more useful for downstream workflow.
10 Minute Tutorial - Fast Table Editing with Transform - See how fast and easy it is to edit tables using the Transform pane. Repetitive edits applying similar changes to different fields can be done fast - really fast! - with highly efficient workflow. Best of all, on the fly previews make it easy to use sophisticated editing techniques like regular expressions without fear of getting something wrong. The video shows a real life example of restructuring a table scraped from a NASA website that shows total and annular eclipses of the Sun from 2021 to 2100, converting the table from awkward form as published by NASA into more usable form.
Edit - Schema