Join Example: Converting between Different Data Types

For ease of use, Manifold can automatically convert data types when possible, for example, from floating point numbers to integers.   When using the Add button in the Edit - Join dialog to add new fields to an original table, there is no need for conversion since the data type of the newly-added field will be the same as the data type as the field from the joined table that was chosen to populate the new field.

 

However, when using the Join dialog to load data from the joined table into existing fields in the original table, the data types may be different, but close enough that automatic conversion can occur.  In such cases we should be aware that automatic conversions from floating point numbers to integers always round down.   We can alter that to a mix of rounding up and down with a simple edit of the saved upgrade query.

 

To follow along with table examples in this topic, download from the Examples page on the Manifold website the books.mxb project.  That project contains the publishers and titles tables used in examples.

 

 

Consider a version of our publishers table that has a field called average price, using data type int32.    A tooltip shows the data type for a field in the table when we hover the mouse cursor above the field's column header.

 

We will use the Join dialog to populate that existing field with the average of the price field in the titles table.

 

 

The price field in the titles table is a float64, which Manifold will automatically convert into an int32 during the join operation.

 

With the focus on the open publishers table window, we choose Edit - Join in the main menu.

 

 

We choose the titles table as the joined table, and we double-click into the joined field cell for the average price existing field to choose price as the joined field that will be used to populate the average price existing field.

 

The dialog automatically loads the join method cell with convert as the method, because it knows the floating point numbers in the price field in the titles table must be converted into integer numbers for use in the average price field in the publishers table.   The presence of convert also warns us the data types are different (otherwise copy would have appeared as the method) between the joined table and the original table.

 

We double-click the convert cell to change the method to average.

 

 

Press Join.

 

 

The publishers table is instantly populated with the average of prices for titles carried by each publisher.   However, if we look closely at the average price value for publisher P02, indicated with a magenta arrow, we might see something unexpected.

 

Taking a look at the titles table we see that publisher P02 has only one title, with a price of 39.95.   However, the average price converted to an integer is only 39, and not 40 as we might expect.

 

 

We can see what is going on by taking a look at the SQL query used.   To do that, we make ready to repeat the Join operation to update the publishers table.

 

With the focus on the publishers table window, we launch the Edit - Join dialog again.

 

 

We set the dialog up as before.   We then click the Setup Join and Edit Query button to see the SQL query Manifold will use to do the join.   That will allow us to run the query as is, or to modify it if we would like it to operate differently.   It will also allow us to save the query for future use, without having to set up the Join dialog again.

 

 

Pressing the Setup Join and Edit Query button launches the Command Window with the query used, an UPDATE query.    The text of the query is:

 

-- $manifold$

--

-- Auto-generated

-- Join

--

 

UPDATE (

  SELECT

    t.[pub_id] AS tkey0,

    t.[average price] AS t0,

    s.sjoinkey, s.s0

  FROM [publishers] AS t LEFT JOIN (

    SELECT

      [pub_id] AS sjoinkey,

      Avg([price]) AS s0

    FROM [titles]

    GROUP BY [pub_id]

  ) AS s ON t.[pub_id] = s.sjoinkey

) SET

  t0 = s0;

 

Prices are averaged in the above query in the line:

 

      Avg([price]) AS s0

 

Given a single floating point number of 39.95  the result of the Avg aggregate function will be 39.95.   However, automatic conversion from a floating point number to an integer always rounds down, so the conversion to an integer results in a value of 39.

 

We can change that line to round the average both up and down, using the Round function:

 

      Round(Avg([price])) AS s0

 

The Round function will take the 39.95 value that is the result of Avg and will round it up to 40.   Our new update query, as modified, will be:

 

-- $manifold$

--

-- Auto-generated

-- Join

--

 

UPDATE (

  SELECT

    t.[pub_id] AS tkey0,

    t.[average price] AS t0,

    s.sjoinkey, s.s0

  FROM [publishers] AS t LEFT JOIN (

    SELECT

      [pub_id] AS sjoinkey,

      Round(Avg([price])) AS s0

    FROM [titles]

    GROUP BY [pub_id]

  ) AS s ON t.[pub_id] = s.sjoinkey

) SET

  t0 = s0;

 

We run that query in the Command Window by pressing the ! Run button in the main toolbar.  

 

When we run the query we update the publishers table with more expected values based on both rounding up and down:

 

 

Making small changes to update queries allows us to take advantage of automatic conversion between types, while controlling how that conversion works to match our preferences when using aggregate methods.

 

Notes

Terminology - The original table also may be called the target table, and the table providing data also may be called the source table.

 

Quick reference - With the Join dialog open, press F1 to open a quick reference guide in a browser window.

 

Guessing the key field - When we launch the dialog with an original field and when we choose a joined field, the dialog will try to guess what fields we might want to use as key fields and will load those first by default.  If it guesses wrong, we can choose the field we want.   For the original table, the dialog tries to use a field (other than mfd_id) that has a BTREE / BTREENULL index, with a data type preference for numbers, then text, then everything else, and a name preference for field names with id in them, such as ... id or ..._id, then ...id and then everything else. For the joined table, the dialog uses similar logic, but first tries to use a field with the same name as in the original table.

 

Added fields are static - Added fields are static, that is, if data in the joined table changes the fields in the original table that received data in the Join operation from the joined table will not automatically be updated.  We can easily update the table any time we want using a saved update query.  

 

Spatial joins are automatically parallel - Joining data between drawings uses parallel GeomOverlay function variants.  Joins from images to drawings are parallelized through a nested SELECT using THREADS.

 

Buy and read a Fehily book on SQL - Manifold recommends Chris Fehily's excellent books on SQL.  Chris makes learning SQL really easy.  To encourage users to buy a Fehily book and to read it, Manifold examples often use examples similar to those in the Fehily SQL books.

 

Example data - To follow along with table examples in this topic, download from the Examples page on the Manifold website the books.mxb project.  That project contains the publishers and titles tables used in examples.

 

Videos

Join Videos

 

Find Percentages of Open Space in ZIP Code Area - Find the percentage of open space in each ZIP code area given a layer of polygons representing ZIP codes and a layer of polygons showing open spaces like parks and green spaces. This video shows how to do that start to finish in a few simple steps, from initial importing of shape files to final results, in just five minutes, with an additional few minutes of explanation what each step does. Works in Manifold Release 9 or using the free Manifold Viewer.

 

See Also

Tables

 

Queries

 

Editing and Combining Data

 

Join

 

Command Window

 

JOIN Statements

 

Editable Results Tables

 

Example: Use Edit - Join to Map a Pandemic - Creating a data source for a CSV web server, we acquire current data on cases and deaths worldwide for the Covid-19 pandemic.  We then use the point and click Edit - Join dialog to rapidly join that data, automatically aggregated as we desire, into a world map for visualization.

 

Example: Create a Map Showing OSM Use by Country - A start-to-finish real life example of map creation that combines various Manifold capabilities, including use of Edit - Join.  Copying a table of numbers from a web site, we create a map that is thematically colored to show usage of OpenStreetMap by country in proportion to the population of that country.

 

Example: Trace Vector Areas from Raster Pixels - This example follows the Trace Vector Areas from Raster Pixels video on the Gallery page.   We use the Trace Areas template in the Transform pane for images to create a drawing with vector areas covering regions of similarly-colored pixels.  Next, we use a simple query to add classification codes from a USGS table of classes to the resulting drawing, using a simple INNER JOIN SQL statement.

 

Example: Import E00 and Rearrange - An intricate example showing how to use Manifold tools to adapt legacy data imported from E00 format into useful, modern form.