Join Example: Converting between Different Data Types

For ease of use, Manifold can automatically convert data types when possible, for example, from floating point numbers to integers.   When using the Add button in the Edit - Join dialog to add new fields to an original table, there is no need for conversion since the data type of the newly-added field will be the same as the data type as the field from the joined table that was chosen to populate the new field.

 

Pending updates: The former Save update query check box in the Join dialog has been replaced with a new Setup Join and Edit Query button, which provides slightly different, but more convenient, operation.  The Join Component button has been renamed Join.  Illustrations and commentary in this topic will be updated soon.

 

However, when using the Join dialog to load data from the joined table into existing fields in the original table, the data types may be different, but close enough that automatic conversion can occur.  In such cases we should be aware that automatic conversions from floating point numbers to integers always round down.   We can alter that to a mix of rounding up and down with a simple edit of the saved upgrade query.

 

To follow along with table examples in this topic, download from the Examples page on the Manifold website the books.mxb project.  That project contains the publishers and titles tables used in examples.

 

 

Consider a version of our publishers table that has a field called average price, using data type int32.    A tooltip shows the data type for a field in the table when we hover the mouse cursor above the field's column header.

 

We will use the Join dialog to populate that existing field with the average of the price field in the titles table.

 

 

The price field in the titles table is a float64, which Manifold will automatically convert into an int32 during the join operation.

 

With the focus on the open publishers table window, we choose Edit - Join in the main menu.

 

 

We choose the titles table as the joined table, and we double-click into the joined field cell for the average price existing field to choose price as the joined field that will be used to populate the average price existing field.

 

The dialog automatically loads the join method cell with convert as the method, because it knows the floating point numbers in the price field in the titles table must be converted into integer numbers for use in the average price field in the publishers table.   The presence of convert also warns us the data types are different (otherwise copy would have appeared as the method) between the joined table and the original table.

 

We double-click the convert cell to change the method to average.

 

 

We check the Save update query box and then press Join Component.

 

 

The publishers table is instantly populated with the average of prices for titles carried by each publisher.   However, if we look closely at the average price value for publisher P02, indicated with a magenta arrow, we might see something unexpected.

 

Taking a look at the titles table we see that publisher P02 has only one title, with a price of 39.95.   However, the average price converted to an integer is only 39, and not 40 as we might expect.

 

 

In the Project pane we can double-click the publishers Update query that we saved to see how it works:

 

UPDATE (

  SELECT

    t.[pub_id] AS tkey0,

    t.[average price] AS t0,

    s.sjoinkey, s.s0

  FROM [publishers] AS t LEFT JOIN (

    SELECT

      [pub_id] AS sjoinkey,

      Avg([price]) AS s0

    FROM [titles]

    GROUP BY [pub_id]

  ) AS s ON t.[pub_id] = s.sjoinkey

) SET

  t0 = s0;

 

Prices are averaged in the above query in the line:

 

      Avg([price]) AS s0

 

Given a single floating point number of 39.95  the result of the Avg aggregate function will be 39.95.   However, automatic conversion from a floating point number to an integer always rounds down, so the conversion to an integer results in a value of 39.

 

We can change that line to round the average both up and down, using the Round function:

 

      Round(Avg([price])) AS s0

 

The Round function will take the 39.95 value that is the result of Avg and will round it up to 40.   Our new update query, as modified, will be:

 

UPDATE (

  SELECT

    t.[pub_id] AS tkey0,

    t.[average price] AS t0,

    s.sjoinkey, s.s0

  FROM [publishers] AS t LEFT JOIN (

    SELECT

      [pub_id] AS sjoinkey,

      Round(Avg([price])) AS s0

    FROM [titles]

    GROUP BY [pub_id]

  ) AS s ON t.[pub_id] = s.sjoinkey

) SET

  t0 = s0;

 

When we run that query we update the publishers table with more expected values based on both rounding up and down:

 

 

Making small changes to update queries allows us to take advantage of automatic conversion between types, while controlling how that conversion works to match our preferences when using aggregate methods.

 

Notes

Terminology - The original table also may be called the target table, and the table providing data also may be called the source table.

 

Quick reference - With the Join dialog open, press F1 to open a quick reference guide in a browser window.

 

Guessing the key field - When we launch the dialog with an original field and when we choose a joined field, the dialog will try to guess what fields we might want to use as key fields and will load those first by default.  If it guesses wrong, we can choose the field we want.   For the original table, the dialog tries to use a field (other than mfd_id) that has a BTREE / BTREENULL index, with a data type preference for numbers, then text, then everything else, and a name preference for field names with id in them, such as ... id or ..._id, then ...id and then everything else. For the joined table, the dialog uses similar logic, but first tries to use a field with the same name as in the original table.

 

Added fields are static - Added fields are static, that is, if data in the joined table changes the fields in the original table that received data in the Join operation from the joined table will not automatically be updated.  We can easily update the table any time we want using a saved update query.  

 

Spatial joins are automatically parallel - Joining data between drawings uses parallel GeomOverlay function variants.  Joins from images to drawings are parallelized through a nested SELECT using THREADS.

 

Why only in the same data source - The Join dialog works only with tables and queries within the same data source to keep the dialog simple and fast.   An important part of keeping the dialog simple is to show only those tables or queries in the pull down list for the joined table that can be used in a join.   Scanning all tables and queries in the same data source is reasonable, but if there are many data sources or many levels of nesting (data sources within data sources) there could be hundreds of thousands, if not millions, of tables and queries within the hierarchy to scan.  For people working in map projects without such elaborate data sources, most of the time what they do will be within the .map project data source so everything works by default.  People who link in data sources can quickly add a simple aliasing query, as illustrated in this topic, to use tables and queries from other data sources.   This method also makes it possible for the same Join dialog to work within other data sources, when aliasing queries refer back either to the .map project data source or to other data sources.

 

Buy and read a Fehily book on SQL - Manifold recommends Chris Fehily's excellent books on SQL.  Chris makes learning SQL really easy.  To encourage Manifold readers to get a Fehily book and to read it, Manifold examples are often integrated with examples in the Fehily SQL books, such as the tables used in this topic.    

 

Example data - To follow along with table examples in this topic, download from the Examples page on the Manifold website the books.mxb project.  That project contains the publishers and titles tables used in examples.

 

Videos

Join Videos

 

Find Percentages of Open Space in ZIP Code Area - Find the percentage of open space in each ZIP code area given a layer of polygons representing ZIP codes and a layer of polygons showing open spaces like parks and green spaces. This video shows how to do that start to finish in a few simple steps, from initial importing of shape files to final results, in just five minutes, with an additional few minutes of explanation what each step does. Works in Manifold Release 9 or using the free Manifold Viewer.

 

See Also

Tables

 

Queries

 

Editing and Combining Data

 

Join

 

Command Window

 

JOIN Statements

 

Editable Results Tables

 

Example: Use Edit - Join to Map a Pandemic - Creating a data source for a CSV web server, we acquire current data on cases and deaths worldwide for the Covid-19 pandemic.  We then use the point and click Edit - Join dialog to rapidly join that data, automatically aggregated as we desire, into a world map for visualization.

 

Example: Create a Map Showing OSM Use by Country - A start-to-finish real life example of map creation that combines various Manifold capabilities, including use of Edit - Join.  Copying a table of numbers from a web site, we create a map that is thematically colored to show usage of OpenStreetMap by country in proportion to the population of that country.

 

Example: Trace Vector Areas from Raster Pixels - This example follows the Trace Vector Areas from Raster Pixels video on the Gallery page.   We use the Trace Areas template in the Transform pane for images to create a drawing with vector areas covering regions of similarly-colored pixels.  Next, we use a simple query to add classification codes from a USGS table of classes to the resulting drawing, using a simple INNER JOIN SQL statement.

 

Example: Import E00 and Rearrange - An intricate example showing how to use Manifold tools to adapt legacy data imported from E00 format into useful, modern form.