Join Example: Join Data from Many Records into One Record

A table to table join:  We use the Edit - Join dialog to take data from many records and add that into a single record, as a list, as an average and as a maximum value from the many records.

 

Pending updates: The former Save update query check box in the Join dialog has been replaced with a new Setup Join and Edit Query button, which provides slightly different, but more convenient, operation.  The Join Component button has been renamed Join.  Illustrations and commentary in this topic will be updated soon.

 

To follow along with table examples in this topic, download from the Examples page on the Manifold website the books.mxb project.  That project contains the publishers and titles tables used in examples.

 

One of the most powerful aspects of Joins is how they can look up data from multiple records and then combine that, using transfer methods such as averaging, into a single record.  For example, if a publisher has many book titles and each such book has a record of sales, the publisher may be interested in finding the average sales considering all of the books.   That is called a many to one join, easy to do in the Join dialog.

 

See the video version of this example in the Join Dialog Part 2 - Joins between Tables video.  

 

 

For this example our original table will be the publishers table.   Using the titles table as the joined table, we will take data from the titles table to populate new fields in the publishers table that list all the titles for each publisher, that provide the average sales across all titles, and that provide the most recent publication date of a title.

 

 

The fields we will use from the titles table are the title_name field, the sales field and the pubdate field.  Note that the titles table we are using does not have the new line added to the bottom of the table in the prior example, so it has no titles for the publisher with a pub_id of P04.

 

With the focus on the open publishers table window, we choose Edit - Join in the main menu.

 

 

In the Join dialog we choose the titles table as the joined table.  Manifold automatically guesses we want to use the pub_id field in that table as the key field. If it guesses wrong, we can choose some other field from the pull down list of fields that can be used as key fields.

 

Press the Add button and choose Fields to launch the Add Fields dialog.

 

 

In the Add Fields dialog we check the title_name, sales, and pubdate fields.  Press OK.

 

 

Back in the Join dialog, that adds the three fields we checked as new fields to the original table, that is, to the publishers table.  

 

We double-click into the title_name new field in the original table and change the name to all titles.  

 

 

Next, we double-click into the transfer method and choose join tokens.   This will harvest the title names from the various titles for that publisher and join them into a single string, using a comma , character to separate the titles within the string.   That single string will be loaded into the new all titles field in the original table.

 

 

We double-click into the sales new field in the original table and change it to average sales.  Next, we double-click into the transfer method and choose average.   This will harvest the sales numbers from the various titles for that publisher and average them into a single number, which will be loaded into the new average sales field in the original table.

 

 

We double-click into the pubdate new field in the original table and change it to last publication.  Next, we double-click into the transfer method and choose max.   This will harvest the pubdate from the various titles for that publisher and choose the maximum, that is, the latest date, which will be loaded into the new last publication field in the original table.

 

 

We check the Save update query box and then press Join Component.

 

 

The system immediately alters the publishers table to add the new fields we have specified, populating them with values derived from fields in the titles table as we have specified.  The new all titles field contains a list of all titles for that publisher in the titles table, with a comma , character separating each title.   The average sales field has been loaded with the average of sales for all titles for that publisher in the titles table, and the last publication field has been loaded with the maximum, that is, the latest date, for all titles for that publisher in the titles table

 

In addition, a new query appears in the Project pane, called publishers Update.  That query contains the SQL that the dialog used to alter the titles table.

Using the Update Query

Just before we pressed Join Component we checked the Save update query box, to create a new query in the project called publishers Update.   That query contains the SQL magic the dialog used to accomplish the join that we ordered.   We can Run the update query any time we want to update the publishers table to incorporate any changes.

 

 

For example, suppose a new record, for a new title, appears in the titles table.  The new title is from a publisher with a pub_id of P04.  

 

 

When we Run the publishers Update query, the publishers table is automatically updated to populate the all titles, average sales, and last publication fields.

Modifying the Update Query

See the video version of this example in the Join Dialog Part 2 - Joins between Tables video.  

 

The update query works great, but suppose we do not like the lack of a space between the comma and the next title, for example, as in

 

200 Years of German Humor,I Blame My Mother,What Are The Civilian Applications?

 

Suppose also titles have commas in them?  We would prefer that the list of titles used a semicolon ; character as a separator, and also followed each separator with a space character.   That is easy to arrange.

 

In the Project pane, we double-click the publishers Update query to open it, to see the following SQL:

 

-- $manifold$

--

-- Auto-generated

-- Join

--

UPDATE (

  SELECT

    t.[pub_id] AS tkey0,

    t.[all titles] AS t0, t.[average sales] AS t1, t.[last publication] AS t2,

    s.sjoinkey, s.s0, s.s1, s.s2

  FROM [publishers] AS t LEFT JOIN (

    SELECT

      [pub_id] AS sjoinkey,

      StringJoinTokens([title_name], ',') AS s0,

      Avg([sales]) AS s1,

      Max([pubdate]) AS s2

    FROM [titles]

    GROUP BY [pub_id]

  ) AS s ON t.[pub_id] = s.sjoinkey

) SET

  t0 = s0, t1 = s1, t2 = s2;

 

 

We will change the StringJoinTokens function so it uses a semicolon ; character followed by a space character as a separator:

 

      StringJoinTokens([title_name], '; ') AS s0,

 

We press the ! Run button in the main toolbar to run the update query.

 

 

Instantly, the publishers table is updated using the new formatting, for example:

 

200 Years of German Humor; I Blame My Mother; What Are The Civilian Applications?

 

The all titles values now use a semicolon and a space character between titles.   That is better, but suppose we would prefer to construct the all titles strings as JSON objects, so we can re-cycle them into other applications?    That also is easy to do.   

 

We will change the StringJoinTokens line to use the string concatenation SQL operator, &, to prepend a bit of text that starts the JSON expression before the StringJoinTokens function, we change the function to use a double quote, a comma, a space and another double quote as the separator between titles, and then we use & again to append a bit of text to close the JSON expression:

 

      '{"Title":["' & StringJoinTokens([title_name], '", "') & '"]}' AS s0,

 

Lucky for us, the StringJoinTokens function does not add on a comma at the end of the string it builds, so we have no issues with a dangling comma after the last item in our JSON array.   Because & operators come ahead of AS in order of precedence, we do not have to encase the expression within parentheses.

 

We press the ! Run button in the main toolbar to run the update query.

 

 

The all_titles values now contain a JSON object of the title strings, for example:

 

{"Title":["200 Years of German Humor", "I Blame My Mother", "What Are The Civilian Applications?"]}

 

In the real world, book titles might contain special characters, like quotes or backslashes, so we might add one more refinement to our query in case book titles contain special characters.   We can take advantage of Manifold's ever-popular StringEscapeJson SQL function, which adds character escaping to JSON strings so they safely can be used within functions or queries without special characters in them causing errors:

 

      '{"Title":["' & StringJoinTokens(StringEscapeJson([title_name]), '", "') & '"]}' AS s0,

 

If all we want is a JSON array, we do not need { } curly brackets and we do not need to name the array.   We can simply use:

 

      '["' & StringJoinTokens(StringEscapeJson([title_name]), '", "') & '"]' AS s0,

 

... which will produce all_titles values such as:

 

["200 Years of German Humor", "I Blame My Mother", "What Are The Civilian Applications?"]

 

Modifying the update query written by the Join dialog is a great way to use a point and click dialog to quickly build a join, which for many people can be confusing, and then to customize what the join does in a more compartmentalized way that is easy to understand.

 

Notes

Terminology - The original table also may be called the target table, and the table providing data also may be called the source table.

 

Quick reference - With the Join dialog open, press F1 to open a quick reference guide in a browser window.

 

Guessing the key field - When we launch the dialog with an original field and when we choose a joined field, the dialog will try to guess what fields we might want to use as key fields and will load those first by default.  If it guesses wrong, we can choose the field we want.   For the original table, the dialog tries to use a field (other than mfd_id) that has a BTREE / BTREENULL index, with a data type preference for numbers, then text, then everything else, and a name preference for field names with id in them, such as ... id or ..._id, then ...id and then everything else. For the joined table, the dialog uses similar logic, but first tries to use a field with the same name as in the original table.

 

Added fields are static - Added fields are static, that is, if data in the joined table changes the fields in the original table that received data in the Join operation from the joined table will not automatically be updated.  We can easily update the table any time we want using a saved update query.  

 

Spatial joins are automatically parallel - Joining data between drawings uses parallel GeomOverlay function variants.  Joins from images to drawings are parallelized through a nested SELECT using THREADS.

 

Why only in the same data source - The Join dialog works only with tables and queries within the same data source to keep the dialog simple and fast.   An important part of keeping the dialog simple is to show only those tables or queries in the pull down list for the joined table that can be used in a join.   Scanning all tables and queries in the same data source is reasonable, but if there are many data sources or many levels of nesting (data sources within data sources) there could be hundreds of thousands, if not millions, of tables and queries within the hierarchy to scan.  For people working in map projects without such elaborate data sources, most of the time what they do will be within the .map project data source so everything works by default.  People who link in data sources can quickly add a simple aliasing query, as illustrated in this topic, to use tables and queries from other data sources.   This method also makes it possible for the same Join dialog to work within other data sources, when aliasing queries refer back either to the .map project data source or to other data sources.

 

Buy and read a Fehily book on SQL - Manifold recommends Chris Fehily's excellent books on SQL.  Chris makes learning SQL really easy.  To encourage Manifold readers to get a Fehily book and to read it, Manifold examples are often integrated with examples in the Fehily SQL books, such as the tables used in this topic.    

 

Example data - To follow along with table examples in this topic, download from the Examples page on the Manifold website the books.mxb project.  That project contains the publishers and titles tables used in examples.

 

Videos

Join Videos

 

Find Percentages of Open Space in ZIP Code Area - Find the percentage of open space in each ZIP code area given a layer of polygons representing ZIP codes and a layer of polygons showing open spaces like parks and green spaces. This video shows how to do that start to finish in a few simple steps, from initial importing of shape files to final results, in just five minutes, with an additional few minutes of explanation what each step does. Works in Manifold Release 9 or using the free Manifold Viewer.

See Also

Maps

 

Tables

 

Queries

 

Drawings

 

Images

 

Editing and Combining Data

 

Join

 

Join Videos

 

Join Examples

 

Command Window

 

JOIN Statements

 

Editable Results Tables