JOIN Statements

A JOIN statement is a SELECT that combines records from two tables into a results table based on some condition, called a join condition.   The condition tested might be whether two field values are the same, so that records having the same values in that essential field can be combined from the two tables into one table.    Joins can be nested to combine records from more than two tables into the results table.

 

 

A typical way of using a join statement would be to add columns to tables by looking up matching values in other tables.   For example, if we have a table of states with the name of each state and various other attributes but we do not have the name of the capital city for each state, we could add it using a JOIN if we had another table with a list of states and their capital cities.

 

This topic discusses joins using SQL.  To take advantage of the power of joins without using SQL we can launch the point and click Edit - Join dialog.

 

 For faster performance, the fields used in the ON condition in the join must have an index on them.  This can make the difference between seconds or hours to do a join with bigger tables.   See the Joins and Indexes section below.

Joins are Easy

Joins are a lot easier than they seem.   It helps to realize that strange terms like INNER, LEFT, RIGHT, FULL and OUTER describe almost absurdly simple situations, as the diagrams below will make clear.   

 

The general form of a join is:

 

SELECT <fields from both tables> FROM <left table> JOIN <right table> ON <condition>;

 

JOIN in the above is any of the various join statement key words, such as INNER JOIN, LEFT JOIN and so on.   Different joins provide different ways to combine the two tables.  The "left" and the "right" in the names of different joins refer to nothing more complicated than whether they return the left table or the right table.   

Example

Let us consider an example of a very simple JOIN statement.   Suppose we have two tables:

 

   

 

The States table on the left lists US states and the approximate population of each state.  The Capitals table on the right lists US states and the capital city of each state.    We would like to create a table that lists each state with both the population and the capital of that state.   That is easy to do with JOIN:

 

 

The text of the query is:

 

SELECT States.State, States.Population, Capitals.Capital

  FROM States JOIN Capitals

  ON States.State = Capitals.State;

 

Because we are referring to fields with the same name in two different tables we take advantage of dot nomenclature to unambiguously name fields within different tables; for example,  States.State refers to the State field within the States table and Capitals.State refers to the State field within the Capitals table.

 

The query is straightforward:   

 

  1. The SELECT statement says what fields we want in the results table.    

  2. FROM States JOIN Capitals announces the two tables to be combined in the results table.  

  3. The ON condition tells us what records to combine: only those where the state name is the same in both the States table and the Capitals table.

 

When Manifold builds the results table the system looks at the first record in the States table and finds a state of Alabama.  Manifold then scans the Capitals table to see if it can find a State with a value of Alabama.  When it finds a record for Alabama, Manifold smiles, thinking to itself, "Cool! I found a match..." and builds the first record in the results table, using the States.State value of Alabama, the States.Population value of 4858979 and the Capitals.Capital value of Montgomery.   If Manifold finds no other records in Capitals with a State of Alabama, the system moves on to the next state.

 

See a "real life" example using JOIN in the Find Percentages of Open Space in ZIP Code Areas  video.

Joins and Indexes

Important:  Fields used in the ON condition should have an index on the field, for both tables involved in the JOIN.    Indexes make a big difference in performance, seconds instead of hours to do a join with bigger tables.  

 

Consider the example JOIN query:

 

SELECT States.State, States.Population, Capitals.Capital

  FROM States JOIN Capitals

  ON States.State = Capitals.State;

 

Both the States table and the Capitals table should have indexes for the State field.  It takes but a few easy clicks in the Schema dialog to add an index.  We can use whatever btree index type works with that field.  Allowing duplicates or NULLs in the index is OK.   The index should be on the field used in the ON condition, without including any other fields as part of the index.

Combining More Tables using Additional Joins

What if in addition to adding the state capital for each state to our results table we also wanted to combine a third table?   That is easy to do.   Suppose we have a table that gives the official state tree for each state:

 

 

To add the official state tree to each state in our results table we can use an additional JOIN:

 

 

The text of the query is:

 

SELECT States.State, States.Population, Capitals.Capital, Trees.Tree

  FROM (States JOIN Capitals ON States.State = Capitals.State)

    JOIN Trees ON States.State = Trees.State;

 

The first two lines of the query are the same as before, and would result in the same results table:

 

SELECT States.State, States.Population, Capitals.Capital, Trees.Tree

  FROM (States JOIN Capitals ON States.State = Capitals.State)

 

By adding one more JOIN we are, in effect, taking the results from the first JOIN and using that as the <left table> for the second JOIN

 

(States JOIN Capitals ON States.State = Capitals.State) JOIN Trees

 

...and the Trees table for the <right table>.   The ON condition:

 

ON States.State = Trees.State;

 

Specifies that only those records where the State name is the same should be matched.

 

Manifold finds Alabama in the first results table and then dutifully looks through the Trees table until it finds a State called Alabama.   It takes the Tree for that record and creates the first record in the final results table, and so on.

Example Tables

In the examples of JOIN statements that follow we will use two example tables, both of which are short enough so we can show all records at once in small illustrations:

 

   

 

The Products table at right has a list of products where each Product has a product number in the ProdNum field.   The Parts table at left has a list of parts where each Part also has a product number in a ProdNum field, giving the number of the product in which that part is used.   For example, we can see that both Wire and Magnets parts have a ProdNum of 10.   A product number of 10 in the Products table is the ProdNum for the Generator product.    The tables are intended to show that the Wire and Magnets parts are both used in the Generator product.

 

Note that although most rows in both tables have a ProdNum number in common, there is a record in the Parts table, for Oil, which has a product number that does not appear in the Products table and there is a record in the Products table, for a Screwdriver product, that has a product number which does not appear in the Parts table.

 

To illustrate the results of different JOIN statements, we will use Venn diagrams where records in one table are represented by a reddish circle, records in the other table by a blue circle and the "overlap" of records for which the join condition is true is colored violet.  (See the Note below on Venn diagrams.)

 

Using the above example tables in the example queries that follow below, the Parts table is the Left Table, the table on the left of the JOIN, and the Products table is the Right Table, the table on the right of the JOIN.

 

All of the different JOIN statements return those "overlap" records for which the <condition> is true.   In addition, the LEFT JOIN, RIGHT JOIN and FULL JOIN statements will also add unmatched records from the left table, from the right table or from both tables (the "full" Monty...).  

Inner Join

The JOIN example above at the beginning of this topic is an example of an inner join.   "JOIN" is just a shorthand way of referring to the full name of the statement, INNER JOIN.   An INNER JOIN returns a table that combines only those records from the left and the right table where the join condition is true.  It is called an inner join because it returns only the inner, overlapping part of the circles representing the two tables.

 

 

Using our Parts and Products example tables, consider the query text:

 

SELECT Part, Supplier, Parts.ProdNum, Product

  FROM Parts INNER JOIN Products

  ON Parts.ProdNum = Products.ProdNum;

 

 

 

If we recall the two tables, the Parts table on the left and the Products table on the right, the INNER JOIN using a join condition returned the four records where the ProdNum number is the same in both tables, that is, the four records for which the join condition is true.

 

   

 

The INNER JOIN excluded the one record in the Parts table, for Oil, that has a product number of 160 which does not occur in the other table.  It also excluded the one record in the Products table, for a Screwdriver, that has a product number of 505 which does not occur in the other table.  Only those records that have the same ProdNum appear in the results of the INNER JOIN.

 

If we prefer, we can use a shorter form of INNER JOIN, writing simply JOIN.

 

SELECT Part, Supplier, Parts.ProdNum, Product

  FROM Parts JOIN Products

  ON Parts.ProdNum = Products.ProdNum;

 

The above query is exactly the same as using INNER JOIN.    If only the JOIN key word is used that means we intend an INNER JOIN.

 

If we desire, we can use more complicated join conditions:  

 

SELECT Part, Supplier, Parts.ProdNum, Product

  FROM Parts INNER JOIN Products

  ON Parts.ProdNum = Products.ProdNum

  AND Supplier NOT LIKE 'A%';

 

This query eliminates suppliers with names that begin with the letter A.

Outer Joins

Outer joins return more than just the inner, overlapping portion of records between the two tables.  Outer joins are called that because they include some of the outer parts of the Venn diagram.   Outer joins can be left joins, right joins or full joins.    The full names of these are LEFT OUTER JOIN, RIGHT OUTER JOIN and FULL OUTER JOIN.  Manifold allows us to use either the full names or the shorter names LEFT JOIN, RIGHT JOIN and FULL JOIN.

Left Outer Join

A LEFT OUTER JOIN, or just LEFT JOIN for short,  returns all records from the left table even if there are no matching records for the <condition> in the right table.   It also returns all matching records just like an INNER JOIN does.  The left outer part of what it returns is the reddish region of the Venn diagram below, in addition to the overlap portion.

 

Using our Parts table as the left table and the Products table as the right table consider the query text:

 

SELECT Part, Supplier, Parts.ProdNum, Product

  FROM Parts LEFT JOIN Products

  ON Parts.ProdNum = Products.ProdNum;

 

 

The Parts table is the left table and the Products table is the right table for no more complicated a reason than...

 

  FROM Parts LEFT JOIN Products

 

... they occur to the left and to the right of the LEFT JOIN statement.   The nomenclature really is that simple.  

 

 

Looking at the results table above and comparing it to the Parts and Products tables reproduced below, we see that the LEFT JOIN query returned all the INNER JOIN rows plus in addition the one record from the left, Parts table, for Oil, that has a ProdNum which is not matched by a record in the right, Products table.  

 

   

Since there is no Product in the right table with the same product number as Oil the value in that cell is NULL.

 

If we prefer, we can use a longer, verbose form of LEFT JOIN, writing LEFT OUTER JOIN:

 

SELECT Part, Supplier, Parts.ProdNum, Product

  FROM Parts LEFT OUTER JOIN Products

  ON Parts.ProdNum = Products.ProdNum;

 

The above query is exactly the same as using LEFT JOIN.  

Right Outer Join

A RIGHT OUTER JOIN, or just RIGHT JOIN for short, is the flip side of a LEFT JOIN.   The RIGHT JOIN returns all records from the right table even if there are no matching records for the <condition> in the left table.   It also returns all matching records just like an INNER JOIN does.  The right outer part of what it returns is the blue region of the Venn diagram below, in addition to the overlap portion.

 

Using our Parts table as the left table and the Products table as the right table consider the query text:

 

SELECT Part, Supplier, Parts.ProdNum, Product

  FROM Parts RIGHT JOIN Products

  ON Parts.ProdNum = Products.ProdNum;

 

 

As before, the Parts table is the left table and the Products table is the right table for no more complicated a reason than...

 

  FROM Parts RIGHT JOIN Products

 

... they occur to the left and to the right of the RIGHT JOIN statement.  

 

 

Looking at the results table above and comparing it to the Parts and Products tables reproduced below, we see that the RIGHT JOIN query returned all the INNER JOIN rows plus in addition the one record from the right, Products table, for a Screwdriver, that has a ProdNum which is not matched by a record in the left, Parts table.  

 

   

Since there is no Part in the left table with the same product number as the Screwdriver the values in the cells for the Part, Supplier and Parts.ProdNum fields for that record are NULLs.

 

If we prefer, we can use a longer, verbose form of RIGHT JOIN, writing RIGHT OUTER JOIN:

 

SELECT Part, Supplier, Parts.ProdNum, Product

  FROM Parts RIGHT OUTER JOIN Products

  ON Parts.ProdNum = Products.ProdNum;

 

The above query is exactly the same as using RIGHT JOIN.  

Full Outer Join

For those who want it all, a FULL OUTER JOIN, or just FULL JOIN for short, is like doing both a LEFT JOIN and a RIGHT JOIN.     The FULL JOIN returns all records from the right table even if there are no matching records for the <condition> in the left table and it also returns all records from the left table even if there are no matching records for the <condition> in the right table.  It also returns all matching records just like an INNER JOIN does.  The full outer part of what it returns is both the reddish region and the blue region of the Venn diagram below, in addition to the overlap portion.  The FULL JOIN is the full Venn diagram.

 

Using our Parts table as the left table and the Products table as the right table consider the query text:

 

SELECT Part, Supplier, Parts.ProdNum, Product

  FROM Parts FULL JOIN Products

  ON Parts.ProdNum = Products.ProdNum;

 

As before, the Parts table is the left table and the Products table is the right table to either side of the FULL JOIN statement.

 

 

Looking at the results table above and comparing it to the Parts and Products tables reproduced below, we see that the FULL JOIN returned the same results as the RIGHT JOIN query returned: all the INNER JOIN rows plus in addition the one record from the right, Products table, for a Screwdriver, that has a ProdNum which is not matched by a record in the left, Parts table.   We can see that it also returned the one unmatched record also returned by  the LEFT JOIN query from the left, Parts table, for Oil, that has a ProdNum which is not matched by a record in the right, Products table.  

 

   

If we prefer, we can use a longer, verbose form of FULL JOIN, writing FULL OUTER JOIN:

 

SELECT Part, Supplier, Parts.ProdNum, Product

  FROM Parts FULL OUTER JOIN Products

  ON Parts.ProdNum = Products.ProdNum;

 

The above query is exactly the same as using FULL JOIN.  

Interesting Join Conditions

Join conditions can be more complex than the examples shown.   A join condition can be almost any expression that does not contain a subquery.  It does not need to use the same field in both tables nor is it even required to reference the left or right tables.   For example, a query such as:

 

SELECT Part, Supplier, Parts.ProdNum, Product

  FROM Parts INNER JOIN Products

  ON Parts.Supplier LIKE 'A%' AND Products.Product = 'Generator';

 

Uses different columns in the ON condition.

 

 

The join condition finds records with suppliers beginning with the letter A and also products called Generator.

 

Join conditions can even create results where the values that participate in the join condition are not seen.    Consider the query:

 

SELECT Part, Supplier, Parts.ProdNum, Product, Price

  FROM Parts LEFT JOIN Products

  ON Parts.ProdNum = Products.ProdNum;

 

When executed it provides the expected LEFT JOIN results table, with all records from both tables that match the join condition plus in addition the one remaining record from the left table that has no match in the ProdNum fields with a record in the other table:

 

 

Now we will make the join condition more complex by including an AND test that tests a value, Price, found only in the right table:

 

SELECT Part, Supplier, Parts.ProdNum, Product, Price

  FROM Parts LEFT JOIN Products

  ON Parts.ProdNum = Products.ProdNum

  AND Products.Price > 10;

 

The result at first glance may seem unexpected:

 

 

There is no surprise about the NULLs in the record for Oil, since that record is in the left table but unmatched by a product number in the right table.    But why the nulls in the record for Plastic?    That record in the left table has a product number that matches a product number in the right table, but it fails the AND test of having a Price greater than 10.   Since it fails the join condition it cannot appear with fields from both tables as do the records that meet the join condition.  Instead, it is added to the results table as part of adding all records from the left table (what the LEFT JOIN does) whether or not they meet the join condition.   But since it does not meet the join condition but is just added as a catch-all addition of all left table records, only the values from the left table are added and none of the values from the right table are added.  There is no Price field or Product field in the left table so both the Price and the Product fields for Plastic have values of NULL.

Cross Joins

A CROSS JOIN combines two tables by creating records that form all possible combinations between records in the two tables.  If the left table has 5 records and the right table has 4 records the results table will have 5 * 4 = 20 records.   CROSS JOIN statements do not have an ON clause like inner or outer joins do.   Cross joins generate a results table that has more records than the sum of records in the left and right tables, so a cross join cannot be represented by a Venn diagram like inner and outer joins.  

 

Using our two example Parts and Products tables we can write a simple cross join:

 

SELECT Part, Supplier, Parts.ProdNum, Product

  FROM Parts CROSS JOIN Products;

 

The results table contains 20 records with all combinations of the left and right tables:

 

 

Taking a closer look at the results table, we can see that the first four records in the results table, outlined in green in the illustration above, are all the combinations between the first record in the Parts table and each record in turn in the Products table:

 

 

The permutations between the first record in the Parts table and records in the Products table generate Wire with Screwdriver, Wire with Relay, Wire with Saw, and Wire with Generator combinations.  The other records in the results table are similar combinations.

 

A simplified nomenclature for a CROSS JOIN that is accepted by most databases, including Manifold, is to use a comma , character instead of the key words CROSS JOIN

 

SELECT Part, Supplier, Parts.ProdNum, Product

  FROM Parts, Products;

 

When a comma is used as a super short form of CROSS JOIN the join is often called a comma join.   Comma joins are just a shortened way of writing a CROSS JOIN.  

Cross Joins with WHERE Clauses

Cross joins are most frequently used as raw material for subsequent steps in a query.  They often are accompanied by WHERE clauses in which a <condition> reduces the many records generated by a cross join between two large tables to only those records of interest.    When accompanied by a WHERE clause a CROSS JOIN can produce a results table identical to what an INNER JOIN with a similar join condition would produce.

 

Consider the CROSS JOIN query:

 

SELECT Part, Supplier, Parts.ProdNum, Product

  FROM Parts CROSS JOIN Products

  WHERE Parts.ProdNum = Products.ProdNum;

 

We can compare the results of that query to the equivalent INNER JOIN query illustrated earlier in this topic:

 

 

 

The results tables are the same.    In the CROSS JOIN query the WHERE condition is the same as the ON condition in the INNER JOIN query used earlier.   

 

Consider the same CROSS JOIN query written using the shortened, comma form:

 

SELECT Part, Supplier, Parts.ProdNum, Product

  FROM Parts, Products

  WHERE Parts.ProdNum = Products.ProdNum;

 

The result, of course, is the same as spelling out CROSS JOIN in the query:

 

 

Because CROSS JOIN with a WHERE condition the same as the ON condition in an INNER JOIN will create similar results, some writers will describe a CROSS JOIN, especially in the shortened, comma join form, as just a different form of INNER JOIN that uses a WHERE clause instead of an ON clause.

 

In cases where a cross join using a WHERE condition produces the same results as a JOIN with an equivalent ON condition, it is better to use the JOIN because that is more efficient than the process a cross join must follow.  The JOIN can process the ON condition as part of the JOIN while the cross join first creates a Cartesian product (all combinations of records in both rows) and then applies the WHERE condition.  Manifold does a good job of optimizing cross joins to avoid wasted effort but it can do an even better job with inner or outer joins.

Self Joins

A self join is a CROSS JOIN between a table and itself.    The result is all possible combinations of records in the table with each other.

 

If we use the Parts table:

 

 

The CROSS JOIN of the Parts table with itself will be a table of all combinations of records in the table with each other:

 

 

In the table above the first five records are combinations of records for Wire with Wire, Wire with Oil, Wire with Magnets, Wire with Plastic and Wire with Blades.  In a table of parts that is every combination of the Wire part with all the other parts.  

 

It can be confusing to use the same table with the same name in different parts of a query, so self joins use a naming trick, of using aliases, to try to make it less confusing:  in the FROM part of a self join query we must use ALIAS to assign two different names to the same table.  We then use the aliased names in the rest of the query.  For example, if we want to do a self join using the Parts table we would use ALIAS to give the Parts table two different aliases:

 

FROM Parts AS A CROSS JOIN Parts AS B

 

We could then refer to the Parts table using the aliased table names of A and B:

 

SELECT A.Part, A.ProdNum, A.Supplier, B.Part, B.ProdNum, B.Supplier

  FROM Parts AS A CROSS JOIN Parts AS B;

 

When doing self joins many people prefer the shortened, comma join form and not spelling out CROSS JOIN.  Using the comma is so popular that some people do not know that a self join is a cross join.  Therefore,  we often will see the above query written as:

 

SELECT A.Part, A.ProdNum, A.Supplier, B.Part, B.ProdNum, B.Supplier

  FROM Parts AS A, Parts AS B;

 

Some databases (not Manifold) allow an even shorter form for people who prefer to assign aliases without spelling out AS.  Manifold requires use of AS when assigning aliases, so the following is incorrect:

 

SELECT A.Part, A.ProdNum, A.Supplier, B.Part, B.ProdNum, B.Supplier  

  FROM Parts A, Parts B;     -- incorrect

 

Let us take a look at the result of the above query, written using the shorthand comma form of CROSS JOIN:

 

SELECT A.Part, A.ProdNum, A.Supplier, B.Part, B.ProdNum, B.Supplier

  FROM Parts AS A, Parts AS B;

 

As expected for any CROSS JOIN the result is simply all combinations of records in two tables, each of which has five records in the table:

 

 

There are five records in the Parts table so all combinations are 5 * 5 = 25 rows in the results table.

 

Suppose now we add a join condition using WHERE:

 

SELECT A.Part, A.ProdNum, A.Supplier, B.Part, B.ProdNum, B.Supplier

  FROM Parts AS A, Parts AS B

  WHERE A.ProdNum = B.ProdNum;

 

That restricts the records in the results table to only those combinations sharing the same product number value:

 

Self joins can be useful to show relationships between rows.  In the above example, ordering using ORDER BY can help make relationships clearer:

 

SELECT A.Part, A.ProdNum, A.Supplier, B.Part, B.ProdNum, B.Supplier

  FROM Parts AS A, Parts AS B

  WHERE A.ProdNum = B.ProdNum

  ORDER BY A.Part;

 

The very simple query above is nonetheless useful: it allows us to scan the table of parts and at a glance see records where more than one part is used in the same product.   We review the table looking for Magnets and instantly see that Wire is used in the same product as well, or if we started our scan with Wire we can see that Magnets are used in the same product as well.

Results Tables and Indexes

As much as possible, the results tables from JOIN statements will try to preserve and expose for use any indexes from the left and right tables:

 

 

The results table from a join that contains an index is editable.   Changing values in the cells will change values in those records in the originating tables.  See the Editable Results Tables topic.   

Use Aliases

Tech Tip:  Get in the habit of using aliases.  JOINs require us to list the fields we want in the results table, using <table name>.<field name> notation, which can be tedious to read when tables have long names.    We can reduce the wall of text by using the AS statement to alias a long table name into a short name.

 

Consider a query similar to that used in the DLG, USGS .DLG, .DO, .OPT  topic:

 

SELECT rd.ElementID, rd.Geom, att.MajorCode, att.MinorCode, att.Description

  INTO [Joined Table]

  FROM rd_keokuk_ia_24k AS rd

  JOIN [rd_keokuk_ia_24k Attributes] AS att

  ON rd.ElementID = att.ElementID;

 

The query uses AS aliasing to replace rd_keokuk_ia_24k with rd, and to replace [rd_keokuk_ia_24k Attributes] with att wherever possible.  We could use whatever alias we like that help us understand what we are doing:  the query uses rd as a mnemonic for "roads" and att as a mnemonic for "attributes."   That makes the query a lot shorter and easier to understand than the version without using aliasing:

 

SELECT rd_keokuk_ia_24k.ElementID, rd_keokuk_ia_24k.Geom,  

    [rd_keokuk_ia_24k Attributes].MajorCode, [rd_keokuk_ia_24k Attributes].MinorCode,

    [rd_keokuk_ia_24k Attributes].Description

  INTO [Joined Table]

  FROM rd_keokuk_ia_24k

  JOIN [rd_keokuk_ia_24k Attributes]

  ON rd_keokuk_ia_24k.ElementID = [rd_keokuk_ia_24k Attributes].ElementID;

 

The queries shown above use a mixed style for square [ ] brackets, using them only when they are required, for example, when a table name includes a space.   Some users prefer to use square brackets to enclose all table names, for greater consistency.

 

Notes

No brackets - In this topic to reduce visual clutter we do not use square [ ] brackets around simple field and table names.    Square brackets are optional if the name is unambiguous.

 

EQUI JOIN - An EQUIJOIN or EQUI JOIN  is a different name for an INNER JOIN used in some databases.  Manifold SQL does not use or accept the EQUIJOIN or EQUI JOIN key word.   Use INNER JOIN or JOIN instead.   

 

Cartesian product - Mathematically, a CROSS join is also known as a Cartesian product.

 

SELECT ... INTO - The examples in this topic use the Command Window to execute queries and show the results.   To create tables with the results of a join we use a SELECT ... INTO statement.   

 

 

For example, the very first illustration at the top of this topic shows a [States with Capitals] table created with:

 

SELECT States.State, States.Population, Capitals.Capital

  INTO [States with Capitals]

  FROM States JOIN Capitals

  ON States.State = Capitals.State;

 

 

JOIN does not stop at the first match - Consider the very first example in this topic, for JOIN that creates a table of States, Populations and State capitals.   Nothing about that JOIN tells Manifold to stop with the first record it finds in the Capitals table that has a State value of Alabama.    We expect that to be the case because we are used to the idea of a state having just one capital city, but nothing about the operation of JOIN limits it to forming only one match.   

 

Suppose instead of an accurate list of state capitals we downloaded a WikiCapitals table from Internet, and we failed to notice that some merry prankster added a second capital, named Frank Zappa, for the state of Alabama.  

 

 

In that case, if we were to write the query using the WikiCapitals table we would get a results table with two records for the state of Alabama:

 

 

The text of the query is:

 

SELECT States.State, States.Population, WikiCapitals.Capital

  FROM States JOIN WikiCapitals

  ON States.State = WikiCapitals.State;

 

Manifold processes the query as before, but this time after finding the first match to Alabama in the WikiCapitals table, with Montgomery as the capital, the system also finds a second match to Alabama and constructs a second record result with Frank Zappa as the capital.

 

Venn diagrams are a simplification - Once we get to know joins beyond the basics we realize that Venn diagrams can logically represent only fairly simple joins with simple results.  An SQL master could immediately quote examples where Venn diagrams are not rigorously accurate.   That is OK, because by the time we know enough about joins to realize that Venn diagrams are a beginner's simplification we will no longer be beginners.  The job of Venn diagrams is to help us get past the initial brain freeze that prevents beginners from realizing the basic meanings of INNER, LEFT, RIGHT and FULL.

 

Venn diagrams are not usually used to illustrate what a CROSS JOIN does because that is simply just all combinations from both tables.   If a cross join has a WHERE clause, that WHERE condition applies after the JOIN and works just like a WHERE clause for SELECT or any other statement: records for which the WHERE condition is not true get eliminated.   But in the case of inner and outer joins Venn diagrams are useful for many even though they are rightfully criticized by experts.

 

An appealing and logically rigorous alternative to the use of Venn diagrams to explain inner and outer joins was posted here by Martin Smith in stackoverflow.com.  As Martin explains in his post (with great examples and illustrations), we can understand everything that inner and outer joins do by imagining a simple, three step process, which can be paraphrased as follows:

 

  1. Imagine a CROSS JOIN between the left table and the right table.

  2. For each row in that results table, evaluate the ON condition.   Remove rows for which the ON condition is false.

  3. If this is an outer join, put back the corresponding rows (from the left table, the right table, or both of them)  that were taken out.  

 

To follow that three step process in the case of a LEFT JOIN, step one is to create a CROSS JOIN results table.  Step two is to evaluate the ON condition and remove all the rows for which that condition is not true.  Step three is to put back any rows from the left table that were removed in the prior step (the left table since this is a LEFT JOIN).   Done.

 

Credits - The Parts and Products tables in this example along with sample queries are based on the excellent discussions and examples provided by IBM in public tutorials for the free edition of DB2.  They appear in this topic to demonstrate Manifold interoperability with IBM's free DB2 Express-C product as well as with other IBM DB2 products.  The examples as written can be used either with Manifold standalone or with Manifold in a connection to a DB2 data source.

 

IBM's DB2 sometimes does not get the visibility that other DBMS products may receive, but DB2 is indisputably, easily, hands down one of the world's finest implementations of SQL, from the company that invented SQL.  It is a credit to Manifold that queries written for such a fine and sophisticated SQL will usually work unmodified in Manifold.  Manifold is agnostic about databases, and part of that agnosticism means not overlooking any opportunities.  To broaden the horizon a bit by recommending fine products like DB2, topics like this one therefore will at times use DB2 example data and queries.

 

Videos

Find Percentages of Open Space in ZIP Code Areas  -  Uses JOIN:  Given a layer of polygons representing ZIP codes and a layer of polygons showing open spaces like parks and green spaces, find the percentage of open space in each ZIP code area.   This video shows how to do that start to finish in a few simple steps, from initial importing of shape files to final results, in just five minutes, with an additional six minutes of explanation what each step does.  Works in Manifold Release 9 or using the free Manifold Viewer.

 

See Also

Tables

 

Queries

 

Edit - Join

 

Command Window

 

Command Window - Query Builder

 

SQL

 

SQL Statements

 

SQL Functions

 

SQL Operators

 

Editable Results Tables

 

Example: Use Edit - Join to Map a Pandemic - Creating a data source for a CSV web server, we acquire current data on cases and deaths worldwide for the Covid-19 pandemic.  We then use the point and click Edit - Join dialog to rapidly join that data, automatically aggregated as we desire, into a world map for visualization.

 

Example: Create a Map Showing OSM Use by Country - A start-to-finish real life example of map creation that combines various Manifold capabilities, including use of Edit - Join.  Copying a table of numbers from a web site, we create a map that is thematically colored to show usage of OpenStreetMap by country in proportion to the population of that country.

 

Example: Trace Vector Areas from Raster Pixels - This example follows the Trace Vector Areas from Raster Pixels video on the Gallery page.   We use the Trace Areas template in the Transform pane for images to create a drawing with vector areas covering regions of similarly-colored pixels.  Next, we use a simple query to add classification codes from a USGS table of classes to the resulting drawing, using a simple INNER JOIN SQL statement.

 

Example: Import E00 and Rearrange - An intricate example showing how to use Manifold tools to adapt legacy data imported from E00 format into useful, modern form.

 

SQL Example: Extract Airport Runways from an OpenStreetMap PBF - We write a simple SQL query using INNER JOIN to extract runway lines from an OpenStreetMap PBF of Cyprus, and to save those lines to a new drawing and table.