This topic provides notes on various SQL clauses, typically those used with SELECT, EXECUTE and VALUES.
EXCEPT - Applies to SELECT, EXECUTE and VALUES and includes an ALL option. Providing an ALL option is non-standard but ALL is used by some databases such as PostgreSQL and can be convenient. With ALL, EXCEPT returns all records found on the left side of the operator but not on the right side of the operator, however many times they might appear on the left side. Without ALL, EXCEPT does the same but filters out duplicate records.
HAVING - Supported by SELECT with GROUP. The condition in HAVING is evaluated in the same context as fields in the FROM list. It can (and normally does) use aggregates. Using HAVING with a constant filter only evaluates the filter once. "Constant" means "not changing for the duration of the corresponding GROUP" so therefore a "constant" filter can be a significantly complex expression in the case of nested queries, frequently involving fields from outer queries as well as fields from the inner query and just not those by which the GROUP corresponding to the HAVING iterates. If the filter includes both constant and non-constant parts HAVING will evaluate the constant parts only once.
INTERSECT- Applies to SELECT, EXECUTE and VALUES and includes an ALL option. Providing an ALL option is non-standard but ALL is used by some databases. With ALL, INTERSECT returns all records found on the left side of the operator which are also found on the right side of the operator, however many times they might appear on the left side. Without ALL, INTERSECT does the same but filters out duplicate records.
OFFSET - Applies to SELECT, EXECUTE and VALUES.
ORDER - Applies to SELECT, EXECUTE and VALUES.
UNION - Applies to SELECT, EXECUTE and VALUES and includes an ALL option. With ALL, UNION returns all records found on either side of the operator. Without ALL, UNION does the same, but filters out duplicate records in a way similar to DISTINCT but utilizing a more efficient algorithm.
Data structures - DISTINCT, UNION, EXCEPT and INTERSECT utilize specialized data structures instead of regular indexes that are optimized for separate write and read phases. This enables reading to be done without any locking whatsoever. As a result, any subquery which uses any of the involved constructs can perform better if it is read multiple times and from more than one thread.
Priority - INTERSECT has priority over UNION and EXCEPT. This the priority specified by the SQL standard even though many database systems do not honor the standard but currently treat all operators as having the same priority.
DISTINCT is unordered - Some database systems (including Manifold 8) implement DISTINCT so the original order of records is preserved taking the first duplicate out of a set. Some implement DISTINCT so output records are sorted as they would be if we applied ORDER on all fields, left to right. However, the standard does not require output records from DISTINCT to be ordered in any way. It only requires that the output records contain no duplicates. Manifold removes duplicates without enforcing any further ordering. Therefore, the output of DISTINCT is unordered. This applies to all cases where DISTINCT is used - SELECT, UNION / EXCEPT / INTERSECT, COLLECT.
DISTINCT or ORDER work with large objects - An LOB in some database systems is a large object such as a long value. Many database systems do not allow using DISTINCT or ORDER with long values such as LOBs or with values of complex types or with user-defined types such as geometry. Such limitations often carry over to UNION, EXCEPT, and INTERSECT, including EXCEPT ALL and INTERSECT ALL, if they are implemented in the database system. Manifold does not have such limitations. We can use any type we want, using binary order if there is no sensible type-specific order.
ALL vs DISTINCT - ALL and DISTINCT are opposites. Generally, where one is applicable the other is applicable as well. The primary clauses where we could use ALL or DISTINCT are SELECT, UNION, EXCEPT, INTERSECT and aggregates plus in Manifold usage, COLLECT. Usage of ALL and DISTINCT is not always consistent within different database systems.
Some databases always allow using either in all places. That conflicts, however, with different default expectations: for example, a plain SELECT implies ALL and a plain aggregate implies all but a plain UNION or EXCEPT or INTERSECT implies DISTINCT.
Other databases, most notably Oracle, do not allow using ALL with EXCEPT (The Oracle nomenclature for EXCEPT is "MINUS") and INTERSECT but only allow use of ALL with UNION.
Manifold's approach to sorting out such possible inconsistencies is:
(a) Implement both ALL and DISTINCT logic for each of the above cases in code.
(b) Preserve uneven defaults according to the standard and to accepted practice so that SELECT and COLLECT default to ALL while UNION, EXCEPT and INTERSECT default to DISTINCT.
(c) Disallow repeating the default explicitly, that is, only allow writing a change from the default. For example, Manifold accepts SELECT and SELECT DISTINCT but to fails on SELECT ALL, thus making it clear that SELECT ALL is the default.