A qualifier that filters SELECT results to omit records that contain duplicate data in the selected fields, leaving only one of each instance.
SELECT DISTINCT <field>/SPLIT <query>, ... FROM <table>
DISTINCT is supported by SELECT, UNION, EXCEPT, INTERSECT and COLLECT. All NULL values are treated as equal to each other. That is logically a departure from Manifold's normal treatment of NULLs as unknown, but it is the behavior specified by the SQL standard and implemented by other database systems. The output of DISTINCT is unordered.
Suppose we have a table called Expenses. For each record the table has three fields: Vendor, Job and Payment. The table records the amount of payments made to different vendors for various categories of jobs in a construction project, such as carpentry (wood work) or masonry (work with brick and stone). Categories typically have more than one vendor who provides that job.
SELECT [Job] FROM [Expenses];
A basic SELECT with no qualifier returns the Job value for all records, returning many duplicates.
SELECT DISTINCT [Job] FROM [Expenses];
Adding the DISTINCT qualifier to the query returns one record for each unique value that is found within the Job field, returning no rows with duplicated values in the Job field that was selected.
Suppose we SELECT more than one field?
SELECT [Vendor], [Job] FROM [Expenses];
In our example Expenses table none of the Vendor names are duplicated. A SELECT query that returns records with two fields, the Vendor and Job field, will return a table with no duplicated records.
SELECT DISTINCT [Vendor], [Job] FROM [Expenses];
If we add the DISTINCT qualifier to the query the same set of records is returned. There are no duplicates to begin with so using DISTINCT has no duplicates to filter out. Note that the order of records returned may be different whether DISTINCT is used or not, since sets of records in SQL are unordered by default.
If we want to see records in order we use ORDER BY to order the results table:
SELECT [Vendor], [Job] FROM [Expenses]
ORDER BY [VENDOR];
... and ...
SELECT DISTINCT [Vendor], [Job] FROM [Expenses]
ORDER BY [VENDOR];
Using ORDER BY to order the results table we can see that both with and without DISTINCT the results are the same, because all records returned by the SELECT are unique with no duplicated records.
NULLs - All NULL values are treated as equal to each other (the behavior prescribed by the standard and implemented in other databases). DISTINCT is one of those places where the notion of a NULL as a generic unknown value meets established practice: instead of treating all NULL values as different from all other values, including other NULL values, DISTINCT treats them as the same value. For example, using DISTINCT on a table with a single field will combine all NULL values into a single record.
Internal optimization - Instead of regular indexes, when processing DISTINCT and UNION / EXCEPT / INTERSECT Manifold utilizes specialized internal data structures which are optimized for separate write and read phases, with reading done without any locking whatsoever.
Order - Some databases implement DISTINCT so the original order of records is preserved, taking the first duplicate of a set, or such that the results table records are sorted as they would be if ORDER were applied on all fields, left to right. Manifold Release 8, for example, does the former. This is purely an implementation detail since DISTINCT does not require the output records to be ordered in any way; DISTINCT merely requires that the results table records contain no duplicates. The Manifold query engine makes use of that and, for maximum performance, simply removes duplicates without enforcing any further ordering for performance. The bottom line is that the output of DISTINCT is unordered. This applies to all cases where DISTINCT is used such as within SELECT, UNION / EXCEPT / INTERSECT, and COLLECT.
Long values are OK - Many databases do not allow using DISTINCT or ORDER with long values such as LOBs ("LOB" = "large object") and / or values of complex / user-defined types such as geometry. In databases with such limits that carries over to UNION / EXCEPT / INTERSECT, including EXCEPT ALL and INTERSECT ALL, if they are implemented. The Manifold query engine does not have such limitations. We can use any type we want, using binary order if there is no sensible type-specific order that otherwise could be used.
Exclude primary keys to avoid surprises - If we know that some of the rows in a table contain duplicate entries it is easy to be surprised when SELECT * FROM <table> returns the same results table as SELECT DISTINCT * FROM <table>. That will always happen if the primary key is an mfd_id field, which is guaranteed unique for all records in the table. To get distinct rows when duplicates occur in all of the fields in the SELECT list, we should not include the mfd_id field in the SELECT list.
ALL and DISTINCT - Manifold does not use the ALL keyword with SELECT. ALL is used in some SQL implementations within SELECT to get all result records. But that is what SELECT does by default anyway, so there would be no difference between SELECT <field> and SELECT ALL <field>. If we do not explicitly use the DISTINCT keyword, we get all result records anyway with no need to explicitly write ALL.
In database packages that utilize ALL and DISTINCT keywords, ALL and DISTINCT are opposites, and generally, where one is applicable, the other is applicable as well. With most databases ALL or DISTINCT can be used within SELECT, UNION / EXCEPT / INTERSECT, and aggregates. In the case of Manifold, within the COLLECT aggregate.
Some databases always allow using either ALL or DISTINCT in all places. That can lead to conceptual difficulties in that the defaults are different: a plain SELECT implies ALL, a plain UNION / EXCEPT / INTERSECT implies DISTINCT, and a plain aggregate implies ALL again. Adding to complexities that must be kept in mind, some databases, most notably Oracle, do not allow using ALL with EXCEPT and INTERSECT but only with UNION. What other databases refer to as EXCEPT, Oracle calls MINUS.
Manifold's approach to the above is to (a) implement both ALL and DISTINCT logic for each of the above cases internally, (b) to preserve the uneven defaults, according to the standard and to accepted practice:
By default, SELECT implies ALL records so it defaults to ALL and there is no ALL keyword for SELECT. If we want something other than the default ALL behavior with SELECT there is a DISTINCT keyword for SELECT.
UNION / EXCEPT / INTERSECT default to DISTINCT so there is no DISTINCT keyword for UNION / EXCEPT / INTERSECT. If we want ALL records instead of the default DISTINCT behavior there is an ALL keyword for UNION / EXCEPT / INTERSECT.
The COLLECT aggregate defaults to ALL, so there is no ALL keyword for COLLECT. If we want distinct records within a COLLECT instead of the default ALL behavior, there is a DISTINCT keyword for COLLECT.
Trying to use an explicit ALL or DISTINCT keyword where that is the default behavior, for example, trying to SELECT ALL, causes the query to fail. That makes it clear that SELECT ALL is the default.
UNION / EXCEPT / INTERSECT