Queries are written in SQL. They are
used to manipulate data and projects, for example, creating new components,
altering the structure of databases and tables and editing, analyzing
and extracting subsets of data. In Manifold we write and edit
queries using the Command
Window for SQL.
Queries are often used to create results tables that on the fly extract, transform, recombine and otherwise use data from tables in our project or from data sources outside of the project. For example, a query might take tiles from one or more tables and manipulate them to create a results table with transformed tiles. We can use such results tables just as if they were tables in our project, for example, creating an image from a results table of transformed tiles, or creating a drawing from a results table of geoms.
Running a query:
Run the query.
Choose Edit - Export Results.
Browse to the folder desired, specify a file name and the file type.
Press Save.
See the Example: Export the Results of a Query topic for an example
Important: Exports to DBF, HTML, MDB and XLS types require launching Manifold in 32 bit mode due to Microsoft limitations.
Saving a query:
SQL is case-insensitive for names and commands. SELECT [NaMe] is the same as select [name]. Using all capitals for SQL words helps legibility.
Square brackets [ ] or reverse quotes ` ` enclose names.
Names which are unambiguous (do not contain spaces or key words, etc.) do not need brackets. SELECT name FROM people; but SELECT name FROM [people 2]; If there is a field called Collect in the People table: SELECT name, [collect] from people; It is legal, but not a good idea, to name fields the same as SQL key words.
The semicolon ; at the end of an SQL command is optional, but is good form for legibility.
The prefix mfd_ is reserved for system use. Do not name any fields, tables, indexes, constraints, etc., using a name that begins with mfd_.
SQL ignores white space such as newlines, tabs, etc., so we can format our SQL text as desired for beauty and legibility.
( ) parentheses are used for grouping language constructs, not for forcing order of precedence. Order of precedence is set by the basic syntax and rules of SQL. For example, in a construction such as SPLIT (COLLECT ...) the parentheses are part of the SPLIT construct and operate to cast the COLLECT table expression as a value, not to specify whether SPLIT or COLLECT have precedence.
Refer to fields in a table using dot . nomenclature: [States].[Population] means the Population field within the States table. [States].[Capital] means the Capital field within the States table.
Basic SQL is very simple yet very powerful. Read an introductory SQL text to learn the basics. SQL in Manifold products, like most professional SQLs, complies with the usual SQL standards but also provides additional capabilities. Review the SQL Statements and SQL Operators topics for specific additions.
A big part of the power of SQL in Manifold is the availability of very many SQL Functions that can be called to accomplish specific tasks. Review the SQL Functions topic for a list of useful and powerful functions that can be invoked within queries.
Study Examples topics in this documentation that use queries or snippets of SQL in dialogs such as the Expression tab of the Transform dialog.
A great way to learn SQL within Manifold is to set up a task using a transform within the Transform dialog and then pressing the Edit Query button to see what SQL the dialog writes to accomplish the task. See the SQL Example: Learning to Union Areas in SQL from Edit Query topic for an example.
The Manifold query engine has some distinctive strengths, including:
CPU Parallelization - Manifold automatically runs parallel for internal Manifold tasks and for transforms and similar operations, using all of the CPU cores available in your system in parallel. When writing queries manually using the Command Window make sure to add a THREADS SystemCpuCount() command to the query to automatically parallelize the query to use all CPU cores in your system.
Simultaneous use of multiple query engines - Most database systems that allow queries using SQL have their own query engine that understands and executes the specific SQL syntax supported by that system. For example, Microsoft's SQL Server's query engine will execute SQL written in SQL Server's T-SQL syntax. Manifold also includes its own full-featured query engine to understand, optimize and execute queries written in Manifold SQL. But in addition, unlike most database systems, Manifold also allows projects to contain queries that Manifold will dispatch for execution within the query engines of other systems.
Queries are OK wherever Tables are OK - Manifold can utilize a query component in all places where one could use a table component. However, recursive queries are not allowed. We can write a Query A that references Query B and at the same time write Query B so that it references Query A, but running either A or B will fail with an error message. See examples such as the SQL Example: GeomOverlayTopologyUnion Function topic for examples of how queries can be used as tables for the dynamic creation of drawings.
Editable Results Tables - In most other database systems we normally think of results tables produced by an SQL query as read only, but in Manifold results tables are often editable. Manifold exposes indexes from source tables in results tables, so if the results table includes an indexed field, as is often the case with a SELECT or a JOIN statement, we can edit that results table and the edits made to field values automatically will be written back into the source tables. See the Editable Results Tables topic for details.
Multiple statements allowed - A query can contain multiple SQL statements, each terminated by a semicolon ; character. The result of a query with multiple statements is the result of the last statement.
Call any query with EXECUTE - Any query, including action queries, can be called using the EXECUTE statement.
Parameter queries - Write a query that uses parameters that will be passed to it, and then call that query from another query and pass the desired parameters. Even tables can be passed as parameters. See the EXECUTE topic for examples.
Includes - Queries can include other queries via the $include$ preprocessor directive.
Functions - The Manifold query engine supports declaring and calling functions, including script functions, functions in external files and compiled functions as .NET assemblies.
Automatically written SQL - A great way to learn SQL with Manifold is to use the Transform dialog's Edit Query button. Transform templates are implemented by the dialog using SQL. Pressing the Edit Query button launches a Command Window loaded with the SQL that would be used for that template with the options and values specified. That is a great way to learn all the details of how a particular operation can be accomplished in real life, by seeing what the Transform template would do for a similar task and then adapting the SQL Manifold writes for our specific requirements.
Threading without locks - The query engine uses a new threading model which allows most nodes to perform without taking any locks whatsoever.
Compile Once, Run Many Times - Inline queries run by EXECUTE and script functions are compiled once per initialization of the query and ran as many times as necessary thereafter.
Multithreaded SELECT with GROUP BY and Aggregates - The SELECT statement supports use of the GROUP BY clause and aggregates. The body of SELECT can be multithreaded with threading controlled as usual by THREADS and BATCH options. GROUP BY criteria can be either fields or expressions.
When Manifold writes SQL, for example,
as generated by the Transform
template's Edit Query button,
by default the query automatically will include a THREADS
SystemCpuCount() command that parallelizes the query. The
SystemCpuCount() returns the number
of CPU cores in the system, as reported by Windows. That provides
a total number of cores whether they are all on a single CPU or whether
the number of cores reported is the sum of cores for more than one CPU
if we are using a multiple-CPU motherboard. The number of
hypercores will be counted by Windows as cores if hyperthreading is enabled
(the default in most systems).
Important: When manually writing queries in the Command Window we must add the THREADS SystemCpuCount() command to the queries we write to parallelize them. If we do not add a THREADS command specifying the number of threads we want, the query will run single-threaded, that is, non-parallel, resulting in significantly slower performance in many cases.
A typical situation is running an Intel Core i7 with eight hypercores. SystemCpuCount() will return a value of 8 so the command becomes THREADS 8 and the query will be automatically parallelized to run in eight threads. If we do not want to use all cores, we can specify the number of threads, and thus cores, to use. For example, we could write THREADS 4 to use only four threads.
If a query includes multiple statements, all of them are run, and the result of the query is that of last statement. Some examples:
If a query includes several SELECTs the query engine will compute all of them one by one (or may optimize to not compute any of the SELECTs except the last one) without fetching any records from the resulting tables and will discard all but the last table.
If a query includes a SELECT followed by an action query the query engine will run the SELECT, will ignore the resulting table and will finally run the action query, showing the result returned by the action.
If a query includes an action query followed by a SELECT the query engine will run the action, which might alter some data, will ignore the result and then finally will run the SELECT and report the result.
Manifold can utilize non-Manifold SQL engines to execute a query. If a data source in a project has its own query engine, for example, a data source created using Oracle DBMS, Manifold can execute queries using that data source's query engine.
There are two ways Manifold knows which query engine to use:
The context of the query or command window - If a query is created within the hierarchy of a data source that has its own SQL engine, when that query is run the data source's native SQL engine will be used. We can launch a Command Window using a data source's native SQL engine by right-clicking on the data source and choosing New Command Window from the context menu. If a Command Window is launched or a query is created outside the hierarchy of a data source that has its own SQL engine then the Manifold SQL engine will be used.
The use of a specific directive - Including the directive $manifold$ in comments prior to the first statement instructs Manifold to use the Manifold SQL engine regardless of the context of the query or command window. Comments in Manifold begin with two hyphens in sequence, -- . We can also use the EXECUTE statement with the ON option to specify execution by a particular data source's query engine.
Example: We right-click within a SQL Server data source in the Project pane and create the query:
SELECT * FROM dbo.t ;
When executed, that query will run in SQL Server using the SQL Server query engine with the results passed to Manifold. If we change the query to be:
-- $manifold$
SELECT * FROM dbo.t ;
That query will run in Manifold using the Manifold engine, pulling data into Manifold from SQL Server as necessary from the dbo.t table.
When a query uses a data source's engine the entire query is sent to the data source for execution entirely within the data source along with a request from Manifold for the results. That is a good strategy in cases where the data source may not be local, the connection to the data source is not fast, or the data source hosts a lot of data and the results from the query are expected to be small. For example, using a data source's query engine is a good idea if many tables involving millions of records are involved with the results expected to be a few hundred records. The data source can do the work of going through a lot of data and then send through the slower connection link back to Manifold a relatively small amount of data.
There are some limitations on queries run within data sources on their native, non-Manifold query engines. Such engines are not Manifold so they can not use features specific to Manifold such as functions and data types specific to Manifold, enhance analytic performance via GPGPU, use virtual tables added by dataports (such as Manifold Enterprise Storage), virtual system tables like mfd_root and mfd_meta and so on.
When a query uses the Manifold engine but involves data from a remote data source the Manifold engine compiles the query into Manifold structures and then lets these structures fetch data from the involved data sources doing analysis within Manifold. In some cases of data sources well-known to Manifold, the internal Manifold optimizer may send parts of the query out to the data source for execution on the data source if Manifold believes that will be more efficient. Using the Manifold engine allows using functions and data types specific to Manifold, combining data from multiple data sources of different types and other Manifold-specific features.
The EXECUTE statement allows queries to call other queries, including specification of parameter values using WITH either using queries stored as query components or as inline queries within the EXECUTE statement set off by double square [[ ]] brackets. Using EXECUTE, queries can pass table parameters to each other. See the EXECUTE topic.
Queries can include other queries via the $include$ preprocessor directive. The format of the directive is
$include$ <query>
Includes have the following characteristics:
All preprocessor directives must appear before the first statement of the query.
The included query can be in a different data source. This allows creating libraries of functions.
Includes can be nested. The query engine detects circular includes automatically: If query A includes query B, and query B includes query A, running either query will fail.
"Running" a comment with the $include$ directive in a command window works.
In the Project pane create a query called Lib, double-click the query to open it in the Command Window and enter the following text into the query:
FUNCTION Rad2Deg(d FLOAT64) FLOAT64 AS
d * 180 / Pi
END;
FUNCTION Deg2Rad(d FLOAT64) FLOAT64 AS
d * Pi / 180
END;
Close the Command Window to save the query. We can now use the functions defined in the Lib query in other queries after including Lib via the $include$ directive. For example, we could create a query in the Command Window such as:
-- $include$ [Lib]
SELECT value, Deg2Rad(value), Rad2Deg(Deg2Rad(value))
FROM CALL ValueSequence(0, 360, 10);
The above could be run directly from the Command Window or within a query component. Once we run the $include$ preprocessor directive in the Command Window to include a query such as Lib we can thereafter launch additional queries in the same Command Window session that reference functions in Lib without the need to each time re-run the $include$ directive, since each session in the Command Window preserves state.
If we would like to use included queries when working within a data source that has its own query engine, such as SQL Server, we must also add the $manifold$ directive to queries that use $include$ so that such queries will be processed by the Manifold engine.
There is nothing significant about the name Lib in the example above. We could name the included query whatever we want. "Lib" is just a more descriptive name than "Ringo" or "George."
The same as EXECUTE <query> or SELECT * FROM <query> will run the referenced query, -- $include$ <query> in essence will also run the query as well. There are two important differences how the include runs the query compared to how EXECUTE or SELECT run the query:
First, EXECUTE and SELECT run the query as a single statement while $include$ runs the query statements individually. Because EXECUTE and SELECT run the query as a single statement the query cannot create components and then refer to them.
Example: Suppose we have a query named q and the text of the query is
CREATE TABLE t <...>; SELECT * FROM t;
If we now use query q in a SELECT statement such as
SELECT * FROM [q];
The SELECT must determine what fields will result from the query q. But the result of q is produced by the second statement within the query q, a SELECT statement, and that statement cannot know what the result is going to be before the first statement in q, a CREATE TABLE statement, is executed. Running the statement...
SELECT * FROM [q];
...requires knowing what results q produces before either of the two statements in q are executed. Since that is not possible statements of the form SELECT * FROM [q] will fail.
In contrast, $include$ does not have that limitation because $include$ runs the query statements individually. Included queries can create components and refer to them, for example, as q does in the example above. The first statement in q is run individually to create table t and then the second statement in q is run individually to select fields from the newly-created table t.
Second Difference
Second, EXECUTE and SELECT run the referenced query in a separate scope while $include$ does not. The separate scope used by EXECUTE and SELECT creates a barrier which protects the referencing query and the referenced query from each other. For example, if the referenced query defines a function that function will only be visible within the referenced query. It will not leak into the referencing query and thus cannot potentially overwrite a different function with the same name in the referencing query.
$include$ is different in that included statements do not run in a separate scope. The entire point of $include$ is to interpret the statements of the included query as if they were part of the including query. Therefore, if the included query defines a function that function will be visible to the including query.
The Manifold query engine uses a new threading model which allows most nodes to perform without taking any locks whatsoever. Whenever the query engine decides to create a new thread, either because it sees a THREADS clause or for other reasons, it performs a short setup phase and then lets the thread go without competing with other threads for resources.
If the query nodes used by the thread do not use any external resources, for example, if they do not run queries on any external data sources via EXECUTE or if they do not run scripts and so on, the setup phase itself also does not take any locks. Otherwise, the setup phase will take a lock but only for the duration of the setup. The duration of the setup phase is usually only a short period at the beginning of the statement.
Tech Tip: In queries use THREADS to make full use of processor cores available.
The THREADS command takes a value for the number of threads to use. For example, if we know we have six CPU cores but we only want to use four threads we could write...
THREADS 4
To automatically use however many CPUs we have available we can add...
THREADS SystemCpuCount() BATCH 1
...to the end of a query we are telling Manifold to see how many CPUs are available, the result of SystemCpuCount(), and to use that many threads. The result can be dramatic, literally running a query several times faster than without launching multiple threads.
Inline queries run by EXECUTE are compiled once during initialization of the query and then are run as many times as necessary thereafter without re-compilation, even when run with different parameter values.
EXECUTE called in the context of a threaded statement will create and compile multiple external commands which will run in parallel without interfering with each other.
Either an external server that ultimately runs these commands or a client-side driver that is used to connect the server might do their own locking, but that is an obvious performance bottleneck so most modern servers and drivers minimize such locking, especially in the case of read-only access.
Script functions also are compiled once during initialization of the query and then are run as many times as necessary thereafter without re-compilation.
A script function called in the context of a threaded statement will create and compile multiple instances of itself which will run in parallel to each other. If desired, we can turn off compilation and execution of such multiple instances using THREADS 1.
One example scenario when we might want to use THREADS 1 to prevent a script function from compiling multiple instances of itself to run in parallel is when we want to maintain a modifiable state, for example, to return a new value in every consecutive call to a function, and we do not want to synchronize access to that state from multiple instances of the script.
Square brackets or reverse quotes are OK - Manifold standard style is to enclose identifiers in square brackets, as in SELECT [name], [address] FROM [contacts]; To provide better compatibility with ODBC the query engine also allows enclosing identifiers in reverse quote ` characters. ODBC as a technology wants to be able to enclose identifiers using the same character on both sides. A construction such as [name] uses two different characters, a left square bracket [ and a right square bracket ] while `name` uses the same character, a reverse quote ` character on both sides.
Brackets and Parentheses - [ ] square brackets or ` ` reverse quotes are used for names. ( ) parentheses are used for grouping language constructs, not as a general purpose way of indicating execution precedence. { } curly brackets are not used in Manifold SQL but often appear in various other usages such as JSON. [[ ]] double square brackets act as fences to isolate what is within. For example, when inline queries are enclosed within double square brackets [[ ]] the use of double square brackets removes the need to escape single quotes within the inline query. @ @ brackets are used to enclose parameter names inside inline queries that are going to be run on a non-Manifold query engine.
Changing queries - When changing a query called by another query, run the changed query at least once or save the project so that the new form gets propagated to all uses. Otherwise other queries that call the changed query may be using an older compiled form from cache.
Character Limitation - Queries and other text windows such as comments are limited to 32 MB of characters, a limitation that also applies to ad-hoc queries or scripts in the Command window. Queries that utilize anywhere near thirty two million characters should probably be re-written in more modular form in any event.
Indexes in projections - Table projections in queries preserve indexes built on output fields, which greatly helps performance. For example, the mfd_meta table contains three indexes:
mfd_id_x - Built on mfd_id
Name_Property_x - Built on Name + Property
Property_Name_x - Built on Property + Name
Reducing mfd_meta to just Name and Property using mfd_meta (Name, Property) will return a table with both the second and the third indexes listed above as well as the two fields.
Indexes with duplicates - Data sources for remote databases support indexes with duplicates (with or without NULLs).
NULLs and DISTINCT - The DISTINCT clause is one of those places where the notion of a NULL as a generic unknown value meets established practice: instead of treating all NULL values as different from all other values, including other NULL values, DISTINCT treats them as the same value. For example, using DISTINCT on a table with a single field will combine all NULL values into a single record.
AS is not Optional - Silent renaming, that is implicit renaming, of fields is not allowed. We must use AS to rename a field. For example, not allowed is
SELECT a b from c;
Instead, we must write...
SELECT a AS b from c;
Using AS explicitly avoids situations where silent renaming is ambiguous.
Tables and Nested SELECT - The Manifold query engine allows replacing a table with a nested SELECT anywhere, without restrictions.
OFFSET and FETCH - SELECT supports the use of OFFSET and FETCH clauses to specify records returned by a SELECT query. If OFFSET is negative or zero it is ignored. If FETCH is negative it is ignored.
OFFSET returns those records after the given number of records For example,
SELECT * FROM [TableName] OFFSET 5;
...returns all records in the table after the first five records, similar to how the more or less equivalent SKIP quantifier found in some other query engines (such as that in Manifold 8) works.
FETCH returns only the given number of records For example,
SELECT * FROM [TableName] FETCH 5;
...returns the first five records in the table, similar to how the more or less equivalent TOP quantifier found in some other query engines (such as that in Manifold 8) works.
OFFSET and FETCH are typically used with an ORDER BY clause, appearing after the ORDER BY clause, to choose which records to return. Using a combination of OFFSET and FETCH provides a way to "page" through query records. We can specify either or both. For example,
SELECT * FROM [TableName] FETCH 5 OFFSET 5;
...returns five records, starting with the sixth record that is returned.
Progress Display - Running a query from the Project pane or any other window or dialog runs the query in a background thread and displays a progress dialog. INSERT / DELETE / UPDATE and similar query statements track the number of processed records / size of processed values and will update progress status. If the number of records is known beforehand, the statements will also advance the progress bar. The progress display can be controlled with pragmas.
Cancel is not an Undo - A running query may be cancelled from the progress display. This allows termination of a long-running operation if we change our minds about allowing it to run to completion. Important: cancelling a running query does not undo what the query has done up to the moment of cancellation. For example, if we execute an UPDATE query that alters a table in a remote data source to which we have connected over an insufferably slow data link, we may lose patience with the slowness of the link and choose to cancel the operation. If we do that then any records which have been updated in the table before we cancel will not be "unupdated".
Date Literals - Date literals are taken in 24 hour time, where 8:00 PM is 20:00 and 8:00 AM is 08:00. AM or PM modifiers are understood within literals. Time portions are hours, minutes, seconds with milliseconds appended after a dot. In the example below the time specified is 1 minute 15 seconds and 126 milliseconds after 12 noon.
? DateTimeHour(#01/21/2017 12:05:15#)
? DateTimeHour(#01/21/2017 08:05:15 PM#)
? DateTimeHour(#01/21/2017 08:05:15 AM#)
? DateTimeHour(#01/21/2017 20:05:15#)
? DateTimeMillisecond(#01/21/2017 12:01:15.126#)
Command Window - Query Builder
SQL Example: Force an Anomaly in Constraints - Constraints are only evaluated when we insert or update records. If a constraint refers to external data, such as the values in a different table, we can force an anomalous condition where the table with the constraint may contain data that no longer meets the requirements of the constraint.
Example: Create and Run a Query - See how the different parts of a command window operate when creating and running SQL queries. Includes use of the Log tab as well as the ?expression and !fullfetch commands.
Example: Export the Results of a Query - How to export the results table of a query for use in other applications.
Example: Refer to a Table within a Data Source - Within a query, how to refer to a table that is in a data source.
Example: Switching between Manifold and Native Query Engines - How to use the !manifold and !native commands to switch a query in the Command window from use the Manifold query engine to whatever query engine is provided by a data source.
Example: Automatically Generating CREATE Queries - How to use the Command window to automatically generate SQL in the form of CREATE queries that create a desired component.
Example: Run JavaScript in the Command Window - How to run a simple V8 JavaScript script in the Command window.
SQL Example: Learning to Union Areas in SQL from Edit Query - We learn how to write an SQL query that does a custom Union Areas operation by cutting and pasting from what the Edit Query button automatically generates.
Example: Transform Templates, Expressions and Queries - We learn to use a function by clicking on a template in the Transform dialog, seeing what it does in a preview, looking at the query Manifold creates and then trying out the function in the Expression tab.