Queries

icon_query.pngQueries are written in SQL.   They are used to manipulate data and projects, for example, creating new components, altering the structure of databases and tables and editing, analyzing and extracting subsets of data.   In Manifold we write and edit queries using the Command Window for SQL.    

 

Queries are often used to create results tables that on the fly extract, transform, recombine and otherwise use data from tables in our project or from data sources outside of the project.   For example, a query might take tiles from one or more tables and manipulate them to create a results table with transformed tiles.   We can use such results tables just as if they were tables in our project, for example, creating an image from a results table of transformed tiles, or creating a drawing from a results table of geoms.  

 

i_cmdwind01_01.png

 

To create a query:

 

 

To write a temporary query:

 

 

Running a query:

 

 

Export the results of a query:

 

 

Saving a query:

 

 

Using the Query Builder:

 

 

Literals:

 

 

Writing Queries:

 

 

Learning Resources:

 

Highlights

The Manifold query engine has some distinctive strengths, including:

 

CPU Parallelization - Manifold automatically runs parallel for internal Manifold tasks and for transforms and similar operations, using all of the CPU cores available in your system in parallel.   When writing queries manually using the Command Window make sure to add a THREADS SystemCpuCount() command to the query to automatically parallelize the query to use all CPU cores in your system.

 

Simultaneous use of multiple query engines - Most database systems that allow queries using SQL have their own query engine that understands and executes the specific SQL syntax supported by that system.  For example, Microsoft's SQL Server's query engine will execute SQL written in SQL Server's T-SQL syntax.   Manifold also includes its own full-featured query engine to understand, optimize and execute queries written in Manifold SQL.   But in addition, unlike most database systems, Manifold also allows projects to contain queries that Manifold will dispatch for execution within the query engines of other systems.

 

Queries are OK wherever Tables are OK - Manifold can utilize a query component in all places where one could use a table component.  However, recursive queries are not allowed.  We can write a Query A that references Query B and at the same time write Query B so that it references Query A, but running either A or B will fail with an error message.   See examples such as the SQL Example: GeomOverlayTopologyUnion Function topic for examples of how queries can be used as tables for the dynamic creation of drawings.

 

Editable Results Tables - In most other database systems we normally think of results tables produced by an SQL query as read only, but in Manifold results tables are often editable.    Manifold exposes indexes from source tables in results tables, so if the results table includes an indexed field, as is often the case with a SELECT or a JOIN statement, we can edit that results table and the edits made to field values automatically will be written back into the source tables.   See the Editable Results Tables  topic for details.

 

Multiple statements allowed - A query can contain multiple SQL statements, each terminated by a semicolon ; character.   The result of a query with multiple statements is the result of the last statement.

 

Call any query with EXECUTE - Any query, including action queries, can be called using the EXECUTE statement.

 

Parameter queries - Write a query that uses parameters that will be passed to it, and then call that query from another query and pass the desired parameters. Even tables can be passed as parameters.  See the EXECUTE topic for examples.

 

Includes - Queries can include other queries via the $include$ preprocessor directive.

 

Functions -  The Manifold query engine supports declaring and calling functions, including script functions, functions in external files and compiled functions as .NET assemblies.

 

Automatically written SQL - A great way to learn SQL with Manifold is to use the Transform dialog's Edit Query button.   Transform templates are implemented by the dialog using SQL.  Pressing the Edit Query button launches a Command Window loaded with the SQL that would be used for that template with the options and values specified.  That is a great way to learn all the details of how a particular operation can be accomplished in real life, by seeing what the Transform template would do for a similar task and then adapting the SQL Manifold writes for our specific requirements.

 

Threading without locks - The query engine uses a new threading model which allows most nodes to perform without taking any locks whatsoever.

 

Compile Once, Run Many Times -  Inline queries run by EXECUTE and script functions are compiled once per initialization of the query and ran as many times as necessary thereafter.  

 

Multithreaded SELECT with GROUP BY and Aggregates - The SELECT statement supports use of the GROUP BY clause and aggregates. The body of SELECT can be multithreaded with threading controlled as usual by THREADS and BATCH options.  GROUP BY criteria can be either fields or expressions.

 

More Details

THREADS and CPU Parallelization

When Manifold writes SQL, for example, as generated by the Transform template's Edit Query button, by default the query automatically will include a THREADS SystemCpuCount() command that parallelizes the query.  The SystemCpuCount() returns the number of CPU cores in the system, as reported by Windows.   That provides a total number of cores whether they are all on a single CPU or whether the number of cores reported is the sum of cores for more than one CPU if we are using a multiple-CPU motherboard.   The number of hypercores will be counted by Windows as cores if hyperthreading is enabled (the default in most systems).  tech_ravi_sm.png

 

Important: When manually writing queries in the Command Window we must add the THREADS SystemCpuCount() command to the queries we write to parallelize them.  If we do not add a THREADS command specifying the number of threads we want, the query will run single-threaded, that is, non-parallel, resulting in significantly slower performance in many cases.  

 

A typical situation is running an Intel Core i7 with eight hypercores.   SystemCpuCount() will return a value of 8 so the command becomes THREADS 8 and the query will be automatically parallelized to run in eight threads.   If we do not want to use all cores, we can specify the number of threads, and thus cores, to use.   For example, we could write THREADS 4 to use only four threads.

Multiple Statements in Queries

If a query includes multiple statements, all of them are run, and the result of the query is that of last statement.  Some examples:

 

Running Queries using Different SQL Engines

Manifold can utilize non-Manifold SQL engines to execute a query.   If a data source in a project has its own query engine, for example, a data source created using Oracle DBMS, Manifold can execute queries using that data source's query engine.

 

There are two ways Manifold knows which query engine to use:

 

 

 

Example:  We right-click within a SQL Server data source in the Project pane and create the query:

 

SELECT * FROM dbo.t ;

 

When executed, that query will run in SQL Server using the SQL Server query engine with the results passed to Manifold.   If we change the query to be:

 

-- $manifold$

SELECT * FROM dbo.t ;

 

That query will run in Manifold using the Manifold engine, pulling data into Manifold from SQL Server as necessary from the dbo.t table.

 

When a query uses a data source's engine the entire query is sent to the data source for execution entirely within the data source along with a request from Manifold for the results.  That is a good strategy in cases where the data source may not be local, the connection to the data source is not fast, or the data source hosts a lot of data and the results from the query are expected to be small.  For example, using a data source's query engine is a good idea if many tables involving millions of records are involved with the results expected to be a few hundred records.   The data source can do the work of going through a lot of data and then send through the slower connection link back to Manifold a relatively small amount of data.

 

There are some limitations on queries run within data sources on their native, non-Manifold query engines.   Such engines are not Manifold so they can not use features specific to Manifold such as functions and data types specific to Manifold, enhance analytic performance via GPGPU, use virtual tables added by dataports (such as Manifold Enterprise Storage), virtual system tables like mfd_root and mfd_meta and so on.

 

When a query uses the Manifold engine but involves data from a remote data source the Manifold engine compiles the query into Manifold structures and then lets these structures fetch data from the involved data sources doing analysis within Manifold.  In some cases of data sources well-known to Manifold, the internal Manifold optimizer may send parts of the query out to the data source for execution on the data source if Manifold believes that will be more efficient.  Using the Manifold engine allows using functions and data types specific to Manifold, combining data from multiple data sources of different types and other Manifold-specific features.

EXECUTE Statement

The EXECUTE statement allows queries to call other queries, including specification of parameter values using WITH either using queries stored as query components or as inline queries within the EXECUTE statement set off by double square [[ ]] brackets.  Using EXECUTE, queries can pass table parameters to each other.   See the EXECUTE  topic.

Includes

Queries can include other queries via the $include$ preprocessor directive.  The format of the directive is

 

$include$ <query>

 

Includes have the following characteristics:

 

 

Example

In the Project pane create a query called Lib, double-click the query to open it in the Command Window and enter the following text into the query:

 

FUNCTION Rad2Deg(d FLOAT64) FLOAT64 AS

  d * 180 / Pi

END;

FUNCTION Deg2Rad(d FLOAT64) FLOAT64 AS

  d * Pi / 180

END;

 

Close the Command Window to save the query.    We can now use the functions defined in the Lib query in other queries after including Lib via the $include$ directive.   For example, we could create a query in the Command Window such as:

 

-- $include$ [Lib]

 

SELECT value, Deg2Rad(value), Rad2Deg(Deg2Rad(value))

FROM CALL ValueSequence(0, 360, 10);

 

The above could be run directly from the Command Window or within a query component.   Once we run the $include$  preprocessor directive in the Command Window to include a query such as Lib we can thereafter launch additional queries in the same Command Window session that reference functions in Lib without the need to each time re-run the $include$ directive, since each session in the Command Window preserves state.

 

Tips:

 

 

$include$ Compared to EXECUTE and SELECT.

The same as EXECUTE <query>  or  SELECT * FROM <query> will run the referenced query, -- $include$ <query> in essence will also run the query as well.  There are two important differences how the include runs the query compared to how EXECUTE or SELECT run the query:

 

First Difference

 

First, EXECUTE and SELECT run the query as a single statement while $include$ runs the query statements individually. Because EXECUTE and SELECT run the query as a single statement the query cannot create components and then refer to them.

 

Example: Suppose we have a query named q and the text of the query is

 

CREATE TABLE t <...>; SELECT * FROM t;

 

If we now use query q in a SELECT statement such as

 

SELECT * FROM [q];

 

The SELECT must determine what fields will result from the query q.   But the result of q is produced by the second statement within the query q, a SELECT statement,  and that statement cannot know what the result is going to be before the first statement in q, a CREATE TABLE statement, is executed.   Running the statement...

 

SELECT * FROM [q];

 

...requires knowing what results q produces before either of the two statements in q are executed.   Since that is not possible statements of the form SELECT * FROM [q] will fail.

 

In contrast, $include$ does not have that limitation because  $include$ runs the query statements individually.  Included queries can create components and refer to them, for example, as q does in the example above.  The first statement in q is run individually to create table t and then the second statement in q is run individually to select fields from the newly-created table t.  

 

Second Difference

 

Second, EXECUTE and SELECT run the referenced query in a separate scope while $include$ does not.   The separate scope used by EXECUTE and SELECT creates a barrier which protects the referencing query and the referenced query from each other.   For example, if the referenced query defines a function that function will only be visible within the referenced query. It will not leak into the referencing query and thus cannot potentially overwrite a different function with the same name in the referencing query.

 

$include$ is different in that included statements do not run in a separate scope.   The entire point of $include$ is to interpret the statements of the included query as if they were part of the including query.  Therefore, if the included query defines a function that function will be visible to the including query.

Threading without Locks

The Manifold query engine uses a new threading model which allows most nodes to perform without taking any locks whatsoever. Whenever the query engine decides to create a new thread, either because it sees a THREADS clause or for other reasons, it performs a short setup phase and then lets the thread go without competing with other threads for resources.

 

If the query nodes used by the thread do not use any external resources, for example, if they do not run queries on any external data sources via EXECUTE or if they do not run scripts and so on,  the setup phase itself also does not take any locks. Otherwise, the setup phase will take a lock but only for the duration of the setup. The duration of the setup phase is usually only a short period at the beginning of the statement.

 

tech_ravi_sm.png

Tech Tip: In queries use THREADS to make full use of processor cores available.  

 

The THREADS command takes a value for the number of threads to use.  For example, if we know we have six CPU cores but we only want to use four threads we could write...

 

THREADS 4

 

To automatically use however many CPUs we have available we can add...

 

THREADS SystemCpuCount() BATCH 1

 

...to the end of a query we are telling Manifold to see how many CPUs are available, the result of SystemCpuCount(), and to use that many threads.   The result can be dramatic, literally running a query several times faster than without launching multiple threads.

 

Compile Once, Run Many Times

Inline queries run by EXECUTE are compiled once during initialization of the query and then are run as many times as necessary thereafter without re-compilation,  even when run with different  parameter values.

 

EXECUTE called in the context of a threaded statement will create and compile multiple external commands which will run in parallel without interfering with each other.

 

Either an external server that ultimately runs these commands or a client-side driver that is used to connect the server might do their own locking, but that is an obvious performance bottleneck so most modern servers and drivers minimize such locking, especially in the case of read-only access.

 

Script functions also are compiled once during initialization of the query and then are run  as many times as necessary thereafter without re-compilation.

 

A script function called in the context of a threaded statement will create and compile multiple instances of itself which will run in parallel to each other. If desired, we can turn off compilation and execution of such multiple instances using THREADS 1.

 

One example scenario when we might want to use THREADS 1 to prevent a script function from compiling multiple instances of itself to run in parallel is when we want to  maintain a modifiable state, for example, to return a new value in every consecutive call to a function, and we do not want to synchronize access to that state from multiple instances of the script.

 

Notes

Square brackets or reverse quotes are OK  -  Manifold standard style is to enclose identifiers in square brackets, as in SELECT [name], [address] FROM [contacts];  To provide better compatibility with ODBC the query engine also allows enclosing identifiers in reverse quote ` characters.  ODBC as a technology wants to be able to enclose identifiers using the same character on both sides.   A construction such as [name] uses two different characters, a left square bracket [ and a right square bracket ] while `name` uses the same character, a reverse quote ` character on both sides.

 

Brackets and Parentheses - [ ] square brackets or ` ` reverse quotes are used for names.   ( ) parentheses are used for grouping language constructs, not as a general purpose way of indicating execution precedence.  { } curly brackets are not used in Manifold SQL but often appear in various other usages such as JSON.   [[ ]] double square brackets act as fences to isolate what is within.  For example, when inline queries are enclosed within double square brackets [[  ]] the use of double square brackets removes the need to escape single quotes within the inline query.  @ @ brackets are used to enclose parameter names inside inline queries that are going to be run on a non-Manifold query engine.

 

 

Changing queries - When changing a query called by another query, run the changed query at least once or save the project so that the new form gets propagated to all uses. Otherwise other queries that call the changed query may be using an older compiled form from cache.

 

Character Limitation - Queries and other text windows such as comments are limited to 32 MB of characters, a limitation that also applies to ad-hoc queries or scripts in the Command window.   Queries that utilize anywhere near thirty two million characters should probably be re-written in more modular form in any event.  

 

Indexes in projections - Table projections in queries preserve indexes built on output fields, which greatly helps performance.  For example, the mfd_meta table contains three indexes:

 

 

Reducing mfd_meta to just Name and Property using mfd_meta (Name, Property) will return a table with both the second and the third indexes listed above as well as the two fields.

 

Indexes with duplicates - Data sources for remote databases support indexes with duplicates (with or without NULLs).

 

NULLs and DISTINCT - The DISTINCT clause is one of those places where the notion of a NULL as a generic unknown value meets established practice: instead of treating all NULL values as different from all other values, including other NULL values, DISTINCT treats them as the same value.  For example, using DISTINCT on a table with a single field will combine all NULL values into a single record.

 

AS is not Optional - Silent renaming, that is implicit renaming, of fields is not allowed.  We must use AS to rename a field.   For example, not allowed is

 

SELECT a b from c;

 

Instead, we must write...

 

SELECT a AS b from c;

 

Using AS explicitly avoids situations where silent renaming is ambiguous.

 

Tables and Nested SELECT - The Manifold query engine allows replacing a table with a nested SELECT anywhere, without restrictions.

 

OFFSET and FETCH - SELECT supports the use of OFFSET and FETCH clauses to specify records returned by a SELECT query. If OFFSET is negative or zero it is ignored. If FETCH is negative it is ignored.

 

OFFSET returns those records after the given number of records   For example,

 

SELECT * FROM [TableName] OFFSET 5;

 

...returns all records in the table after the first five records, similar to how the more or less equivalent SKIP quantifier found in some other query engines (such as that in Manifold 8) works.

 

FETCH returns only the given number of records   For example,

 

SELECT * FROM [TableName] FETCH 5;

 

...returns the first five records in the table, similar to how the more or less equivalent TOP quantifier found in some other query engines (such as that in Manifold 8) works.

 

OFFSET and FETCH are typically used with an ORDER BY clause, appearing after the ORDER BY clause, to choose which records to return.   Using a combination of OFFSET and FETCH provides a way to "page" through query records.   We can specify either or both.  For example,

 

SELECT * FROM [TableName] FETCH 5 OFFSET 5;

 

...returns five records, starting with the sixth record that is returned.

 

Progress Display - Running a query from the Project pane or any other window or dialog runs the query in a background thread and displays a progress dialog.   INSERT / DELETE / UPDATE and similar query statements track the number of processed records / size of processed values and will update progress status.    If the number of records is known beforehand, the statements will also advance the progress bar.  The progress display can be controlled with pragmas.

 

Cancel is not an Undo - A running query may be cancelled from the progress display.   This allows termination of a long-running operation if we change our minds about allowing it to run to completion.   Important: cancelling a running query does not undo what the query has done up to the moment of cancellation.  For example, if we execute an UPDATE query that alters a table in a remote data source to which we have connected over an insufferably slow data link, we may lose patience with the slowness of the link and choose to cancel the operation.   If we do that then any records which have been updated in the table before we cancel will not be "unupdated".  

 

Date Literals  - Date literals are taken in 24 hour time, where 8:00 PM is 20:00 and 8:00 AM is 08:00.  AM or PM modifiers are understood within literals.    Time portions are hours, minutes, seconds with milliseconds appended after a dot.  In the example below the time specified is 1 minute 15 seconds and 126 milliseconds after 12 noon.

 

? DateTimeHour(#01/21/2017 12:05:15#)

? DateTimeHour(#01/21/2017 08:05:15 PM#)

? DateTimeHour(#01/21/2017 08:05:15 AM#)

? DateTimeHour(#01/21/2017 20:05:15#)

? DateTimeMillisecond(#01/21/2017 12:01:15.126#)

 

See Also

Tables

 

Data Types

 

Indexes

 

Functions

 

Regular Expressions

 

Command Window

 

Command Window - Query Builder

 

SQL

 

Aggregates

 

SQL Statements

 

SQL Functions

 

SQL Operators

 

Temporary Databases

 

EXECUTE

 

Editable Results Tables

 

SQL Example: Force an Anomaly in Constraints - Constraints are only evaluated when we insert or update records.   If a constraint refers to external data, such as the values in a different table, we can force an anomalous condition where the table with the constraint may contain data that no longer meets the requirements of the constraint.

 

Example: Create and Run a Query -  See how the different parts of a command window operate when creating and running SQL queries.   Includes use of the Log tab as well as the ?expression and !fullfetch commands.

 

Example: Export the Results of a Query - How to export the results table of a query for use in other applications.

 

Example: Refer to a Table within a Data Source -  Within a query, how to refer to a table that is in a data source.

 

Example: Switching between Manifold and Native Query Engines - How to use the !manifold and !native commands to switch a query in the Command window from use the Manifold query engine to whatever query engine is provided by a data source.

 

Example: Automatically Generating CREATE Queries - How to use the Command window to automatically generate SQL in the form of CREATE queries that create a desired component.

 

Example: Run JavaScript in the Command Window - How to run a simple V8 JavaScript script in the Command window.

 

SQL Example: Learning to Union Areas in SQL from Edit Query - We learn how to write an SQL query that does a custom Union Areas operation by cutting and pasting from what the Edit Query button automatically generates.

 

Example: Transform Templates, Expressions and Queries - We learn to use a function by clicking on a template in the Transform dialog, seeing what it does in a preview, looking at the query Manifold creates and then trying out the function in the Expression tab.