Queries

Queries are written in a language called SQL, the same SQL language that has become universally popular through many years of use with databases.  Different SQL implementations in various products will have slightly different syntax, but all major SQL implementations are so similar that anyone familiar with one of them will be able to quickly adapt to any of them.    

 

SQL is pronounced in English either by saying the letters, as in "ess cue ell", or as the word "sequel".   Both pronunciations are perfectly OK.  

Manifold SQL

This documentation covers SQL as implemented in Manifold's Radian spatial database engine.  Radian is built into Manifold technology products such as Manifold Release 9 GIS, the free Manifold Viewer, add-ins for other GIS packages like ESRI's ArcGIS Pro, and various OEM products.   Packages using Manifold technology share a common code base and a common set of documentation topics.  Improvements in Manifold infrastructure automatically provide updated functionality to all packages and products using that technology.

 

Manifold SQL is a very large, full-featured spatial SQL, like SQL implementations in major DBMS packages.  Manifold SQL is automatically CPU and GPU parallel and includes hundreds of functions of interest in spatial data work.  Manifold DBMS and SQL technology was designed from the ground up to support the unique nature of GIS operations, which often involve vastly greater data per record than is typical of classic DBMS products like Oracle or SQL Server.   

 

Users familiar with SQL in Oracle, MySQL, SQL Server, PostgreSQL, or DB2 will be immediately productive in Manifold SQL after noting the usual slight syntax variations as with any SQL.  For example, use square brackets [ ], double quote " characters, or back ticks ` ` to delimit identifiers like field and table names, and enclose text literals in single quote ' characters:

 

SELECT * FROM [Orders] WHERE [City Name] = 'London' ;

 

See the Identifiers, Constants and Literals topic.

Queries

As with many databases, the word query in Manifold SQL simply means a component that contains SQL.  SQL uses many English words for commands and statements, which make it much easier to learn.    Basic SQL is really easy.  We can get started with five minutes of learning to write simple, but productive, SQL.  

 

The word "query" as a verb means "to ask questions of," and asking questions of our data is one of the biggest uses for queries.  This documentation uses the word query to refer to any SQL text even though what is in a query often does much more than simply ask questions of the data.  

 

Queries can create or delete data, transform the structure of databases, analyze data, and can transform data within tables and databases.   For example, we can write a query that creates polygonal areas in drawings based on relationships with other drawings, such as creating new polygonal areas that show the regions of intersection between objects in other drawings.  

Uses for SQL:

 

An item in the project pane marked with a query icon is a component that contains SQL text in it.  

 

If we double-click a query, it opens up in a  Command Window, which is a special window that makes writing SQL queries easier.

 

 

 

 

 

 

Example:

A simple query:

 

SELECT * FROM [Employees];

 

The asterisk * character is a wildcard, meaning "all of the fields".   The above query says to select everything from the Employees table.

 

Names of fields and tables are usually bracketed by square [ ] brackets, which are called delimiters.   We can use double quote " characters as delimiters if we prefer.

 

SELECT [Last Name], [First Name] FROM [Employees];

 

Bracketing names with delimiter characters like quare [ ] brackets is optional if the names are unambiguous, for example, not containing spaces or key words that are commands or operators or other special words within SQL.  In the query above, Employees is the name of a table and Last Name and First Name are names of two fields in that table.

 

Using square brackets [ ] or double quote " characters to delimit the names of fields and tables eliminates ambiguities.  That allows us to use names like Employees in 2021 as a name for a table or a field, without the spaces in the name or the in word in that name causing confusion with possible use of the IN operator in SQL.   Many people therefore get in the habit of using delimiters, even if there is no ambiguity about a given name and thus delimiters are optional for that name.  

 

Other people prefer to use names for fields and tables that contain no spaces and are not key words, so they do not need to be delimited by brackets:

 

SELECT * FROM Employees;

 

The books.mxb example project includes a table called Employees. If open that project, we can pop open a  Command Window, enter the query above into the Command Window, and press the ! run button on the main menu.  This is what we will see:  

 

 

The results table displayed from a query is called a database view in some DBMS products.  It is a virtual view into the results that are assembled on-the-fly when we run a query.   Although the results table is generated on the fly, we can manipulate it as though it was a regular table in our project.  

 

We can use the Layers pane to move columns about in the results table, to hide or show them, or to change their widths.  We can sort by column and use Filters.     We can export the results shown in a results table to a file by choosing Edit - Export Results in the main menu, to export to popular formats such as CSV, GDB, GPKG, and others.  

 

If the results table contains a field with an index, like the mfd_id field, that would make a table selectable and editable, then the results table is also selectable and we will be able to edit field values shown in the results table and have those edits propagate into the tables from which the records and fields shown in the results table were drawn.   If any records in a results table are selected, the Export Selected command in the Edit main menu will be enable, allowing us to export the selected records to a file.

 

Because a results table is a virtual table created on-the-fly when a query is run, when the results table is closed those changes go away.  If we would like a permanent snapshot of the results table in our project, we can use SELECT ... INTO to create a table that is a permanent copy of the results table, using a slightly more sophisticated query:

 

SELECT * INTO [Sales Reps] FROM [Employees]

   WHERE [Title] = 'Sales Representative';

 

The above query creates a new table, called Sales Reps, in the project and fills it with all records from the Employees table that have a Title of Sales Representative.    Queries work fine with multiple lines.  SQL does not care about white space.   We can use white space, like tabs or blank lines,  however we like to provide greater readability.   

 

The above query also shows the use of string delimiters, single quote ' characters that are used to set off text that we intend as a literal string value.  See the Identifiers, Constants and Literals topic for how to set off literal values of other types, such as dates and times.

 

Simple queries are not rocket science.  They are easy to write and easy to change if we want something different:

 

SELECT * INTO [Sales Reps] FROM [Employees]

   WHERE [Title] = 'Sales Representative'

   AND [Last Name] <> 'King';

 

In the above query, <> means "not equal to".   Guess what the above query does?   Without knowing anything about SQL, we can guess right away the query creates a new table filled will all the sales representatives except Bob King.

 

SQL is like that: learning each additional bit of SQL is very easy, and we do not have to learn very much to start creating very useful and very powerful queries.   Every bit we learn adds to the power at our fingertips.  

 

Although every SQL implementation has specific quirks, since the general principles of SQL are very widely used by thousands of applications, there are many web sites and other resources that can help us learn.  What we learn about SQL in this application will help us with SQL in Oracle or PostgreSQL and vice versa.  

 

We have two ways to write and to use SQL:

 

 

 

The Command Window

Our primary interface for editing queries is the Command Window for SQL.  Launch it with View - New Command Window - SQL.  The Command Window is a console that makes it easier to write SQL.   It provides point-and-click commands that reduce the need for manual keyboarding.  We can use Notepad or any other text editor to write SQL, but most people prefer the extra help the Command Window provides.

 

 

To create and run a query, we enter whatever SQL text we want into the Command and Queries section and then press the ! run button on the main toolbar.   The system will run the query and report the results.  

 

All of the rest of the Command Window is optional: it is there to help us write queries more easily and to help us avoid manual keyboarding.  For example, we can double-click something in the Tables pane to add that text automatically to the SQL text we are writing.  That is quicker and less error-prone than manually keyboarding the names of tables and fields.  Same with the Templates, where it is quicker to double-click the name of a function than to keyboard it manually,   Press F1 when a Command Window is open for a web-based quick reference guide to keyboard shortcuts.

 

To create a query:

 

 

To write a temporary query:

 

 

New queries created within the Command Window disappear when the Command Window is closed.   To save the query, choose Edit - Save As Query to save the query text as a query in the project.

 

Running a query:

 

 

Export the results of a query:

 

 

Saving a query:

 

 

Using the Query Builder:

 

 

Literals:

 

 

Writing Queries:

 

 

SELECT * FROM [mysql]::[Products];

 

 

 

Learning Resources:

 

 

Creating a Query

There are two ways to create a query component in a project:

 

 

Results Tables

Query results tables are virtual tables in that they are constructed on the fly to show the results of a query.  To make them permanent we would use SELECT ... INTO to create a real table.

 

When browsing results tables we can use many display features such as the Layers pane and Filters to control the display.   For example, in the illustration below we have adjusted widths of columns.  Such settings go away with the next run of the query but they are so quick to specify they can be very convenient when browsing a complex results table.  

 

 

 

 

If we are only interested in two fields out of twenty in a results table, we Ctrl-click on those two fields in the Layers pane to select them, we Ctrl-I to invert the selection, and we double-click any of those selected fields OFF to hide all eighteen of the selected fields.  That is a very quick way to hide all but two fields in a results table, taking but three seconds for an experienced user.

 

If the fields in the results table include an indexed key field, like mfd_id, the results table will be both selectable and we edit "through" the table into the tables from which the data is pulled.  

 

Note that the Counts pane in the status bar reports Selected: 0 of ?, with a question mark for the total number of records.  Queries stream their results as the query computes, so the total number of records can be unknown until the query finishes, which could be beyond the system's control in the case of queries run against external data sources.

 

To see the total number of records returned, right-click the Counts pane and choose Refresh.   The Refresh command is enabled any time the number of selected records or the total number of records is unknown.

 

 We also will be able to use the Selected button in the main toolbar to toggle the results table between showing only selected records, or showing all records.  See the Selection topic and the Tables topic for examples of how that works.

 

Tech Tip:  To fit into this documentation, the illustration above shows all windows and panes docked within a small Manifold desktop window.  Undock windows and panes by Shift-clicking their name tabs, so they could be resized to larger size and moved around a larger Windows desktop.   To view a results table we could also drag the dividing line in the Command Window between the results table and the query text  to expand the size of the results table.

Indexes in Results Tables

Results tables automatically include any indexes built in the source tables on the fields reported in the results table.  For example, if we have a btree index built on the mfd_id field in a table or a spatial rtree index built on a Geom field in a table, if a results table reports the mfd_id and Geom fields the results table will also have within it the btree and rtree indexes built on those fields.

Example: Create a New Query in the Project Pane

Right-click into the Project pane and choose Create - New Query.

 

That pops open a dialog that allows us to choose a name for the query.

 

 

Choose a name for the query (the default is Query) and press the Create Query button.  Pressing the Edit Query button will launch a Command Window.

 

 

Pressing the Create Query button creates a new, blank query, called Query by default.   Double-click the query to open it in a Command Window, and then enter the desired text into the query.

 

In the illustration below, we have also dragged and dropped the Employees table into the lower right pane of the Command Window, so we have the name of the table and the fields available for quick use, with no need to keyboard them manually.

 

 

When we write SQL text into the query as above, that text automatically goes into the Query component that we created.

 

SELECT [First Name], [Last Name] FROM [Employees]

 WHERE [Title] = 'Sales Representative';

 

There is no need to "save" the text into the Query component.  The next time we double-click on Query to open it in the Command Window, the text we entered above will be there.

 

 

Pressing the ! run button we can see the results table the query generates.    The table is shown with gray read-only background because the results do not include an index.    To make the results table read/write (and also, to make it selectable) we must have at least one field in the results table with an index.

 

We can arrange that by adding the [mfd_id] field to the results:

 

SELECT [mfd_id], [First Name], [Last Name] FROM [Employees]

 WHERE [Title] = 'Sales Representative';

 

 

Running the query now provides columns with white backgrounds, indicating they are writable.    For example, we could double-click into the First Name cell for the first record and change Nancy to Kate and that change would alter the record within the database.  

Example: Count Duplicates in a Field

We can use small SQL queries for routine work.  For example, suppose we would like to know if there are any duplicate values in a field, or if all records are unique?

 

 

Consider the table above, from the nwind sample database.   We would like to know if there are any duplicates in the City field, and, if so, how many duplicates there are.  With such a small table we could find the answer just by looking carefully at the table, but if the table had many thousands or millions of records, we should use a query.

 

 

In the Command Window we enter the following query:

 

SELECT [City], Count(*) FROM [Employees]

  GROUP BY [City]

  HAVING Count(*) > 1;

 

The Count function is an aggregate function, which works together with a GROUP BY clause.    GROUP sorts records in the table into groups based on whatever grouping criteria it is told to use.  In this case it sorts the table into groups where in each group the City field contains the same value.   An aggregate works with groups to create a single record for each group based on what the aggregate does.  In this case, from reading the Aggregates topic we know the Count(*) function using a * argument counts up all the records in the group.   

 

If every record in a table has a unique City value, then the result of GROUP will be very many groups each with just one record in them.  Counting the number of records in each group will return a record with a value of 1 in the result for Count(*).   But if any of the records have the same value for City, then those will be grouped together in a group that has more than one record in the group.  Counting those will result in a value greater than 1 in the result for Count(*).   Those are the ones we want to see, and the HAVING clause gives us only those.

 

 

When we run the query we get the results seen above.  The column giving the result of the Count(*) aggregate is named "result" since we did not specify a name for it.    If we would like to specify a name, we can do that using the AS aliasing key word to give it a name:

 

SELECT [City], Count(*) AS Duplicates FROM [Employees]

  GROUP BY [City]

  HAVING Count(*) > 1;

 

Running the above query provides a more self-documenting caption for the results table.

 

 

We can see that in the Employees table there are four instances of London and two instances of Seattle.  If the results table was empty, there would be no cases where the City was duplicated in more than one record.

 

GROUP BY is one of the cases where NULLs are considered equivalent to each other, as opposed to the usual SQL rigor of considering them unknown values that cannot be compared.  

 

 

Suppose we copy our Employees table and we paste it to create an Employees 2 table, and then we modify that Employees 2 table to put a few NULL values into the City field, as seen above.

 

SELECT [City], Count(*) AS Duplicates FROM [Employees 2]

  GROUP BY [City]

  HAVING Count(*) > 1;

 

We modify our query to use the Employees 2 table, and then we run it.

 

 

The result is that the query reports duplicates for NULL values as well as for London and Seattle.   An entry for NULLs appears not because aggregate functions reckon NULLs, which they do not do.  The NULL entry appears because GROUP BY will group by NULLs.   The Count(*) function simply counts the number of records in a group, without looking at the value of any records.

Schemas and Queries

To see the schema of a results table in the Command Window, as in the example above, we can choose Edit - Schema:

 

 

We cannot change the schema because the results table is generated by a query.   Opening the schema shows what data types are used for the various fields and it also shows what indexes are in the results table.

 

 

We can see the schema of most queries, without opening the query in a Command Window, by Right-clicking on the query in the project pane and then choosing Schema.    Queries can report their schemas in cases where the schema of the result table can be discovered from the query text without actually running the query.    

 

Consider a query such as:

 

SELECT [mfd_id], [Last Name], [First Name], [Title] FROM [Employees]

WHERE (...more stuff here...)

 

In the above the Employees table exists in the project and there is no mystery about what fields are in the SELECT list and thus will appear in the results table.   The system can report the schema for the table based on those fields without needing to know how many records and which records will be in the results table.

 

In contrast, consider a query such as:

 

CREATE TABLE [Prices] (

  [mfd_id] INT64,

  [Name] NVARCHAR,

  [Cost] INT32,

  [Retail] INT32 AS [Cost]*2,

  INDEX [mfd_id_x] BTREE ([mfd_id])

);

SELECT * FROM [Prices]

WHERE (...more stuff here...)

 

The result table arises from the SELECT, but that depends on a table, Prices, which does not exist if the prior part of the query has not run.   In this case the schema of the result table cannot be discovered without running the query.

Schemas and Database Views

Database views are just queries within the native data sources.   We can see the schema for such a query by right-clicking the view and choosing Schema.

Create a Drawing from a Query

If the results table for a query contains a geom field and the query reports a schema without having to run the query, we can create a drawing from that query.    We must know the coordinate system used within the geom in the table, which we can easily find in the Properties of the table in the FieldCoordSystem.Geom property as a human-readable JSON string.   The usual rules which apply to creating drawings from tables also apply to creating drawings from queries.  For example, we cannot create a drawing when the query resides within a read-only data source, such as a read-only .map project or a read-only DBMS.

 

Drawings created from queries can be styled just like other drawings, including the use of thematic formatting based on a field the query reports in the results table.   To update a drawing to show any changes in data in tables which the query uses, we must choose View - Refresh to refresh the drawing.    

 

We can change the query text within the query from which a drawing is created.  After changing the query text we must run the query at least once to update the system, before choosing View - Refresh for the drawing to update the drawing.

 

Use only in maps - In current builds of Release 9, drawings created from queries are visible only when they participate as layers in maps that have at least one other layer.  Drawings created from queries will not be visible when opened in their own drawing window.  If we want to see only the drawing, turn off the other layer in the Layers pane.

 

No spatial index?  - When creating a drawing from a query, If the table from which the query takes the geom field does not contain a spatial index on the geom field, we will have to use a temporary spatial index every time we open the drawing.  However, that would be very unusual as normally tables that contain geom fields will have a spatial index built on that geom field as well, and thus a query which includes the geom field in the results table will also include the spatial index on the geom field as well.

 

See the Example: Create a Drawing from a Query topic, which duplicates the Drawings from Queries video using the Mexico_queries.mxb sample project that may be downloaded from the Examples page on the Manifold web site.  

Query and Function Parameters

Queries can include functions as well as other queries.  The function or query can use parameters.   Parameters use the @ character as a prefix.   The names of query and function parameters can be reserved words, because the @ prefix removes the ambiguity.  All the same, to avoid confusion it is unwise to use reserved words as parameter names.  

 

A query parameter example:

 

EXECUTE WITH (@p INT32 = 1) [[ SELECT * FROM mfd_root WHERE mfd_id=@p; ]]

 

A function parameter example:

 

FUNCTION f(@p INT32) INT32 AS @p+1 END; VALUES (f(10)), (f(11));

 

Copy and paste the above into a Command Window and press ! Run to see what they do.

 

 

Additional examples:

 

-- Query parameter example

EXECUTE WITH (@p NVARCHAR = 'folder') [[

  SELECT * FROM mfd_meta WHERE

    StringCompareCollate(property, @p, CollateCode('en-US, nocase, noaccent, nosymbols') ) = 0;

]];

 

-- Function parameter example

FUNCTION combine(@p NVARCHAR, @q NVARCHAR) NVARCHAR AS

 @p & ': ' & @q

END;

SELECT combine(name, type) FROM mfd_root;

 

Copy and paste the above into a Command Window and press ! Run to see what they do.

Inline Scripts

Queries can include inline scripts using the SCRIPT statement.  Inline scripts provide a way to use script functions in a single self-contained query component.

 

Example: Add a computed field that uses an inline script to insert today's date for each record:

 

ALTER TABLE t (

  ADD insertdate DATETIME

    WITH

[[

 

SCRIPT funcs ENGINE 'c#' [[

  class Script

  {

    static System.DateTime F() { return System.DateTime.Now; }

  }

]];

FUNCTION currentdate() DATETIME AS SCRIPT INLINE funcs ENTRY 'Script.F';

 

]]

    AS [[ currentdate() ]]

);

 

After running the above query, existing records in the table will get the current datetime.   If we wait several seconds or a minute so to allow the datetime visibly to change, and then we insert a new record,  either manually or using INSERT, we might expect it will get the new, current datetime value; however, the optimizer used to evaluate computed fields using the above construction will use the same datetime as when the computed field was first added to the table.  

 

To  force the optimizer to use a new datetime for a newly inserted record, we put a reference into the script and function to a field, such as mfd_id, which changes with each insert:

 

ALTER TABLE t (

  ADD insertdate DATETIME

    WITH

[[

 

SCRIPT funcs ENGINE 'c#' [[

  class Script

  {

    static System.DateTime F(int unused) { return System.DateTime.Now; }

  }

]];

FUNCTION currentdate(@unused INT32) DATETIME AS SCRIPT INLINE funcs ENTRY 'Script.F';

 

]]

    AS [[ currentdate(mfd_id) ]]

);

 

Query Engine Highlights

The Manifold query engine has some distinctive strengths, including:

 

CPU Parallelization - SQL automatically runs parallel for internal tasks, for SQL functions and for Transform pane templates and similar operations, using all of the CPU cores available in your system in parallel.   When writing queries manually using the Command Window make sure to add a THREADS SystemCpuCount() command to the query to automatically parallelize the query to use all CPU cores in your system.

 

GPU Parallelization - If our system has a GPGPU capable GPU installed, the query engine will automatically use it for various query engine internal tasks as well as for SQL functions that benefit from GPU.

 

Simultaneous use of multiple query engines - Most database systems that allow queries using SQL have their own query engine that understands and executes the specific SQL syntax supported by that system.  For example, Microsoft's SQL Server's query engine will execute SQL written in SQL Server's T-SQL syntax.   Manifold also includes its own full-featured query engine to understand, optimize and execute queries written in Manifold SQL.   But in addition, unlike most database systems, Manifold also allows projects to contain queries that Manifold will dispatch for execution within the query engines of other systems.   Queries can even mix SQL for execution server side in the query engine of a database with SQL for execution within the Manifold query engine.

 

Queries are OK wherever Tables are OK - Manifold can utilize a query component in all places where one could use a table component.  However, recursive queries are not allowed.  We can write a Query A that references Query B and at the same time write Query B so that it references Query A, but running either A or B will fail with an error message.   See examples such as the SQL Example: GeomOverlayTopologyUnion Function topic for examples of how queries can be used as tables for the dynamic creation of drawings.

 

Editable Results Tables - In most other database systems we normally think of results tables produced by an SQL query as read only, but in Manifold results tables are often editable.    Manifold exposes indexes from source tables in results tables, so if the results table includes an indexed field, as is often the case with a SELECT or a JOIN statement, we can edit that results table and the edits made to field values automatically will be written back into the source tables.   See the Editable Results Tables  topic for details.

 

Multiple statements allowed - A query can contain multiple SQL statements, each terminated by a semicolon ; character.   The result of a query with multiple statements is the result of the last statement.

 

Call any query with EXECUTE - Any query, including action queries, can be called using the EXECUTE statement.   Suppose we have three queries named Create List, Update data and Copy Into that contain some SQL we frequently use.   We can write a query called Get Data that could contain the text

 

EXECUTE [Create List];

EXECUTE [Update data];

EXECUTE [Copy Into];

 

Running that single Get Data query would execute all three of the other queries in the given order.

 

Parameter queries - Write a query that uses parameters that will be passed to it, and then call that query from another query and pass the desired parameters. Even tables can be passed as parameters.  Query parameters use the @ character as a prefix.  See the EXECUTE topic for examples.

 

Includes - Queries can include other queries via the $include$ preprocessor directive.

 

Functions -  The Manifold query engine supports declaring and calling functions, including script functions, functions in external files and compiled functions as .NET assemblies.

 

Automatically written SQL - A great way to learn SQL is to use the Transform pane's Edit Query button.   Transform templates are implemented by the dialog using SQL.  Pressing the Edit Query button launches a Command Window loaded with the SQL that would be used for that template with the options and values specified.  That is a great way to learn all the details of how a particular operation can be accomplished in real life, by seeing what the Transform template would do for a similar task and then adapting for our specific requirements the SQL written by the Transform pane.   Other panes and dialogs like the Select pane and the Join dialog also have Edit Query buttons.

 

Threading without locks - The query engine uses a new threading model which allows most nodes to perform without taking any locks whatsoever.

 

Compile Once, Run Many Times -  Inline queries run by EXECUTE and script functions are compiled once per initialization of the query and ran as many times as necessary thereafter.  

 

Multithreaded SELECT with GROUP BY and Aggregates - The SELECT statement supports use of the GROUP BY clause and aggregates. The body of SELECT can be multithreaded with threading controlled as usual by THREADS and BATCH options.  GROUP BY criteria can be either fields or expressions.

THREADS and CPU Parallelization

When Manifold writes SQL, for example, as generated by the Transform pane's Edit Query button, by default the query automatically will include a THREADS SystemCpuCount() command that parallelizes the query.  The SystemCpuCount() returns the number of CPU cores in the system, as reported by Windows.   That provides a total number of cores whether they are all on a single CPU or whether the number of cores reported is the sum of cores for more than one CPU if we are using a multiple-CPU motherboard.   The number of hypercores will be counted by Windows as cores if hyperthreading is enabled (the default in most systems).  

 

Important: When manually writing queries in the Command Window we must add the THREADS SystemCpuCount() command to the queries we write to parallelize them.  If we do not add a THREADS command specifying the number of threads we want, the query will run single-threaded, that is, non-parallel, resulting in significantly slower performance in many cases.  

 

A typical situation is running an Intel Core i7 with eight hypercores.   SystemCpuCount() will return a value of 8 so the command becomes THREADS 8 and the query will be automatically parallelized to run in eight threads.   If we do not want to use all cores, we can specify the number of threads, and thus cores, to use.   For example, we could write THREADS 4 to use only four threads.

Multiple Statements

If a query includes multiple statements, all of them are run, and the result of the query is that of last statement.  Some examples:

 

 

Using multiple statements to do something big in multiple steps is often a great idea.   Many short statements are frequently much easier to understand than a single big statement.   Pieces of complex logic can be separated into FUNCTIONs or even into separate query components (which can be included).   We can put intermediate results, including tables, into VALUEs.   We can freely create temporary tables, and we can even create temporary databases with CREATE ROOT.

 

Sometimes there are things we can do with many short statements that we cannot do as well with a single big statement.  For example, in a big query that does a series of nested SELECTs shuffling the data and performing computations in between, we do not have much control over what indexes any intermediate tables will have. But if we separate the single big query into multiple SELECTs and use temporary tables, we can add whatever index you want at whatever step that we want.

 

Multiple short statements also allow us to check intermediate results.   Breaking up a single big statement into a sequence of multiple short statements is often a very good idea and highly recommended.

Running Queries using Different SQL Engines

SQL can utilize other SQL engines to execute a query.   If a data source in a project has its own query engine, for example, a data source created using Oracle DBMS, SQL can execute queries using that data source's query engine.

 

There are two ways SQL knows which query engine to use:

 

 

 

Example:  We right-click within a SQL Server data source in the Project pane and create the query:

 

SELECT * FROM dbo.t ;

 

When executed, that query will run in SQL Server using the SQL Server query engine with the results passed to Manifold.   If we change the query to be:

 

-- $manifold$

SELECT * FROM dbo.t ;

 

That query will run using the native SQL engine, pulling data into the system from SQL Server as necessary from the dbo.t table.

 

When a query uses a data source's engine the entire query is sent to the data source for execution entirely within the data source along with a request from SQL for the results.  That is a good strategy in cases where the data source may not be local, the connection to the data source is not fast, or the data source hosts a lot of data and the results from the query are expected to be small.  For example, using a data source's query engine is a good idea if many tables involving millions of records are involved with the results expected to be a few hundred records.   The data source can do the work of going through a lot of data and then send through the slower connection link back to SQL a relatively small amount of data.

 

There are some limitations on queries run within data sources on their native query engines.   Such engines are not Manifold SQL so they can not use features specific to Manifold SQL such as functions and data types specific to Manifold SQL, enhance analytic performance via GPGPU, use virtual tables added by dataports, virtual system tables like mfd_root and mfd_meta and so on.

 

When a query uses the SQL engine but involves data from a remote data source, the SQL engine compiles the query into native structures and then lets these structures fetch data from the involved data sources for doing analysis within SQL.  In some cases of well-known data sources, the Manifold SQL optimizer may send parts of the query out to the data source for execution on the data source if the optimizer believes that will be more efficient.  Using the native Manifold engine allows using functions and data types specific to Manifold, combining data from multiple data sources of different types and other Manifold-specific features.

EXECUTE Statement

The EXECUTE statement allows queries to call other queries, including specification of parameter values using WITH either using queries stored as query components or as inline queries within the EXECUTE statement set off by double square [[ ]] brackets.  Using EXECUTE, queries can pass table parameters to each other.   See the EXECUTE  topic.

Includes

Queries can include other queries via the $include$ preprocessor directive.  See the $include$ Directive topic.

 

Threading without Locks

The Manifold query engine uses a new threading model which allows most nodes to perform without taking any locks whatsoever. Whenever the query engine decides to create a new thread, either because it sees a THREADS clause or for other reasons, it performs a short setup phase and then lets the thread go without competing with other threads for resources.

 

If the query nodes used by the thread do not use any external resources, for example, if they do not run queries on any external data sources via EXECUTE or if they do not run scripts and so on,  the setup phase itself also does not take any locks. Otherwise, the setup phase will take a lock but only for the duration of the setup. The duration of the setup phase is usually only a short period at the beginning of the statement.

 

The THREADS command takes a value for the number of threads to use.  For example, if we know we have six CPU cores but we only want to use four threads we could write...

 

THREADS 4

 

To automatically use however many CPUs we have available we can add...

 

THREADS SystemCpuCount() BATCH 1

 

...to the end of a query we are telling the system to see how many CPUs are available, the result of SystemCpuCount(), and to use that many threads.   The result can be dramatic, literally running a query several times faster than without launching multiple threads.

Compile Once, Run Many Times

In-line queries run by EXECUTE are compiled once during initialization of the query and then are run as many times as necessary thereafter without re-compilation,  even when run with different  parameter values.

 

EXECUTE called in the context of a threaded statement will create and compile multiple external commands which will run in parallel without interfering with each other.

 

Either an external server that ultimately runs these commands or a client-side driver that is used to connect the server might do their own locking, but that is an obvious performance bottleneck so most modern servers and drivers minimize such locking, especially in the case of read-only access.

 

Script functions also are compiled once during initialization of the query and then are run  as many times as necessary thereafter without re-compilation.

 

A script function called in the context of a threaded statement will create and compile multiple instances of itself which will run in parallel to each other. If desired, we can turn off compilation and execution of such multiple instances using THREADS 1.

 

One example scenario when we might want to use THREADS 1 to prevent a script function from compiling multiple instances of itself to run in parallel is when we want to  maintain a modifiable state, for example, to return a new value in every consecutive call to a function, and we do not want to synchronize access to that state from multiple instances of the script.

Notes

A query by any other name -  Manifold uses the word query to refer to any SQL text.   Some writers, like Fehily, use query to refer to only to SELECT statements within SQL text, a choice of terminology they feel better reflects the original meaning of query as "asking something."   No need to stress over pure terminology, as everybody understands what is meant in both cases.

 

SELECT * is not the best way - We all do it, but using the SELECT * FROM table construction is a bit lazy.  It is better to list out explicitly all the field names in the table.  In the Command Window's query builder, we can get a list by simply right-clicking the table in the Tables pane and choosing Insert Field List.   That provides a self-documenting query that does not depend upon someone knowing all the columns in the table to know what they'll get.  When working with larger databases, using the asterisk * character instead of explicitly enumerating all the fields we want can be lower performance as well.

 

Square brackets or reverse quotes are OK  -  Manifold SQL standard syntax is to enclose identifiers in square brackets, as in SELECT [name], [address] FROM [contacts];  To provide better compatibility with ODBC and other DBMS SQL implementations , the query engine also allows enclosing identifiers in reverse quote ` characters.  ODBC as a technology wants to be able to enclose identifiers using the same character on both sides.   A construction such as [name] uses two different characters, a left square bracket [ and a right square bracket ] while `name` uses the same character, a reverse quote ` character on both sides. Future Manifold SQL builds will also add support for double quote " characters as delimiters for identifiers, for maximum compatibility with very many DBMS products.   

 

Brackets and Parentheses - [ ] square brackets or ` ` reverse quotes are used as delimiters for identifiers such as field and table names.   ( ) parentheses are used for grouping language constructs, not as a general purpose way of indicating execution precedence.  { } curly brackets are not used in Manifold SQL but often appear in various other usages such as JSON.   [[ ]] double square brackets act as fences to isolate what is within.  For example, when in-line queries are enclosed within double square brackets [[  ]] the use of double square brackets removes the need to escape single quotes within the in-line query.  @ @ brackets are used to enclose parameter names inside in-line queries that are going to be run on a non-Manifold query engine.

 

SELECT requires a FROM, or use VALUES instead - SELECT statements in Manifold SQL require a FROM with a reference to a table even if the table is not used.   In many databases, SELECT can be used to create derived columns based on system functions such as the time or mathematical expressions, but that is a fairly rare and unusual thing.  For example SELECT 3 + 5 AS [SUM RESULT]; is a valid construction in many DBMS packages that generates a result table with a single row and a single column named "SUM RESULT" and the numeric value 8.  That works in Manifold SQL as well, but only if we write SELECT 3 + 5 AS [SUM RESULT] FROM [Table] where Table is the name of a table in our project.  The query does not use the table for anything, but the query parser wants to see a FROM clause in the query so any table will make do as a placeholder to keep the parser happy.   If we want the same effect as a derived columns expression in other databases such as SELECT 3 + 5; which does not use a FROM, simply use the VALUES statement, as in VALUES (3 + 5);

 

A Third Way to Create a Query - In the example above, when we created a new query in the Project pane we pressed the Create Query button.   But right next to that button is an Edit Query button.    Pressing that button opens the Command Window and fills it with text automatically generated by Manifold that is the SQL query which creates a query component of the specified name.   For example, suppose the name we use is MyQuery and we press the Edit Query button.  The query generated would be:

 

-- $manifold$

--

-- Auto-generated

-- New Query

--

CREATE QUERY [MyQuery] (

  PROPERTY 'Text' '-- $manifold$\u000D\u000A\u000D\u000ASELECT * FROM [mfd_root];'

);

 

The above is the SQL required to generate the default query text which appears in new queries.   The \u000D and similar are escaped newline character sequences.   We can edit the text to automatically write whatever queries we want.    It may seem odd to use a query to write another query, but sometimes we might want to do that when creating queries programmatically.

 

No spatial index?  - When creating a drawing from a query, If the table from which the query takes the geom field does not contain a spatial index on the geom field, we will have to use a temporary spatial index every time we open the drawing.  However, that would be very unusual as normally tables that contain geom fields will have a spatial index built on that geom field as well, and thus a query which includes the geom field in the results table will also include the spatial index on the geom field as well.   If we want to automatically add a spatial index to a drawing created from a query, we use the technique described in the Example: Create a Drawing Dynamically from a Geocoded Table topic.

 

Changing queries - When changing a query called by another query, run the changed query at least once or save the project so that the new form gets propagated to all uses. Otherwise other queries that call the changed query may be using an older compiled form from cache.

 

Character Limitation - Queries and other text windows such as comments are limited to 32 MB of characters, a limitation that also applies to ad-hoc queries or scripts in the Command window.   Queries that utilize anywhere near thirty two million characters should probably be re-written in more modular form in any event.  

 

Indexes in projections - Table projections in queries preserve indexes built on output fields, which greatly helps performance.  For example, the mfd_meta table contains three indexes:

 

 

Reducing mfd_meta to just Name and Property using mfd_meta (Name, Property) will return a table with both the second and the third indexes listed above as well as the two fields.

 

Indexes with duplicates - Data sources for remote databases support indexes with duplicates (with or without NULLs).

 

NULLs and DISTINCT - The DISTINCT clause is one of those places where the notion of a NULL as an unknown value meets established practice: instead of treating all NULL values as different from all other values, including other NULL values, DISTINCT treats them as the same value.  For example, using DISTINCT on a table with a single field will combine all NULL values into a single record.

 

NULLs and ORDER BY - The ORDER BY clause is also one of those places where the notion of a NULL as an unknown value meets established practice.  The SQL standard allow each DBMS to decide whether it considers NULLs as coming before, that is, the lowest value, or coming after, that is, the highest value, all other values in sort order.  Manifold SQL is like Oracle, DB2 and PostgreSQL in treating NULLs as the highest value. Using ORDER BY in default, ascending order to sort a text column that contains names like Ann, Bob and Zorro plus a NULL record will result in Ann, Bob, Zorro, NULL order.   Using ORDER BY with the DESC (descending) keyword will result in NULL, Zorro, Bob, Ann order.   Note that Microsoft Access, Microsoft SQL Server and MySQL treat NULLs as the lowest value for sort order.  

 

AS is not Optional - Silent renaming, that is implicit renaming, of fields is not allowed.  We must use AS to rename a field.   For example, not allowed is

 

SELECT a b from c;

 

Instead, we must write...

 

SELECT a AS b from c;

 

Using AS explicitly avoids situations where silent renaming is ambiguous.

 

Tables and Nested SELECT - The Manifold query engine allows replacing a table with a nested SELECT anywhere, without restrictions.

 

OFFSET and FETCH - SELECT supports the use of OFFSET and FETCH clauses to specify records returned by a SELECT query. If OFFSET is negative or zero it is ignored. If FETCH is negative it is ignored.

 

OFFSET returns those records after the given number of records   For example, using [mfd_root] as an example table, which will have plenty of records in a project with many components:

 

SELECT * FROM [mfd_root] OFFSET 5;

 

...returns all records in the table after the first five records, similar to how the more or less equivalent SKIP quantifier found in some other query engines works.

 

FETCH returns only the given number of records   For example,

 

SELECT * FROM [mfd_root] FETCH 5;

 

...returns the first five records in the table, similar to how the more or less equivalent TOP quantifier found in some other query engines works.

 

OFFSET and FETCH are typically used with an ORDER BY clause, appearing after the ORDER BY clause, to choose which records to return.   Using a combination of OFFSET and FETCH provides a way to "page" through query records.   We can specify either or both.  For example,

 

SELECT * FROM [mfd_root] FETCH 5 OFFSET 5;

 

...returns five records, starting with the sixth record that is returned.

 

Progress Display - Running a query from the Project pane or any other window or dialog runs the query in a background thread and displays a progress dialog.   INSERT / DELETE / UPDATE and similar query statements track the number of processed records / size of processed values and will update progress status.    If the number of records is known beforehand, the statements will also advance the progress bar.  The progress display can be controlled with pragmas.

 

Cancel is not an Undo - A running query may be cancelled from the progress display.   This allows termination of a long-running operation if we change our minds about allowing it to run to completion.

 

Important: cancelling a running query does not undo what the query has done up to the moment of cancellation.  For example, if we execute an UPDATE query that alters a table in a remote data source to which we have connected over an insufferably slow data link, we may lose patience with the slowness of the link and choose to cancel the operation.   If we do that then any records which have been updated in the table before we cancel will not be "un-updated".  

 

Date Literals  - Date literals are taken in 24 hour time, where 8:00 PM is 20:00 and 8:00 AM is 08:00.  AM or PM modifiers are understood within literals.    Time portions are hours, minutes, seconds with milliseconds appended after a dot.  In the example below the time specified is 1 minute 15 seconds and 126 milliseconds after 12 noon.

 

? DateTimeHour(#01/21/2017 12:05:15#)

? DateTimeHour(#01/21/2017 08:05:15 PM#)

? DateTimeHour(#01/21/2017 08:05:15 AM#)

? DateTimeHour(#01/21/2017 20:05:15#)

? DateTimeMillisecond(#01/21/2017 12:01:15.126#)

 

Everything is a table - Spatial data in Manifold products is stored in tables.  Spatial data in tables is also self-describing, with the system using indexes on that spatial data to describe the structure of that spatial data.  Drawings and images that show spatial data in visual form are used only for visual display purposes.  When topics in this documentation use illustrations that show visual settings like drawings, those visual displays are just a convenient user interface for seeing and for  interacting with the actual spatial data, which remains, as always, stored in the associated table.

 

Virtual drawings - Operations on geometry can use any table with a geometry field. The geometry field does not have to be part of a spatial index, although having such an index will frequently help performance. A query function that operates on geometry can accept either a physical drawing component stored in a database, or a virtual drawing created on a geometry field (using the ComponentFieldDrawing function) of a table or of a query component.  A virtual drawing supports all functions available for regular components:  for example, it can report its coordinate system or the name of the underlying component.  Both physical and virtual drawings can be limited to using only selected records with the result being accepted as a drawing by query functions.

 

Virtual images - Operations on tiles use a table with a tile field, a pair of X and Y fields, and a spatial index connecting these fields together. Images on some data sources, for example, web images, use a slightly different structure with a non-spatial index on X and Y fields, sometimes also including a level field, instead of the spatial index. Such images have to be converted to images with a spatial index prior to the operations.  A query function that operates on tiles can accept either a physical image component stored in a database, or a virtual image created on a tile field (using the ComponentFieldImage function) of a table or of a query component . A virtual image supports all functions available for regular components. Both physical and virtual images can be limited to using only selected records but the result will not be accepted as an image by query functions and will only work as a plain table with no spatial context.  Planes are to extend selections in images to be per-pixel and after that is accomplished, the result of limiting an image to only selected pixels also will work as an image.

 

SQL is not an acronym -  A common misconception repeated far and wide on Internet is that SQL is an acronym for "Structured Query Language."  That is not true.  According to IBM, the inventors of the language,  SQL means, simply, SQL.    

 

A predecessor version of the language was, indeed, a quasi-acronym: SEQUEL, for Structured English QUEry Language.   To avoid conflict with the "SEQUEL" trademark of a different company, IBM renamed the language SQL.  It is now officially "Database Language SQL" or simply SQL.

 

That may seem like merely a minor change, but to DBMS gods, that SQL is not an acronym avoids the heresy of claiming structure where there is none.  The original acronym referred to the use of structured English in the syntax of the language.  Take out the "English" part of it and the word structured no longer applies.   As Allen Taylor, the author of "Database Development for Dummies" and "SQL for Dummies," has so concisely commented online:  "SQL is an example of structured English, and it is a query language, but it is not a structured query language."  

 

Chris Fehily, well-known author of many SQL books, points out that insiders feel "Structured Query Language" is the worst possible description because SQL:

 

 

Trademarks - Radian, Manifold, and Polygon are registered trademarks of Manifold Software Limited.    流形,  多边形, and 弧度 (Manifold, Polygon, and Radian in Chinese) are trademarks of Manifold Software Limited.

 

See Also

Tables

 

Data Types

 

Indexes

 

Computed Fields

 

Editing Queries, Scripts and Comments

 

Functions

 

Regular Expressions

 

Manifold Commander

 

File - Create - New Query

 

Edit - Join

 

Command Window

 

Command Window - Query Builder

 

SQL

 

Aggregates

 

SQL Statements

 

SQL Functions

 

SQL Operators

 

Temporary Databases

 

$include$ Directive

 

$manifold$ Directive

 

EXECUTE

 

Editable Results Tables

 

Example: Transfer DEM Terrain Heights to Areas in a Drawing - Given a map with an image layer that shows terrain heights taken from a DEM, and a drawing layer that contains areas, using a small SQL query we transfer the average terrain height within each area to that area as a Height attribute for the area. Easy!

 

Example: Unique Names using Regular Expressions - We have a table with a text field that contains a list of names, separated by commas.  Some of the names are repeated.   We would like to transform the list of names into a similar list, but without any repetitions of names.   This topic shows how using a regular expression.   It shows the power and speed of a concise regular expression, and the flexibility with which regular expressions can be used in SQL queries as well as in the Transform pane.

 

Example: Create a Drawing from a Query - Everybody knows we can create a drawing from a table, but we can also create a drawing from a query.  When the query reports different results the drawing changes too.   This example show step by step how to create a query and then how to create a drawing from that query.   We show how to command Manifold to write a query for us that grabs a selection, and then how to create a drawing based on that new query.   This example duplicates the Drawings from Queries video using the Mexico_queries.mxb sample project that may be downloaded from the Examples page on the Manifold web site.  

 

SQL Example: Force an Anomaly in Constraints - Constraints are only evaluated when we insert or update records.   If a constraint refers to external data, such as the values in a different table, we can force an anomalous condition where the table with the constraint may contain data that no longer meets the requirements of the constraint.

 

Example: Create and Run a Query -  See how the different parts of a command window operate when creating and running SQL queries.   Includes use of the Log tab as well as the ?expression and !fullfetch commands.

 

Example: Export the Results of a Query - How to export the results table of a query for use in other applications.

 

Example: Refer to a Table within a Data Source -  Within a query, how to refer to a table that is in a data source.

 

Example: Switching between Manifold and Native Query Engines - How to use the !manifold and !native commands to switch a query in the Command window from use the Manifold query engine to whatever query engine is provided by a data source.

 

Example: Automatically Generating CREATE Queries - How to use the Command window to automatically generate SQL in the form of CREATE queries that create a desired component.

 

SQL Example: Miscellaneous SQL Functions - A variety of small examples using SQL functions.

 

SQL Example: Create Lines from a Table of Points - A classic task in GIS is to rebuild objects from data in file formats, such as CSV, which decompose GIS objects like lines or areas into individual points.  In this example we recover the original lines from a table of points that has been imported from a CSV file.   The points mark the positions of coordinates (vertices) that originally represented lines marking the boundaries of ice shelves in Antarctica.  Each point has a field giving the line of which it was a part, and also a field giving the order of the point within that line.   We use SQL and Manifold SQL functions to build lines from that table, automatically grouping points from each line in the correct order to reconstruct each line.

 

SQL Example: GeomOverlayAdjacent Function - Using the GeomOverlayAdjacent function, an example that shows how this function and similar functions such as GeomOverlayContained, GeomOverlayContaining, GeomOverlayIntersecting and GeomOverlayTouching operate.

 

SQL Example: GeomOverlayTopologyUnion Function - A continuation of the SQL Example: GeomOverlayAdjacent Function example, using the GeomOverlayTopologyUnion function, an example that shows how this function and similar functions such as GeomOverlayTopologyIdentity, GeomOverlayTopologyIntersect and GeomOverlayTopologyUpdate operate.

 

SQL Example: Learn SQL from Edit Query - Merging Areas - We learn how to write an SQL query that does a Merge : area (dissolve) operation by cutting and pasting from what the Edit Query button automatically generates.

 

Example: Transform Templates, Expressions and Queries - We learn to use a function by clicking on a template in the Transform pane, seeing what it does in a preview, looking at the query Manifold creates and then trying out the function in the Expression tab.

 

SQL Example: Process Images with 3x3 Filters -  Shows a step-by-step example of developing an SQL query that takes a query written by the Edit Query button and then modifies that query into a general purpose query that can apply any 3x3 filter.   This makes it easy to use matrix filters we find on the web for custom image processing.   We extend the query by using parameters and adding a function, and then show how it can be adapted to use a 5x5 filter.

 

SQL Example: List Transformation Grids - In this example we use a few snippets of easy SQL to list NTv2 and NADCON transformation grids that are available within the grids.dat compressed collection of transformation grid files.   Grid transformation files can be used when creating custom base coordinate systems, for NADCON / HARN / HPGN and NTv2 high accuracy transformations that use grid files.