Like most database systems, Manifold uses indexes to improve performance, both within Manifold native tables and also taking advantage of indexes that may be available within any external data sources that might be linked into a Manifold project. An index is a data structure that provides greater efficiency and speed for many database operations. An index on a field helps the database system find specific values in that field faster.
The performance gain from using an index is so great that Manifold requires tables to have an index (normally a btree or btreenull index) to be editable and to support Manifold's interactive selection capabilities (a different thing than using SELECT in SQL, which does not require an index). We can create indexes on fields in tables using the Schema dialog.
In general, if a field plays an important role in a query it should have an index on it. For example, key fields for JOINs or in the Join dialog should have an index on them. If we write a query where for a billion records we want to check if a particular field has a value within a given range, that field should have an index on it. If we write a query involving larger data, and the query runs unusually slowly, processing only a few records per second, despite appearing to be a simple query, that often indicates a lack of an index on one or more fields that play important roles in the query.
Must read: Please make sure to read the Add an Index to a Table topic as well. That topic covers the mfd_id field used within Manifold for indexes as well as other important info. See also the Collations topic to learn how mechanisms for managing order in different languages, called collations, are managed by Manifold.
The default index created by Manifold is a btree index, a classic balanced tree, B-tree, data structure which utilizes a single field in which there can be no NULLs and no duplicate values. Indexes can also be created using more than one field. We can even create an index on a computed field, so long as the table for the computed field is stored in the .map project (and not in an external database).
There are four forms of the basic btree index, referred to collectively as BTREExxx indexes, which cover the four possibilities for whether NULLs are allowed or not allowed and whether duplicates are allowed or not allowed.
Manifold uses rtree spatial indexes for spatial work with geometry fields in drawings and tile fields in rasters. Rtree indexes are balanced tree structures utilizing bounding rectangles, the R in rtree. In general, if a drawing's table does not have an rtree index on the geometry field the drawing will not be able to display the objects.
btree |
A classic B-tree, that is a balanced tree, index. A btree index allows no duplicates and no NULLs. |
btreedup |
A B-tree index that allows duplicates but does not allow NULLs. |
btreedupnull |
A B-tree index that allows duplicates and also allows NULLs. |
btreenull |
A B-tree index that allows NULLs but does not allow duplicates; that is, it requires all non-NULLs to be unique. |
rtree |
A spatial index that is a balanced tree structure utilizing bounding rectangles or boxes. |
A BTREExxx index sorts values from lowest to highest, is generally applied to numbers, strings, and dates, and helps optimize searches for operators such as = as well as operators such as IN, or BETWEEN AND, ome other operators.
An RTREE index sorts values spatially and can only be applied to geometry values. An RTREE index helps optimize searches for functions specific to the database, for example, functions such as GeomAdjacent, GeomContains, GeomIntersects, GeomTouches, GeomWithin, and other functions.
Whenever possible, indexes from input tables are exposed, that is, available for use in subsequent steps, in the results tables of various SQL statements. This makes writeback possible, that is, the editing of records by editing results tables and it improves performance in many ways:
SELECT exposes indexes on output fields from the underlying table. This provides much better performance in GROUP and ORDER using these indexes.
Replicating an indexed field in the SELECT list exposes multiple copies of the same index.
WHERE optimizes searches via indexes exposed by SELECT, by combining search criteria with the WHERE condition.
JOIN exposes indexes on output fields from the underlying tables. A LEFT outer join loses BTREExxx indexes on the right side. A RIGHT outer join loses BTREExxx indexes on the left side. A FULL outer join loses them on both sides.
The types of BTREExxx indexes exposed by a join can change depending on the join condition: For example, a BTREE index can become a BTREEDUP index if the joined table can include multiple copies of a record from the indexed table. RTREE indexes survive intact in all cases.
ORDER retains the index it uses to order whenever possible.
SELECT allows writing to the underlying table. The table exposed by SELECT must have a BTREE or BTREENULL index. If a particular field from the underlying table participates in the SELECT list twice, it becomes read-only.
JOIN allows writing to the underlying tables when the table exposed by JOIN has a BTREE or BTREENULL index. Attempting to edit at the same time fields within both joined tables will fail.
In most other database systems we normally think of results tables produced by an SQL query as read only, but as can be seen from the list above, in Manifold results tables are often editable. If the results table includes an indexed field, as is often the case with a SELECT or a JOIN statement, we can edit that results table and the edits made to field values automatically will be written back into the source tables. See the Editable Results Tables topic for details.
When is it useful to add an index? Consider a query such as:
SELECT t.value1, u.value2
FROM t, u
WHERE t.id=u.id:
We assume all fields are numbers so we consider only BTREExxx indexes. We can then say:
Adding an index on u.value3 is not going to make the query faster because that field does not participate in the query at all.
Adding an index on t.value1 is not going to help the query run faster, because when the database executes the query it does not search t for specific values of value1, it searches t (or u) for specific values of t.id or u.id and whatever is in t.value1 just gets copied into the output.
Adding an index on t.id tells the query engine "Just in case, if you happen to have a value of t.id, the table t can return other values for that record quickly." The query engine will use that tip to optimize its work and the query will run faster than without the index.
Adding an index on u.id in addition to the index on t.id will make the query even faster, and doubly so if the indexes are BTREE indexes, which disallow nulls and duplicates.
The answer to the general question, "Which field needs which index?" depends on the queries we are running and the nature of our data. That is a big topic on which books can be written (and have been written), but a quick and imprecise answer is that if we have a field which we primarily use to identify records and to refer to records from other tables, for example, an ID field or a RECORDNUM field or something similar, and if we are using that field in constructs such as WHERE t.ID = <another value possibly from another table>, or a field involved in an ON condition for a JOIN, we will probably want to add a BTREExxx index on that field.
If the values in the field cannot contain NULLs, we can use that to guide the index type we choose, for example, using a BTREE or a BTREEDUP index. If the field can contain NULLs we instead would use a BTREENULL or BTREEDUPNULL index. If values in the field cannot be duplicates, we would choose either BTREE or BTREENULL, and if they can contain duplicates we would use either BTREEDUP or BTREEDUPNULL.
If we are doing spatial searches on a particular field, then probably we would build an RTREE index on that field.
More than one field in an index - BTREExxx indexes can be built on more than one field. All BTREExxx indexes built on more than one field support searches by partial key.
Partial keys, indexes and WHERE - WHERE optimizes for conditions which engage BTREExxx indexes via a partial key. For example, SELECT ... WHERE a = ... runs faster if the source table has a BTREExxx index with a composite key built on a and other fields, with a coming first in the composite key.
RTREE indexes and spatial searches - RTREE indexes can use a secondary box as a hint for spatial searches, when the query engine exposes an RTREE index on a join that already filters by a spatial criteria.
Spatial indexes and tiles - Spatial indexes for tiles support a number of operations which help the user interface, such as computing bounds limited to visible pixels, computing approximate statistics, and so forth. Spatial indexes for tiles are supported in ,map files and in file dataports that store data in MAPCACHE files.
Selection or SELECT? Manifold allows interactive selection of records in a table using point and click commands or using the Select pane. That is a different thing than the use of the SELECT statement in SQL. See the Selection topic. SELECT works with any table whether or not there are indexes in the table. Selection requires the table to have an index.
Indexes or indices? Both are correct in English language as the plural form of index, with indexes more popular in the US. Computer people tend to use indexes, presumably due to the influence of Americanisms within computing. Given this documentation was written by many hands, some under the influence of American English and others influenced by UK English, both forms may appear.
Command Window - Query Builder
Example: Create a Table with a Constraint - Create a simple table that includes a simple constraint upon one of its fields.
Adding an Index to a Table - A basic topic on adding an index using the built-in capabilities of the mfd_id field.
Example: Add a Spatial Index to a Table - A typical use of an index is to provide a spatial index on a geom field in a table, so the geom data can be visualized in a drawing. This topic provides the step by step procedure for adding a spatial index.
Example: Add a UUID-based Index to a Table - Create a new computed field that is filled with UUID values on creation and then create an index on that field. This technique creates an indexed field that has guaranteed unique values for all records and thus the indexed field and record values may be used in other projects.