This topic that captures technical notes from engineering on using DBMS packages such as Oracle, PostgreSQL, SQL Server, and DB2, including ESRI SDE utilized within databases. Notes may also refer to other, non-server DBMS packages, such as GPKG. Notes provided here may be difficult to find in other topics or not yet incorporated into other topics. They are provided here so search engines can find them.
For the current levels of DBMS products that are supported, see the Third Party Release Levels topic.
When connecting to a DBMS we can specify the port to use either in the connection string or within the name of the server after a colon. For example, when connecting to a PostgreSQL server we can write for the name of server:
localhost:5433
The dialog will know we are connecting to PostgreSQL, a database that allows explicit specification of a port to use for the connection, and it will split out the port number from the name automatically. Alternatively we could explicitly specify the port as part of the connection string, in the form:
{ "Source": "host=localhost port=5433 user=postgres password=xxxx dbname=xxxx" }
The best option for secure database connections is to use integrated security, where the Windows login through which a connection is made is mapped into user accounts within the DBMS, to which all the power and sophistication of roles and privileges can apply. When using integrated security, no passwords or other credentials appear within connection strings. Most DBMS packages support the use of integrated security. To keep the presentation as simple as possible, except for the SQL Server example, integrated security is not used in DBMS topics.
Other options leverage the extensive security facilities already built into Window and databases. If we do not use integrated security, but instead create a data source that uses login and password credentials we should take advantage of Windows security to protect those credentials. It should go without saying that if we have access to a database in a particular role, we must know the password for the database user login that we will use. We therefore should not fear using that password in plain text in the connection string if we have protected our credentials using the usual Windows security features.
For example, if we do not want other users on our computer to open the project and to use our access privileges to reach into the database, then we should Windows security features to ensure other users on the machine do not have permission to use the .map file. If they have permissions to launch the .map file, it does not matter whether the password is stored in the connection string as plain text or in encrypted form: they still get the same level of access to the database when they connect through the data source using our credentials.
Another way to enforce security is to save the connection to the database in a Windows DSN file as a user DSN. User DSNs are specific to a particular user. We then create a data source in Manifold by using File: dsn as the type in the New Data Source dialog, so that no connection strings are stored in the .map file. Using a DSN has the advantage that we need only specify desired access to that one file to protect credentials, instead of to every .map project file we create. In a well-administered and well-organized Windows installation, access permissions and other security features will automatically be applied based on the role of our login.
Views on PostgreSQL and other databases allow specifying a field to use as an identity field. Doing this allows records in the view table to be selected, and, if the view supports that, edited or deleted. To specify a field to use as an identity field, right-click the field header and choose Use as Identity. The system will scan all values in the field to make sure there are no duplicates or NULLs and if the values are clean, the table will expose a BTREE index on the field. The information about the field used as an identity field is saved into the database, so attempting to use the table in future Manifold sessions will automatically expose a BTREE index on that field without further scans. The identity option can be cleared or moved to a different field. The view can have only one identity field specified in this way.
Using a view with the identity field specified via the Use as Identity command in the table window monitors reads via the index. If the system detects that some of the values in the identity field are duplicates or NULLs, it automatically removes the identity option from the field and reports the removal in the log window.
Dataports for databases expose all text fields as NVARCHAR (Unicode) even if they are stored as VARCHAR. Attempting to create a VARCHAR field will create it as VARCHAR on the database, but the field will look like an NVARCHAR data type in Manifold. What VARCHAR means varies between data sources and converting between different meanings frequently loses data, so this convention by Manifold helps to preserve data.
Consider an example: In Manifold, VARCHAR means 'characters in the currently active codepage, whatever that currently active codepage may be.' However, in databases VARCHAR usually means 'characters in the codepage associated with the field / table / database'. If we connect to a database, we can ask it to return data as VARCHAR, and the database might return data in a German codepage if a German code page was used for that field. If we then try to use that data as VARCHAR in Manifold on an English system, the characters will be interpreted wrongly. That will affect both correct display of the characters as well as comparisons and orderings.
Different client systems use different ways of dealing with such issues. Manifold 8, for example, allows setting a codepage for the field. However, the end result of such adaptations is to convert the data in all codepages different from the default code page into Unicode and then handle data as Unicode. If Unicode ends up being the intermediate form, it is better, as Manifold now does, to simply do the conversion as close to the data as possible.
Reading geometry collection values automatically merges individual values of the same underlying type used in Manifold geometry, such as area, line or point, with differences between subtypes such as line and multiline being ignored. The result of the merge is returned. This applies to all data which support geometry collection values, including WKB, GeoJSON, JSON, native geometry in database-specific formats, and so on.
Reading geometry collection values with individual values of mixed underlying types automatically converts areas to lines and lines to points in order to return all coordinates. Example: reading a geometry collection with an area and several points will return a multipoint containing all coordinates of all individual values.
Manifold supports collations used within ICU, a library for all things Unicode developed and maintained by IBM. ICU is used by many, if not all, significant applications that support Unicode, including some DBMS products. Manifold implements custom collations currently used by the more common and more popular DBMS products, such as:
The SQL Server dataport supports binary collations and collations based on Windows. This covers the vast majority of collations available on SQL Server.
The PostgreSQL dataport supports binary collations, collations based on Windows and collations based on ICU. This covers the vast majority of collations available on PostgreSQL.
Creating a primary key on an existing text field in PostgreSQL always uses the collation of the field. This became a requirement in PostgreSQL 13. Secondary indexes are able to use collations different from those of fields.
The MySQL dataport supports binary collations, collations in the 'latin' family and collations based on ICU. This covers a sizeable portion of the most commonly used collations.
Creating an index on an existing text field in MySQL or SQL Server always uses the collation of the field and does not attempt to change it. When adding an index on a field that is not part of any other index the system shows, for a likely edit, the collation of the field.
The Oracle dataport supports collations for text fields. Recognized collations are listed in the system mfd_collate table.
The DB2 dataport supports collations for text fields with some limits: We can use collations when reading, but we cannot create indexes with collations, due to the current limitations of the latest versions of DB2. DB2 is in the process of migrating to doing everything in Unicode and that process is not yet complete. Recognized collations are listed in the system mfd_collate table.
When creating a data source on an external database, for example, in PostgreSQL or Oracle or some other DBMS, Manifold creates a virtual System Data folder that hosts two to four virtual tables, created on the fly for housekeeping. These tables appear to be part of the data source, but do not exist within the data source and instead are created on the fly for our convenience as if they were in the data source. They allow treating various external databases as if they were native Manifold components.
An mfd_meta and mfd_root table are always in such a virtual System Data folder. Usually an mfd_srid table listing coordinate systems as spatial reference identifiers also appears, if the database uses SRIDs. In addition, if the database supports collations an mfd_collate table appears, listing all the collations available in the database.
When connecting to a DBMS, the database dataport will list available external collations in an mfd_collate table appears within the System Data folder in the data source. Each collation has a unique name and a definition that can be used in a query. Unsupported collations have NULL definitions.
For example, creating a new data source from a new PostgreSQL installation we will see the System Data folder as above. In addition to the usual mfd_meta and mfd_root tables, it also contains an mfd_collate table. These are virtual tables maintained by the dataport and not actually created within the PostgreSQL database, even though they appear within the PostgreSQL data source.
If we open the mfd_collate table we see it lists three available external collations, named pg_catalog.POSIX, pg_catalog.C, and pg_catalog.default. The collation definitions we can use in Manifold queries are given in the Value field for each collation. The table uses gray background to indicate it is read-only.
See the Collations topic for details on collations.
When connecting to a DBMS that uses spatial reference identifiers also appears, an mfd_srid table listing coordinate systems as spatial reference identifiers will appear within the System Data folder in the data source.
For example, creating a new data source from a new PostgreSQL installation that has had PostGIS extension installed, we will see an mfd_srid table within the System Data folder as shown above. This is a virtual table maintained by the dataport and not actually created within the PostgreSQL database, even though it appears within the PostgreSQL data source.
If we open the mfd_srid table we see it lists SRIDs within the database, many of which have been derived from EPSG codes and definitions. The table uses gray background to indicate it is read-only.
See the Projections topic to start drilling into coordinate systems / projections.
Creating a new component on a data source which adjusts names of created components (commonly seen with databases which prepend a schema name) captures the final created name and correctly selects the created component in the Project pane.
Queries representing database views report their schemas. The Project pane allows viewing the schema of such a query by right-clicking it and selecting the Schema command.
Refreshing a table in a database data source automatically reloads table metadata from the database to catch changes done in other sessions, for example, in a different instance of Manifold connected to the same database.
The DB2 dataport reports schemas for materialized views.
The Oracle dataport treats materialized views as queries, not as tables.
The PostgreSQL dataport allows accessing materialized views and reports their schemas.
The GPKG dataport returns schemas for views: that allows using views as a base for drawings or for images.
Dataports for PostgreSQL and other databases detect writable views and allow writes to them.
The SQL Server dataport tries to resolve the SRID of geometry values in a table to the system MFD_SRID table, which is created based on a SQL Server view. If an SRID cannot be resolved, it is interpreted as an EPSG code, according to Microsoft guidelines and similar to how such codes are used in practice by Manifold 8 and other products.
The SQL Server dataport supports DATETIMEOFFSET and DATETIME2 data types. DATETIMEOFFSET values are automatically converted to the timezone for the current user on both read and write. Reading a date with a timezone converts data to the timezone for the current user. Entering a date without a timezone will use the timezone for the current user.
Dataports for SQL Server and other databases map unrecognized field types to NVARCHAR. This allows working with partial data in tables that contain fields which the database driver cannot work with, which may happen due to a bug, misconfiguration, or version skew.
SQLITE / GPKG dataports keep binary values in non-binary fields as binary. This is specific to SQLITE, which allows values of different underlying types within the same field.
Jet XLS databases support boolean fields.
When creating a new table on DB2, Jet, MySQL, Oracle, PostgreSQL, and SQL Server, names that are longer than allowed by the server automatically will be reduced to the name length allowed by the DBMS.
PostgreSQL, Jet DB, Jet WKx, Jet XLS, and ADO.NET databases support UUID fields as a native UUID type. MySQL, SQLite, Oracle, and DB/2 databases support UUID fields as a fixed-length string type.
MySQL databases support unsigned integer types.
MySQL databases support both utf8mb4 and utf8 / utf8mb3 Unicode types.
Dataports for SQL Server, Oracle, DB2, PostgreSQL, GPKG, MySQL, MDB and other DBMS packages that support transactions wrap batch insert, batch delete, and batch update operations into transactions for performance and robustness, for example, for safe rollback in case an operation fails mid-way.
Dataports for all databases automatically break big transactions for batch operations into manageable portions. Without this feature, most databases will fail with an overflow error when the amount of data to be committed exceeds a certain limit. The limit usually is too small for GIS work, often being in the range of 15,000 to 20,000 records or 50 MB to 100 MB of record data. Those values are far too low for GIS work so Manifold protects against overflows.
When writing to MDB, the MDB dataport automatically adjusts the names of newly inserted tables to fit into the limits imposed by MDB, for example, no leading spaces, no double quotations, no periods, and so on.
OLE DB, ODBC, and ADO.NET databases expose Manifold system tables.
Many databases make a distinction between NULL binary values and empty binary values. Manifold does not make such a distinction and this may be a problem in some cases. For example, copying a table with empty binary values and a non-NULL constraint on the relevant field from PostgreSQL into a MAP file will currently fail on the first empty value encountered. These issues will be addressed in future builds.
When writing queries, we can refer to tables within a data source using the syntax [datasource]::[table]. For example, we could write a query that selects all records from the Products table within data source called mysql as:
SELECT * FROM [mysql]::[Products];
COPY in general takes much less time than INSERT. Eight to ten times faster record inserts are common.
COPY is much friendlier to slow connections, where it saves even more time due to significantly reduced round trips.
--SQL9
SELECT [area], [code], [name], [geom]
INTO [gdb]::[states] (
PROPERTY 'FieldGeomType.geom' 'area' -- the default would be 'point'
) FROM [states];
For databases that limit geometry types to a single type, values can be area, line, point, or pointmulti). The any type is also allowed for other databases, but is the default with no need explicitly to specify any.
'Database' vs 'data source' - In Manifold documentation and build notes, these two terms are frequently more or less the same thing. Data source is a collective name for files, databases, or web servers which provide data. When we start Manifold and create a new .map file, we set that .map file to be an unnamed root data source, 'root' because it is a root of a potential data source tree. When we link a SHP file, we do so by creating a new data source component, that is, a new data source. That data source behaves like a database in that it exposes data through tables, and it has system tables with reserved names which describe its contents. The object representing an opened data source for a script is usually called Database, and in other ways data sources are treated as databases within Manifold.
Compacting Data Sources - If a data source can be compacted, right-clicking on that data source in the Project pane will show a Compact context menu command. If the data source does not support compacting, the command will not appear. In addition to the context menu Compact command that can be used manually, the Manifold object model includes means to compact a data source, to check whether a data source supports compacting, to check whether a data source has data to compact or has unsaved changes.
Constraints - The Constraint dialog in the Add - Constraint choice of the Schema dialog allows choosing a not-NULL template for a field, as long as the database hosting the table supports not-NULL constraints, as most databases do.
Oracle Spatial Index - Creating a spatial index for a table hosted on an Oracle database uses SPATIAL_INDEX_V2, if available, instead of SPATIAL_INDEX. See the Schema topic.
Unique Index Names - Manifold allows the same name for an index to be used in different tables, while Oracle and DB2 require the name of an index to be unique throughout the entire DBMS. If we have linked an Oracle database into our project and we are adding indexes to tables within that database, we cannot add an index named cities_x to a table if some other table in that Oracle database already uses that name. Instead, the Schema dialog will automatically generate a unique name for that index using a GUID.
BTREE Uniqueness - Btree indexes can imply uniqueness requirements, for example, with only a given value stored once for a primary key. The system implements such uniqueness requirements for BTREExxx indexes for all databases using UNIQUE indexes instead of UNIQUE constraints, to better match semantics used by the system.
Mixed Geometry Types - Copying and pasting a table with geometry data from a .map file into a data source that does not allow mixing geometry types within a single table, for example, GDB, automatically sets the geometry type of pasted data to that used by most records. This is a way of ensuring that tables which use only one type (points, lines, areas) have the type that they use correctly specified in the destination data source.
File - Create - New Data Source
Microsoft Access and Manifold ODBC
Big List of Formats and Data Sources
Example: Switching between Manifold and Native Query Engines