DBMS Data Sources - Notes

This topic that captures technical notes from engineering on using DBMS packages such as Oracle, PostgreSQL, SQL Server, and DB2, including ESRI SDE utilized within databases.  Notes may also refer to other, non-server DBMS packages, such as GPKG.  Notes provided here may be difficult to find in other topics or not yet incorporated into other topics.  They are provided here so search engines can find them.

 

For the current levels of DBMS products that are supported, see the Third Party Release Levels  topic.

Specifying ports when connecting to DBMS data sources

When connecting to a DBMS we can specify the port to use either in the connection string or within the name of the server after a colon.   For example, when connecting to a PostgreSQL server we can write for the name of server:

 

localhost:5433

 

The dialog will know we are connecting to PostgreSQL, a database that allows explicit specification of a port to use for the connection, and it will split out the port number from the name automatically.   Alternatively we could explicitly specify the port as part of the connection string, in the form:

 

{ "Source": "host=localhost port=5433 user=postgres password=xxxx dbname=xxxx" }

Security

The best option for secure database connections is to use integrated security, where the Windows login through which a connection is made is mapped into user accounts within the DBMS, to which all the power and sophistication of roles and privileges can apply.  When using integrated security, no passwords or other credentials appear within connection strings.  Most DBMS packages support the use of integrated security.  To keep the presentation as simple as possible, except for the SQL Server example, integrated security is not used in DBMS topics.

 

Other options leverage the extensive security facilities already built into Window and databases.   If we do not use integrated security, but instead create a data source that uses login and password credentials we should take advantage of Windows security to protect those credentials.   It should go without saying that if we have access to a database in a particular role, we must know the password for the database user login that we will use.   We therefore should not fear using that password in plain text in the connection string if we have protected our credentials using the usual Windows security features.

 

For example, if we do not want other users on our computer to open the project and to use our access privileges to reach into the database, then we should Windows security features to ensure other users on the machine do not have permission to use the .map file.   If they have permissions to launch the .map file, it does not matter whether the password is stored in the connection string as plain text or in encrypted form: they still get the same level of access to the database when they connect through the data source using our credentials.  

 

Another way to enforce security is to save the connection to the database in a Windows DSN file as a user DSN.  User DSNs are specific to a particular user.   We then create a data source in Manifold by using File: dsn as the type in the New Data Source dialog, so that no connection strings are stored in the .map file.  Using a DSN has the advantage that we need only specify desired access to that one file to protect credentials, instead of to every .map project file we create.  In a well-administered and well-organized Windows installation, access permissions and other security features will automatically be applied based on the role of our login.  

Identity Fields

Views on PostgreSQL and other databases allow specifying a field to use as an identity field. Doing this allows records in the view table to be selected, and, if the view supports that, edited or deleted. To specify a field to use as an identity field, right-click the field header and choose Use as Identity. The system will scan all values in the field to make sure there are no duplicates or NULLs and if the values are clean, the table will expose a BTREE index on the field. The information about the field used as an identity field is saved into the database, so attempting to use the table in future Manifold sessions will automatically expose a BTREE index on that field without further scans. The identity option can be cleared or moved to a different field. The view can have only one identity field specified in this way.

 

Using a view with the identity field specified via the Use as Identity command in the table window monitors reads via the index. If the system detects that some of the values in the identity field are duplicates or NULLs, it automatically removes the identity option from the field and reports the removal in the log window.

VARCHAR and NVARCHAR Data Types for Databases

Dataports for databases expose all text fields as NVARCHAR (Unicode) even if they are stored as VARCHAR.   Attempting to create a VARCHAR field will create it as VARCHAR on the database, but the field will look like an NVARCHAR data type in Manifold.   What VARCHAR means varies between data sources and converting between different meanings frequently loses data, so this convention by Manifold helps to preserve data.  

 

Consider an example:  In Manifold, VARCHAR means 'characters in the currently active codepage, whatever that currently active codepage may be.'   However, in databases VARCHAR usually means 'characters in the codepage associated with the field / table / database'.    If we connect to a database, we can ask it to return data as VARCHAR, and the database might return data in a German codepage if a German code page was used for that field.  If we then try to use that data as VARCHAR in Manifold on an English system, the characters will be interpreted wrongly.  That will affect both correct display of the characters as well as comparisons and orderings.   

 

Different client systems use different ways of dealing with such issues.   Manifold 8, for example, allows setting a codepage for the field.  However, the end result of such adaptations is to convert the data in all codepages different from the default code page into Unicode and then handle data as Unicode.  If Unicode ends up being the intermediate form, it is better, as Manifold now does, to simply do the conversion as close to the data as possible.  

Geometry Collections

Reading geometry collection values automatically merges individual values of the same underlying type used in Manifold geometry, such as area, line or point, with differences between subtypes such as line and multiline being ignored.   The result of the merge is returned.   This applies to all data which support geometry collection values, including WKB, GeoJSON, JSON, native geometry in database-specific formats, and so on.

 

Reading geometry collection values with individual values of mixed underlying types automatically converts areas to lines and lines to points in order to return all coordinates. Example: reading a geometry collection with an area and several points will return a multipoint containing all coordinates of all individual values.

Collations

Manifold supports collations used within ICU, a library for all things Unicode developed and maintained by IBM.  ICU is used by many, if not all, significant applications that support Unicode, including some DBMS products.  Manifold implements custom collations currently used by the more common and more popular DBMS products, such as: 

 

 

System Data Table within Data Sources

When creating a data source on an external database, for example, in PostgreSQL or Oracle or some other DBMS, Manifold creates a virtual System Data folder that hosts two to four virtual tables, created on the fly for housekeeping.  These tables appear to be part of the data source, but do not exist within the data source and instead are created on the fly for our convenience as if they were in the data source.  They allow treating various external databases as if they were native Manifold components.  

 

An mfd_meta and mfd_root table are always in such a virtual System Data folder. Usually an mfd_srid table listing coordinate systems as spatial reference identifiers also appears, if the database uses SRIDs.    In addition, if the database supports collations an mfd_collate table appears, listing all the collations available in the database.

The mfd_collate System Data Table

When connecting to a DBMS, the database dataport will list available external collations in an mfd_collate table appears within the System Data folder in the data source.  Each collation has a unique name and a definition that can be used in a query. Unsupported collations have NULL definitions.

 

 

For example, creating a new data source from a new PostgreSQL installation we will see the System Data folder as above. In addition to the usual mfd_meta and mfd_root tables, it also contains an mfd_collate table.  These are virtual tables maintained by the dataport and not actually created within the PostgreSQL database, even though they appear within the PostgreSQL data source.

 

 

If we open the mfd_collate table we see it lists three available external collations, named pg_catalog.POSIX, pg_catalog.C, and pg_catalog.default.   The collation definitions we can use in Manifold queries are given in the Value field for each collation.  The table uses gray background to indicate it is read-only.

 

See the Collations topic for details on collations.

The mfd_srid System Data Table

When connecting to a DBMS that uses spatial reference identifiers also appears, an mfd_srid table listing coordinate systems as spatial reference identifiers will appear within the System Data folder in the data source.

 

 

For example, creating a new data source from a new PostgreSQL installation that has had PostGIS extension installed, we will see an mfd_srid table within the System Data folder as shown above. This is a virtual table maintained by the dataport and not actually created within the PostgreSQL database, even though it appears within the PostgreSQL data source.

 

 

If we open the mfd_srid table we see it lists SRIDs within the database, many of which have been derived from EPSG codes and definitions.  The table uses gray background to indicate it is read-only.

 

See the Projections topic to start drilling into coordinate systems / projections.  

 

Miscellaneous

 

SELECT * FROM [mysql]::[Products];

 

 

COPY in general takes much less time than INSERT.  Eight to ten times faster record inserts are common.

COPY is much friendlier to slow connections, where it saves even more time due to significantly reduced round trips.

 

 

--SQL9

SELECT [area], [code], [name], [geom]

INTO [gdb]::[states] (

  PROPERTY 'FieldGeomType.geom' 'area' -- the default would be 'point'

) FROM [states];

 

For databases that limit geometry types to a single type, values can be area, line, point, or pointmulti).  The any type is also allowed for other databases, but is the default with no need explicitly to specify any.  

 

Notes

'Database' vs 'data source' - In Manifold documentation and build notes, these two terms are frequently more or less the same thing. Data source is a collective name for files,  databases, or web servers which provide data.   When we start Manifold and create a new .map file, we set that .map file to be an unnamed root data source, 'root' because it is a root of a potential data source tree. When we link a SHP file, we do so by creating a new data source component, that is, a new data source.   That data source behaves like a database in that it exposes data through tables, and it has system tables with reserved names which describe its contents. The object representing an opened data source for a script is usually called Database, and in other ways data sources are treated as databases within Manifold.

 

Compacting Data Sources - If a data source can be compacted, right-clicking on that data source in the Project pane will show a Compact context menu command.  If the data source does not support compacting, the command will not appear.  In addition to the context menu Compact command that can be used manually, the Manifold object model includes means to compact a data source, to check whether a data source supports compacting, to check whether a data source has data to compact or has unsaved changes.

 

Constraints - The Constraint dialog in the Add - Constraint choice of the Schema dialog allows choosing a not-NULL template for a field, as long as the database hosting the table supports not-NULL constraints, as most databases do.

 

Oracle Spatial Index - Creating a spatial index for a table hosted on an Oracle database uses SPATIAL_INDEX_V2, if available, instead of SPATIAL_INDEX.  See the Schema topic.

 

Unique Index Names - Manifold allows the same name for an index to be used in different tables, while Oracle and DB2 require the name of an index to be unique throughout the entire DBMS.  If we have linked an Oracle database into our project and we are adding indexes to tables within that database, we cannot add an index named cities_x to a table if some other table in that Oracle database already uses that name.  Instead, the Schema dialog will automatically generate a unique name for that index using a GUID.

 

BTREE Uniqueness - Btree indexes can imply uniqueness requirements, for example, with only a given value stored once for a primary key.  The system implements such uniqueness requirements for BTREExxx indexes for all databases using UNIQUE indexes instead of UNIQUE constraints, to better match semantics used by the system.

 

Mixed Geometry Types - Copying and pasting a table with geometry data from a .map file into a data source that does not allow mixing geometry types within a single table, for example, GDB,  automatically sets the geometry type of pasted data to that used by most records.   This is a way of ensuring that tables which use only one type (points, lines, areas) have the type that they use correctly specified in the destination data source.

See Also

Collations

 

Schema

 

Project Pane

 

File - Create - New Data Source

 

Command Window

 

Real and Virtual Components

 

Microsoft Access and Manifold ODBC

 

Connect to MySQL

 

Connect to SQL Server

 

Connect to PostgreSQL

 

Connect to Oracle

 

Connect to Db2

 

Big List of Formats and Data Sources

 

Example: Switching between Manifold and Native Query Engines