SQL Example: Extract Airport Runways from an OpenStreetMap PBF

We write a simple SQL query using INNER JOIN to extract runway lines from an OpenStreetMap PBF of Cyprus, and to save those lines to a new drawing and table.

 

OpenStreetMap (OSM) has the tremendous virtue of providing vast amounts of free data worldwide, but it has the disadvantage of using  native data structures designed primarily to support the server-side, distributed internal functioning of OSM.  When published in what has become the de facto native export format for OSM, PBF, the data relationships can be difficult to understand in comparison to typical GIS data sets, and often require some processing to extract features of interest.

 

In this example we download a PBF with complete OSM data for the island of Cyprus.  We then extract runway lines from the tables.   We use Cyprus because the PBF for Cyprus is small, easily downloaded by anyone who wants to repeat this example, and the workflow shown in this topic can rapidly be done even on older and weaker computers.   

 

Faster workflow encourages experimentation, such as adapting the workflow in this topic to extract other features of interest.  Once we feel confident in our abilities and in the results of tests on small PBF files like that for Cyprus, we can tackle larger countries with multi-hundred MB PBF files that expand into projects that are many gigabytes in size, and which can take hours to import and many minutes, or even hours, to process.

Download PBF File

We visit the geofabrik.de website for European downloads, at https://download.geofabrik.de/europe.html

 

From that site we download the PBF file for Cyprus, from https://download.geofabrik.de/europe/cyprus-latest.osm.pbf

 

Websites and links change frequently.  If the links above no longer work, any good search engine will find resources for downloading OSM data as PBF files.

 

The Cyprus PBF file is just over 15 MB and downloads quickly.   Thank you, Geofabrik, for providing this service!

Import

We launch Manifold and choose File - Import.

 

 

In the Import dialog, we navigate to the folder to which we downloaded the file, and we double-click on cyprus-latest.osm.pbf.   Manifold automatically recognizes it as a PBF and imports it.

 

 

The main drawing is the cyprus-latest Drawing, using the table cyprus-latest.   Other tables provide supplementary information associated with the objects in that drawing.   

 

Current Manifold builds have a limit of 2 billion records per table, which can be exceeded when importing PBF files that provide OSM data for the entire planet at once.   To avoid that limit, Manifold imports the PBF with separate tags tables, with tags tables for addresses, buildings, sources, and tags table for everything else, called cyprus-latest Tags.  

 

Our task is to use the information in the cyprus-latest Tags table to find OSM id values for records for where the key is aeroway and the value is runway, and to then use a simple JOIN to get the object records for those id values from the cyprus-latest table.

 

The query to do that is very simple.  However, as often the case in GIS, much of this topic is workflow to prepare to run the query, such as ensuring the tables we use have the indexes we want to support fast processing.

Configure Indexes

As imported from the PBF, the tables we will be using do not have indexes on fields we will use in the join, so we will add those.  That is easy to do with a few clicks in the Schema dialog.

 

We first will adjust the cyprus-latest table, and then we will adjust the cyprus-latest Tags table.

 

 

In the Project pane, we double-click the cyprus-latest table to open it.    Since this table is used to create a drawing, on import Manifold automatically adds a guaranteed unique mfd_id key field and an index on that.   However, the table does not have an index on the id field that we will use in the join operation later on in this topic.

 

We will add an index on the id field.   With the focus on the opened table, in the main menu we choose Edit - Schema.  

 

 

In the Schema dialog, we press the Add command button and choose Index.  

 

 

In the Index dialog, we specify id_x as the Name (a memorable name for an index on the id field), we choose regular index (btree) as the Type of index, and we check Allow duplicate values. In the pull down menu for the first Field box we choose id.

 

Press OK.

 

 

The new index is shown in provisional blue color, as something that will be added if we press Save Changes.   We double-check our work and press Save Changes.

 

Next, in the Project pane we double-click open the cyprus-latest Tags table.    We will add indexes on the id field, on the key field, and on the value field.   The id field will be used in the join, and the key field and the value field will be used in a WHERE clause.

 

 

With the focus on the open cyprus-latest Tags table, in the main menu we choose Edit - Schema.

 

 

In the Schema dialog, we press the Add command button and choose Index.  

 

 

As we did for the prior table, in the Index dialog, we specify id_x as the Name, we choose regular index (btree) as the Type of index, and we check Allow duplicate values. In the pull down menu for the first Field box we choose id.

 

Press OK.

 

 

The new index is shown in provisional blue color, as something that will be added if we press Save Changes.   However, we have two more indexes to add.   First we will add an index for the key field.

 

 

We press the Add command button and choose Index.  

 

 

In the Index dialog, we specify key_x as the Name, we choose regular index (btree) as the Type of index, we check Allow duplicate values, and we also check Allow null values.  In the pull down menu for the first Field box we choose key.

 

Press OK.

 

 

A new row appears for the proposed key_x index.

 

We repeat the procedure above, pressing Add, choosing Index and so on, remembering to check Allow duplicate values and Allow null values, to add an index called value_x on the value field.

 

 

We now have three indexes we propose to add to the table.     We double-check our work and press Save Changes.

Create a Query

With the above workflow we have added indexes to the tables we will use in the query that extracts runway features.   We now will create that query.

 

In the main menu, we choose File - Create - New Query.

 

 

In the New Query dialog we enter Extract Runways as the name for our new query.   Press Create Query.

 

 

The new query appears in the Project pane.   Double-click the new query to open it in a Command window.  

 

The query opens with default, stub text.   We replace that text with the query text we would like to use.

 

 

The SQL text we enter is:

 

CREATE TABLE [cyprus runways Table] (

  [mfd_id] INT64,

  [id] INT64,

  [type] INT8,

  [version] INT32,

  [name] NVARCHAR,

  [Geom] GEOM,

  [value] NVARCHAR,

  INDEX [mfd_id_x] BTREE ([mfd_id]),

  INDEX [Geom_x] RTREE ([Geom]),

  PROPERTY 'FieldCoordSystem.Geom' ComponentCoordSystem([cyprus-latest Drawing])

);

CREATE DRAWING [cyprus runways] (

  PROPERTY 'Table' '[cyprus runways Table]',

  PROPERTY 'FieldGeom' 'Geom'

);

PRAGMA ('progress.percentnext' = '100');

INSERT INTO [cyprus runways Table] (

  [id], [type], [version], [name], [Geom], [value]

)

SELECT [cyprus-latest].id,  [cyprus-latest].type,

   [cyprus-latest].version, [cyprus-latest].name,

   [cyprus-latest].Geom, [Right].value

   FROM [cyprus-latest]

   INNER JOIN

      (SELECT [id], [value] FROM [cyprus-latest Tags]

         WHERE [key] = 'aeroway' AND [value] = 'runway')

         AS [Right]

   ON [cyprus-latest].[id] = [Right].[id] AND [cyprus-latest].[type] = 1;

 

The query generates a simple drawing and table, with only one additional field added from the tags table, the value field.  For discussion on how the query functions, see the How the Query Works section at the end of this topic.

 

We run the query by pressing the ! Run button in the main toolbar.

 

 

The results table reported in the Command Window is a single value of 23, meaning there were 23 records inserted into the new table and drawing the query created.

 

 

The new table and drawing appear in the Project pane.     We could double-click the drawing to take a look at it, but to better see it in context we will create a map that uses it.

 

In the main menu, we choose File - Create - New Map.

 

 

The New Map dialog shows layers available for use in a new map.  We enter Runways Map as a Name, we choose Bing streets as a base layer, and we check the boxes for the cyprus-runways and cyprus-latest Drawing layers.

 

Press Create Map.

 

 

A new Runways Map appears in the Project pane.  We double-click it to open it.

 

 

We zoom into the view above, and we double-click the cyprus-latest Drawing layer tab to turn off that layer.   

 

We use the Style pane to style the runway lines as thick green lines with black borders on the lines.

 

 

Zooming in to the airport by Nicosia, we see it has two runways.

 

 

We can turn on the cyprus-latest Drawing layer to see the mass of objects from which we extracted the runway lines.

 

 

Turning off the cyprus runways layer, and selecting the corresponding lines in the cyprus-latest Drawing layer to highlight them in red (this requires adding an mfd_id index to the table, a single click to turn on selection for that layer)  we can see which lines were picked out as runway lines by our query.

 

 

The runway lines at the Nicosia airport might have been more or less obvious by visual inspection, but the many other small airport lines scattered throughout Cyprus would have been very hard to find manually.   Above we see a small airport in a mostly unpopulated part of the island, that would have been very difficult to find manually.  We have added a Bing satellite image server layer for a satellite image background.

How the Query Works

Consider the query we wrote, to extract runway lines:

 

 

The SQL text is:

 

CREATE TABLE [cyprus runways Table] (

  [mfd_id] INT64,

  [id] INT64,

  [type] INT8,

  [version] INT32,

  [name] NVARCHAR,

  [Geom] GEOM,

  [value] NVARCHAR,

  INDEX [mfd_id_x] BTREE ([mfd_id]),

  INDEX [Geom_x] RTREE ([Geom]),

  PROPERTY 'FieldCoordSystem.Geom' ComponentCoordSystem([cyprus-latest Drawing])

);

CREATE DRAWING [cyprus runways] (

  PROPERTY 'Table' '[cyprus runways Table]',

  PROPERTY 'FieldGeom' 'Geom'

);

PRAGMA ('progress.percentnext' = '100');

INSERT INTO [cyprus runways Table] (

  [id], [type], [version], [name], [Geom], [value]

)

SELECT [cyprus-latest].id,  [cyprus-latest].type,

   [cyprus-latest].version, [cyprus-latest].name,

   [cyprus-latest].Geom, [Right].value

   FROM [cyprus-latest]

   INNER JOIN

      (SELECT [id], [value] FROM [cyprus-latest Tags]

         WHERE [key] = 'aeroway' AND [value] = 'runway')

         AS [Right]

   ON [cyprus-latest].[id] = [Right].[id] AND [cyprus-latest].[type] = 1;

 

 

A Manifold query can contain more than one statement ending in a semicolon ; character, with each statement being executed in sequence.  The query above contains four statements: two CREATE statements, a PRAGMA, and then finally an INSERT INTO.

 

The query has two main parts:  the first part creates a table and a drawing, and the second part populates the table (which automatically, of course, populates the drawing).    

 

Let us unpack the query to see how it works.   The first part is simple:

 

CREATE TABLE [cyprus runways Table] (

  [mfd_id] INT64,

  [id] INT64,

  [type] INT8,

  [version] INT32,

  [name] NVARCHAR,

  [Geom] GEOM,

  [value] NVARCHAR,

  INDEX [mfd_id_x] BTREE ([mfd_id]),

  INDEX [Geom_x] RTREE ([Geom]),

  PROPERTY 'FieldCoordSystem.Geom' ComponentCoordSystem([cyprus-latest Drawing])

);

CREATE DRAWING [cyprus runways] (

  PROPERTY 'Table' '[cyprus runways Table]',

  PROPERTY 'FieldGeom' 'Geom'

);

PRAGMA ('progress.percentnext' = '100');

 

The first two CREATE statements create a table and then create a drawing based on that table.   The table is created with six fields.   Two indexes are created, one on id and one on Geom.   The table is created using the same coordinate system as the cyprus-latest Drawing drawing.

 

Drawings in Manifold contain no data but just consist of a minimum of two properties that say which table they should use for their geometry and which field in that table contains the geometry.   Creating a drawing in Manifold in SQL is thus trivial.

 

The PRAGMA statement creates a progress bar that shows progress as the rest of the query proceeds.

 

The second, main part of the query finds records in the tags table that correspond to runways, joins them into the table with geometry to create a results table of objects that are runways, and then inserts those into the newly-created table:

 

INSERT INTO [cyprus runways Table] (

  [id], [type], [version], [name], [Geom], [value]

)

SELECT [cyprus-latest].id,  [cyprus-latest].type,

   [cyprus-latest].version, [cyprus-latest].name,

   [cyprus-latest].Geom, [Right].value

   FROM [cyprus-latest]

   INNER JOIN

      (SELECT [id], [value] FROM [cyprus-latest Tags]

         WHERE [key] = 'aeroway' AND [value] = 'runway')

         AS [Right]

   ON [cyprus-latest].[id] = [Right].[id] AND [cyprus-latest].[type] = 1;

 

The core part of the query is a SELECT statement that accomplishes an INNER JOIN, and within that is a SELECT that finds records in the tags table using WHERE to find only those that are aeroways and are runways.   

 

The result of that SELECT, finding records in the tags table that are runways, is used in the SELECT ... INNER JOIN.    The ON condition is not just a match on the [id] field common to both tables, it also includes an AND clause to restrict matches to only those that have [type] of 1 in the cyprus-latest table.  OSM encodes the type of object using a type of 1 for lines.  We want only airport runway lines, not objects which have the same id and which are tagged as runways, but which may be points or areas.  

 

The result of the SELECT ... INNER JOIN in turn is used to feed the INSERT INTO, populating the new table (and thus the new drawing) that the query creates.

 

Videos

Join Videos

 

Find Percentages of Open Space in ZIP Code Area - Find the percentage of open space in each ZIP code area given a layer of polygons representing ZIP codes and a layer of polygons showing open spaces like parks and green spaces. This video shows how to do that start to finish in a few simple steps, from initial importing of shape files to final results, in just five minutes, with an additional few minutes of explanation what each step does. Works in Manifold Release 9 or using the free Manifold Viewer.

 

See Also

Tables

 

Queries

 

Drawings

 

Maps

 

Style: Drawings

 

Schema

 

Join

 

Join Examples

 

Command Window

 

PBF .pbf, OSM, O5M

 

Example: Find Percentages of Open Space in ZIP Code Areas - Given a drawing showing ZIP codes as areas (polygons) and another drawing showing open spaces like parks and nature preserves, we add a field to each ZIP code area that gives the percentage of open space in that ZIP code area.  The workflow we show handles situations where some open space regions overlap multiple ZIP code areas, correctly reckoning only that part of the open space within each ZIP code area.

 

Example: Create a Map Showing OSM Use by Country - A start-to-finish real life example of map creation that combines various Manifold capabilities.  Copying a table of numbers from a web site, we create a map that is thematically colored to show usage of OpenStreetMap by country in proportion to the population of that country.

 

Example: Style Pane Quickstart - A tutorial introduction to using the Style pane to apply color, symbology, size and rotation to areas, lines and points in drawings.

 

Example: Format a Drawing using the Style Pane - In this example we provide a first, step by step look at how to format areas in a drawing using the Style pane.  We can specify the same formatting for all areas or use a field to automatically set formatting, a process usually known as thematic formatting.

 

Example: Connect to an OSM Vector Server - We connect to an OSM Server that provides a vector layer containing points and lines in the OpenStreetMap database.  We then show how to scrape (copy) data from the OpenStreetMap server into local storage.  We extract building footprints from the local copy.