In this example create a data source for a CSV web server, to acquire current data on cases and deaths worldwide for the Covid-19 pandemic. The CSV file on the server is updated, so our project can use the latest data as it is published. We then use the point and click Edit - Join dialog to rapidly join that data, automatically aggregated as we desire, into a world map for visualization. The project created for this example topic may be downloaded as the pandemic.map project in the Examples page on the Manifold website.
We will connect to a web server run by the European Centre for Disease Prevention and Control (ECDC) that provides data on the Covid-19 pandemic as a CSV file. The link to the file is
https://opendata.ecdc.europa.eu/covid19/casedistribution/csv
We open Manifold with a new, blank project. Choose File - Create - New Data Source.
In the New Data Source dialog we enter a useful Name for the new data source, covid_csvserver, so later on we can tell at a glance what this data source provides. We choose Web Server: csvserver as the Type and we enter the URL above into the Source box. We leave all other options at their default settings, and then press Create Data Source.
For more details on using the CSV web server dataport, see the CSV web server topic.
That creates a new data source in our project called covid_csvserver. When we expand the project, we can see it contains a table called csv. We double-click the csv table to open it.
This particular data set has multiple records for each country, with each record providing a calendar date it was reported in the dateRep field, the new Covid cases reported on that date in the cases field and deaths that occurred on that date in the deaths field. Many countries in the data set do not have recent data or have missed reporting on various dates. While having a time sequence like this provides a historical record of what cases and deaths were reported on each date that a report was made, it means that to learn the total number of cases and deaths as of the latest report date, we must sum cases and deaths values from multiple records for each country. The Join dialog can do that for us.
Note also that the csv table has a field for each country that appears to give the three letter ISO code for the country, in the countryterritoryCode field, as well as population data for the country as of 2018 in the popData2018 field.
We will use the countryterritoryCode field as a key field for the Join dialog, and then later on when we normalize the data to population for a better cartographic display, we could use the popData2018 field, if we so choose.
The csv table resides on a server in Europe somewhere, in the covid_csvserver data source. We will first create a query that dynamically captures what is in that csv table and repeats it within our local .map project data source, so the Join dialog can use it.
We right-click within the local .map portion of the project (that is, outside the covid_csvserver data source) and we choose Create - New Query.
We name our new query data and then we press the Create Query button. We have just created a new, blank query.
To enter SQL into the query, we double-click the data query to open it in a Command Window.
We enter a single, short line of SQL:
TABLE [covid_csvserver]::[csv];
That line refers to the csv table within the covid_csvserver data source, using the TABLE statement to simply repeat whatever is in the table. This is like creating a Windows shortcut in File Explorer.
There is no need to save the change, as making a change to a query in the Command Window edits the query in place. We close the Command Window.
Most people doing GIS have favorite maps of the world that they use for thematic presentations, such as coloring countries by various attributes.
We import a world map of countries originally taken from Natural Earth. We choose this data set because it has a three letter ISO code for each country, which will be a convenient key field to match to the country codes in our csv table.
Double-clicking open the drawing, we see it is in Latitude / Longitude projection. We have cleaned it up a bit, deleting Antarctica and removing fields we will not use from the table (Natural Earth has a very large number of fields).
Popping open the table, we see the names of countries and the estimated population for each, along with the three letter ISO code for each country in the ISO_A3 field. The table also has a POP_EST field that we will use in the continuation of this topic to normalize the data reported from absolute numbers to relative numbers of cases and deaths per million population, for a better cartographic display.
We now will use the Edit - Join dialog to add fields to the countries Table that capture and aggregate covid data for each country.
With the focus on the countries Table, we choose Edit - Join in the main menu.
The countries Table is automatically loaded as the original table. We choose the ISO_A3 field to use as the key field for the original table.
Next, we choose the data query from the pull down menu as the joined table, and in the box below we choose countryterritoryCode as the key field for the joined table.
Press the Add button and choose dateRep as the field in the joined table to use as the basis for a new field.
Check the fields desired and press OK.
That adds a new field to the original table, shown in provisional blue background color. To change the name of the field, we double-click into the field.
We change the name of the field to LastReport and we press Enter.
Next, we double-click into the join method box for the LastReport field and we change copy to max. We want to take the maximum value, that is, the latest date, that occurs in the dateRep field for each country and put it into the LastReport field.
We double-click into the join method cell for the new cases field and in the pull down menu that appears we choose sum.
This will take the sum of all cases numbers that occur for each country and put it into the new cases field. We might have named the new field total_cases_to_date, but we think cases is OK.
We also double-click into the join method cell for the new deaths field and in the pull down menu that appears we choose sum. That will populate the field by the sum of the deaths values for each country found in the data query, that is, in the csv table.
The proposed new fields are shown with provisional blue background color. Press Join to apply the operation and to add them, populated as specified, to the table.
Instantly, the Join dialog internally composes an SQL update query that takes data from the joined table and joins it into the original table, creating three new fields and populating them with data.
We now have a LastReport field that reports for each country the most recent date for which we have a report, a new cases field that gives the sum of all case values for all reports, and a new deaths field that gives the sum of deaths values for all reports.
We can now color each country using a thematic format based on the number of cases and deaths in each country.
Using totals is not great cartography: when coloring countries by some variable we should normalize that variable by population, for example, cases per one million population. We will do that in the continuation of this topic, using the popData2018 field. For now, we just show the simple case of creating a thematic format.
An easy way to create multiple thematic maps in Manifold is to simply copy the countries drawing and paste it to make as many copies as we want. All the data is in the drawing's table, so we can create a hundred copies of the drawing without increasing the size of the project. A drawing is just a few bytes that says how to Style geometry and data in the table.
We create two copies and name one cases and the other deaths. Like the original countries drawing, both take their data from the same countries Table.
We use the Style pane to style the cases drawing with a thematic format based on the cases field, showing more red where there are more cases, using exponential intervals to color.
We use the Style pane to style the deaths drawing with a thematic format based on the deaths field, showing more red where there are more deaths.
We can create a map with all three of the drawings as layers, providing a dark gray background. This is a convenient way to use whatever projection we want, such as Pseudo-Mercator, shown above. We can also conveniently turn layers on and off in a map to compare different data.
To update the data used in the map, we open the csv table in the covid_csvserver data source and we choose View - Refresh. That will pull the latest data from the CSV file on the ECDC server.
Next, right-click the saved countries Table Update query in the project and choose Run. That re-runs the query and re-populates the countries Table with fresh data pulled from the csv table, via the data query.
The countries Table immediately updates, for example, showing new LastReport dates and new cases and deaths for those countries that have updated data in the CSV file on the ECDC server. For example, we can see that Andorra as of 12 June 2020 reported a total of 853 cases and 51 deaths.
Our next task, in the continuation of this example, will be to create normalized fields showing cases and deaths per one million population.
To normalize the data presented, we will create two new computed fields in the countries Table, one a field called cases per million and the other called deaths per million.
With the focus on the opened countries Table, we choose Edit - Schema in the main menu.
We press the Add button in the Schema dialog.
We specify cases per million as the name of the new field, and float64 as the data type. The cases field we will take data from is an integer field, but we want the cases per million field to be a floating point number so it can hold fractional numbers in countries with less than one case per million.
We press Edit Expression.
The expression we use is simple, and uses fields we can double-click in the query builder to avoid manually keyboarding the names of fields.
([cases] * 1000000) / [POP_EST]
We press OK.
Back in the Field dialog we press OK. That adds the new computed field to the table schema.
We repeat the procedure above to add a computed field based on the deaths field, using the expression
([deaths] * 1000000) / [POP_EST]
The two new fields we have added are shown in provisional blue background color. To commit changes and to alter the table's schema, we press Save Changes.
Instantly, two new computed fields appear in the table, computed using the cases, deaths, and POP_EST fields. As the cases and deaths fields are updated in the future, the new cases per million and deaths per million fields will automatically compute the correct numbers.
We can now use these two fields to do a more sensible thematic format of countries, showing those with a higher rate of cases and deaths per million population in more red color.
We copy the countries drawing and then paste twice, to create two more copies of the drawing, which we rename to cases per million and deaths per million respectively. We Style the cases per million drawing as seen above, applying a thematic format using natural breaks with 8 intervals and applying the Color Brewer Red palette.
We Style the deaths per million drawing as seen above, again applying a thematic format using natural breaks with 8 intervals and applying the Color Brewer Red palette.
The maps above illustrate a failing of this style of cartography, in that apparently large regions such as Greenland show high rates per population that misrepresent reality.
We can Alt-click Greenland in the map above to see the attributes for that area. What we see right away is that Greenland, an overseas territory of Denmark, is part of the same area as Denmark. That is a typical situation for world maps that combine the various overseas territories of countries into a single, multi-branched area object.
Alt-clicking Greenland opens the Record pane, showing that Greenland is bundled into the same area object as Denmark, using the same figures for population, cases and so on.
Given that the ECDC data set reports only for countries, and does not break out their overseas territories separately, we are doing the best we can using the data we have available. We simply do not have finer resolution data to be able to show figures for Greenland separately from Denmark. If we did, we could have used a different countries drawing, that resolved individual countries into separate overseas territories, such as Greenland for Denmark, Tahiti for France, and the US Virgin Islands for the US.
In fact, we see a similar situation when looking at the cartographic presentation that bundles Alaska with the lower 48 US states. Another skewed presentation of such style maps is that coloring the entire US in the same, even color, can give the impression that cases and deaths per one million population are evenly spread throughout the US when in reality they are (at the time of this writing) highly localized to large cities like New York City.
Terminology - The original table also may be called the target table, and the table providing data also may be called the source table.
Quick reference - With the Join dialog open, press F1 to open a quick reference guide in a browser window.
Why only in the same data source? - The Join dialog works only with tables and queries within the same data source to keep the dialog simple and fast. An important part of keeping the dialog simple is to show only those tables or queries in the pull down list for the joined table that can be used in a join. Scanning all tables and queries in the same data source is reasonable, but if there are many data sources or many levels of nesting (data sources within data sources) there could be hundreds of thousands of tables and queries within the hierarchy to scan. For people working in map projects without such elaborate data sources, most of the time what they do will be within the .map project data source so everything works by default. To use tables and queries from other data sources as the joined table in the Join dialog we can quickly add a simple aliasing query, as illustrated in the Join Example: Joins between Data Sources topic. This method also makes it possible for the same Join dialog to work within other data sources, when aliasing queries refer back either to the .map project data source or to other data sources. Almost all queries can be used as the source, joined table. The Join dialog also can also use a query as the target, original table, to be used to copy data into the table through the query, as shown in the Join Example: Join into a Query and Join Example: Join into a Query into a Selection topics, if the result table is writable and the Join does not attempt to create new columns in the table.
Example: Use Edit - Join to Map a Pandemic - Creating a data source for a CSV web server, we acquire current data on cases and deaths worldwide for the Covid-19 pandemic. We then use the point and click Edit - Join dialog to rapidly join that data, automatically aggregated as we desire, into a world map for visualization.
Example: Create a Map Showing OSM Use by Country - A start-to-finish real life example of map creation that combines various Manifold capabilities. Copying a table of numbers from a web site, we create a map that is thematically colored to show usage of OpenStreetMap by country in proportion to the population of that country.