Short Course: Tools & Best Practices for the Integration of Spatial Data

Tools & Best Practices for the Integration of Spatial Data

Half Day PM (1:30 p.m. – 5:30 p.m.)

This short course introduces a set of novel tools and associated best practices for the integration of spatial data. The course will give participants an overview of the foundations for the integration of spatial data, including key conceptual and technical challenges, followed by specific applications. In particular, the course focuses on two important contexts in which integration is valuable. Participants will be introduced to two innovative software tools—geomerge and MELTT—that implement best practices for spatial data integration, receiving practical exposure via hands-on exercises using illustrative datasets and code.

The first context is that integration of spatial data is a common, vital consideration when seeking to employ indicators of an assortment of dependent and independent variables and covariates that have different geographic resolutions. These disparities can be inherent to the measurement of certain indicators. Disparities can also arise when indicators are drawn from distinct sources with varying spatial units of measurement and/or reporting. A major hurdle to overcome, prior to conducting analysis involving multiple indicators, is to ensure that all the original data are matched up properly, reflecting their spatial properties, and placed at an appropriate spatial resolution—potentially a single resolution. Methodologies and tools exist to perform this basic integration task, but have not previously been compiled and made accessible and friendly to a range of users addressing with a variety of research designs. These tasks are handled by an innovative tool, geomerge, which consolidates the methodologies in a streamlined manner.

The second context is that integration of spatial data should also be a consideration when multiple datasets on the same empirical phenomenon are available. Together, these datasets can afford a more comprehensive, precise, and valid measurement of the phenomenon. A single dataset, by contrast, is likely to be less complete, exact, and reliable. To date, however, the typical empirical study that uses spatial data—in particular, geocoded event data—relies on only a single dataset at a time to measure a given phenomenon of interest. Such an approach ignores the potential value of integrating the information available from multiple datasets. These datasets cannot simply be pooled, since they may overlap in coverage. A major hurdle to overcome, therefore, is identification of clear duplicates and disambiguation among potential duplicates. These tasks are handled by another innovative tool, Matching Event Data by Location, Time and Type (MELTT), which facilitates transparent, efficient, and flexible integration of event datasets, addressing the needs for de-duplication and disambiguation.

The course is intended for any researchers who use spatial data. It assumes a general knowledge of spatial data analysis, as well as some familiarity with GIS software and the R programming language.

**All Short Courses will take place on Wednesday, August 30 at the APSA 2017 Annual Meeting in San Francisco, CA.**