Creating a Mask for Raster Data in R

Author

Elke Windschitl

Published

September 29, 2023

Description: In this qmd, I showcase how to create a mask for raster data and apply that mask to depth data in the Santa Barbara Channel.

Introduction

Sometimes when combining spatial data, one needs to change the spatial resolution or extent, or reproject in order to match other datasets. This is necessary if all of the data need to match the same grid for analysis like a Maxent species distribution model. One approach to doing this is to create a common data “mask” and apply that mask to other raster data. In this Rmd, I create a mask by defining the extent, crs of the target data, resolution, and I remove areas from the mask that are not relevant to my analysis. I will be applying the mask to ocean depth data in the Santa Barbara Channel. This analysis was completed by Elke Windschitl as part of the kelpGeoMod Bren School of Environmental Science and Management 2023 capstone project authored by Erika Egg, Jessica French, Javier Parton, and Elke Windschitl. Some of the choices made here were in context of the whole project, such as the resolution and extent chosen.

The Data

I used data from two sources in this process. The first is vector data from California.gov, the second is depth data from NOAA. The kelpGeoMod repository describes the data as following:

Land Bounds:

The California County Boundaries data contain shape file for California lands to be masked out. County data was used to make the mask which had the Channel Islands already.

The original data source is described as: This layer provides an initial offering as “best available” at 1:24,000 scale. Hosted on CAL FIRE AGOL.

In late 1996, the Dept of Conservation (DOC) surveyed state and federal agencies about the county boundary coverage they used. As a result, DOC adopted the 1:24,000 (24K) scale U.S. Bureau of Reclamation (USBR) dataset (USGS source) for their Farmland Mapping and Monitoring Program (FMMP) but with several modifications. Detailed documentation of these changes is provided by FMMP and included in the lineage section of the metadata.

A dataset was made available (approximately 2004) through CALFIRE - FRAP and the California Spatial Information Library (CaSIL), with additional updates throughout subsequent years. More recently, an effort was made to improve the coastal linework by using the previous interior linework from the 24k data, but replacing the coastal linework based on NOAA’s ERMA coastal dataset (which used NAIP 2010).

In this dataset, all bays (plus bay islands and constructed features) are merged into the mainland, and coastal features (such as islands and constructed features) are not included, with the exception of the Channel Islands which ARE included.

This service represents the latest released version, and is updated when new versions are released. As of June, 2019 it represents cnty19_1.

Depth:

The ETOPO Global Relief Model 2022 (Bedrock 15 arcseconds) dataset contains ocean depth data in meters for the Santa Barbara Channel. The data were downloaded via the gid extract tool at the original source and transformed using the ETOPO_2022_v1_15s_N45W120_geoid.tif and ETOPO_2022_v1_15s_N45W135_geoid.tif geoid height tiles as recommended in the original user guide section 2.2.

The original data source is described as: ETOPO 2022, a global relief model with 15 arc-second resolution seamlessly integrating topographic and bathymetric data. The ETOPO 2022 model uses a combination of numerous airborne lidar, satellite-derived topography, and shipborne bathymetry datasets from U.S. national and global sources. ETOPO 2022 uses bare-earth topographic data from NASAs ICESat-2 and other vetted data sources to independently validate both the input datasets and the final ETOPO 2022 model. ETOPO 2022 is available in “Ice Surface” (top of Antarctic and Greenland ice sheets) and “Bedrock” (base of the ice sheets) versions.

Methods

To get started, there are several packages I will be using. I will use sf for handling vector, terra and raster for handling raster data, and mapview for viewing my data.

#---- Load libraries
library(sf)
library(terra)
library(raster)
library(mapview)

#---- Set up the data directory
#data_dir <- insert your file path to your data

First, I make an empty raster with my desired resolution, extent, crs. I then explore the size of the raster and assign all cells a value of 1.

#---- Make an empty raster of 0.008 degree resolution in WGS84 with desired AOI
empty_rast <- rast() # make an empty raster

crs(empty_rast) <- "EPSG:4326" # confirm/set WGS84
crs(empty_rast) # check crs value
[1] "GEOGCRS[\"WGS 84\",\n    ENSEMBLE[\"World Geodetic System 1984 ensemble\",\n        MEMBER[\"World Geodetic System 1984 (Transit)\"],\n        MEMBER[\"World Geodetic System 1984 (G730)\"],\n        MEMBER[\"World Geodetic System 1984 (G873)\"],\n        MEMBER[\"World Geodetic System 1984 (G1150)\"],\n        MEMBER[\"World Geodetic System 1984 (G1674)\"],\n        MEMBER[\"World Geodetic System 1984 (G1762)\"],\n        MEMBER[\"World Geodetic System 1984 (G2139)\"],\n        ELLIPSOID[\"WGS 84\",6378137,298.257223563,\n            LENGTHUNIT[\"metre\",1]],\n        ENSEMBLEACCURACY[2.0]],\n    PRIMEM[\"Greenwich\",0,\n        ANGLEUNIT[\"degree\",0.0174532925199433]],\n    CS[ellipsoidal,2],\n        AXIS[\"geodetic latitude (Lat)\",north,\n            ORDER[1],\n            ANGLEUNIT[\"degree\",0.0174532925199433]],\n        AXIS[\"geodetic longitude (Lon)\",east,\n            ORDER[2],\n            ANGLEUNIT[\"degree\",0.0174532925199433]],\n    USAGE[\n        SCOPE[\"Horizontal component of 3D system.\"],\n        AREA[\"World.\"],\n        BBOX[-90,-180,90,180]],\n    ID[\"EPSG\",4326]]"
ext(empty_rast) <- c(-120.65, -118.80, 33.85, 34.59) # set extent
ext(empty_rast) # check ext
SpatExtent : -120.65, -118.8, 33.85, 34.59 (xmin, xmax, ymin, ymax)
res(empty_rast) <- c(0.008, 0.008) # set resolution
res(empty_rast) # check resoultion
[1] 0.008 0.008
# Explore the size of the raster
nrow(empty_rast) # identify how many rows we have
[1] 93
ncol(empty_rast) # identify how many columns we have
[1] 231
ncell(empty_rast) # identify how many total cells we have
[1] 21483
values(empty_rast) <- 1 # fill in values of 1

mask <- empty_rast # rename to mask
mask
class       : SpatRaster 
dimensions  : 93, 231, 1  (nrow, ncol, nlyr)
resolution  : 0.008, 0.008  (x, y)
extent      : -120.65, -118.802, 33.85, 34.594  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326) 
source(s)   : memory
name        : lyr.1 
min value   :     1 
max value   :     1 
mapview(raster(mask),
        col.regions = list("blue4","#c4fbff"),
        layer.name = "Raster Value") # this should be a raster with only values of one in our AOI

At this point the SpatRaster has the dimensions, resolution, and crs that I would expect. Next, I used land boundary data from California to remove data from the raster where we would expect to find land based on the vector shape provided. I have to transform the crs, convert to a terra object, then rasterize the layer.

#---- Remove land boundaries from the mask
# Read in land shapefile 
boundaries <- st_read(file.path(data_dir, "01-raw-data/02-ca-county-land-boundaries-raw/California_County_Boundaries/cnty19_1.shp"))
Reading layer `cnty19_1' from data source 
  `/Users/elkewindschitl/Documents/MEDS/kelpGeoMod/final-data/01-raw-data/02-ca-county-land-boundaries-raw/California_County_Boundaries/cnty19_1.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 69 features and 10 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -13849230 ymin: 3833650 xmax: -12704980 ymax: 5162403
Projected CRS: WGS 84 / Pseudo-Mercator
boundaries <- st_transform(x = boundaries, crs = 4326) # Transform crs to exact crs we want
boundaries <- vect(boundaries) # Make a terra vector object
boundaries <- terra::rasterize(x = boundaries, 
                               y = mask) # Rasterize to a terra raster object with ext of mask
mapview(raster(boundaries),
        col.regions = list("blue4","#c4fbff"),
        layer.name = "Raster Value")
# Right now land = 1. Create a reclassification matrix so land = 0
reclassification_matrix <- matrix(c(NaN, 1, 1, NaN), 
                                  ncol = 2, 
                                  byrow = TRUE) 

# Apply classification matrix
land_bounds <- classify(boundaries, rcl = reclassification_matrix)
mapview(raster(land_bounds),
        col.regions = list("blue4","#c4fbff"),
        layer.name = "Raster Value") # plot to check
mask <- land_bounds # make this the new mask

Great! I now have a mask with values of 1 for the ocean surrounding Santa Barbara and the Channel Islands. This will be used to mask the depth data below. First, though I need read in and manipulate some of the depth data as per the user documentation by NOAA.

# Read in depth data
depth_dat <- terra::rast(file.path(data_dir, "01-raw-data/06-depth-noaa-raw/exportImage.tiff"))

# Plot to visualize
mapview(raster(depth_dat),
        col.regions = list("blue4","green"),
        layer.name = "Depth-Elevation (m)")

The depth data covers the area I am interested in. However, the ETOPO User Guide mentions that the data come with “an accompanying”geoid” tile for converting EGM2008 geoid heights into WGS84 ellipsoid elevation heights (EPSG:4979). Since most other geoid, ellipsoid, and/or tidal vertical datums are defined by grids in reference to the WGS84 ellipsoid, this eases the conversion of ETOPO 2022 tiles into other vertical reference datums of the user’s choice.” We will do that here following their guidelines. My AOI is split by two tiles, so I will need to use both and combine them.

# Read in Geoid tiles for depth correction
geoid1 <- terra::rast(file.path(data_dir, "01-raw-data/06-depth-noaa-raw/ETOPO_2022_v1_15s_N45W135_geoid.tif")) # left tile

mapview(raster(geoid1),
        col.regions = list("purple","green"),
        layer.name = "Conversion Number")
geoid2 <- terra::rast(file.path(data_dir, "01-raw-data/06-depth-noaa-raw/ETOPO_2022_v1_15s_N45W120_geoid.tif")) # right tile

mapview(raster(geoid2),
        col.regions = list("purple","green"),
        layer.name = "Conversion Number")
geoid_tile <- merge(geoid1, geoid2) # merge the tiles

mapview(raster(geoid_tile),        
        col.regions = list("purple","green"),
        layer.name = "Conversion Number")
# Crop geoid tile to match depth extent
geoid_tile_c <- crop(x = geoid_tile,
     y = depth_dat)

mapview(raster(geoid_tile_c),
        col.regions = list("purple","green"),
        layer.name = "Conversion Number")
# Add the tiles together as per the user guide instructions
depth_wgs84 <- depth_dat + geoid_tile_c
  
mapview(raster(depth_wgs84),
        col.regions = list("blue4","green"),
        layer.name = "Depth-Elevation WGS84 Ellipsoid (m)")

Now I have the depth data I want, and I have my mask. It is time to get the depth data in line with with the mask. I have to crop and resample. For the resampling, I use bilinear reseampling because I have a continuous numerical variable. Then I can mask.

# Check the crs, and reproject the crs if different
crs(depth_wgs84) == crs(mask)
[1] TRUE
# Crop to mask
depth_wgs84 <- crop(x = depth_wgs84,
                         y = mask) # crop to set extent to mask extent

# Resample to mask resolution
resampled_depth <- terra::resample(x = depth_wgs84,
                                   y = mask,
                                   method = "bilinear")

# Mask to remove land
resampled_depth <- mask(resampled_depth, mask) #convert to package raster to utilize mapview

# Confirm integrity of raster
crs(resampled_depth) == crs(mask)
[1] TRUE
ext(resampled_depth) == ext(mask)
[1] TRUE
res(resampled_depth) == res(mask)
[1] TRUE TRUE
nrow(resampled_depth) == nrow(mask)
[1] TRUE
ncol(resampled_depth) == ncol(mask)
[1] TRUE

Results

mapview(raster(resampled_depth),
        col.regions=list("blue4","#c4fbff"),
        layer.name="Resampled & Masked Depth (m)")

Conclusion

A mask is a great way to standardize multiple raster datasets for combining. While this example shows one masking process, this process could be applied to numerous datasets with the same mask. Once data are standardized, they can be combined into a stack or a brick to be used for future analyses. Read the kelpGeoMod technical documentation to understand at length how this project standardized the spatiotemporal resolution of many datasets in preparation for kelp forest modeling.