Data Preparation

The Data Preparation tab groups the following Scenario Datasheets:

In the SyncroSim UI, the Data Preparation tab can be accessed by right-clicking on a WISDM Scenario and selecting Properties from the context menu.


Template Raster

The Template Raster Datasheet can be found under the Spatial Data tab and contains information about the dimensions of the study area.

Raster File

Defines the template raster (e.g., GEOTiff) file containing the target dimensions of the study area. The Covariate Raster Files will be clipped to the extent of this raster and will have the same spatial reference and resolution (cell size) as this raster. In the SyncroSim UI, the Raster File can be selected by clicking on the yellow folder icon, removed by clicking on the X icon, and exported by clicking on upwards arrow icon.

Number of Multiprocessing Tiles

Optional. Sets the number of multiprocessing tiles to divide up a model’s processing across multiple cores. This option allows to model to run faster and more efficiently. For more information about multiprocessing, see SyncroSim Reference - Multiprocessing.


Covariate Data

The Covariate Data Datasheet can be found under the Spatial Data tab and contains the spatial information about each covariate. Covariates, also known as predictors, represent spatial and environmental data (e.g., precipitation and slope) that models rely on to infer spatial patterns to predict habitat suitability for a species.

In the SyncroSim UI, this information can be input directly or imported from a CSV file (right-click the Covariate Data page and select Import) that contains the four columns described below. If importing the data into SyncroSim, it must contain the following headers and corresponding information: “CovariatesID” (see Covariate), “RasterFilePath” (see Raster File), “ResampleMethod” (see Resample Method), and “AggregationMethod” (see Aggregation Method).

Covariate

Defines the names of the covariates that will be used in the Scenario.

Raster File

Defines the path to the covariate’s raster (.tif) file. In the SyncroSim UI, the Raster File can be selected by clicking on the yellow folder icon, removed by clicking on the X icon, and exported by clicking on upwards arrow icon.

Resample Method

Optional. Defines which resampling method each raster will use. Resampling interpolates cell values when transforming the covariate raster to the coordinate reference system (CRS) or resolution of the template raster. Resampling options include: Nearest Neighbor, Bilinear, Cubic, Cubic Spline, and Lanczos. To learn more about resampling rasters, see Resampling Methods. In the SyncroSim UI, this column is initially hidden, and can be made visible by right-clicking anywhere inside the Covariate Data Datasheet and selecting “Resample Method” from the context menu.

Default: If a covariate is categorical, "Nearest Neighbor" is the default resampling method. If a covariate is not categorical, "Bilinear" is the default resampling method.

Aggregation Method

Optional. Defines which aggregation method each raster will be using. Aggregation of covariate layers occurs when the resolution of the raster needs to be decreased to match the template raster. Aggregating ensures that the data within each cell is preserved when decreasing the resolution of the covariate raster to match the template raster (for example, re-scaling a 10 m resolution covariate raster to a 100 m resolution template raster). Options for aggregation methods include Mean, Max, Min, and Majority. For more information about these aggregation methods, see Summarizing Raster and Aggregating Rasters. In the SyncroSim UI, this column is initially hidden, and can be made visible by right-clicking anywhere inside the Covariate Data Datasheet and selecting “Aggregation Method” from the context menu.

Default: If a covariate is categorical, "Majority" is the default aggregation method. If a covariate is not categorical, "Mean" is the default aggregation method.


Restriction Raster

The Restriction Raster Datasheet can be found under the Spatial Data tab and contains information pertaining to the optional restriction layer.

Raster File

This is an optional input that accepts a raster (.tif) file that gets multiplied across the probability raster during the apply model stage. It is recommended that this is a binary raster, where 0 indicates restricted areas where prediction is not necessary. The Raster File could otherwise be a probability raster (i.e., values ranging from 0.0 to 1.0), which would act as a weighted restriction layer, adjusting the values of all pixels based on the pixel restriction values.

Description

This is also an optional input that can be used to detail the corresponding Raster File.


Field Data

The Field Data Datasheet can be found under the Field Data tab and contains information about species presence and absence locations.

X

Defines the X coordinates (longitude) in the coordinate system defined in the Options tab.

Y

Defines the Y coordinates (latitude) in the coordinate system defined in the Options tab.

Response

Defines whether there was an absence or presence detected at a given site, where 1 represents a location where the species was present and 0 represents a location where the species was absent.

Site

Optional. Sets the site ID number. In the SyncroSim UI, this column is initially hidden, and can be made visible by right-clicking anywhere inside the Field Data Datasheet and selecting “Site” from the context menu.

Use In Model Evaluation

Optional. Determines whether a given row of data was saved for testing the Scenario’s models (Yes) or used in the model training process (No). This column is auto-filled based on the parameters defined under Validation Options.

Model Selection Split

Optional. Sets in which cross-validation fold a given row of data will be included. This column is auto-filled based on the parameters defined under Validation Options. Its values will range from 1 to the defined Number of cross-validation folds. Rows of data marked for Model Selection Split will be reserved from the model training process but will still be included in model evaluation. Additionally, the split data can be stratified by the response variable to ensure the proportion of presence and absence points is balanced between the testing and training split provided Stratify cross-validation folds by the response is set to “Yes”.

Weight

Optional. Sets the weight of a given data point. This column is only used by the model if Aggregate or Weight Data in the Options Datasheet is set to “Weights”. If multiple species presence/absence points are included in a single pixel, these points can be down-weighted so that pixels with multiple points can have an equal weight to pixels with a single point. In the SyncroSim UI, this column is initially hidden, and can be made visible by right-clicking anywhere inside the Field Data Datasheet and selecting “Weight” from the context menu.


Field Data Options

The Field Data Options Datasheet can be found under the Field Data tab and controls some of the Scenario’s settings relating to the Field Data.

Authority Code (e.g., EPSG:4326)

Optional. Sets the coordinate reference system (CRS) information for the Field Data points. This entry represents the CRS that the field data are in. If left blank, WISDM assumes that the field data are in the same CRS as that of the template raster.

Aggregate or Weight Data

Optional. Determines whether the Field Data points should be “Aggregated” or “Weighted” in the event that multiple points fall within the same pixel. Aggregating will combine multiple points into one, while weighting will ensure that all of the points are downweighted so that combined, they have the same influence as just one point.


Background Data Options

The Background Data Options Datasheet controls some of the Scenario’s settings relating to the Field Data.

Generate background sites

Defines whether background sites should be generated for the Scenario. Background sites are also referred to as pseudo-absences and represent absences of a species. This information allows the models to compare environmental spaces where a species can and cannot be found. Background sites are often used when true absence data are not available for a species.

Default: No.

Number of background sites added to field data

Optional. Sets the number of background points that should be generated for this Scenario. Depends on Generate background sites being set to “Yes”.

Default: Sum of presence points from the Field Data Datasheet.

Method used for background site generation

Optional. Defines the background surface method for generating background points based on the species presence values. Two options are available: “Kernel Density Estimate (KDE)” or “Minimum Convex Polygon (MCP)”.

Default: Kernel Density Estimate (KDE).

KDE background surface method

Optional. Defines the type of KDE background surface method to be used if the Method used for background site generation is set to “Kernel Density Estimate (KDE)”. Two options are available: “Continuous” or “Binary” .

Default: Continuous.

Isopleth threshold used for binary mask creation (%)

Optional. Sets the threshold value that is required to be used when creating a binary mask if the Method used for background site generation is set to “Kernel Density Estimate (KDE)” and KDE background surface method is set to “Binary”, or if Method used for background site generation is set to “Minimum Convex Polygon (MCP)”. The isopleth threshold value is expressed as an integer, so an isopleth of 99% would require an input of 99 for this argument.

Default: 95.


Validation Options

The Validation Options Datasheet sets parameters for model validation. In this documentation, model processing, model fitting, and model training are used interchangeably and refer to modeling using training data. Model validation and model testing are used interchangeably and refer to the evaluation of model predictive performance using testing data.

Split data for model training and testing

Determines whether to withhold data from modeling to be used for model validation.

Default: No.

Proportion of data used for model training

Sets the proportion of the Field Data to be used for model training if Split data for model training and testing is set to “Yes”. Proportions should be entered decimals. For example, setting this parameter to 0.7, 70% of the Field Data will be used for model training and 30% of the Field Data will be withheld for model validation (testing).

Default: 0.5.

Use cross validation for model selection

Determines whether to split data into cross validation folds so that models are tested on a subset of the data that was not used for model training.

Default: No.

Stratify cross-validation folds by the response

Determines whether the cross-validation folds should be representative of the distribution of the response values.

Default: No.

Number of cross-validation folds

Sets the number of cross-validation folds to include in the Scenario.

Default: 10.


Spatial Multiprocessing

The Spatial Multiprocessing Datasheet contains a tiling raster used for spatial multiprocessing.

Tiling raster

Defines the raster (e.g., GEOTiff) file representing the tiles the data can be split into for parallel processing.