Getting started with ecoClassify

Here we provide a guided tutorial on ecoClassify, an open-source package for developing and applying an image classifier model.

ecoClassify is a package within the SyncroSim framework. Familiarity with SyncroSim is helpful but not required to follow this tutorial. Throughout the Quickstart tutorial, terminology associated with SyncroSim is italicized, and whenever possible, links are provided to the SyncroSim online documentation. For more on SyncroSim, please refer to the SyncroSim Overview and Quickstart tutorial.


ecoClassify Quickstart Tutorial

This quickstart tutorial will introduce you to the basics of working with ecoClassify. The steps include:

  1. Installing ecoClassify
  2. Opening a configured ecoClassify library
  3. Viewing model inputs
  4. Running models
  5. Viewing model outputs and results


Step 1: Installing ecoClassify

Before we begin, you must have SyncroSim Studio and the ecoClassify package installed. Download the latest stable release of SyncroSim Studio here and follow the installation prompts.

Open SyncroSim Studio and select File > Local Packages. This will open the Local Packages window in the main panel of SyncroSim Studio. In the bottom left corner, click on Install from Server…, select ecoClassify from the window that opens, and click OK.

If you do not have Miniconda installed on your computer, a dialog box will open asking if you would like to install Miniconda. Click Yes. Once Miniconda is done installing, a dialog box will open asking if you would like to create a new conda environment. Click Yes. Note that the process of installing Miniconda and the ecoClassify conda environment can take several minutes. If you choose not to install the conda environment you will need to manually install all required package dependencies.

Miniconda is an installer for conda, a package environment manager that installs any required packages and their dependencies. By default, ecoClassify runs conda to install, create, save, and load the required environment for running ecoClassify. The ecoClassify environment includes R and Python software and associated packages.


Step 2: Opening a configured ecoClassify library

Having installed the ecoClassify package, you are now ready to create your SyncroSim library. A library is a file (with extension .ssim) that contains all your model inputs and outputs. Note that the layout of each library is specific to the package for which it was initially created. You can opt to create an empty library or download the ecoClassify template library in SyncroSim Studio. In this tutorial, we will be working with the ecoClassify template library.

Start SyncroSim Studio by searching for it using the Windows toolbar and under the file menu, select Open. Once SyncroSim Studio opens, navigate to File > New and select From Online Template….


Select the ecoClassify package. Notice that the only available template is the ecoClassify Example. Select C:\Temp\ as the destination folder and click OK. The ecoClassify example will automatically open in the SyncroSim Studio explorer.


The ecoClassify example library was created with SyncroSim Studio v3.1.20. If you have a more recent release of SyncroSim Studio installed, you will automatically be prompted to update the library to configure it to your installed version of the software. Click Apply.


Step 3: Viewing model inputs

The contents of your newly opened library are now displayed in the Library Explorer. The library stores information on three levels: the library, the project, and the scenarios.

Most model inputs in SyncroSim Studio are organized into scenarios, where each scenario consists of a suite of properties, one for each of the model’s required inputs. Because you downloaded and opened a complete ecoClassify library, your library already contains three demonstration scenarios with pre-configured model inputs and outputs. In this tutorial’ we’ll work through the Snow Cover - Training, Snow Cover - Predicting, and Snow Cover - Post-Processing scenarios, demonstrating each of the three steps in the ecoClassify pipeline.


To view the details of the scenario:

This opens the scenario properties window.


Pipeline

Located underneath the General tab, the model Pipeline allows you to select which stages of the model to include and in what order they should be run. A full run of ecoClassify consists of three stages:

Note that the Predict stage is dependent on the results of the previous stage, Train Classifier, and the Post-Process stage is dependent on the Train Classifier and Predict stages. You cannot run a stage without having first run the previous required stages, either within the same scenario or included as dependencies.


Next, click on the Datafeeds node. Here, all of the inputs to the model are listed as individual datasheets. Notice how some rows have a green checkmark in the Data column to indicate these datasheets contain data. From here, you can navigate to these individual datasheets by clicking on their name in the View column, or by navigating to the next tab called Image Classifier.


Stage 1: Training

Input

The first node under the ecoClassify tab is the Input node. Expand this node to reveal the following input datasheets:

The Classifier options datasheet is where you may specify a sample size and and choose from a drop-down menu of model types. Model types currently available include Random Forest, Convolutional Neural Network (CNN), and MaxEnt. In this example training scenario we use a Random Forest model.


The Rasters tab contains two datasheets, Training Rasters and Covariates.

The Training Rasters datasheet is where training and user classified spatial data are loaded into the library. Note that this datasheet also contains a Timestep column. In the ecoClassify package, timesteps are used to link training rasters with their corresponding user classified rasters.


The Covariates datasheet is where additional spatial data are loaded into the library. Note that there is no Timestep column; these data are applied to each timestep.


The Advanced datasheet contains several sections that control how the model is trained and tuned.


Raster preprocessing:


Model tuning:


Thresholding:


Contextualization:


Reproducibility:


Output

Finally, the Output node contains the Statistics datasheet. This datasheet is filled with key statistics for the image classifier model that is generated during the Training step. To view an example, click on the drop-down arrow to the left of the Snow Cover - Training scenario and double click on the Snow Cover - Training result scenario and navigate to the Statistics datasheet.


Stage 2: Predicting

Next, right click on the scenario named Snow Cover - Predicting in the Library Explorer and choose Open from the context menu, or double-click on the scenario.

Navigating to the Pipeline datasheet under the General tab shows that the 2-Predict stage is used in this scenario.

Input

In the 2-Predict stage, there is a new Predicting tab under Input > Rasters. This tab contains two new datasheets: Predicting Rasters and Covariates.

The Predicting Rasters datasheet is where rasters to be classified are loaded into the library. As with the Training rasters datasheet, this datasheet also contains a Timestep column to provide a unique identifier for each predicting raster. These rasters must have layer names that match the names used to train the image classifer.


The Covariates datasheet is where additional spatial data are loaded into the library. Note that these data are applied to each timestep, and their layer names must match the names used to train the image classifer.


Stage 3: Post-Processing

Right click on the scenario named Snow Cover - Post-Processing in the Library Explorer and choose Open from the context menu, or double-click on the scenario.

The Pipeline datasheet under the General tab shows that the 3-Post-Process stage is used in this scenario.

Post-Processing Options

The Post-Process stage has a new Post-Processing Options tab with two datasheets: Filtering and Rule-Based Restrictions.

The Filtering datasheet contains options for filtering and filling in the classified rasters.


The Rule-Based Restrictions datasheet is where you can apply rules for reclassifying the rasters based on the pixel values in supplied rasters with the same resolution and spatial extent.

In this example, the pixels are assigned a value of 1 (present) in the classified raster where values in the dem-cropped-30m.tif raster have a value between 1500-1800:


Dependencies

Dependencies can be used to break each stage up into a separate scenario, as is done in this example library. Click on the drop-down icon on the left side of the Snow Cover - Predicting scenario to show the two nested folders: Dependencies and Results. Click on the Dependencies folder. The first scenario, Snow Cover - Training, is present as a dependency. The most recent results from this scenario will be used as the inputs to the Predicting scenario each time it is run.


Click on the Dependencies folder in the Snow Cover - Post-Processing scenario; the Snow Cover - Predicting scenario is present as a dependency, and the Snow Cover - Training dependency for the predicting scenario is included as well.


Step 4: Running models

Right-click on the Snow Cover - Training scenario in the Library Explorer window and select Run from the context menu. If prompted to save your project, click Yes. The example model run should complete within a couple of minutes. If the run is successful, you will see a Status of Done in the Run Monitor window. If the run fails, you can click on the Run Log link to see a report of any problems that occurred. A blue information symbol indicates that there is additional information in the run log, which will occur in all scenarios using the Training stage.


Repeat for each of the two remaining scenarios, Snow Cover - Predicting and Snow Cover - Post-Processing.

Step 5: Viewing model outputs and results

Once the run is complete, you can view the details of the result scenario:



You can look through the result scenario to see the updated or newly populated datasheets. You should find that the Output datasheet, Statistics, has been populated with model run outputs.


Charts

To view tabular outputs, move to the results panel at the bottom left of the Library Explorer window. Under the Charts tab, double-click on the Model Fit chart to view the accuracy metrics from the model training results.


Note: results can be added or removed from the maps and charts by right clicking on the scenario and selecting “Add to/Remove from Results” in the context window.


Maps

To view spatial outputs, move to the Maps tab and double-click on the Training map to visualize the classified rasters.


In order, the columns show the User Classification map (true presence), Probability map, and Binary map. The Binary (Filtered), Binary (Restricted), and Binary (Restricted and Filtered) columns will be populated with results from the Snow Cover - Post-Processing scenario. Scroll through the map page to see the Post-Processed results.


Next, double-click on the Predicting map to view the Probability and Binary panels for the Snow Cover - Predicting results scenario. Note that the Binary (Restricted), Binary (Filtered), and Binary (Restricted and Filtered) results of the training rasters will be populated with results from the Snow Cover - Post-Processing scenario. Scroll through the map page to see the Post-Processed results:


Images

ecoClassify also allows you to visualize the RGB images of your classified and training rasters under the Images tab.


Here, you may also view a confusion matrix quantifying the classifier’s performance, a bar chart of the classifier’s variable importance, and a Histogram for each training variable overlaid with the model response.


Export Data

To export model outputs, add the result scenario with the desired outputs to the results and open the Export tab at the bottom of the screen. All available files for export will be listed. To export, simply double-click on the desired output and choose the directory in which to save the file in the pop-up window. Note that if multiple result scenarios are included in the active result scenarios, files for each of the selected scenarios will be exported.