Multispectral image classification
This is the documentation for the multispectral image classification tutorial. All the steps, code and explan
- Connor Bax & Ronny A. Hernández Mora
- 23/11/2021
If you want to follow this tutorial you can download all the code from:
After you download all the materials, unzip your folder and open your .Rproj file.
Prepare your working folder
If you downloaded this repo directly from GitHub, make sure to create a
data
folder in your repository. Put all the downloaded files there in
order to run all the code below.
You can set up your data
folder with this code:
fs::dir_create(path = paste0(here::here(), "/data"))
R packages needed for the analysis
library(RStoolbox)
library(raster)
library(RStoolbox)
library(raster)
library(tidyverse)
library(sf)
library(rpart)
library(rpart.plot)
library(rasterVis)
library(mapedit)
library(mapview)
library(caret)
library(forcats)
library(patchwork)
library(cluster)
library(randomForest)
Understanding the image data
We are going to read the raster image which is in a .tiff
extention.
This image comes from Landsat 8. Bands are as follows:
- Band 1 Coastal aersol
- Band 2 Blue
- Band 3 Green
- Band 4 Red
- Band 5 Near Infrared (NIR)
- Band 6 Shortwave Infrared (SWIR) 1
- Band 7 Shortwave Infrared (SWIR) 2
- Band 8 Panchromatic
- Band 9 Cirrus
- Band 10 Thermal Infrared (TIRS) 1
- Band 11 Thermal Infrared (TIRS) 2
We are going to read individually each of the bands as follows:
band1 <- raster("data/band1.tif")
band2 <- raster("data/band2.tif")
band3 <- raster("data/band3.tif")
band4 <- raster("data/band4.tif")
band5 <- raster("data/band5.tif")
band6 <- raster("data/band6.tif")
band7 <- raster("data/band7.tif")
band8 <- raster("data/band8.tif")
band9 <- raster("data/band9.tif")
band10 <- raster("data/band10.tif")
band11 <- raster("data/band11.tif")
Exploring and preparing the data
Let’s take a look at some of the bands. Notice any variation or bands that stand out
plot(band1)
plot(band10)
plot(band5)
Now, let’s take a look of the band resolutions:
bands <- c(band1, band2, band3, band4, band5, band6,
band7, band8, band9, band10, band11)
for (i in bands) {
check <- res(i)
print(check)
}
## [1] 30 30
## [1] 30 30
## [1] 30 30
## [1] 30 30
## [1] 30 30
## [1] 30 30
## [1] 30 30
## [1] 15 15
## [1] 30 30
## [1] 30 30
## [1] 30 30
Stacking the images
In order to stack all the images together, we must ensure that all the
resolutions match. To achieve this, we are going to use the function
aggregate
to lower the resolution (larger number) of the image with
the higher resolution (lower number).
Using fact = 2
means that we are multiplying the current resolution by
2. If we wanted to divide it by 2 we can use 1/2, or 0.5
band8 <- aggregate(band8, fact = 2)
Now that we have all the images with the same resolution, we are going to procced to stack the bands into one image:
image <- stack(band1, band2, band3, band4, band5, band6,
band7, band8, band9, band10, band11)
How do we make sure that this step worked? Well, we just add 11 bands into one image, so this should have 11 layers. The function below should return 11.
nlayers(image)
## [1] 11
Spatial reference
Now we are going to check that our resulting image have a spatial reference that its still valid. Otherwise our image will be lost spatially. We can check this with the following code:
crs(image)
## Coordinate Reference System:
## Deprecated Proj.4 representation:
## +proj=utm +zone=12 +datum=WGS84 +units=m +no_defs
## WKT2 2019 representation:
## PROJCRS["WGS_1984_UTM_Zone_12N",
## BASEGEOGCRS["WGS 84",
## DATUM["World Geodetic System 1984",
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4326]],
## CONVERSION["Transverse Mercator",
## METHOD["Transverse Mercator",
## ID["EPSG",9807]],
## PARAMETER["Latitude of natural origin",0,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8801]],
## PARAMETER["Longitude of natural origin",-111,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8802]],
## PARAMETER["Scale factor at natural origin",0.9996,
## SCALEUNIT["unity",1],
## ID["EPSG",8805]],
## PARAMETER["False easting",500000,
## LENGTHUNIT["metre",1],
## ID["EPSG",8806]],
## PARAMETER["False northing",0,
## LENGTHUNIT["metre",1],
## ID["EPSG",8807]]],
## CS[Cartesian,2],
## AXIS["easting",east,
## ORDER[1],
## LENGTHUNIT["metre",1]],
## AXIS["northing",north,
## ORDER[2],
## LENGTHUNIT["metre",1]],
## ID["EPSG",32612]]
- What coordinate system is used?
- Often the ‘zone’ is accompanied by either an S or N, what would these indicate?
Now, we also want to confirm the resolution of the image matches the aggregated resolution
res(image)
## [1] 30 30
Plotting our image
Finally we have confirmed the necessary information of our created image. Let’s try to plot it using some band combinations. We will try:
- True Color Composite
- False Color Composite
True Color Composite
par(col.axis = "white", col.lab = "white", tck = 0)
plotRGB(image, r = 4, g = 3, b = 2, axes = TRUE,
stretch = "lin", main = "True Color Composite")
box(col = "white")
- What bands are used in the True Color Composite? (Refer to the band list at the top ex: Band 9 (Cirrus))
False Color Composite
par(col.axis = "white", col.lab = "white", tck = 0)
plotRGB(image, r = 5, g = 4, b = 3, axes = TRUE, stretch = "lin",
main = "False Color Composite")
box(col = "white")
- What bands are used in the True Color Composite? (Refer to the band list at the top ex: Band 9 (Cirrus))
- What appears to be the main feature of the False Color Composite? (In RED)
Cleaning our working environment
We are done with our image, but to start we created some objects that we
no longer need, given that from now on, all our work is going to be
based on the image
object.
All the bands
object that we created before can be remove from our
Global environment
. This will allow us to free some temporal memory
space.
# Remove all the bands (11 in total)
for (i in 1:11) {
rm(list = paste0("band", i))
}
# Remove our `bands` vector
rm(bands)
Also, we can use a general ‘garbage collection’ to free up some space as well that may be occupied
gc()
## used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 5722153 305.6 10480042 559.7 9705519 518.4
## Vcells 15389940 117.5 38827878 296.3 75489854 576.0
Calculating NDVI index
- By observing our composites images we can see a large amount of vegetation.
- Given our available bands, we can derive an NDVI (Normalized Difference Vegetation Index)
- Recall that NDVI scales from -1 to +1, with +1 indicating more vegetation cover
- These values are largely driven by pigments in vegetation measured by the bands used
We can calculate our NDVI index as follows:
ndvi <- (image[[5]] - image[[4]]) / (image[[5]] + image[[4]])
- What bands are used in the NDVI calculation?
- Why these bands? (Recall the Vegetation Spectrum)
Exploring the NDVI result
Now, with the help of some functins, we can explore what is the resulting NDVI values for our image:
# minimum
min(ndvi@data@values, na.rm = T)
## [1] -0.1987892
# maximum
max(ndvi@data@values, na.rm = T)
## [1] 0.6099971
# standard deviation
sd(ndvi@data@values, na.rm = T)
## [1] 0.1311813
# summary
summary(ndvi)
## layer
## Min. -1.987892e-01
## 1st Qu. 1.509053e-01
## Median 2.238097e-01
## 3rd Qu. 3.335210e-01
## Max. 6.099971e-01
## NA's 6.410260e+05
summary(ndvi@data@values)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## -0.2 0.2 0.2 0.2 0.3 0.6 641026
Plotting NDVI index
Also, other way to explore our resulting NDVI index, we can plot it:
as(ndvi, "SpatialPixelsDataFrame") %>%
as.data.frame() %>%
ggplot(data = .) +
geom_tile(aes(x = x, y = y, fill = layer)) +
theme(axis.text = element_blank(),
axis.ticks = element_blank(),
panel.background = element_blank(),
panel.grid.minor = element_blank()) +
labs(title = "NDVI for Calgary, Alberta",
x = " ",
y = " ") +
scale_fill_gradient(high = "#CEE50E",
low = "#087F28",
name = "NDVI")
- Feel free to use other color gradients and plotting schemes, be as creative as you wish
- Do the higher NDVI values match up with any thing from the True Color or False Color Composites?
Supervised Classification
We are now going to attempt a Supervised Classification to try and classify each pixel of the image into various classes. Because this is a Supervised Classification we will first need to create a training dataset as a first step.
This is an interactive process, meaning that everytime that we run the
code editMap()
an interface will open on the Viewer panel. This
interface will allow us to click and point the image to generate some
object in our Global Environment.
Details
TODO: This is not anymore polygons, we are going to use points.
The below line will open up RGB image in the plotting window as an interactive image. You can pan around with the hand and zoom in with the scroll wheel. Get familiar with the controls and movement before moving on. Once comfortable, select the option from the left menu for drawing a ‘polygon’, this will allow you to click and define a polygon You are clicking to place the vertices, so be aware that it will be drawing straight lines between points. Once you have defined a polygon click on the first point to close it, then you can click ‘cancel’ to clear the tool and move to the next area. Please only do one class at a time, meaning that if you are defining agriculture, just draw all agriculture polygons.
Once you have draw several polygons for one class, select ‘done’ in the bottom right.
We will repeat this process for each of the following classes:
- Urban
- Water
- Agriculture
- Other Vegetation
Be aware that it may be beneficial to use the False Color Composite for some classes such as water.
Other vegetation will cover things such as shrublands, forests, or grasslands that are possible not as vibrant green as agriculture
Also, this step can crash some R sessions in computers with limited
memory. So, after each polygon is created, we are using the function
saveRDS
, that is going to save the object in our data
folder.
Say for example that you spend 15 minutes creating the agriculture polygons. Now you are working on creating the urban polygons but in the middle of that process your R session just crashed.
You can load the agriculture polygons and avoid start over again with this category. The code to load the saved polygons objects is below.
Agriculture
# Create polygons
points_agriculture <- viewRGB(image, r = 4, g = 3, b = 2) %>%
editMap()
# Save object
saveRDS(points_agriculture, file = "data/points_agriculture.rds")
# Rename column with agriculture geometries
agriculture <- points_agriculture$finished$geometry %>%
st_sf() %>%
mutate(class = "agriculture", id = 1)
Urban
# Create polygons
points_urban <- viewRGB(image, r = 4, g = 3, b = 2) %>%
editMap()
# Save object
saveRDS(points_urban, file = "data/points_urban.rds")
# Rename column with urban geometries
urban <- points_urban$finished$geometry %>%
st_sf() %>%
mutate(class = "urban", id = 2)
Water
# Create polygons
points_water <- viewRGB(image, r = 5, g = 4, b = 3) %>%
editMap()
# Save object
saveRDS(points_water, file = "data/points_water.rds")
# Rename column with water geometries
water <- points_water$finished$geometry %>%
st_sf() %>%
mutate(class = "water", id = 3)
Vegetation
# Create polygons
points_vegetation <- viewRGB(image, r = 4, g = 3, b = 2) %>%
editMap()
# Save object
saveRDS(points_vegetation, file = "data/points_vegetation.rds")
# Rename column with vegetation geometries
veg <- points_vegetation$finished$geometry %>%
st_sf() %>%
mutate(class = "vegetation", id = 4)
Loading points files
If you missed one of your processes, load the point file with the next instruction. Just use the one that you need:
agriculture <- readRDS(file = "data/points_agriculture.rds")
veg <- readRDS(file = "data/points_urban.rds")
water <- readRDS(file = "data/points_water.rds")
urban <- readRDS(file = "data/points_vegetation.rds")
Collect al training data
Now that we have all the polygons, we need to collect all this information into one object that we call training dataset
training_points <- rbind(agriculture, veg, water, urban) %>%
as.data.frame()
At this point, this is just an object in Global environment
To export
this and use it in other projects or in a R session later (instead of
repeating again all the steps before), we can save it as a shapefile.
Be aware that we are saving this in our data
folder within our working
project.
write_sf(training_points,
"data/calgary_training_points.shp",
driver = "ESRI shapefile")
If we want to read this in a later R session, we can do it as follows:
training_points <- st_read("data/calgary_training_points.shp")
## Reading layer `calgary_training_points' from data source
## `/home/ronny/Documents/repos/github/ceos/ceos_tutorials/data/calgary_training_points.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 91 features and 2 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -114.2097 ymin: 50.86595 xmax: -113.8678 ymax: 51.13811
## Geodetic CRS: WGS 84
Explore training dataset
Now, we are going to check the distribution of points for our training dataset. At first, we are going to read the Calgary city boudary:
city_boundary <- st_read("data/CityBoundary.geojson", quiet = TRUE)
Now, we are going to create a map looking at just the distribution of points:
points_distribution <- ggplot() +
geom_sf(data = city_boundary, fill = "light gray", color = NA) +
geom_sf(data = training_points, size = 0.5) +
labs(title = "Distribution of\nclassification points") +
theme(panel.background = element_blank(), axis.ticks = element_blank(),
axis.text = element_blank())
points_distribution
Great! We are going to create a second map looking at the distribution of points by the classification type that we created:
points_categories <- ggplot() +
geom_sf(data = city_boundary, fill = "light gray", color = NA) +
geom_sf(data = training_points, aes(color = class), size = 0.5) +
scale_color_viridis_d() +
labs(title = "Classification points by land use") +
theme(panel.background = element_blank(), axis.ticks = element_blank(),
axis.text = element_blank())
points_categories
Now, we are going to put both plots side by side:
points_distribution +
points_categories +
plot_layout(ncol = 2)
See how your distribution is allocated, if you feel as though it is biased to one section or not well distributed you may wish to redo the training data specification.
Extracting spectral data for training points
Now we will extract the spectral data or band data for our training points. First convert to a spatial point format
training_points <- as(training_points, 'Spatial')
Now we have a new object with the bands information, but it’s in a a
special structure called SpatialPointsDataFrame
. Our next step will be
to extract the values in this object and the image (trainign data
points) in a matrix structure:
training_values_matrix <- raster::extract(image, training_points) %>%
round()
## Warning in .local(x, y, ...): Transforming SpatialPoints to the
## crs of the Raster
Now, we should have a matrix of band values for each point
head(training_values_matrix)
## band1 band2 band3 band4 band5 band6 band7 band8 band9
## [1,] 9530 8610 7926 6932 18751 9616 6885 7577 5070
## [2,] 9535 8602 8141 6850 23999 9585 6947 7578 5065
## [3,] 9842 9211 10517 9432 26344 11032 7860 9947 5065
## [4,] 9626 8811 8410 7594 17444 12394 8562 8039 5055
## [5,] 9668 9063 10535 9434 27810 10496 7499 9956 5069
## [6,] 9786 9195 10887 9815 28364 11230 7973 10334 5087
## band10 band11
## [1,] 26574 24651
## [2,] 26404 24508
## [3,] 26641 24780
## [4,] 28911 26357
## [5,] 25827 24022
## [6,] 26447 24546
- Try exploring the data through plotting
- Remember you id numbers from the sections above
profiles <- training_values_matrix %>%
as.data.frame() %>%
cbind(., training_points$id) %>%
rename(id = "training_points$id") %>%
na.omit() %>%
group_by(id) %>%
summarise(band1 = mean(band1),
band2 = mean(band2),
band3 = mean(band3),
band4 = mean(band4),
band5 = mean(band5),
band6 = mean(band6),
band7 = mean(band7),
band8 = mean(band8),
band9 = mean(band9),
band10 = mean(band10),
band11 = mean(band11)) %>%
mutate(id = case_when(id == 1 ~ "agriculture",
id == 2 ~ "urban",
id == 3 ~ "water",
id == 4 ~ "other vegetation"
)) %>%
as.data.frame()
# Take a look of the first values
glimpse(profiles)
## Rows: 4
## Columns: 12
## $ id <chr> "agriculture", "urban", "water", "other vegetati…
## $ band1 <dbl> 9885.654, 11275.897, 9662.667, 9995.286
## $ band2 <dbl> 9158.038, 10688.795, 8742.917, 9274.214
## $ band3 <dbl> 9662.769, 10306.103, 7821.667, 9009.643
## $ band4 <dbl> 8673.038, 10425.872, 6791.417, 8467.929
## $ band5 <dbl> 23836.077, 14951.462, 6587.833, 18410.286
## $ band6 <dbl> 11477.308, 13372.897, 5450.083, 14023.786
## $ band7 <dbl> 8322.731, 11576.154, 5276.417, 9731.429
## $ band8 <dbl> 9166.500, 10354.333, 7408.583, 8752.643
## $ band9 <dbl> 5069.500, 5079.410, 5045.000, 5103.857
## $ band10 <dbl> 26881.77, 30648.46, 25931.75, 29389.14
## $ band11 <dbl> 24886.50, 27825.49, 24022.00, 26764.50
After all this data wrangling, we have a data frame with our values of interest that can be used to create a plot of the profiles, to check if they are indeed unique and distinguishable
profiles %>%
select(-id) %>%
gather() %>%
mutate(class = rep(c("agriculture", "urban", "water", "other vegetation"),
11)) %>%
ggplot(data = ., aes(x = fct_relevel(as.factor(key),
levels = c("band1", "band2",
"band3", "band4",
"band5", "band6",
"band7", "band8",
"band9", "band10",
"band11")), y = value,
group = class, color = class)) +
geom_point(size = 2.5) +
geom_line(lwd = 1.2) +
scale_color_viridis_d() +
labs(title = "Spectral Profile from Landsat 8 Imagery",
x = "Bands",
y = "Reflectance",
color = "Class") +
#scale_y_continuous(limits=c(5000, 15000)) +
theme(axis.text.x = element_text(angle = 45, h = 1)) +
theme(panel.background = element_blank(),
panel.grid.major = element_line(color = "gray", size = 0.5),
panel.grid.minor = element_line(color = "gray", size = 0.5),
axis.ticks = element_blank())
## Warning: Outer names are only allowed for unnamed scalar atomic
## inputs
## Warning: Outer names are only allowed for unnamed scalar atomic
## inputs
## Warning: Outer names are only allowed for unnamed scalar atomic
## inputs
Another way to assess this is through a density plot, note any severe overlap between classes at each band. The mean values will also indicate if there is a large degree of overlap between classes:
profiles %>%
select(-id) %>%
gather() %>%
mutate(class = rep(c("agriculture",
"urban",
"water",
"other vegetation"), 11)) %>%
ggplot(., aes(x = value,
group = as.factor(class),
fill = as.factor(class))) +
geom_density(alpha = 0.75) +
geom_vline(data = . %>%
group_by(class) %>%
summarise(grp.mean = mean(value)),
aes(xintercept = grp.mean, color = class),
linetype = "dashed", size = 1) +
scale_fill_manual(values = c('lawngreen',
'burlywood',
'lightblue',
'darkgreen'),
name = "class") +
scale_color_manual(values = c("black",
"red",
"orange",
"yellow")) +
theme(panel.background = element_blank(),
panel.grid.major = element_line(color = "gray", size = 0.5),
panel.grid.minor = element_line(color = "gray", size = 0.5),
axis.ticks = element_blank()) +
labs(x = "Reflectance Value",
y = "Density",
title = "Density histograms of spectral profiles",
subtitle = "Vertical lines represent mean group reflectance values")
- Note the similarities in overlap between the density plot and the spectral profile.
- These overlapping classes may prove to be difficult to distinguish via the classification.
Classifyng the image
Now we can move onto classifying the image by training the model. As a first step, we are going to combine the classes and extracted point values in one data frame:
training_set <- data.frame(training_points$class, training_values_matrix)
We then use the rpart() to train the model
model_class <- rpart(as.factor(training_points.class) ~ . ,
data = training_set, method = 'class')
Now, we are going to plot a decision tree resulting from the training
rpart.plot(model_class, box.palette = 0, main = "Classification Tree")
We are set with the training section and the model creation. Now we can run the model to obtain the predictions of the entire image:
predictions <- predict(image, model_class,
type = 'class',
progress = 'text') %>%
ratify()
## | | | 0% | |============== | 25% | |============================ | 50% | |========================================= | 75% | |=======================================================| 100%
##
After we have all the predictions for the entire image, we are going to set the levels to our selected classes:
levels(predictions) <- levels(predictions)[[1]] %>%
mutate(legend = c("agriculture","urban","water", "other vegetation"))
Finally, let’s check the result with a plot:
levelplot(predictions, maxpixels = 1e6,
col.regions = c('lawngreen', 'burlywood', 'lightblue', 'darkgreen'),
scales = list(draw = FALSE),
main = "Supervised Classification of Imagery")
- Does the results seem reasonable?
- Why or why not?
- Where are areas or conflict?
To validate our results, we can use a confusion matrix in which we can check the predicted values against the ground-truth points:
test <- raster::extract(predictions, training_points) %>%
as.data.frame() %>%
rename(id = ".")
## Warning in .local(x, y, ...): Transforming SpatialPoints to the
## crs of the Raster
test_probs <- data.frame(
obs = as.factor(training_points$id),
pred = as.factor(test$id)
) %>%
mutate(correct = ifelse(obs == pred, 1, 0))
confMatrix <- confusionMatrix(test_probs$obs, test_probs$pred)
confMatrix
## Confusion Matrix and Statistics
##
## Reference
## Prediction 1 2 3 4
## 1 24 0 2 0
## 2 0 37 2 0
## 3 0 0 0 12
## 4 0 0 14 0
##
## Overall Statistics
##
## Accuracy : 0.6703
## 95% CI : (0.5639, 0.7653)
## No Information Rate : 0.4066
## P-Value [Acc > NIR] : 3.399e-07
##
## Kappa : 0.5317
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: 1 Class: 2 Class: 3 Class: 4
## Sensitivity 1.0000 1.0000 0.0000 0.0000
## Specificity 0.9701 0.9630 0.8356 0.8228
## Pos Pred Value 0.9231 0.9487 0.0000 0.0000
## Neg Pred Value 1.0000 1.0000 0.7722 0.8442
## Prevalence 0.2637 0.4066 0.1978 0.1319
## Detection Rate 0.2637 0.4066 0.0000 0.0000
## Detection Prevalence 0.2857 0.4286 0.1319 0.1538
## Balanced Accuracy 0.9851 0.9815 0.4178 0.4114
- Review the results, what class or classes tend to be misrepresented or misclassified?
- The rows and columns of the confusion matrix should provide this information, as well as the sensitivity analysis
- What is your overall accuracy?
If you need help with the interpretation of the confusion matrix, you can check this information
Unsupervised Image Classification
This below section for the EAS451 class of Fall 2021 is optional
For this section we are going to be completely hands off in the classification process. The idea is that the classification method will find statistically distinct pixels or clusters of pixels and use those to define numbered classes, this will be refined multiple times to give the fewest number of distinct classes as possible.
We will use 3 different methods:
- Kmeans
- Clara Clustering
- Random Forest
Before we start, we need the original image data that we created above with the bands.
From this image, we will create a matrix of values for the classification:
v <- getValues(image)
i <- which(!is.na(v))
v <- na.omit(v) #remove NA values because they cannot be classified
Kmeans classification
For this classification method, we will use the kmeans
package. The
information of this package can be found
here
In the kmeans()
function we are specificying 10 initial centers of
clusters, runninig 100 iterations and 10 intiial configurations. Then,
we plot the results
clusters <- kmeans(v, 4, iter.max = 100, nstart = 10)
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 47147350)
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 47147350)
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 47147350)
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 47147350)
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 47147350)
kmeans_raster <- raster(image)
kmeans_raster[i] <- clusters$cluster
plot(kmeans_raster)
Clara classification
For this classification method, we will use the clara
package. The
information of this package can be found
here
clus <- clara(v, 4, samples = 500, metric = "manhattan", pamLike = T)
clara_raster <- raster(image)
clara_raster[i] <- clus$clustering
plot(clara_raster, col = topo.colors(5)) #try other colors as well
Unsupervised Random Forest classification using kmeans
For this classification method, we will use the randomForest
package.
The information of this package can be found
here
vx <- v[sample(nrow(v), 500),]
rf = randomForest(vx)
rf_prox <- randomForest(vx, ntree = 1000, proximity = TRUE)$proximity
random_forest_kmeans <- kmeans(rf_prox, 4, iter.max = 100, nstart = 10)
random_forest_class <- randomForest(vx,
as.factor(random_forest_kmeans$cluster),
ntree = 500)
random_forest_raster <- predict(image, random_forest_class)
plot(random_forest_raster)
Plotting all models results
Finally, we can plot all the three models results into one layerstack to compare the results:
class_stack <- stack(kmeans_raster, clara_raster, random_forest_raster)
names(class_stack) <- c("kmeans", "clara", "randomForest")
par(mfrow = c(3, 1))
plot(class_stack)
- Ideally we would then test the accuracy against the ground truth file, however that would require the manual merging of the unsupervised classes and then assessing accuracy wich for the sake of this assignment and time boundaries is not required