THE references for this subject are:
And the best thing is: the authors are active developers of R
. The package we will use in this topic very often is spatstat
. A package that powerful that the manual alone has over 1400 pages (as of 2015-09-26).
If you use the package in your research, do not forget to cite it (this holds true for every package used, including the base software R
):
library(spatstat)
citation(package = "spatstat")
##
## To cite spatstat in publications use:
##
## Adrian Baddeley, Ege Rubak, Rolf Turner (2015). Spatial Point
## Patterns: Methodology and Applications with R. London: Chapman
## and Hall/CRC Press, 2015. In press. URL
## http://www.taylorandfrancis.com/books/details/9781482210200/
##
## Adrian Baddeley, Rolf Turner (2005). spatstat: An R Package for
## Analyzing Spatial Point Patterns. Journal of Statistical
## Software 12(6), 1-42. URL http://www.jstatsoft.org/v12/i06/.
##
## If you use hybrid models, please also cite:
##
## Adrian Baddeley, Rolf Turner, Jorge Mateu, Andrew Bevan (2013).
## Hybrids of Gibbs Point Process Models and Their Implementation.
## Journal of Statistical Software, 55(11), 1-43. URL
## http://www.jstatsoft.org/v55/i11/.
[following text largely from Knitter & Nakoinz (submitted)]
Point patterns are the result of processes that are influenced by (a) first-order effects, i.e. the location of the point is influenced by the underlying structure of the area but not by the location of other points (Wiegand & Moloney 2004, p.210); (b) second-order effects that occur when the location of a point is influenced by the presence or absence of other points (Wiegand & Moloney 2004, p.210). Point pattern analyses are common in ecological studies (e.g. Legendre & Legendre 2012; Wiegand & Moloney 2013); up to now there are only few applications in archaeological contexts (e.g. Knitter et al. 2014).
Load shapefiles and create point dataset of the city polygons.
library(rgdal)
boundary <- readOGR(dsn = "./data/DSL250-Shp/", layer = "Boundary")
## OGR data source with driver: ESRI Shapefile
## Source: "./data/DSL250-Shp/", layer: "Boundary"
## with 504 features
## It has 5 fields
cities <- readOGR(dsn = "./data/DSL250-Shp", layer = "Cities")
## OGR data source with driver: ESRI Shapefile
## Source: "./data/DSL250-Shp", layer: "Cities"
## with 27 features
## It has 6 fields
library(rgeos)
cities.p <- gCentroid(cities, byid = TRUE)
par(mfrow = c(1,2))
plot(cities)
plot(cities.p)
## prepare the data
x.co <- cities.p@coords[,1]
y.co <- cities.p@coords[,2]
n <- length(cities.p@coords[,1])
## calculate mean center
mc <- SpatialPoints(coords = cbind(sum(x.co)/n,sum(y.co)/n), proj4string = cities.p@proj4string)
mc
## class : SpatialPoints
## features : 1
## extent : 188800.9, 188800.9, 265830.7, 265830.7 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=tmerc +lat_0=7.000480277777778 +lon_0=80.77171111111112 +k=0.9999238418 +x_0=200000 +y_0=200000 +a=6377276.345 +b=6356075.41314024 +units=m +no_defs
## calculate standard distance
sd <- sqrt(sum((x.co-mean(x.co))^2+(y.co-mean(y.co))^2)/n)
sd
## [1] 128172.4
## plot it
plot(boundary)
points(cities.p, pch = 19, cex = .5)
points(mc, col="red", pch = 19)
library(plotrix)
draw.circle(x = mc@coords[,1], y = mc@coords[,2], radius = sd, border = "red")
title("Mean center and standard distance \nof cities.p in Sri Lanka")
## get Area of Sri Lanka and change it to sqkm
area <- gArea(boundary)
area <- area/1000000
## calculate intensity
intensity <- length(cities.p)/area
intensity
## [1] 0.0004115326
ppp
objects – Conducting Point Pattern AnalysisFirst, let us create the point pattern.
## another window -- when "boundary" is too detailed
# library(mapdata)
# sl <- map("worldHires","Sri Lanka")
# sl <- data.frame(x=sl$x,y=sl$y)
# sl <- sl[is.na(sl$x)==FALSE & is.na(sl$y)==FALSE,]
# sl.owin <- owin(poly = sl)
# plot(sl.owin) ## quite rough, i know... but it's fast
library(spatstat)
library(maptools)
sl.owin <- as.owin.SpatialPolygons(unionSpatialPolygons(boundary, IDs = boundary@data$OUTERB_ID))
pp.cit <- ppp(x = cities.p@coords[,1], y = cities.p@coords[,2], window = sl.owin)
#str(pp.cit)
pp.vil <- ppp(x = vil@coords[,1], y = vil@coords[,2], window = sl.owin)
## Warning in ppp(x = vil@coords[, 1], y = vil@coords[, 2], window = sl.owin):
## 51 points were rejected as lying outside the specified window
#plot(pp.vil, pch = 3, cex = .5)
plot(pp.cit, pch = 19, cols = "red")
As you can see the plot of pp.cit
already shows the boundary of Sri Lanka. Why? And: why is this important? (hint: ?owin
)
qc.cit <- quadratcount(X = pp.cit)
qc.vil <- quadratcount(X = pp.vil)
par(mfrow = c(1,2))
plot(qc.cit)
plot(qc.vil)
So the question is: does the quadratcount indicates CSR? To check we use a \(\chi^2\) test approach (remember: Relation between observed (i.e. empirical) and expected (i.e. theoretical, here CSR) amounts of points in quadrants)
qt.cit <- quadrat.test(X = pp.cit)
## Warning: Some expected counts are small; chi^2 approximation may be
## inaccurate
qt.cit
##
## Chi-squared test of CSR using quadrat counts
## Pearson X2 statistic
##
## data: pp.cit
## X2 = 30.323, df = 21, p-value = 0.1714
## alternative hypothesis: two.sided
##
## Quadrats: 22 tiles (irregular windows)
qt.vil <- quadrat.test(X = pp.vil)
qt.vil
##
## Chi-squared test of CSR using quadrat counts
## Pearson X2 statistic
##
## data: pp.vil
## X2 = 304.3, df = 21, p-value < 2.2e-16
## alternative hypothesis: two.sided
##
## Quadrats: 22 tiles (irregular windows)
# par(mfrow = c(1,2))
# plot(qt.cit)
# plot(qt.vil)
Exercise: What influence does a change in the amount and density of the quadrants have? Is the current approach useful?
nn.cit <- nndist(X = pp.cit)
nn.vil <- nndist(X = pp.vil)
str(nn.cit)
## num [1:27] 51998 47135 47135 68439 47586 ...
mean(nn.cit)
## [1] 38571.04
mean(nn.vil)
## [1] 2819.191
hist(nn.cit)
abline(v=mean(nn.cit))
abline(v=median(nn.cit), lty=2)
hist(nn.vil)
abline(v=mean(nn.vil))
abline(v=median(nn.vil), lty=2)
“An R value of less than 1 indicates of a tendency toward clustering, since it shows that observed nearest neighbor distances are shorter than expected. An R value of more than 1 indicatives of a tendency toward evenly spaced events” (O’Sullivan & Unwin 2010, 144)
nnE <- 1/(2*sqrt((pp.vil$n/gArea(boundary))))
nnE
## [1] 2493.517
R.vil <- mean(nn.vil)/nnE
R.vil
## [1] 1.130608
nnE <- 1/(2*sqrt((pp.cit$n/gArea(boundary))))
nnE
## [1] 24647.22
R.cit <- mean(nn.cit)/nnE
R.cit
## [1] 1.564925
Amount of events within pixel…in imprecise language: a histogram in space. This is a step by step example but the code is rather inefficient.
cs <- 50000 # cellsize
# enlarge the study area by ONE pixel (in E-W and N-S direction)
xmin <- pp.cit$window$xrange[1] - cs/2 # enlarge the study area by cs/2
xmax <- pp.cit$window$xrange[2] + cs/2 # enlarge the study area by cs/2
ymin <- pp.cit$window$yrange[1] - cs/2 # enlarge the study area by cs/2
ymax <- pp.cit$window$yrange[2] + cs/2 # enlarge the study area by cs/2
rows <- round((ymax-ymin)/cs, 0) + 1 # calculate the number of rows (add 1 just because a pixl might get lost through the rounding operation)
columns <- round((xmax-xmin)/cs, 0) + 1 # calculate the number of columns (add 1 just because a pixl might get lost through the rounding operation)
z <- cbind(1:(columns*rows)) # create a vector with all grid cells
df <- data.frame(z) # create a data.frame of it
gt <- GridTopology(c(pp.cit$window$xrange[1] - cs/2,pp.cit$window$yrange[1] -
cs/2), c(cs,cs), c(columns,rows)) # create a topological description of a grid and aftecsards...
gt # ...have a look at it and...
## min min
## cellcentre.offset 37439.5 55431.19
## cellsize 50000.0 50000.00
## cells.dim 7.0 11.00
sgdf <- SpatialGridDataFrame(gt, df, proj4string = cities.p@proj4string) # ...create the grid.
for (i in seq(along=coordinates(gt)[,1])){ # loop for every cell
x <- coordinates(gt)[i,1] - cs/2 # because the coordinate is define for the center of the cell, the half cellsize is substracted
y <- coordinates(gt)[i,2] - cs/2 # because the coordinate is define for the center of the cell, the half cellsize is substracted
xi <- which(pp.cit$x>x & pp.cit$x<x+cs) # which events lie within the x direction?
yi <- which(pp.cit$y>y & pp.cit$y<y+cs) # which events lie within the y direction?
pz <- length(intersect(xi,yi)) # how many objects in x and y direction intersect?
sgdf@data$z[i]<- pz / (cs/1000)^2 # divide the number of points by the area
}
plot(raster(sgdf), col = gray.colors(25, start = 0.97, end = 0.4))
plot(boundary, add=TRUE)
points(pp.cit$x, pp.cit$y, pch=16, cex=0.4)
Kernel density estimation; again here performed step by step and rather inefficiently coded. Use it to understand but do not run the code.
sgdf_kde <- sgdf # copy the grid into a new variable
sd <- 100000 # define the bandwidth of the kernel - it is NOT the pixel size; this is defined in the beginning of the 1str approach
# ...and think of it, the bandwidth shall not be smaller than the pixelsize, cause points outside the kernel are "underestimated"
for (i in seq(along=coordinates(gt)[,1])){ # loop through all pixel
x <- coordinates(gt)[i,1]
y <- coordinates(gt)[i,2]
g2 <- 0
for (j in seq(along=pp.cit$x)){ # we use ALL points
distanz <- sqrt((pp.cit$x[j] - x)^2 + (pp.cit$y[j] - y)^2) # calculate this distance of one point to the centre of the raster cell. this is necessary for the...
g1 <- dnorm(distanz, mean=0, sd=sd) #...weighting of point in th e cell using a normal distributed kernel
g2 <- g2 + g1 # now collect the weights for the different points under in the pixel to get the density value for the pixel
}
sgdf_kde@data$z[i]<- g2
}
plot(raster(sgdf_kde), col = gray.colors(25, start = 0.97, end = 0.4))
plot(boundary, add=TRUE)
points(pp.cit$x, pp.cit$y, pch=16, cex=0.4)
Kernel density estimation using the built in function of spatstat
. Since the package is great, it is perfectly coded and fast. Play around with the values. Calculate also a KDE for the village data set.
cs <- 1000
sd <- 50000
dens_r5 <- density(pp.cit, sd, eps=cs, edge=TRUE, at="pixels")
plot(dens_r5, col = gray.colors(25, start = 0.97, end = 0.4))
#contour(dens_r5, add=T)
points(pp.cit$x, pp.cit$y, pch=16, cex=0.4)
# use another measure for sd - in this case three times the mean nearest neighbor distance
sdev <- 3*mean(nndist(pp.cit))
dens_r6 <- density(pp.cit, sdev, eps=cs, edge=TRUE, at="pixels")
plot(dens_r6, col = gray.colors(25, start = 0.97, end = 0.4))
#contour(dens_r6, add=T)
points(pp.cit$x, pp.cit$y, pch=16, cex=0.4)
density
function to calculate the KDE for the village data set.spatstat
manualFirst approach based on theoretical assumptions, i.e. CSR
g.cit <- Gest(pp.cit)
plot(g.cit)
f.cit <- Fest(pp.cit)
plot(f.cit)
Second approach based on simulations of a theortical process, i.e. CSR
g.cit.env <- envelope(pp.cit, fun = Gest, nsim = 10)
## Generating 10 simulations of CSR ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
##
## Done.
plot(g.cit.env)
f.cit.env <- envelope(pp.cit, fun = Fest, nsim = 10)
## Generating 10 simulations of CSR ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
##
## Done.
plot(f.cit.env)
Question: Interpret the results. Why two approaches? What are the advantages?
First approach based on theoretical assumptions, i.e. CSR
k.cit <- Kest(pp.cit)
plot(k.cit)
l.cit <- Lest(pp.cit)
plot(l.cit)
Second approach based on simulations of a theortical process, i.e. CSR
k.cit.env <- envelope(pp.cit, fun = Kest, nsim = 10)
## Generating 10 simulations of CSR ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
##
## Done.
plot(k.cit.env)
l.cit.env <- envelope(pp.cit, fun = Lest, nsim = 10)
## Generating 10 simulations of CSR ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
##
## Done.
plot(l.cit.env)
Question: Interpret the results. Why two approaches? What are the advantages?
spatstat
manualBaddeley, A., 2008. Analysing spatial point patterns in R, CSIRO; University of Western Australia. Available at: http://www.csiro.au/files/files/pn0y.pdf.
Bivand, R.S., Pebesma, E.J. & Gómez-Rubio, V., 2008. Applied Spatial Data Analysis with R, New York: Springer.
Diggle, P.J., 2013. Statistical Analysis of Spatial and Spatio-Temporal Point Patterns, Third Edition 3rd ed., Boca Raton: Chapman; Hall/CRC.
Knitter, D. et al., 2014. The Centrality of Aleppo and its environs. eTopoi. Journal for Ancient Studies, 3, pp.107–127.
Legendre, P. & Legendre, L., 2012. Numerical ecology Third English edition., Amsterdam: Elsevier.
Wiegand, T. & Moloney, K.A., 2013. Handbook of Spatial Point-Pattern Analysis in Ecology, CRC Press.
Wiegand, T. & Moloney, K.A., 2004. Rings, Circles, and Null-Models for Point Pattern Analysis in Ecology. Oikos, 104(2), pp.209–229. Available at: http://www.jstor.org/stable/3547954 [Accessed March 13, 2015].