What you shall learn according to the descrition of the course:
And besides this:
In the short available time we cannot cover all of these topics in detail. Therefore, we aim to provide a starting point that enables you to continue studying and learning.
We will teach you the scheduled content using the open-source software R
(https://www.r-project.org/ and http://cran.r-project.org/). The reason: You do not need to learn many different software tools. If you are able to use R
you can do every GIS and geostatistic related task.
Here is a not exhaustive list of useful literature about spatial data analyses, (geo-)statistics, and machine learing.
R
R
is a high-level computer/programming language and environment for data analysis and graphics (Crawley 2012). What can R
do for you? “R can do anything you can imagine” (Zuur et al. 2009, p.1). You can write functions, do calculations, apply hundreds of statistical and geostatistical techniques, create complex graphs, and adapt it to your needs by writing your own library functions. R
is supported by a huge user group, so besides continuous development of the software, you will always find experts able to help you with R
related questions; e.g. using mailing lists (general: https://www.r-project.org/mail.html or specific: https://stat.ethz.ch/mailman/listinfo/R-SIG-Geo/), at SO (http://stackoverflow.com/questions/tagged/r), using vignettes that are available for a lot of packages, functions, and problems (https://stat.ethz.ch/R-manual/R-devel/library/utils/html/vignette.html), and finally via google search to the rest of the world not listed here. Ways to find help in R
are nicely summarised in the SO answer: http://stackoverflow.com/questions/15289995/how-to-get-help-in-r.
A rising number of research institutes, companies, and universities have migrated to R
, what gets obvious by looking at the number of scientific articles (see http://r4stats.com/articles/popularity/) as well as by the large amount of books published about R and topics related to (geo-)statistics. A non-exhaustive collection:
By the way: R is available free of charge — for everyone, everywhere, any time. R is free software. If you want to learn more about this fundamental and important aspects have a look at https://www.fsf.org/ as well as https://www.gnu.org/.
Having said this, there are a lot of different ways to use R
. Besides R
s own GUI, my personal favourite is ESS, what stands for Emacs Speaks Statistics, an add-on for the famous GNU emacs text editor (more information at http://ess.r-project.org/). Nevertheless, since learning emacs demands a course on its we are going to use R-Studio, probably the most accessible and popular GUI for R at the moment (more information at https://www.rstudio.com/; a collection of GUIs for R
at wikipedia.
Video lectures (like An Introduction to Quantitative Inference and Thinking); YouTube in general is a great resource to find help about R, statistics, etc. A complete lecture series on Geographical Analysis at University Utah by Dr. Steven Farber can be found here
Massive Open Online Courses (MOOC) on R e.g. at edX or at Coursera
Interactive, online “Introduction to R”: https://www.datacamp.com/courses/free-introduction-to-r
Another good introduction to R
: https://ramnathv.github.io/pycon2014-r/
Stay updated: http://www.r-bloggers.com/
In the simplest case R
can be used directly from the console. Let’s try it by using R
as a calculator. Just type the following in and hit enter after each line:
1+1
## [1] 2
10-1
## [1] 9
3*5
## [1] 15
12/3
## [1] 4
16%/%3
## [1] 5
16%%3
## [1] 1
12^2
## [1] 144
sqrt(16)
## [1] 4
log(1)
## [1] 0
log10(100)
## [1] 2
exp(0)
## [1] 1
What is %%
and %/%
? Let’s find out:
help(%%)
Anything more that you do not understand? Search the help for it! There are many different possibilities to do it…
Another resource for beginners is the official “An Introduction to R”-documentation.
swirl
packageIf you want to learn more about R
you can use the interactive tutorial from the swirl
package to get started on your own. “The swirl R package makes it fun and easy to learn R programming and data science. If you are new to R, have no fear.” (http://swirlstats.com/students.html)
To install and use swirl type the following code in your R
console.
install.packages("swirl", dependencies = TRUE)
library(swirl)
swirl()
Excercise
- play around and get familiar with
R
and RStudio- install the
swirl
package and do the first unit (“R Programming: The basics of programming in R”).
“A R
script (basically any script) is simply a text file containing (almost) the same commands that you would enter on the command line of R
” (https://cran.r-project.org/doc/contrib/Lemon-kickstart/kr_scrpt.html).
This is a great feature since while writing a script you automatically have a documentation of your work, hence it is possible for you (and others) to reconstruct how you produced your results. Besides, you can share your script with other researchers in order to debug it, enhance it, get feedback on it, help others with it, …
Think of a script like a publication and follow some basic rules to get the most of it. Give a title, mention the purpose, give references, set the license,…:
################################################################################
## An example of how to write the header of a R Script
## =============================================================================
## Project: GIS in Geostatistics in Sri Lank
## Author: Daniel Knitter
## Version: 01
## Date of last changes: So 30. Aug 17:26:27 CEST 2015
## Data:
## Author of data:
## Purpose: just an example
## Content: nothing yet
## Licence data: -
## Licence Script: GPL
##
## how to cite a package? citation(package="PACKAGE-NAME")
################################################################################
Please recognise the #
symbol. It defines the rest of the line as a comment and is not interpreted by R
. Hence, everywhere in your script where you want to make a remark you can just do it using a comment.
sqrt(12) # sqrt() means square root of something in the brackets
## [1] 3.464102
Before we you will fill your script with commands we have to define a style guide. There are some style guides available, for instance:
It does not matter which style guide you use, but be consistent. Here are some examples for points in a style guide:
At rOpenSci you will find packages that help you to access data repositories through R
. “Transforming science through open data – We are changing how science works” https://ropensci.org/.
Since these ideas are important here are some of these packages, allowing data access and analyses.
Install required packages
install.packages("devtools")
require(devtools)
## what the heck is the difference between "library" and "require"? ##
install.packages("rjson")
install_github("geonames","barryrowlingson")
And load the package. Here is the point where your username is required
library(geonames)
options(geonamesUsername="YOURUSERNAME")
A tutorial on how to access and use the data is here. We want to use the package to get information about the development of temperature and precipitation between 1960 and 2050.
install.packages("rWBclimate")
library(rWBclimate)
Now, get your ISO 3 country code and start to download some data
We are going to use topographic information that can downloaded from the homepage of the department. Please download and extract the free shapefile-set they offer here.
Census Data of Sri Lanka can be accessed via the great and brand new (12/2014) LankaSIS.
We collected some datasets for you that we thought might be interesting. Wait for the exercises.
[The following is to a large extent taken from Knitter & Nakoinz (submitted): “Point Pattern Analysis as Tool for Digital Geoarchaeology – A Case Study of Megalithic Graves in Schleswig-Holstein, Germany”]
Statistics is a very large, sometimes overwhelmingly large, subject. Nevertheless, there are good news: in focusing on “Geostatistics and GIS” we already defined the focus of our statistical analyses: everything we are investigating is concerned with space and hence spatial data.
In contrast to normal everyday statistical data, spatial data are special because they do not fulfil one of the most common prerequisites of conventional statistical analyses: they are not random, i.e. stochastically independent. This causes the specificity of spatial data (collection after O’Sullivan & Unwin 2010, p.34):
Many of these points may sound trivial. Nevertheless, it is important to be aware of them since they directly influence the results. Spatial data are the result of processes. In analysing them it is possible to detect functional relationships. But these do not infer causality (see Ahnert 2003, pp.19–20). Hence, it needs to be discussed continuously, whether these processes are the actual reason of the configuration of spatial data or just an artefact of the analytical approach.
An overview of the different spatial analytical tools and packages (135 are listed on August 31 2015) available for R
you can find at https://cran.r-project.org/web/views/Spatial.html A great introduction into the handling of spatial data is given by Lovelace & Cheshire (2015) and can be downloaded from Lovelace’s github account.
“Leaflet is one of the most popular open-source JavaScript libraries for interactive maps. It’s used by websites ranging from The New York Times and The Washington Post to GitHub and Flickr, as well as GIS specialists like OpenStreetMap, Mapbox, and CartoDB” (https://rstudio.github.io/leaflet/)
The R
package makes it easy to integrate and control Leaflet maps in R.
First install and load the necessary packages
## devtools::install_github("rstudio/leaflet")
library(leaflet)
Well, before we produce some maps we shall define a location/an area that we want to see. How about this campus? A search for your campus on http://wikimapia.org gave gives us the geographic coordinates in degree, minutes, and seconds. This is a small problem, since we need them in decimal degree. R
to the rescue, we just recalculate the values by writing our very own functions.
This is a small task to learn how to write functions. The equations used within the functions can be found at wikipedia and via google I found another version here.
dms.to.dd <- function(d,m,s) {
dd <- d + (m/60) + (s/3600)
return(dd)
}
dd.to.dms1 <- function(dd) {
dd <- as.numeric(dd)
d <- floor(dd)
m <- floor((dd - d)*60)
s <- floor((dd - d - m/60)*3600)
dms <- paste(d,"°",m,"'",s,"\'\'",sep = "")
return(dms)
}
dd.to.dms2 <- function(dd) {
dd <- as.numeric(dd)
d <- floor(dd)
m <- floor((abs(dd) * 60))%%60
s <- floor((abs(dd) * 3600))%%60
dms <- paste(d,"°",m,"'",s,"\'\'",sep = "")
return(dms)
}
Question: Which version of the dd.to.dms
function is more convenient? And why? Question: How to advance the code? What is bad with the code at the moment?
Let us use our brand new functions. Get some geographic coordinates of your PGIS institute (I found these 7°15'30"N 80°35'47"E
on http://wikimapia.org) and try them out.
co.pgis <- c(lat = dms.to.dd(7,15,30),lon = dms.to.dd(80,35,47),name = "Welcome at PGIS :)")
co.pgis
## lat lon name
## "7.25833333333333" "80.5963888888889" "Welcome at PGIS :)"
Let’s see, whether our functions lead to the same results:
dms.co.pgis1 <- c(dd.to.dms1(co.pgis[1]),dd.to.dms1(co.pgis[2]))
dms.co.pgis2 <- c(dd.to.dms2(co.pgis[1]),dd.to.dms2(co.pgis[2]))
dms.co.pgis1
## [1] "7°15'29''" "80°35'47''"
dms.co.pgis2
## [1] "7°15'29''" "80°35'47''"
And now, produce some nice interactive maps…and try to make sense of the %>%
symbol.
m <- leaflet() %>%
addTiles() %>%
addMarkers(lng=as.numeric(co.pgis[2]), lat=as.numeric(co.pgis[1]))
m
This produces an output like this with the default OpenStreetMap background.
You can also change the map tile provider for a wide range of different maps. An overview can be found here: http://leaflet-extras.github.io/leaflet-providers/preview/index.html
m1 <- leaflet() %>%
addProviderTiles("Thunderforest.Landscape") %>%
addMarkers(lng=as.numeric(co.pgis[2]), lat=as.numeric(co.pgis[1]), popup = "PGIS")
m1
m2 <- leaflet() %>%
addProviderTiles("Stamen.Watercolor") %>%
addMarkers(lng=co.pgis[2], lat=co.pgis[1], popup=as.character(co.pgis[3])) %>%
setView(lng = as.numeric(co.pgis[2]), lat = as.numeric(co.pgis[1]), zoom = 10)
m2
Question: What are the differences in m1
and m2
besides the different map tile provider?
Exercise
Change the map tile provider and add another marker to the map (probably your hometown?)
Ahnert, F., 2003. Einführung in die Geomorphologie, Stuttgart: Eugen Ulmer.
Baddeley, A., 2008. Analysing spatial point patterns in R, CSIRO; University of Western Australia. Available at: http://www.csiro.au/files/files/pn0y.pdf.
Baddeley, A. & Turner, R., 2005. Spatstat: An R package for analyzing spatial point patterns. Journal of Statistical Software, 12(6), pp.1–42. Available at: www.jstatsoft.org.
Bivand, R.S., Pebesma, E.J. & Gómez-Rubio, V., 2008. Applied Spatial Data Analysis with R, New York: Springer.
Borcard, D., Gillet, F. & Legendre, P., 2011. Numerical Ecology with R, New York, NY: Springer New York. Available at: http://link.springer.com/10.1007/978-1-4419-7976-6 [Accessed March 12, 2015].
Crawley, M.J., 2012. The R Book, Chichester, UK: John Wiley & Sons, Ltd.
Diggle, P.J., 2013. Statistical Analysis of Spatial and Spatio-Temporal Point Patterns, Third Edition 3rd ed., Boca Raton: Chapman; Hall/CRC.
Everitt, B., 2006. A handbook of statistical analyses using R, Boca Raton: Chapman & Hall/CRC.
Fortin, M.-J. & Dale, M.R.T., 2005. Spatial analysis a guide for ecologists, Cambridge, N.Y.: Cambridge University Press.
Friedman, J., Hastie, T. & Tibshirani, R., 2001. The elements of statistical learning, Springer series in statistics Springer, Berlin. Available at: http://statweb.stanford.edu/~tibs/book/preface.ps [Accessed May 22, 2015].
Gaetan, C. & Guyon, X., 2010. Spatial Statistics and Modeling, New York, NY: Springer New York. Available at: http://link.springer.com/10.1007/978-0-387-92257-7 [Accessed March 13, 2015].
Gelfand, A.E. et al., 2010. Handbook of Spatial Statistics, CRC Press.
Glenberg, A.M. & Andrzejewski, M.E., 2008. Learning from data: An introduction to statistical reasoning 3rd ed., New York: Lawrence Erlbaum Associates.
Haining, R.P., 2003. Spatial data analysis theory and practice, Cambridge, UK; New York: Cambridge University Press.
Hengl, T., 2009. A practical guide to geostatistical mapping 2nd extended ed., Amsterdam: Hengl.
Illian, J. et al., 2008. Statistical Analysis and Modelling of Spatial Point Patterns, West Sussex: John Wiley & Sons.
James, G. et al., 2013. An Introduction to Statistical Learning, New York, NY: Springer New York. Available at: http://link.springer.com/10.1007/978-1-4614-7138-7 [Accessed May 11, 2015].
Legendre, P. & Legendre, L., 2012. Numerical ecology Third English edition., Amsterdam: Elsevier.
Lloyd, C.D., 2011. Local Models for Spatial Analysis, Boca Raton: CRC Press.
Lovelace, R. & Cheshire, J., 2015. Introduction to visualising spatial data in R, Available at: https://github.com/Robinlovelace/Creating-maps-in-R/raw/master/intro-spatial-rl.pdf [Accessed November 29, 2014].
Maindonald, J. & Braun, J., 2003. Data Analysis and Graphics Using R 1st ed., Cambridge University Press.
Openshaw, S., 1984. The modifiable areal unit problem, Norwich: Geo Abstracts Univ. of East Anglia.
O’Sullivan, D. & Perry, G.L.W., 2013. Spatial simulation: Exploring pattern and process, Chichester, West Sussex, UK: John Wiley & Sons Inc.
O’Sullivan, D. & Unwin, D., 2010. Geographic information analysis, Hoboken: John Wiley & Sons.
Pilz, J., 2009. Interfacing Geostatistics and GIS, Springer.
Radziwill, N.M., 2015. Statistics (The Easier Way) with R: An informal text on applied statistics, San Francisco, California: Lapis Lucera.
Ripley, B.D., 2004. Spatial statistics, Hoboken, N.J: Wiley-Interscience.
Schabenberger, O. & Gotway, C.A., 2005. Statistical methods for spatial data analysis, Boca Raton: Chapman & Hall.
Schumacker, R. & Tomek, S., 2013. Understanding Statistics Using R, New York, NY: Springer New York. Available at: http://link.springer.com/10.1007/978-1-4614-6227-9 [Accessed March 12, 2015].
Soetaert, K., Cash, J. & Mazzia, F., 2012. Solving differential equations in R, New York: Springer.
Stevens, M.H., 2010. A Primer of Ecology with R 1st ed. 2009 edition., Dordrecht ; New York: Springer.
Tobler, W.R., 1970. A Computer Movie Simulating Urban Growth in the Detroit Region. Economic Geography, 46, pp.234–240. Available at: http://www.jstor.org/stable/143141 [Accessed August 22, 2012].
Wickham, H., 2009. Ggplot2: Elegant Graphics for Data Analysis, New York: Springer.
Wiegand, T. & Moloney, K.A., 2013. Handbook of Spatial Point-Pattern Analysis in Ecology, CRC Press.
Zhao, Y., 2012. R and Data Mining: Examples and Case Studies, Academic Press, Elsevier. Available at: http://www.rdatamining.com/docs/RDataMining.pdf.
Zuur, A.F., Ieno, E.N. & Meesters, E., 2009. A Beginner’s Guide to R 1st ed., Springer.