Title: | Point Process Models with LASSO-Type Penalties |
---|---|
Description: | Toolkit for fitting point process models with sequences of LASSO penalties ("regularisation paths"), as described in Renner, I.W. and Warton, D.I. (2013) <doi:10.1111/j.1541-0420.2012.01824.x>. Regularisation paths of Poisson point process models or area-interaction models can be fitted with LASSO, adaptive LASSO or elastic net penalties. A number of criteria are available to judge the bias-variance tradeoff. |
Authors: | Ian Renner |
Maintainer: | Ian Renner <[email protected]> |
License: | GPL-3 |
Version: | 1.4 |
Built: | 2025-02-17 02:58:32 UTC |
Source: | https://github.com/cran/ppmlasso |
This package contains tools to fit point process models with sequences of LASSO penalties ("regularisation paths"). Regularisation paths of Poisson point process models or area-interaction models can be fitted with LASSO, adaptive LASSO or elastic net penalties. A number of criteria are available to judge the bias-variance tradeoff.
The key functions in ppmlasso are as follows:
Useful pre-analysis functions:
findRes
Determine the optimal spatial resolution at which to perform analysis
getEnvVar
Interpolate environmental data to species presence locations
griddify
Ensure a matrix of environmental data is on a rectangular grid
ppmdat
Calculate observation weights and set up design matrix for fitting
pointInteractions
Calculate interpoint interactions for fitting area-interaction models
sampleQuad
Set up a regular grid of quadrature points
Creating regularisation paths of point process models:
ppmlasso
Fit a regularisation path of point process models
plotFit
Plot the fitted intensity of a ppmlasso
object
plotPath
Plot the regularisation path of a ppmlasso
object
print.ppmlasso
Print output from a ppmlasso
object
predict.ppmlasso
Make predictions from a fitted point process model to new data
Checking assumptions:
diagnose.ppmlasso
Create diagnostic residual plots of ppmlasso
object
envelope.ppmlasso
Create simulation envelope for goodness-of-fit checks on a ppmlasso
object
Ian W. Renner
Maintainer: Ian W. Renner <[email protected]>
Renner, I.W. & Warton, D.I. (2013). Equivalence of MAXENT and Poisson point process models for species distribution modeling in ecology Biometrics 69, 274-281.
Renner, I.W. et al (2015). Point process models for presence-only analysis. Methods in Ecology and Evolution 6, 366-379.
Renner, I.W., Warton, D.I., & Hui, F.K.C. (2021). What is the effective sample size of a spatial point process? Australian & New Zealand Journal of Statistics 63, 144-158.
# Fit a regularisation path of Poisson point process models data(BlueMountains) sub.env = BlueMountains$env[BlueMountains$env$Y > 6270 & BlueMountains$env$X > 300,] sub.euc = BlueMountains$eucalypt[BlueMountains$eucalypt$Y > 6270 & BlueMountains$eucalypt$X > 300,] ppm.form = ~ poly(FC, TMP_MIN, TMP_MAX, RAIN_ANN, degree = 2, raw = TRUE) ppm.fit = ppmlasso(ppm.form, sp.xy = sub.euc, env.grid = sub.env, sp.scale = 1, n.fits = 20, writefile = FALSE) # Fit a regularisation path of area-interaction models data(BlueMountains) ai.form = ~ poly(FC, TMP_MIN, TMP_MAX, RAIN_ANN, degree = 2, raw = TRUE) ai.fit = ppmlasso(ai.form, sp.xy = sub.euc, env.grid = sub.env, sp.scale = 1, family = "area.inter", r = 2, availability = BlueMountains$availability, n.fits = 20, writefile = FALSE) # Print a ppmlasso object print(ppm.fit, out = "model") # Residual plot of a ppmlasso object diagnose(ppm.fit, which = "smooth", type = "Pearson") # Make predictions pred.mu = predict(ppm.fit, newdata = sub.env) # Plot the intensity from a fitted ppmlasso object plotFit(ppm.fit) # Plot the regularisation path from a fitted ppmlasso object plotPath(ppm.fit)
# Fit a regularisation path of Poisson point process models data(BlueMountains) sub.env = BlueMountains$env[BlueMountains$env$Y > 6270 & BlueMountains$env$X > 300,] sub.euc = BlueMountains$eucalypt[BlueMountains$eucalypt$Y > 6270 & BlueMountains$eucalypt$X > 300,] ppm.form = ~ poly(FC, TMP_MIN, TMP_MAX, RAIN_ANN, degree = 2, raw = TRUE) ppm.fit = ppmlasso(ppm.form, sp.xy = sub.euc, env.grid = sub.env, sp.scale = 1, n.fits = 20, writefile = FALSE) # Fit a regularisation path of area-interaction models data(BlueMountains) ai.form = ~ poly(FC, TMP_MIN, TMP_MAX, RAIN_ANN, degree = 2, raw = TRUE) ai.fit = ppmlasso(ai.form, sp.xy = sub.euc, env.grid = sub.env, sp.scale = 1, family = "area.inter", r = 2, availability = BlueMountains$availability, n.fits = 20, writefile = FALSE) # Print a ppmlasso object print(ppm.fit, out = "model") # Residual plot of a ppmlasso object diagnose(ppm.fit, which = "smooth", type = "Pearson") # Make predictions pred.mu = predict(ppm.fit, newdata = sub.env) # Plot the intensity from a fitted ppmlasso object plotFit(ppm.fit) # Plot the regularisation path from a fitted ppmlasso object plotPath(ppm.fit)
This data set contains the observed presence locations of a Sydney eucalypt (eucalypt
), the values of four
environmental variables and two variables related to site accessibility throughout the region at a spatial
resolution of 500m (env
), and a matrix indicating whether locations in the region are available to the species (availability
).
data(BlueMountains)
data(BlueMountains)
A list with three objects:
A data frame with a column X
of UTM Easting coordinates (km) and a column Y
of UTM Northing coordinates (km) of observed locations of a Sydney eucalypt
A data frame containing environmental data in the Blue Mountains region near Sydney
A 301x201 matrix with UTM Northing and Easting locations stored in dimnames
indicating whether locations are accessible or not
The data frame env
contains the following environmental data:
UTM Easting coordinates (km)
UTM Northing coordinates (km)
Number of fires since 1943
Distance from the nearest main road (m)
Distance from the nearest urban area (m)
Average annual rainfall (mm)
Average maximum temperature (degrees Celsius)
Average minimum temperature (degrees Celsius)
diagnose
Methods for function diagnose
signature(fit = "ppm")
Creates residual plots for a ppm
object. See the help function for diagnose.ppm
in spatstat
for more details.
signature(fit = "ppmlasso")
Creates residual plots for a ppmlasso
object. See the help function for diagnose.ppm
in spatstat
for more details.
This function is analogous to the diagnose.ppm
function of the spatstat
package.
## S3 method for class 'ppmlasso' diagnose(object, ...)
## S3 method for class 'ppmlasso' diagnose(object, ...)
object |
A fitted regularisation path of point process models. The diagnostic plots will be created for the model that optimises the given criterion. |
... |
Other arguments for producing diagnostic plots, as given by the |
See the help file for diagnose.ppm
in the spatstat
package for further details of diagnostic plots.
Ian W. Renner
Baddeley, A.J. & Turner, R. (2005). Spatstat: an R package for analyzing spatial point patterns. Journal of Statistical Software 12, 1-42.
envelope.ppmlasso
, for other goodness-of-fit functions inherited from spatstat
.
data(BlueMountains) sub.env = BlueMountains$env[BlueMountains$env$Y > 6270 & BlueMountains$env$X > 300,] sub.euc = BlueMountains$eucalypt[BlueMountains$eucalypt$Y > 6270 & BlueMountains$eucalypt$X > 300,] ppm.form = ~poly(FC, TMP_MIN, TMP_MAX, RAIN_ANN, degree = 2) + poly(D_MAIN_RDS, D_URBAN, degree = 2) ppm.fit = ppmlasso(ppm.form, sp.xy = sub.euc, env.grid = sub.env, sp.scale = 1, n.fits = 20, writefile = FALSE) diagnose(ppm.fit, which = "smooth", type = "Pearson")
data(BlueMountains) sub.env = BlueMountains$env[BlueMountains$env$Y > 6270 & BlueMountains$env$X > 300,] sub.euc = BlueMountains$eucalypt[BlueMountains$eucalypt$Y > 6270 & BlueMountains$eucalypt$X > 300,] ppm.form = ~poly(FC, TMP_MIN, TMP_MAX, RAIN_ANN, degree = 2) + poly(D_MAIN_RDS, D_URBAN, degree = 2) ppm.fit = ppmlasso(ppm.form, sp.xy = sub.euc, env.grid = sub.env, sp.scale = 1, n.fits = 20, writefile = FALSE) diagnose(ppm.fit, which = "smooth", type = "Pearson")
envelope
Methods for function envelope
signature(Y = "ppmlasso")
Creates Monte Carlo simulation envelopes for a given function for a ppmlasso
object. See the help function for envelope
in spatstat
for more details.
This function is analogous to the envelope
function of the spatstat
package.
## S3 method for class 'ppmlasso' envelope(Y, fun = Kest, ...)
## S3 method for class 'ppmlasso' envelope(Y, fun = Kest, ...)
Y |
A fitted regularisation path of point process models. The simulation envelopes will be calculated for the model that optimises the given criterion. |
fun |
The summary function to be computed for the given point process model. See the help file for the |
... |
Other arguments for producing diagnostic plots, as given by the |
See the help file for envelope
in the spatstat
package for further details of simulation envelopes.
Ian W. Renner
Baddeley, A.J. & Turner, R. (2005). Spatstat: an R package for analyzing spatial point patterns. Journal of Statistical Software 12, 1-42.
diagnose.ppmlasso
, for residual plots inherited from spatstat
.
data(BlueMountains) sub.env = BlueMountains$env[BlueMountains$env$Y > 6270 & BlueMountains$env$X > 300,] sub.euc = BlueMountains$eucalypt[BlueMountains$eucalypt$Y > 6270 & BlueMountains$eucalypt$X > 300,] ppm.form = ~poly(FC, TMP_MIN, TMP_MAX, RAIN_ANN, degree = 2) + poly(D_MAIN_RDS, D_URBAN, degree = 2) ppm.fit = ppmlasso(ppm.form, sp.xy = sub.euc, env.grid = sub.env, sp.scale = 1, n.fits = 20, writefile = FALSE) envelope(ppm.fit, Kinhom, nsim = 20)
data(BlueMountains) sub.env = BlueMountains$env[BlueMountains$env$Y > 6270 & BlueMountains$env$X > 300,] sub.euc = BlueMountains$eucalypt[BlueMountains$eucalypt$Y > 6270 & BlueMountains$eucalypt$X > 300,] ppm.form = ~poly(FC, TMP_MIN, TMP_MAX, RAIN_ANN, degree = 2) + poly(D_MAIN_RDS, D_URBAN, degree = 2) ppm.fit = ppmlasso(ppm.form, sp.xy = sub.euc, env.grid = sub.env, sp.scale = 1, n.fits = 20, writefile = FALSE) envelope(ppm.fit, Kinhom, nsim = 20)
This function produces a plot to choose the optimal spatial resolution for analysis. A point process model is calculated for each nominated spatial resolution and the log-likelihood of all fitted models are plotted against the spatial resolutions.
findRes(scales, lambda = 0, coord = c("X", "Y"), sp.xy, env.grid, formula, tol = 0.01, ...)
findRes(scales, lambda = 0, coord = c("X", "Y"), sp.xy, env.grid, formula, tol = 0.01, ...)
scales |
A vector of spatial resolutions for which to produce the plot. |
lambda |
The penalty for each fitted spatial resolution. This should be a single value such that only one point process model is computed for each spatial resolution. |
coord |
A vector containing the names of the longitude and latitude coordinates,
used by |
sp.xy |
A matrix of species locations containing at least one column representing
longitude and one column representing latitude, as in |
env.grid |
The geo-referenced matrix of environmental grids, as in |
formula |
The formula of the fitted model, as in |
tol |
An optional argument to specify the tolerance level of coordinate error passed to an internal call to the |
... |
Further arguments passed to |
This function produces a plot which can be used to judge an optimal spatial resolution for analysis. As the spatial resolution gets finer, the log-likelihood tends to stabilise to a constant value. The largest spatial resolution at which the log-likelihood appears to stabilise may be considered optimal for model fitting.
A plot of log-likelihood versus spatial resolution.
Ian W. Renner
Renner, I.W. et al (2015). Point process models for presence-only analysis. Methods in Ecology and Evolution 6, 366-379.
data(BlueMountains) sub.env = BlueMountains$env[BlueMountains$env$Y > 6270 & BlueMountains$env$X > 300,] sub.euc = BlueMountains$eucalypt[BlueMountains$eucalypt$Y > 6270 & BlueMountains$eucalypt$X > 300,] scales = c(0.5, 1, 2, 4, 8, 16) ppm.form = ~ poly(FC, TMP_MIN, TMP_MAX, RAIN_ANN, degree = 2) findRes(scales, formula = ppm.form, sp.xy = sub.euc, env.grid = sub.env)
data(BlueMountains) sub.env = BlueMountains$env[BlueMountains$env$Y > 6270 & BlueMountains$env$X > 300,] sub.euc = BlueMountains$eucalypt[BlueMountains$eucalypt$Y > 6270 & BlueMountains$eucalypt$X > 300,] scales = c(0.5, 1, 2, 4, 8, 16) ppm.form = ~ poly(FC, TMP_MIN, TMP_MAX, RAIN_ANN, degree = 2) findRes(scales, formula = ppm.form, sp.xy = sub.euc, env.grid = sub.env)
Given a matrix of quadrature points and a list of species presences, this function extracts environmental data to presence locations using bilinear interpolation.
getEnvVar(sp.xy, env.grid, env.scale, coord = c("X", "Y"), envfilename = "SpEnvData", tol = 0.01, writefile = TRUE)
getEnvVar(sp.xy, env.grid, env.scale, coord = c("X", "Y"), envfilename = "SpEnvData", tol = 0.01, writefile = TRUE)
sp.xy |
A matrix of species locations containing at least one column representing longitude and one column representing latitude. |
env.grid |
The geo-referenced matrix of environmental grids. |
env.scale |
The spatial resolution of the quadrature scheme from which the environmental data is extracted. |
coord |
A vector containing the names of the longitude and latitude coordinates,
as in |
envfilename |
An optional argument containing the name of the saved file. Setting
|
tol |
An optional argument to specify the tolerance level of coordinate error passed to an internal call to the |
writefile |
A logical argument to determine whether the output should be written to a file or not. If |
At a given species location with coordinates , the interpolated value of the
environmental variable
is calculated as a weighted average of
at four reference
quadrature points
,
,
and
that form a square of nominated side length
env.scale
surrounding .
A matrix containing locations of species presences in the first two columns and the interpolated environmental data in the remaining columns.
Ian W. Renner
data(BlueMountains) species.env = getEnvVar(BlueMountains$eucalypt, env.grid = BlueMountains$env, env.scale = 0.5, envfilename = NA, writefile = FALSE)
data(BlueMountains) species.env = getEnvVar(BlueMountains$eucalypt, env.grid = BlueMountains$env, env.scale = 0.5, envfilename = NA, writefile = FALSE)
This function ensures that the coordinates of the supplied geo-referenced matrix of environmental grids constitutes a rectangular grid.
griddify(envframe, tol = 0.01, coord = c("X", "Y"))
griddify(envframe, tol = 0.01, coord = c("X", "Y"))
envframe |
The geo-referenced matrix of environmental grids. |
tol |
The tolerance level within which to correct coordinate errors. |
coord |
A vector containing the names of the longitude and latitude coordinates. |
The functions in the ppmlasso
package require a set of quadrature points along a rectangular grid. At times a set of quadrature points with a desired spatial resolution of will have some minor machine error in some coordinates such that the coordinates as supplied do not consistute a rectangular grid. The
griddify
function corrects this error as follows:
Let and
be the supplied coordinates contained in
envframe
. The function first determines the spatial resolution based on the median of the differences in the unique values of
and
as well as the coordinates of a rectangular grid with this spatial resolution
and
. Given the tolerance
supplied to
tol
, any coordinate for which
will be adjusted to
. Likewise, any coordinate
for which
will be adjusted to
.
Any environmental variables contained in envframe
are left unchanged.
A data frame containing the coordinates on a rectangular grid as well as any environmental variables left unchanged.
Ian W. Renner
X = seq(0, 5, 1) Y = seq(1, 11, 2) XY = expand.grid(X, Y) # generate 1 x 2 rectangular grid names(XY) = c("X", "Y") #move some coordinates off of rectangular grid XY$X[1] = XY$X[1] - 0.01 XY$Y[1] = XY$Y[1] - 0.01 XY$X[7] = XY$X[7] + 0.01 XY$Y[7] = XY$Y[7] + 0.01 #generate environmental variables XY$V1 = 0.1*XY$X + 0.2*XY$Y + rnorm(36, 0, 1) XY$V2 = -0.2*XY$X + 0.1*XY$Y + 0.05*XY$X*XY$Y + rnorm(36, 0, 5) XY_grid = griddify(XY)
X = seq(0, 5, 1) Y = seq(1, 11, 2) XY = expand.grid(X, Y) # generate 1 x 2 rectangular grid names(XY) = c("X", "Y") #move some coordinates off of rectangular grid XY$X[1] = XY$X[1] - 0.01 XY$Y[1] = XY$Y[1] - 0.01 XY$X[7] = XY$X[7] + 0.01 XY$Y[7] = XY$Y[7] + 0.01 #generate environmental variables XY$V1 = 0.1*XY$X + 0.2*XY$Y + rnorm(36, 0, 1) XY$V2 = -0.2*XY$X + 0.1*XY$Y + 0.05*XY$X*XY$Y + rnorm(36, 0, 5) XY_grid = griddify(XY)
ppmlasso
modelThis function produces a levelplot of the predicted intensity from a fitted ppmlasso
model.
plotFit(fit, pred.data = data.frame(X = fit$x[fit$pres == 0], Y = fit$y[fit$pres == 0], 1, scale(fit$data[fit$pres == 0, -1], center = -fit$s.means/fit$s.sds, scale = 1/fit$s.sds)), coord = c("X", "Y"), asp = "iso", ylab = "", xlab = "", col.regions = heat.colors(1024)[900:1], cuts = length(col.regions), cex = 1.4, main.text = paste(toupper(fit$criterion), "fit"), cex.color = 1.4)
plotFit(fit, pred.data = data.frame(X = fit$x[fit$pres == 0], Y = fit$y[fit$pres == 0], 1, scale(fit$data[fit$pres == 0, -1], center = -fit$s.means/fit$s.sds, scale = 1/fit$s.sds)), coord = c("X", "Y"), asp = "iso", ylab = "", xlab = "", col.regions = heat.colors(1024)[900:1], cuts = length(col.regions), cex = 1.4, main.text = paste(toupper(fit$criterion), "fit"), cex.color = 1.4)
fit |
A fitted |
pred.data |
A data frame which defines the coordinates and the environmental variables for which the predicted intensity will be calculated. By default, this uses the coordinates and environmental variables from the object supplied to the |
coord |
A vector containing the names of the longitude and latitude coordinates,
as in |
asp |
Aspect of the plot, with "iso" as default. See the documentation for the |
ylab |
Label for the |
xlab |
Label for the |
col.regions |
A vector of colours to define the intensity gradient. See the documentation for the |
cuts |
The number of levels the color gradient for the intensity surface is divided into. See the documentation for the |
cex |
Character size for axis labels. |
main.text |
Title of the plot. |
cex.color |
Character size for the colorkey labels. |
This function will compute the predicted intensity of a fitted ppmlasso
object using the model within the regularisation path which optimises the criterion specified in the call to ppmlasso
.
Ian W. Renner
data(BlueMountains) sub.env = BlueMountains$env[BlueMountains$env$Y > 6270 & BlueMountains$env$X > 300,] sub.euc = BlueMountains$eucalypt[BlueMountains$eucalypt$Y > 6270 & BlueMountains$eucalypt$X > 300,] ppm.form = ~poly(FC, TMP_MIN, TMP_MAX, RAIN_ANN, degree = 2) + poly(D_MAIN_RDS, D_URBAN, degree = 2) ppm.fit = ppmlasso(ppm.form, sp.xy = sub.euc, env.grid = sub.env, sp.scale = 1, n.fits = 20, writefile = FALSE) plotFit(ppm.fit)
data(BlueMountains) sub.env = BlueMountains$env[BlueMountains$env$Y > 6270 & BlueMountains$env$X > 300,] sub.euc = BlueMountains$eucalypt[BlueMountains$eucalypt$Y > 6270 & BlueMountains$eucalypt$X > 300,] ppm.form = ~poly(FC, TMP_MIN, TMP_MAX, RAIN_ANN, degree = 2) + poly(D_MAIN_RDS, D_URBAN, degree = 2) ppm.fit = ppmlasso(ppm.form, sp.xy = sub.euc, env.grid = sub.env, sp.scale = 1, n.fits = 20, writefile = FALSE) plotFit(ppm.fit)
ppmlasso
modelThis function produces a trace plot of the coefficient estimates of a fitted ppmlasso
object and identifies the models which optimise the various penalisation criteria.
plotPath(fit, colors = c("gold", "green3", "blue", "brown", "pink"), logX = TRUE)
plotPath(fit, colors = c("gold", "green3", "blue", "brown", "pink"), logX = TRUE)
fit |
A fitted |
colors |
A vector of colours for each criterion: |
logX |
A logical argument to indicate whether the plot should utilise a logarithmic scale on the x-axis (the default) or not. |
A fitted ppmlasso
object contains a matrix called betas
which stores the coefficient estimates of each of the n.fits
models fitted. This function produces a traceplot of these coefficient estimates for each environmental variable and highlights the models which optimise each of the penalisation criteria.
Ian W. Renner
data(BlueMountains) sub.env = BlueMountains$env[BlueMountains$env$Y > 6270 & BlueMountains$env$X > 300,] sub.euc = BlueMountains$eucalypt[BlueMountains$eucalypt$Y > 6270 & BlueMountains$eucalypt$X > 300,] ppm.form = ~poly(FC, TMP_MIN, TMP_MAX, RAIN_ANN, degree = 2) + poly(D_MAIN_RDS, D_URBAN, degree = 2) ppm.fit = ppmlasso(ppm.form, sp.xy = sub.euc, env.grid = sub.env, sp.scale = 1, n.fits = 20, writefile = FALSE) plotPath(ppm.fit)
data(BlueMountains) sub.env = BlueMountains$env[BlueMountains$env$Y > 6270 & BlueMountains$env$X > 300,] sub.euc = BlueMountains$eucalypt[BlueMountains$eucalypt$Y > 6270 & BlueMountains$eucalypt$X > 300,] ppm.form = ~poly(FC, TMP_MIN, TMP_MAX, RAIN_ANN, degree = 2) + poly(D_MAIN_RDS, D_URBAN, degree = 2) ppm.fit = ppmlasso(ppm.form, sp.xy = sub.euc, env.grid = sub.env, sp.scale = 1, n.fits = 20, writefile = FALSE) plotPath(ppm.fit)
This function calculates point interactions at presence locations and quadrature points required for fitting a regularisation path of area-interaction models.
pointInteractions(dat.ppm, r, availability = NA)
pointInteractions(dat.ppm, r, availability = NA)
dat.ppm |
A design matrix generated using the |
r |
The radius of point interactions. |
availability |
An optional binary matrix used in calculating point interactions indicating
whether locations are available ( |
Theoretically, the point interaction at a point
is calculated
as the proportion of available area in a circular region
or radius
centred
at
that overlaps with circles of radius
centred at other presence locations
(Baddeley & Turner, 2005).
This function discretises the study region at the same spatial resolution
as availability
by defining the matrix occupied
, a fine grid of locations
spanning the study region initialised to zero. The values of occupied
within a
distance of r
of each presence location are then augmented by 1
, such that
occupied
then contains the total number of presence locations with which each
grid location interacts. To prevent unavailable areas from being included in the
calculation of point interactions, the values of occupied
at grid locations for which
availability = 0
are set to zero.
is then estimated as the proportion of available grid locations within
that overlap circular regions around other presence locations.
The availability matrix is particularly useful for regions that have inaccessible areas (e.g. due to the presence of ocean or urban areas).
Finer resolutions of the availability
matrix will yield more precise estimates but
at a cost of greater computation time.
A vector of point interactions corresponding to the locations contained in the
dat.ppm
argument.
Ian W. Renner
Baddeley, A.J. & Turner, R. (2005). Spatstat: an R
package for analyzing spatial
point patterns. Journal of Statistical Software 12, 1-42.
ppmlasso
for fitting a regularisation path of area-interaction models
data(BlueMountains) species.ppm = ppmdat(sp.xy = BlueMountains$eucalypt, back.xy = BlueMountains$env, sp.scale = 1, datfilename = NA, writefile = FALSE) # generate design matrix species.int = pointInteractions(species.ppm, 2, BlueMountains$availability)
data(BlueMountains) species.ppm = ppmdat(sp.xy = BlueMountains$eucalypt, back.xy = BlueMountains$env, sp.scale = 1, datfilename = NA, writefile = FALSE) # generate design matrix species.int = pointInteractions(species.ppm, 2, BlueMountains$availability)
This function prepares the data for model fitting. In particular, it determines observation weights and sets up the design matrix required for fitting a regularisation path.
ppmdat(sp.xy, sp.scale, back.xy, coord = c("X","Y"), sp.dat = getEnvVar(sp.xy = sp.xy, env.scale = sp.scale, env.grid = back.xy, coord = coord, writefile = writefile), sp.file = NA, quad.file = NA, datfilename = "PPMDat", writefile = TRUE)
ppmdat(sp.xy, sp.scale, back.xy, coord = c("X","Y"), sp.dat = getEnvVar(sp.xy = sp.xy, env.scale = sp.scale, env.grid = back.xy, coord = coord, writefile = writefile), sp.file = NA, quad.file = NA, datfilename = "PPMDat", writefile = TRUE)
sp.xy |
A matrix of species locations containing at least one column representing longitude and one column representing latitude. |
sp.scale |
The spatial resolution at which to sample quadrature points. |
back.xy |
The geo-referenced matrix of environmental grids. |
coord |
A vector containing the names of the longitude and latitude coordinates, as
in |
sp.dat |
A matrix of species presence locations and the corresponding environmental
data, as generated by |
sp.file |
The name of a saved file containing a matrix of species presence locations
and the corresponding environmental data, as generated by |
quad.file |
The name of a fie containing the quadrature points created from the
|
datfilename |
An optional argument containing the name of the saved file. Setting
|
writefile |
A logical argument to determine whether the output should be written to a file or not. If |
This function will call the sampleQuad
and getEnvVar
functions to generate a quadrature scheme and interpolate environmental data to presence
locations. Alternatively, the quadrature scheme may be directly supplied to the
quad.file
argument, and the matrix of presence locations and associated environmental
data may be directly supplied to either the sp.dat
argument (as an object in the
workspace) or to the sp.file
argument (as the name of a saved file containing this
matrix).
A matrix dat.ppm
with columns representing the latitude and longitude of presence
locations and quadrature points along with the associated environmental data, as well as
a column Pres
indicating whether either point corresponds to a presence location or
a quadrature point, and a column wt
of observation weights.
Ian W. Renner
sampleQuad
for generating a regular grid of quadrature points.
getEnvVar
for interpolating environmental data to species presence locations.
## Do not run because of NOTE Examples with CPU time > 2.5 times elapsed time #data(BlueMountains) #species.ppm = ppmdat(sp.xy = BlueMountains$eucalypt, back.xy = BlueMountains$env, #sp.scale = 1, datfilename = NA, writefile = FALSE)
## Do not run because of NOTE Examples with CPU time > 2.5 times elapsed time #data(BlueMountains) #species.ppm = ppmdat(sp.xy = BlueMountains$eucalypt, back.xy = BlueMountains$env, #sp.scale = 1, datfilename = NA, writefile = FALSE)
The ppmlasso function fits point process models (either Poisson or area-interaction models) with a sequence of LASSO, adaptive LASSO or elastic net penalties (a "regularisation path").
ppmlasso(formula, sp.xy, env.grid, sp.scale, coord = c("X", "Y"), data = ppmdat(sp.xy = sp.xy, sp.scale = sp.scale, back.xy = env.grid, coord = c("X","Y"), sp.file = NA, quad.file = NA, datfilename = "PPMDat", writefile = writefile), lamb = NA, n.fits = 200, ob.wt = NA, criterion = "bic", alpha = 1, family = "poisson", tol = 1.e-9, gamma = 0, init.coef = NA, mu.min = 1.e-16, mu.max = 1/mu.min, r = NA, interactions = NA, availability = NA, max.it = 25, min.lamb = -10, standardise = TRUE, n.blocks = NA, block.size = sp.scale*100, seed = 1, writefile = TRUE)
ppmlasso(formula, sp.xy, env.grid, sp.scale, coord = c("X", "Y"), data = ppmdat(sp.xy = sp.xy, sp.scale = sp.scale, back.xy = env.grid, coord = c("X","Y"), sp.file = NA, quad.file = NA, datfilename = "PPMDat", writefile = writefile), lamb = NA, n.fits = 200, ob.wt = NA, criterion = "bic", alpha = 1, family = "poisson", tol = 1.e-9, gamma = 0, init.coef = NA, mu.min = 1.e-16, mu.max = 1/mu.min, r = NA, interactions = NA, availability = NA, max.it = 25, min.lamb = -10, standardise = TRUE, n.blocks = NA, block.size = sp.scale*100, seed = 1, writefile = TRUE)
formula |
The formula of the fitted model. For a point process model, the correct form is |
sp.xy |
A matrix of species locations containing at least one column representing
longitude and one column representing latitude. Environmental variables are
interpolated to the locations of |
env.grid |
The geo-referenced matrix of environmental grids. This matrix is used to
generate quadrature points using the |
sp.scale |
The spatial resolution at which to define the regular grid of quadrature
points. |
coord |
A vector containing the names of the longitude and latitude coordinates. |
data |
An optional data matrix generated from the |
lamb |
A vector of penalty values that will be used to create the regularisation path.
If |
n.fits |
The number of models fitted in the regularisation path. If |
ob.wt |
Quadrature weights, usually inherited from the |
criterion |
The penalisation criteria to be optimised by the regularisation path. The
options include |
alpha |
The elastic net parameter. The form of the penalty is
The default value |
family |
The family of models to be fitted – |
tol |
The convergence threshold for the descent algorithm. The algorithm continues
for a maximum of |
gamma |
The exponent of the adaptive weights for the adaptive LASSO penalty. The
default value |
init.coef |
The initial coefficients used for an adaptive LASSO penalty. |
mu.min |
The threshold for small fitted values. Any fitted value less than the threshold
is set to |
mu.max |
The threshold for large fitted values. Any fitted value larger than the threshold
will be set to |
r |
The radius of point interactions, required if |
interactions |
A vector of point interactions calculated from the |
availability |
An optional binary matrix used in calculating point interactions indicating
whether locations are available (1) or not (0). See |
max.it |
The maximum number of iterations of the descent algorithm for fitting the model. |
min.lamb |
The power |
standardise |
A logical argument indicating whether the environmental variables should be standardised to have mean 0 and variance 1. It is recommended that variables are standardised for analysis. |
n.blocks |
This argument controls the number of cross validation groups into which the spatial blocks
are divided if the |
block.size |
The length of the edges for the spatial blocks created if the |
seed |
The random seed used for controlling the allocation of spatial blocks to cross validation groups
if the |
writefile |
A logical argument passed to the |
This function fits a regularisation path of point process models provided a list of species locations and a geo-referenced grid of environmental data. It is assumed that Poisson point process models (Warton & Shepherd, 2010) fit intensity as a log-linear model of environmental covariates, and that area-interaction models (Widom & Rowlinson, 1970; Baddeley & van Lieshout, 1995) fit conditional intensity as a log-linear model of environmental covariates and point interactions. Parameter coefficients are estimated by maximum likelihood for Poisson point process models and by maximum pseudolikelihood (Besag, 1977) for area-interaction models. The expressions for both the likelihood and pseudolikelihood involve an intractable integral which is approximated using a quadrature scheme (Berman & Turner, 1992).
Each model in the regularisation path is fitted by extending the Osborne descent algorithm (Osborne, 2000) to generalised linear models with penalised iteratively reweighted least squares.
Three classes of penalty are available for the vector of parameter coefficients
:
For the LASSO (Tibshirani, 1996),
For the adaptive LASSO (Zou, 2006), , where
for some initial estimate of parameters
.
For the elastic net (Zou & Hastie, 2005), .
Note that this form of the penalty is a restricted case of the general elastic net penalty.
There are various criteria available for managing the bias-variance tradeoff (Renner, 2013). The default choice is BIC, the Bayesian Information Criterion, which has been shown to have good performance.
An alternative criterion useful when data are sparse is MSI, the maximum score
of the intercept model (Renner, in prep). For a set of presence locations, the
MSI penalty is
, where
is the smallest penalty
that shrinks all environmental coefficients to zero. The MSI penalty differs from the other criteria in that does
not require an entire regularisation path to be fitted.
It is also possible to control the magnitude of the penalty by spatial cross validation by setting the
criterion
argument to "blockCV"
. The study region is then divided into square blocks with edge
lengths controlled by the block.size
argument, which are assigned to one of a number of cross validation
groups controlled by the n.groups
argument. The penalty which maximises the predicted log-likelihood is
chosen.
An object of class "ppmlasso"
, with elements:
betas |
A matrix of fitted coefficients of the |
lambdas |
A vector containing the |
likelihoods |
A vector containing the likelihood of |
pen.likelihoods |
A vector containing the penalised likelihood of |
beta |
A vector containing the coefficients of the model that optimises the specified |
lambda |
The penalty value of the model that optimises the specified |
mu |
A vector of fitted values from the model that optimises the specified |
likelihood |
The likelihood of the model that optimises the specified |
criterion |
The specified |
family |
The specified |
gamma |
The specified |
alpha |
The specified |
init.coef |
The specified |
criterion.matrix |
A matrix with |
data |
The design matrix. For the point process models fitted with this function,
|
pt.interactions |
The calculated point interactions. |
wt |
The vector of quadrature weights. |
pres |
A vector indicating presence (1) or quadrature point (0). |
x |
A vector of point longitudes. |
y |
A vector of point latitudes. |
r |
The radius of point interactions. |
call |
The function call. |
formula |
The |
s.means |
If |
s.sds |
If |
cv.group |
The cross validation group associated with each point in the data set. |
n.blocks |
The number of cross validation groups specified. |
Ian W. Renner
Baddeley, A.J. & van Lieshout, M.N.M. (1995). Area-interaction point processes. Annals of the Institute of Statistical Mathematics 47, 601-619.
Berman, M. & Turner, T.R. (1992). Approximating point process likelihoods with GLIM. Journal of the Royal Statistics Society, Series C 41, 31-38.
Besag, J. (1977). Some methods of statistical analysis for spatial data. Bulletin of the International Statistical Institute 47, 77-91.
Osborne, M.R., Presnell, B., & Turlach, B.A. (2000). On the lasso and its dual. Journal of Computational and Graphical Statistics 9, 319-337.
Renner, I.W. & Warton, D.I. (2013). Equivalence of MAXENT and Poisson point process models for species distribution modeling in ecology. Biometrics 69, 274-281.
Renner, I.W. (2013). Advances in presence-only methods in ecology. https://unsworks.unsw.edu.au/fapi/datastream/unsworks:11510/SOURCE01
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B 58, 267-288.
Warton, D.I. & Shepherd, L.C. (2010). Poisson point process models solve the "pseudo-absence problem" for presence-only data in ecology. Annals of Applied Statistics 4, 1383-1402.
Widom, B. & Rowlinson, J.S. (1970). New model for the study of liquid-vapor phase transitions. The Journal of Chemical Physics 52, 1670-1684.
Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association 101, 1418-1429.
Zou, H. & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B 67, 301-320.
print.ppmlasso
for printing features of the fitted regularisation path.
predict.ppmlasso
for predicting intensity for a set of new data.
envelope.ppmlasso
for constructing a K-envelope of the model which optimises
the given criterion from the spatstat
package.
diagnose.ppmlasso
for diagnostic plots from the spatstat
package.
# Fit a regularisation path of Poisson point process models data(BlueMountains) sub.env = BlueMountains$env[BlueMountains$env$Y > 6270 & BlueMountains$env$X > 300,] sub.euc = BlueMountains$eucalypt[BlueMountains$eucalypt$Y > 6270 & BlueMountains$eucalypt$X > 300,] ppm.form = ~ poly(FC, TMP_MIN, TMP_MAX, RAIN_ANN, degree = 2) ppm.fit = ppmlasso(ppm.form, sp.xy = sub.euc, env.grid = sub.env, sp.scale = 1, n.fits = 20, writefile = FALSE) #Fit a regularisation path of area-interaction models data(BlueMountains) ai.form = ~ poly(FC, TMP_MIN, TMP_MAX, RAIN_ANN, degree = 2) ai.fit = ppmlasso(ai.form, sp.xy = sub.euc, env.grid = sub.env, sp.scale = 1, family = "area.inter", r = 2, availability = BlueMountains$availability, n.fits = 20, writefile = FALSE)
# Fit a regularisation path of Poisson point process models data(BlueMountains) sub.env = BlueMountains$env[BlueMountains$env$Y > 6270 & BlueMountains$env$X > 300,] sub.euc = BlueMountains$eucalypt[BlueMountains$eucalypt$Y > 6270 & BlueMountains$eucalypt$X > 300,] ppm.form = ~ poly(FC, TMP_MIN, TMP_MAX, RAIN_ANN, degree = 2) ppm.fit = ppmlasso(ppm.form, sp.xy = sub.euc, env.grid = sub.env, sp.scale = 1, n.fits = 20, writefile = FALSE) #Fit a regularisation path of area-interaction models data(BlueMountains) ai.form = ~ poly(FC, TMP_MIN, TMP_MAX, RAIN_ANN, degree = 2) ai.fit = ppmlasso(ai.form, sp.xy = sub.euc, env.grid = sub.env, sp.scale = 1, family = "area.inter", r = 2, availability = BlueMountains$availability, n.fits = 20, writefile = FALSE)
"ppmlasso"
A class ppmlasso
which represents a point process model with a LASSO-type penalty.
signature(object = "ppmlasso")
: Produce diagnostic plots for a fitted point process model.
signature(Y = "ppmlasso")
: Produce a Monte Carlo simulation envelope for a summary function of a fitted point process model.
signature(object = "ppmlasso")
: Calculate the predicted intensity for a fitted point process model to a set of data.
signature(x = "ppmlasso")
: Print the details of a fitted point process model.
Ian W. Renner
showClass("ppmlasso")
showClass("ppmlasso")
predict
Methods for function predict
signature(object = "ppmlasso")
Creates a vector of predicted intensities for an object of class ppmlasso
.
Given a fitted regularisation path produced by ppmlasso
, this function will predict the intensity for a new
set of data.
## S3 method for class 'ppmlasso' predict(object, ..., newdata, interactions = NA)
## S3 method for class 'ppmlasso' predict(object, ..., newdata, interactions = NA)
object |
A fitted regularisation path produced by |
... |
Additional arguments impacting the prediction calculations. |
newdata |
A data frame of new environmental data for which predicted intensities are computed. |
interactions |
A vector of point interactions for predictions of area-interaction models. |
A vector of predicted intensities corresponding to the environmental data provided in the newdata
argument.
Ian W. Renner
ppmlasso
for fitting a regularisation path of point process models.
data(BlueMountains) sub.env = BlueMountains$env[BlueMountains$env$Y > 6270 & BlueMountains$env$X > 300,] sub.euc = BlueMountains$eucalypt[BlueMountains$eucalypt$Y > 6270 & BlueMountains$eucalypt$X > 300,] ppm.form = ~ poly(FC, TMP_MIN, TMP_MAX, RAIN_ANN, degree = 2, raw = TRUE) ppm.fit = ppmlasso(ppm.form, sp.xy = sub.euc, env.grid = sub.env, sp.scale = 1, n.fits = 20, writefile = FALSE) pred.mu = predict(ppm.fit, newdata = sub.env)
data(BlueMountains) sub.env = BlueMountains$env[BlueMountains$env$Y > 6270 & BlueMountains$env$X > 300,] sub.euc = BlueMountains$eucalypt[BlueMountains$eucalypt$Y > 6270 & BlueMountains$eucalypt$X > 300,] ppm.form = ~ poly(FC, TMP_MIN, TMP_MAX, RAIN_ANN, degree = 2, raw = TRUE) ppm.fit = ppmlasso(ppm.form, sp.xy = sub.euc, env.grid = sub.env, sp.scale = 1, n.fits = 20, writefile = FALSE) pred.mu = predict(ppm.fit, newdata = sub.env)
print
Methods for function print
signature(x = "ppmlasso")
Prints output for a ppmlasso
object with details controlled by arguments of the
print.ppmlasso
function.
This function prints output from a fitted regularisation path.
## S3 method for class 'ppmlasso' print(x, ..., output = c("all", "path", "model", "interaction"))
## S3 method for class 'ppmlasso' print(x, ..., output = c("all", "path", "model", "interaction"))
x |
A regularisation path fitted by |
... |
Further arguments controlling the printed output. |
output |
This argument controls what output is printed to the screen. If |
N/A
Ian W. Renner
ppmlasso
for fitting regularisation paths.
# Fit a regularisation path of Poisson point process models data(BlueMountains) ppm.form = ~ poly(FC, TMP_MIN, TMP_MAX, RAIN_ANN, degree = 2) ppm.fit = ppmlasso(ppm.form, sp.xy = BlueMountains$eucalypt, env.grid = BlueMountains$env, sp.scale = 1, n.fits = 20, writefile = FALSE) print(ppm.fit)
# Fit a regularisation path of Poisson point process models data(BlueMountains) ppm.form = ~ poly(FC, TMP_MIN, TMP_MAX, RAIN_ANN, degree = 2) ppm.fit = ppmlasso(ppm.form, sp.xy = BlueMountains$eucalypt, env.grid = BlueMountains$env, sp.scale = 1, n.fits = 20, writefile = FALSE) print(ppm.fit)
This function generates a regular grid of quadrature points and associated environmental data at a nominated spatial resolution.
sampleQuad(env.grid, sp.scale, coord = c("X", "Y"), quadfilename = "Quad", tol = 0.01, writefile = TRUE)
sampleQuad(env.grid, sp.scale, coord = c("X", "Y"), quadfilename = "Quad", tol = 0.01, writefile = TRUE)
env.grid |
The geo-referenced matrix of environmental grids. It must have a vector of longitude and a vector of latitude. |
sp.scale |
The spatial resolution at which to sample quadrature points. |
coord |
A vector containing the names of the longitude and latitude coordinates. |
quadfilename |
An optional argument containing the prefix of the name of the saved file. The
default is |
tol |
An optional argument to specify the tolerance level of coordinate error passed to an internal call to the |
writefile |
A logical argument to determine whether the output should be written to a file or not. If |
The output is a matrix of quadrature points at the spatial resolution
supplied to sp.scale
. If a vector of resolutions is supplied, the output is a list of
matrices for each spatial resolution.
Ian W. Renner
data(BlueMountains) quad.1 = sampleQuad(env.grid = BlueMountains$env, sp.scale = 1, quadfilename = NA, writefile = FALSE)
data(BlueMountains) quad.1 = sampleQuad(env.grid = BlueMountains$env, sp.scale = 1, quadfilename = NA, writefile = FALSE)