1 Introductory items

1.1 Participants

  • Lab members: Kitakado, Ndiaye, Lurkpranee, Hiwatashi, Nagatsuka (leader), Mitsuyu, Tsubosaka

  • Guests: Sheng-Ping Wang, Wen-Pei Tsai, Sung Il Lee, Hee Won Park, Jung-Hyun Lim, Matsuzaka

1.2 Goal of this workshop

2 Models

2.1 Bionomial (Tsubosaka)

2.1.1 Overview of the Distribution

The binomial distribution is a discrete probability distribution that describes the number of successes in a fixed number of independent Bernoulli trials, each with the same probability of success. It is widely used in statistics and various fields to model binary outcomes, such as pass/fail, yes/no, or success/failure.

2.1.2 Mathematical Definition of the Distribution

Mathematically, a binomial distribution with parameters \(n\) (number of trials) and \(p\) (probability of success in each trial) is defined by the probability mass function (PMF):

\[ P(Y = y) = \binom{n}{y} p^y (1-p)^{n-y} \]

where \(\binom{n}{y} = \frac{n!}{y!(n-y)!}\) is a binomial coefficient, and \(Y\) is the random variable representing the number of successes.

2.1.3 Characterization, Properties and So On

  • expectation and variance:
    • expectation : \(E[Y] = np\)
    • variance : \(\text{Var}(Y) = np(1-p)\)
  • Reproductive property:

The moment generating function can be derived using the binomial theorem (where \(q = 1 - p\)):

\[ M(t) = \mathbb{E}[e^{tX}] = \sum_k \binom{n}{k} (pe^t)^k q^{n-k} = (pe^t + q)^n \]

Thus, the characteristic function is

\[ \phi(t) = (pe^{it} + q)^n \]

Let\(X \sim \text{Bin}(n_1, p), Y \sim \text{Bin}(n_2, p)\).Then,

\[ \phi_{X+Y}(t) = \phi_X(t) \phi_Y(t) = (pe^{it} + q)^{n_1} (pe^{it} + q)^{n_2} = (pe^{it} + q)^{n_1 + n_2} \]

Thus,\(X + Y \sim \text{Bin}(n_1 + n_2, p)\), demonstrating the property of binomial distribution under convolution.

  • Binomial Distribution and its Relationship with Other Distributions:
    • Poisson Distribution: When the number of trials is large and the probability of success is small, the binomial distribution approximates the Poisson distribution.
    • normal distribution:hen the number of trials is large, the binomial distribution tends to resemble the normal distribution.
  • when data have many 0: In a binomial distribution, the variance depends on the expected value (the product of the success probability and the number of trials). However, in actual data, the variance can be larger than what the binomial distribution predicts. This phenomenon is called overdispersion. Especially in data with many zeros, the observed variance is often larger than the expected variance. To address these issues, considering methods such as negative binomial distribution can be effective.

2.1.4 Examples

Detection of defective products: Suppose you check for products with a 1% chance of being defective. The probability of defective products follows a binomial distribution with \(n = 100\) and \(p = 0.01\).

2.2 Poisson (Mitsuyu)

2.2.1 Overview of the Distribution

The Poisson distribution is one of the most commonly used discrete-type distributions, along with the binomial distribution; One of the two major differences is that while the binomial distribution has a lower limit (0) and an upper limit (N) for the number of occurrences within N trials, the Poisson distribution, similarly taking values of 0 or more, does not have a specific upper limit. This difference is easy to understand when considering the following difference in observation procedures. In the sampling of an organism, in the binomial distribution, the observation is a breakdown of “how many males out of N sampled”, whereas in the Poisson distribution, the observation is a breakdown of “how many individuals are to be sampled” or “how many males are to be sampled”, and so on. The interest is different, as in ‘how many individuals are caught’. Thus, the Poisson distribution is a discrete probability distribution describing the number of occurrences of rare events in a given time interval or spatial domain. It is particularly applicable when the occurrences of events are independent and the average number of event occurrences in a given time interval or spatial domain is constant. It is widely used in statistics and in various fields to model randomly occurring events, such as the number of incoming telephone calls, the volume of car traffic, the number of natural disasters, etc.

2.2.2 Mathematical Definition of the Distribution

The probability mass function (PMF) of the Poisson distribution is expressed by the following equation:

\[ P(Y = y) = e^{-\lambda} \frac{\lambda^y }{y!}, y=0,1,2,...(\lambda>0) \]

Here, \(y\) is the number of observed events, and \(\lambda\) is the average rate of occurrence (the average number of events per unit time). The Poisson distribution has a mean of \(\lambda\) and a variance of \(\lambda\).

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

2.2.3 Characterization, properties and so on

The expected value is obtained as follows

\[ \begin{array}{ll} E[Y] &= \sum_{y=0}^{\infty} y \cdot P(Y = y) \\ &= \sum_{y=1}^{\infty} y \cdot e^{-\lambda} \frac{\lambda^y}{y!} \quad (\text{When } y=0, \text{ Since } y \cdot P(Y = y) = 0 ) \\ &= \lambda \cdot e^{-\lambda} \sum_{y=1}^{\infty} \frac{\lambda^{y-1}}{(y-1)!} \\ &= \lambda \cdot e^{-\lambda} \sum_{z=0}^{\infty} \frac{\lambda^z}{z!} \quad (\text{Let } z = y-1, \text{ So } z=0,1,\ldots) \\ &= \lambda \cdot e^{-\lambda} \cdot e^{\lambda} \\ &= \lambda \end{array} \]

The expected value \(E[Y]\) is \(\lambda\).

Variance is obtained as follows. The following formula is known as the formula to obtain variance.

\[ \begin{array}{ll} V[Y] &= E[(Y - E[Y])^2] \\ &= E[Y^2 - 2YE[Y] + E[Y]^2] \\ &= E[Y^2] - 2E[Y] \cdot E[Y] + E[Y]^2 \\ &= E[Y^2] - E[Y]^2 \\ &= E[Y(Y - 1) + Y] - E[Y]^2 \\ &= E[Y(Y - 1)] + E[Y] - E[Y]^2 \end{array} \] From a calculation almost similar to that for \(E[Y]\), we obtain \(E[Y(Y - 1)]\) as follows.

\[ \begin{array}{ll} E[Y(Y - 1)] &= \sum_{y=0}^{\infty} y(y - 1) \cdot P(Y = y) \\ &= \sum_{y=2}^{\infty} y(y - 1) \cdot e^{-\lambda} \frac{\lambda^y}{y!} \quad (\text{When } y=0,1, \text{ Since } y(y-1) \cdot P(Y = y) = 0 ) \\ &= \lambda^2 \cdot e^{-\lambda} \sum_{y=2}^{\infty} \frac{\lambda^{y-2}}{(y-2)!} \\ &= \lambda^2 \cdot e^{-\lambda} \sum_{z=0}^{\infty} \frac{\lambda^z}{z!} \quad (\text{Let } z = y-2, \text{ So } z=0,1,\ldots) \\ &= \lambda^2 \cdot e^{-\lambda} \cdot e^{\lambda} \\ &= \lambda^2 \end{array} \]

Therefore, the variance is

\[ \begin{array}{ll} V[Y] &= E[Y(Y - 1)] + E[Y] - E[Y]^2 \\ &= \lambda^2 + \lambda - \lambda^2 \\ &= \lambda \end{array} \] Thus, in the Poisson distribution, the expectation and variance are equal, which is one of the characteristics of the Poisson distribution.

2.2.4 Relationship with the Binomial Distribution (Poisson Limit Theorem)

In a binomial distribution with parameters \(p\) and \(p = \lambda / n\), if we keep \(\lambda\) constant and let \(n\) approach infinity, the distribution approaches a Poisson distribution with mean \(\lambda\). That is,

\[ \lim_{{\lambda=np, n \to \infty}} \binom{n}{k} p^k (1 - p)^{n - k} = \frac{\lambda^k e^{-\lambda}}{k!} \]

This is known as the Poisson limit theorem. This theorem shows that the Poisson distribution is derived as the limit of the binomial distribution.

The detailed derivation is shown below. The following relation is used in the calculation:

\[ \lim_{{n \to \infty}} \left(1 - \frac{\lambda}{n}\right)^n = e^{-\lambda} \]

Here, if \(p = \frac{\lambda}{n}\), then

\[ \begin{array}{ll} \lim_{{n \to \infty}} P(Y = k) &= \lim_{{n \to \infty}} \binom{n}{k} p^k (1 - p)^{n - k} \\ &= \lim_{{n \to \infty}} \frac{n!}{(n - k)! k!} \left(\frac{\lambda}{n}\right)^k \left(1 - \frac{\lambda}{n}\right)^{n - k} \\ &= \lim_{{n \to \infty}} \frac{n (n - 1) (n - 2) \cdots (n - k + 1)}{n^k} \cdot \frac{\lambda^k}{k!} \left(1 - \frac{\lambda}{n}\right)^n \left(1 - \frac{\lambda}{n}\right)^{-k} \end{array} \]

The limit exists and becomes

\[ \frac{\lambda^k e^{-\lambda}}{k!} \]

2.2.5 Poisson distribution reproductive property

Let \(k \ge 0\) be an integer. Since \(X_1\) and \(X_2\) are independent, \[ \begin{array}{ll} P(Y_1 + Y_2 = k) &= \sum_{n=0}^{k} P(Y_1 = n, Y_2 = n-k) \\ &= \sum_{n=0}^{k} P(Y_1 = n)P(Y_2 = n-k) \\ &= \sum_{n=0}^{k} \frac{\lambda_1^n}{n!} e^{-\lambda_1} \frac{\lambda_2^{\ n-k}}{(n-k)!} e^{-\lambda_2} \\ &= \sum_{n=0}^{k} \frac{1}{n!(k-n)!} \lambda_1^n \lambda_2^{k-n} e^{-(\lambda_1 + \lambda_2)} \end{array} \]

By the binomial theorem,

\[ \begin{array}{ll} (\lambda_1 + \lambda_2)^k &= \sum_{n=0}^{k} \frac{k!}{n!(k-n)!} \lambda_1^n \lambda_2^{k-n} \end{array} \]

Therefore,

\[ \sum_{n=0}^{k} \frac{1}{n!(k-n)!} \lambda_1^n \lambda_2^{k-n} e^{-(\lambda_1 + \lambda_2)} = \frac{(\lambda_1 + \lambda_2)^k}{k!} e^{-(\lambda_1 + \lambda_2)} \]

Hence, \(P(X_1 + X_2 = k) = \frac{(\lambda_1 + \lambda_2)^k}{k!} e^{-(\lambda_1 + \lambda_2)}\) and we can see that \(X_1 + X_2 \sim \text{Poisson}(\lambda_1 + \lambda_2)\).

2.2.6 Examples

Suppose that on a certain coastline, seabird dives are observed an average of 7 times per hour. Answer the following questions in this situation:

  1. What is the probability of observing no dives in one hour?
  2. What is the probability of observing 10 or more dives in one hour?
  3. What is the probability of seeing between 3 and 8 dives per hour?
set.seed(2024)
# Average seabird dive observation counts
lambda <- 7

# Generate observation data for 1000 hours
random_data <- rpois(1000, lambda)

# Plot the histogram
hist(random_data, breaks = 30, main = "Histogram of Poisson Distribution (Seabird Dive Observations)", xlab = "Dive Counts", ylab = "Frequency", col = "skyblue")

  1. The probability of observing no dives in one hour \[ P(X = 0) = \frac{7^0 e^{-7}}{0!} = e^{-7} \]
## [1] 0.000911882
  1. The probability of observing 10 or more dives in one hour \[ P(X \geq 10) = 1 - P(X \leq 9) \]
## [1] 0.1695041
  1. The probability of seeing between 3 and 8 dives per hour

\[ P(3 \leq X \leq 8) = P(X \leq 8) - P(X \leq 2) \]

## [1] 0.6994551

2.3 Negative binomial (Ndiaye)

  • The NB distribution parameter (the rate parameter) represents both the mean and the variance.

  • In nearly all practical problems, a count variable’s variance is most likely larger than its mean, a feature known as over-dispersion.

  • As a result, when using the Poisson distribution, we often consider options to model the response variable variance that is larger than the mean.

  • The negative binomial distribution is often used in ecological studies to model over-dispersed count data.

  • The negative binomial distribution is seen as an extension of the Poison distribution.

  • Fisher et al. [1943] derived the negative binomial distribution as a natural extension of the Poison distribution by using a \(\chi^{2}\) distribution to model the Poisson distribution parameter.

The negative binomial distribution has three different parameterizations. The initial definition of the distribution is through independent and identical Bernoulli trials (each with the same probability of success of p) as in the binomial distribution.

In the binomial distribution, we consider the number of successes out of n trials as the random variable. In the negative binomial distribution, the random variable \(Y\) is the number of failures before r successes are achieved.

2.3.1 Overview of the distribution

As the last trial is the \(r^{th}\) success, the probability of \(Y = k\) can be derived through the binomial distribution:

\[ P(Y=y \vert r,p)=\binom{y+r-1}{r-1}p^r (1-p)^{y} \]

This parameterization is also known as the Pascal distribution. It is a generalization of the geometric distribution (when \(r = 1\)). When the definition of \(r\) is extended to real values (not limited to integer values as in the Pascal distribution), the negative binomial is also known as the Polya distribution. When \(r\) takes a real value, the binomial coefficient is extended to its real value definition through the gamma function:

\[ P(Y=k \vert r,p)=\dfrac{\Gamma(k+r)}{k!\Gamma(r)} p^r (1-p)^{k} \]

When the negative binomial distribution is used in the generalized linear model, we want to relate predictor variables to the mean of the distribution, which is \(\mu = \dfrac{rp}{1-p}\). Hence, the distribution is reparameterized by replacing \(p\) with \(\mu\), that is, setting \(p = \dfrac{r}{\mu+r}\) and \(1-p = \dfrac{\mu}{\mu+r}\), resulting in the third parameterization

\[ P(Y=k \vert r,p)=\dfrac{\Gamma(k+r)}{k!\Gamma(r)} \left(\dfrac{r}{\mu+r}\right)^r \left(\dfrac{\mu}{\mu+r}\right)^{k} \]

Because the gamma distribution is the conjugate prior of the Poison rate parameter, we also define the negative binomial through the gamma distribution parameter \(\alpha\) and \(\beta\), by setting \(r = \alpha\) and \(\mu = \alpha/\beta\):

\[ P(Y=k \vert \alpha,\beta)=\dfrac{\Gamma(\alpha+k)}{k!\Gamma(\alpha)} \left(\dfrac{\beta}{\beta+1}\right)^\alpha \left(\dfrac{1}{\beta+1}\right)^{k} \]

2.3.2 Mathematical definition of the distribution

For example, suppose that, based on past experience, we know that, on average, 3 fish are caught. Using the Negative Binomial distribution we can calculate the probability that \(Y = 0\) fish are caught or that \(Y = 1\) fish are caught. The distribution function allows us to enter values for \(\mu\) and \(k\) to calculate the probability of any value of \(Y\). Let us assume for the present that \(\mu = 3\) and \(k = 1\).

Calculating the probability for every possible \(Y\) value forms the distribution function.

In the following following we plotted negative binomial distributions for three \(\mu\) and \(k\) values. - Note that for large \(k\) values, the negative binomial distribution becomes a Poisson distribution. For small values of \(k\) the negative binomial distribution allows for a large number of zeros.

NOTE: The negative binomial distribution can only be used for a response variable that has values equal to, or greater than, 0 and is discrete.

Nine density curves from a negative binomial distribution NB(mu, k), in which mu is the mean and k is the dispersion parameter. The panels in the right column have large k values, and these negative binomial curves approximate the Poisson distribution (A.Zuur,2015)

Nine density curves from a negative binomial distribution NB(mu, k), in which mu is the mean and k is the dispersion parameter. The panels in the right column have large k values, and these negative binomial curves approximate the Poisson distribution (A.Zuur,2015)

2.3.3 Characterization, properties

As mentioned above, there are various ways of presenting the negative binomial distribution. Its density function is given by

\[ f(y; k,\mu)=\dfrac{\Gamma(y+k)}{\Gamma(k)\Gamma(y+1)}\times \left(\dfrac{k}{\mu+k}\right)^{k} \left(1-\dfrac{k}{\mu+k}\right)^{y} \] The distribution function has the parameters \(\mu\) and \(k\). The symbol \(\Gamma\) is defined as \(\Gamma(y + 1) = y!\). The first component in the density function is only a number.

The other two components are power functions of the two unknown parameters. The mean and variance of \(Y\) of a negative binomial GLM are given by

\[ E(Y)=\mu ~~~~~~~ \text{var}(Y)=\mu + \dfrac{\mu^2}{k} \] If \(k\) is large relative to \(\mu^2\), the term \(\dfrac{\mu^2}{k}\) is close to 0 and the variance of \(Y\) is \(\mu\). In such cases the negative binomial converges to the Poisson distribution.

Hilbe (2011) uses a different notation for the variance: \[ \text{var}(Y) = \mu + \alpha \times \mu^2 \]

  • This notation is slightly simpler, as \(\alpha = 0\) means that the quadratic term disappears. Under this parameterization, which is used in most commercial statistics packages (e.g. SAS, Stata, SPSS, Genstat, Limdep) a negative binomial model with a dispersion or heterogeneity (scale) parameter equal to zero, \(\alpha = 0\), is Poisson.

  • Larger values of \(\alpha\) indicate increasing levels of variability, or dispersion, over what is expected given the Poisson distribution.

  • When the negative binomial distribution upon which the negative binomial GLM is based is parameterized with \(\alpha =\dfrac{1}{k}\) such that its variance is \(\mu + \alpha \times \mu^2\), there is a direct relationship between \(\mu\) and the dispersion parameter, \(\alpha\).

  • When the parameterization of the negative binomial results in a variance of \(\mu+\dfrac{\mu^2}{k}\), the relationship is indirect.

  • The coefficients, standard errors, and fitted values of the direct and indirect parameterizations are the same; only the dispersion parameter and statistics using it differ.

Recall that the dispersion statistic is defined as:

\[ \text{dispersion statistic}=\dfrac{\chi^{2}}{\text{residual degrees of freedom}} \]

The residual degrees of freedom is equal to the number of observations minus the number of parameters, including the intercept. We also write this as \(N – p\).

  • For both parameterizations, values of the dispersion statistic greater than 1 indicate overdispersion. An ideally fitted Poisson or negative binomial GLM has a dispersion statistic of 1.0.

  • Do not confuse the dispersion statistic with the dispersion parameter, \(\alpha\).

  • A new R function called nbinomial displays negative binomial GLM results based on the direct parameterization. It is in the msme package on CRAN.

But for now we will continue using the indirect parameterization of the dispersion statistic in our discussion of the negative binomial, since it is used in glm.nb, which comes with the standard R software.

2.3.4 Examples 1

Suppose a start-up is looking for 5 investors. They ask investors repeatedly where each independently has a \(20\%\) chance of saying yes. Let \(𝑋\) = total # of investors that they ask and note that \(𝑋\) does not follow a negative binomial distribution. Find \(f(x)\) and \(f(10)\).

Solution :

Let \(Y\) = # who say no before 5 say yes. We know \(𝑋 = 𝑌 + 5\).

\[ 𝑌 ∼ NB(5, 0.2) \]

2.3.5 Examples 2: Application to the fisheries data

2.3.5.1 Data

The data used in this workshop were presented in Bailey et al. (2008), who investigated the impact of commercial fishing on deep-sea fish populations. They analysed data of a bottom trawl survey conducted in the northeast Atlantic Ocean from 1979 to 1989 and from 1997 to 2002.

head(Fish,5)
##   Site TotAbund        Dens MeanDepth Year Period       Xkm       Ykm SweptArea
## 1    1       76 0.002070281       804 1978      1  98.75575 -57.46692  36710.00
## 2    2      161 0.003519799       808 2001      2  76.80388 178.64798  45741.25
## 3    3       39 0.000980515       809 2001      2 103.79283 -50.05184  39775.00
## 4    4      410 0.008039216       848 1979      1  91.53227 146.44797  51000.00
## 5    5      177 0.005933375       853 2002      2 107.14419 -37.07544  29831.25

The underlying biological question is whether the stress imposed by commercial fishing on a portion of this depth gradient has affected species assemblages throughout the entire depth gradient.

Following Bailey et al. (2009), we use total abundance, defined as the total number of fish per site.

We investigate the relationship between total fish abundance and mean depth for each of the time periods.

Before applying any statistical model, we present a scatterplot (Figure below).

The graph shows a pattern of decline in total abundance at greater depths.

Scatterplot of total abundances versus mean depth for each time period (1979 – 1989 versus 1997 – 2002).

Scatterplot of total abundances versus mean depth for each time period (1979 – 1989 versus 1997 – 2002).

2.3.5.2 Recall: Results from Poisson distribution applied to the same data

M1 <- glm(TotAbund ~ MeanDepth, 
          data = Fish, 
          family = poisson(link = "log"))
summary(M1)
## 
## Call:
## glm(formula = TotAbund ~ MeanDepth, family = poisson(link = "log"), 
##     data = Fish)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  6.64334    0.01273  521.70   <2e-16 ***
## MeanDepth   -0.62870    0.00670  -93.84   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 27779  on 145  degrees of freedom
## Residual deviance: 15770  on 144  degrees of freedom
## AIC: 16741
## 
## Number of Fisher Scoring iterations: 5
Pearson residuals plotted versus fitted values, on the scale of the response variable. B: Pearson residuals plotted versus fitted values, on the scale of the predictor function. C: Pearson residuals plotted versus Mean Depth. D: Pearson residuals plotted versus Period.

Pearson residuals plotted versus fitted values, on the scale of the response variable. B: Pearson residuals plotted versus fitted values, on the scale of the predictor function. C: Pearson residuals plotted versus Mean Depth. D: Pearson residuals plotted versus Period.

Given a Poisson GLM the dispersion statistic may be obtained by

#Checking for overdispersion
N <- nrow(Fish)
p <- length(coef(M1))
Dispersion <- sum(E1^2) / (N - p)
Dispersion
## [1] 115.5856

The dispersion statistic = 115.5856

There is substantial overdispersion in the data, possibly to the extent that even a negative binomial GLM will not sufficiently adjust for the variation. We will need to apply the negative binomial GLM to determine if that is the case.

Simulation tests show that a true or perfectly fitted Poisson model has a dispersion statistic of 1.0. Values greater than 1 indicate likely overdispersion; values smaller than 1 indicate under- dispersion. Overdispersion, which is common in count data, occurs when the variation in the data exceeds the theoretical or expected amount of variability based on Poisson distribution assumptions.

If our Poisson GLM has a dispersion statistic greater than 1, we should check the following possible sources before concluding that the model is truly overdispersed.

  1. The model is missing a required explanatory variable.
  2. The model has one or more outliers.
  3. The model requires one or more interaction terms.
  4. One or more explanatory variables are not measured on the most appropriate scale; i.e. a term may need to be logged, squared, inverted, etc.
  5. A continuous covariate has a non-linear effect.
  6. The model is mis-specified; e.g. it has the wrong link function.
  7. There is zero-inflation.
  8. There is an inherent dependency structure in the data.

If we have eliminated all these possibilities, and the model still shows a dispersion statistic greater than 1, the model is genuinely overdispersed.

There is a variety of methods for handling Poisson overdispersed data. One may simply adjust the standard errors by scaling, which is what the quasipoisson family does.

However, only the standard errors will be adjusted, not the coefficients. In many cases overdispersion affects model coefficients. The most common method used by statisticians to adjust for Poisson overdispersion is the negative binomial GLM.

The negative binomial distribution contains a second parameter called the dispersion or heterogeneity parameter, which is used to accommodate Poisson overdispersion.

The quasi-Poisson distribution only modifies the standard errors of the parameters in the Poisson GLM and not the parameter estimates. The mechanism behind overdispersion is likely to affect the regression parameters, and therefore quasi-Poisson is less useful as a solution for overdispersion.

2.3.5.3 Fitting Negative bionomial to fisheries data

We are using the same fishery data to determine whether negative binomial GLM will adequately adjust for the Poisson overdispersion in the data. We use the following model:

\[ TotAbund_i \sim NB(\mu_i,k) \]

\[ \text{E}(TotAbund_i) =\mu_i ~~~~~~~~~~~ \text{var}(TotAbund_i) = \mu_i +\dfrac{\mu_i^2}{k} \]

\[ \text{log}(\mu_i)=\eta_i \]

\[ \eta_i = \beta_1 + \beta_2 \times MeanDepth_i+ \beta_3 \times Period_i + \beta_4 \times MeanDepth_i\times Period_i+ \text{log}(SweptArea_i) \]

The same covariates are used as for the Poisson GLM. The only difference is that we now use a negative binomial distribution.

Fish$fPeriod <- factor(Fish$Period)
Fish$LogSA <- log(Fish$SweptArea)
M4 <- glm.nb(TotAbund ~ MeanDepth * fPeriod + offset(LogSA), 
          data = Fish)

summary(M4)
## 
## Call:
## glm.nb(formula = TotAbund ~ MeanDepth * fPeriod + offset(LogSA), 
##     data = Fish, init.theta = 1.950355727, link = log)
## 
## Coefficients:
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        -3.25515    0.16064 -20.263   <2e-16 ***
## MeanDepth          -1.03764    0.06032 -17.202   <2e-16 ***
## fPeriod2           -0.61216    0.27395  -2.235   0.0254 *  
## MeanDepth:fPeriod2  0.07571    0.10209   0.742   0.4583    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(1.9504) family taken to be 1)
## 
##     Null deviance: 461.62  on 145  degrees of freedom
## Residual deviance: 158.32  on 142  degrees of freedom
## AIC: 1754.7
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  1.950 
##           Std. Err.:  0.219 
## 
##  2 x log-likelihood:  -1744.729

Let’s check first for overdispersion:

E4 <- resid(M4, type = "pearson")
N <- nrow(Fish)
p <- length(coef(M4)) + 1  #"+1' is due to k
Dispersion <- sum(E4^2) / (N - p)
Dispersion
## [1] 1.001618

The ’+1” used for determining the number of parameters (\(p\)) is due to the \(k\). There is no overdispersion. Recall that the Poisson GLM AIC statistic was 816741 and the dispersion statistic was 115.5856, it is easy to conclude that the negative binomial is a far superior model to the Poisson. Here we have an AIC of 1754.7 and a dispersion statistic of 1.0016.

2.3.5.4 Model validation

As with linear regression, we need to use residuals in a model validation.

Here are using use Pearson residuals. These are standardized ordinary residuals and are calculated by

\[ \varepsilon_i=\dfrac{TotAbund_i−\text{E}(TotAbund_i)}{\sqrt{\text{var}({TotAbund_i})}} \] - Deviance residuals, used by default by some R functions, quantify the contribution of each observation to the residual deviance.

NOTE: It should be noted that we are not looking for normality in the Pearson or deviance residuals. The issue is lack of fit, and we look for patterns in the Pearson residuals.

  • Pearson residuals are extracted with the resid function applied to a glm object:
E1 <- resid(M1, type = "pearson")

Now that we have obtained the residuals, we plot them against fitted values, against each covariate in the model, and each covariate not in the model. If sampling took place at different time points or spatial points, we also need to investigate temporal or spatial correlation in the residuals. The auto-correlation function and variogram functions can be used for this.

We refit the model without the interaction term and present a portion of the numerical output.

#drop1(M4, test = "Chi")
M5 <- glm.nb(TotAbund ~ MeanDepth + fPeriod + offset(LogSA), 
          data = Fish)

#drop1(M5, test = "Chi")
summary(M5)
## 
## Call:
## glm.nb(formula = TotAbund ~ MeanDepth + fPeriod + offset(LogSA), 
##     data = Fish, init.theta = 1.943752383, link = log)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -3.31134    0.13703 -24.164  < 2e-16 ***
## MeanDepth   -1.01372    0.04876 -20.788  < 2e-16 ***
## fPeriod2    -0.43027    0.12650  -3.401 0.000671 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(1.9438) family taken to be 1)
## 
##     Null deviance: 460.09  on 145  degrees of freedom
## Residual deviance: 158.28  on 143  degrees of freedom
## AIC: 1753.2
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  1.944 
##           Std. Err.:  0.218 
## 
##  2 x log-likelihood:  -1745.204

We use this model for the validation step. We need to extract Pearson residuals and plot them against fitted values and also versus each covariate in the model and not in the model (Figure below). There are no obvious problems.

A: Pearson residuals versus fitted values for model M5. B: Cook’s distance values for model M5. C: Pearson residuals versus depth. D: Pearson residuals versus Period.

A: Pearson residuals versus fitted values for model M5. B: Cook’s distance values for model M5. C: Pearson residuals versus depth. D: Pearson residuals versus Period.

We also plot the Pearson residuals versus Mean Depth per Period (Figure below).

We also need to inspect the Pearson residuals for spatial correlation. To do this we can make a variogram of the Pearson residuals (e.g. per Period).

2.3.5.5 Model interpretation

  • As in linear regression, the Akaike Information Criterion (AIC) can be used to find the optimal model.

  • In linear regression we use the total sum of squares and the residual sum of squares to calculate R2, which informs us of the explained variance. In a GLM we use the deviance to gain this information.

  • The null deviances: is defined as maximum likelihood equivalents of the total sum of squares

  • Residual deviances: is defined as the residual sum of squares

We do not have an \(R_2\) in GLM models; the closest we can get is the explained deviance, which is calculated as Null deviance: 461.62 on 145 degrees of freedom Residual deviance: 158.32 on 142 degrees of freedom \[ 100\times \dfrac{\text{null deviance}-\text{residual deviance}}{\text{null deviance}}=100\times\dfrac{461.62-158.32}{461.62}=65.70\% \] So the explanatory variables explain \(65.70\%\) of the variation in total fish abundance.

Next let’s talk about the estimated parameters.

For each regression parameter we have an estimate, standard error, \(z\)-value (the ratio of the estimate to the standard error), and a \(p\)-value.

Under the null hypothesis of no effect (\(\text{H}0: \beta = 0\)), we expect a \(z\)-value of \(0\).

The larger this value, the more unlikely is the null hypothesis.

The \(p\)-value informs us of the likelihood of this hypothesis.

In this case both the intercept and slopes are significantly different from 0 at the 5% level.

The estimated model is given by

\[ TotAbund_i \sim NB(\mu_i,1.94) \]

\[ \text{E}(TotAbund_i) =\mu_i ~~~~~~~~~~~ \text{var}(TotAbund_i) = \mu_i +\dfrac{\mu_i^2}{1.94} \]

$$ \[\begin{array}{rcl} \text{Period 1:}~~ \mu & = & e^{\eta_i} \\[0.2cm] \eta_i & =& −3.31−1.01\times MeanDepth_i + \text{log}(SweptArea_i) \\[0.2cm] \text{Period 2:}~~ \mu & = & e^{\eta_i} \\[0.2cm] \eta_i & =& -3.74 −1.01\times MeanDepth_i + \text{log}(SweptArea_i) \\[0.2cm] \end{array}\]

$$

Hence we have two exponential curves, each with the same slope for Mean Depth but with different intercepts. The best way to demonstrate the effects of covariates on the fitted values is to plot the fitted values relative to the mean depth. A practical problem is that the model also contains the log of Swept Area as an offset. To avoid plotting 3-dimensional graphs, we substitute the log of the mean Swept Area values for the offset. Using R we calculated the two fitted lines (one per period).

Model fit of the NB GLM. The two grey shades correspond to the two time periods (A. Zuur, 2015).

Model fit of the NB GLM. The two grey shades correspond to the two time periods (A. Zuur, 2015).

2.4 Delta model (Lurkpranee)

2.4.1 Overview of the model

The Delta-model is a two-component model used to handle data that contains a large number of zeros and positive continuous values. This structure is particularly useful in ecological and environmental studies, as well as in other fields where zero-inflated data is common.

The Delta-model can be categorized as a mixture model as it is divided into two separate but related modeling steps:

  • Modeling the Probability of Zero versus Non-Zero Values
  • Modeling the Distribution of Positive Values

 

2.4.1.1 Key Components of the Delta-Lognormal Model

  1. Binary Indicator for Zeroes and Non-Zeroes

Delta Distribution (Discrete Part): This component models the occurrence of zero values in the data. It uses a binary distribution (such as the Bernoulli distribution) to represent the probability of observing a zero versus a non-zero value.

To represent the binary indicator for zeroes and non-zeroes:

\[ X = \begin{cases} 1 & \text{if non-zeroes} \\ 0 & \text{otherwise} \end{cases} \] This means 𝑋 takes the value 1 in an event that “non-zeroes” occurs, and 0 if otherwise. This is a deterministic way of defining 𝑋 based on the occurrence of the event.

Bernoulli Distribution

The Bernoulli distribution is a discrete probability distribution for a random variable which takes the value 1 with probability 𝑝 and the value 0 with probability 1−𝑝. It is denoted as:

\[ X \sim \text{Bernoulli}(p) \]

  1. Lognormal Distribution for Positive Values

Lognormal Distribution (Continuous Part): This component models the positive continuous values. Positive values are assumed to follow a lognormal distribution, which is appropriate for non-negative data that is right-skewed.

Given that 𝑋= 1 (the value is positive), the positive values 𝑋 follow a lognormal distribution: \[ \log(X) \sim \mathcal{N}(\mu, \sigma^2) \] Finally, an overall distribution was obtained by multiplying the outcomes of the first (probability) and second (lognormal) steps.


2.4.2 Mathematical definition of the distribution

Parameter Estimation

To fit the Delta-Lognormal model with the data, the following parameters need to be estimated:

  • 𝑝: The probability 𝑝 of observing a non-zero value is estimated as the proportion of non-zero observations in the dataset.

  • 𝜇 and 𝜎: The parameters 𝜇 and 𝜎 of the lognormal distribution are estimated using the log-transformed positive values. This can be done using the maximum likelihood estimation (MLE) or other appropriate statistical techniques.

Probability Density Function (PDF)

The probability density function (pdf) of the Delta-Lognormal model can be expressed as a mixture of a discrete probability mass at zero and a continuous lognormal distribution for positive values:

\[ f(x) = \begin{cases} 1 - p & \text{if } x = 0 \\ p \cdot \frac{1}{x \sigma \sqrt{2\pi}} \exp\left(-\frac{(\log x - \mu)^2}{2 \sigma^2}\right) & \text{if } x > 0 \end{cases} \]

Where:

  • \(p_i\) is the probability of a non-zero value.
  • \(\mu_i\) is the mean of the log-transformed positive values from the lognormal model.
  • \(\sigma\) is the standard deviation of the log-transformed positive values.

\(\mu_{LN}\) is mean of log-normal distribution

\[ E[X] = p\cdot\mu_{LN} \] \(\sigma^2_{LN}\) is variance of log-normal distribution

\[ V[X] = p\cdot\sigma_{LN}^2+p(1-p)\mu_{LN}^2 \]


2.4.3 Characterization, properties

2.4.3.1 Proof: Probability density function of the log-normal distribution

We denote a random variable \(X\) following a log-normal distribution with parameters \(\mu\) and \(\sigma^2\) as:

\[ X \sim \ln N(\mu, \sigma^2) \]

The probability density function of the log-normal distribution follow by:

\[ f_X(x) = \frac{1}{x\sigma\sqrt{2\pi}} \cdot \exp\left[-\frac{(\ln x - \mu)^2}{2\sigma^2}\right] \]

A log-normally distributed random variable is defined as the exponential function of a normal random variable:

\[ Y \sim N(\mu, \sigma^2) \Rightarrow X = \exp(Y) \sim \ln N(\mu, \sigma^2) \]

For a univariate lognormal distribution, where \(y = \{y_1, \ldots, y_n\}\) are lognormally distributed random variables.

The maximum likelihood estimates for the mean \(\mu\) and variance \(\sigma^2\) are given by:

\[ \hat{\mu} = \frac{1}{n} \sum_{i=1}^{n} \log(y_i) \] and

\[ \hat{\sigma} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (\log(y_i) - \hat{\mu})^2} \ \]

2.4.3.2 Let’s Break down

The probability density function of the normal distribution is:

\[ f_Y(y) = \frac{1}{\sigma\sqrt{2\pi}} \cdot \exp\left[-\frac{(y - \mu)^2}{2\sigma^2}\right] \]

The change of variables technique involves the inverse function:

\[ X = g(Y) = \exp(Y) \] with the inverse function \[ Y = g^{-1}(X) = \ln(X) \] Because the derivative of \(\exp(Y)\) is always positive, \(g(Y)\) is strictly increasing and we can calculate the probability density function of a strictly increasing function as \[ f_X(x) = \begin{cases} {f_Y(g^{-1}(x))}\frac{dg^{-1}(x)}{dx}, & \text{if } x \in X \\ 0, & \text{if } x \notin X \end{cases} \]

where \(X = \{x = g(y) : y \in Y\}\).With the probability density function of the normal distribution.

And the derivative of the inverse function with respect to x is:

\[ \frac{dg^{-1}(x)}{dx} = \frac{1}{x} \]

Substituting these into the normal distribution’s PDF gives us the log-normal distribution’s PDF:

\[ \begin{array}{lcl} f_X(x) &=& f_Y(g^{-1}(x)) \cdot \frac{dg^{-1}(x)}{dx} \\ &=& \frac{1}{\sqrt{2\pi}\sigma} \cdot\ exp\left[-\frac{1}{2}\left(\frac{g^{-1}(x) - \mu^2}{\sigma^2}\right)\right]\cdot \frac{dg^{-1}(x)}{dx} \\ &=& \frac{1}{\sqrt{2\pi}\sigma} \cdot\ exp\left[-\frac{1}{2}\left(\frac{\ln x - \mu^2}{\sigma^2}\right)\right]\cdot \frac{d(lnx)}{dx} \\ &=& \frac{1}{\sqrt{2\pi}\sigma} \cdot\ exp\left[-\frac{1}{2}\left(\frac{(\ln x - \mu)^2}{\sigma}\right)\right]\cdot \frac{1}{x} \\ &=& \frac{1}{x\sigma\sqrt{2\pi}} \cdot \exp\left[-\frac{(\ln x - \mu)^2}{2\sigma^2}\right] \end{array} \]

Which simplifies to:

\[ f_X(x) = \frac{1}{x\sigma\sqrt{2\pi}} \cdot \exp\left[-\frac{(\ln x - \mu)^2}{2\sigma^2}\right] \]

2.4.3.3 References

Soch, Joram, et al. (2024). StatProofBook/StatProofBook.github.io: The Book of Statistical Proofs (Version 2023). Zenodo. https://doi.org/10.5281/ZENODO.4305949


2.4.4 Applying delta-lognormal distribution with the GAM model

2.4.4.1 Generalized additive model (GAM)

Generalized additive model (GAM) is a statistical modeling technique which allows non-linear relationships between the response and predictor variables, while maintaining additivity, following the formula:

\[ y_i = f(x_{i1}) + \cdots + f(x_{ip}) + \varepsilon_i, \quad \varepsilon_i \sim N(0, \sigma^2) \quad (i=1,2,...,n) \] where \(f\) are spline functions allowing a flexible and accurate modeling of the relationships between the response and predictor variables.

 

Step 1: Binomial Model for Probability of Non-Zero Values

Objective: Estimate the probability that the response variable is non-zero.

Method: Use a Generalized Additive Model (GAM) with a binomial family to model the probability of a non-zero response based on predictor variables.

The probability \(p_i\) that the response \(Y_i\) is non-zero is modeled using a binomial distribution by the logistic function applied to a linear predictor involving smooth functions of the covariates:

\[ \text{logit}(p_i) = \eta_i = \beta_0 + s_1(x_{i1}) + s_2(x_{i2}) + \cdots + s_k(x_{ik}) \]

Where:

          \(\text{logit}(p_i) = \log\left(\frac{p_i}{1 - p_i}\right)\) is the logit link function;
          \(\eta_i\) is the linear predictor;
          \(\beta_0\) is the intercept;
          and \(s_j(x_{ij})\) are smooth functions of the covariates \(x_{ij}\).

 

The model can be fitted using a Bernoulli distribution:

\[ Y_i \sim \text{Bernoulli}(p_i) \]  

Step 2: Lognormal Model for Positive Values

For the positive response values \(Y_i\), the log-transformed values are modeled as: \[ \log(Y_i) = \mu_i + \epsilon_i \] Where:

\[ \mu_i = \beta_0 + s_1(x_{i1}) + s_2(x_{i2}) + \cdots + s_k(x_{ik}) \] And: \[ \epsilon_i \sim \mathcal{N}(0, \sigma^2) \]

Where:

          \(\log(Y_i)\) is the log-transformed response;
          \(\mu_i\) is the linear predictor;
          \(\beta_0\) is the intercept;
          \(s_j(\cdot)\) are smooth functions of the covariates \(x_{ij}\);
          \(\epsilon_i\) is the normally distributed error term.

 

2.4.4.2 Predicting distribution

In the context of GAM, the inverse link function is used to transform the linear predictor back to the scale of the response variable. For the logistic (logit) link function and the lognormal link function, the inverse link functions are as follows:

  • Logistic (Logit) Link Function

The inverse logit link function, which transforms the linear predictor \(\eta_i\) back to the probability \(p_i\), is given by: \[ \text{logit}(p_i)=\eta_i = \log\left(\frac{p_i}{1 - p_i}\right) \]

\[ \begin{array}{lcl} e^{\eta_i} &=& \frac{p_i}{1-p_i} \\ p_i &=& \frac{e^{\eta_i}}{1 + e^{\eta_i}} \end{array} \]

  • Lognormal Link Function

For the lognormal distribution, the link function involves the natural logarithm. The log link function is defined as: \[ \log(Y_i) = \mu_i \] The inverse log link function, which transforms the linear predictor \(\eta_i\) back to the original response scale \(Y_i\), is given by: \[ Y_i = e^{\mu_i} \]

 

2.4.4.3 Overall Expected Value

To obtain the overall expected value of \(E[Y_i]\) , the probability of being non-zero with the expected value of the positive values are combined:

\[ E[Y_i] = p_i \cdot E[Y_i \mid Y_i > 0] \] Substituting the expressions: \[ E[Y_i] = p_i \cdot e^{\mu_i} \]

By multiplying the probability from the logistic model with the values from the lognormal model. The complete distribution of \(Y_i\) is obtained. This approach integrates both the discrete and continuous components of the Delta-Lognormal model.


2.4.5 Example analyses

2.4.5.1 How the binomial distribution by the logistic function looks like;

# Load necessary library
library(ggplot2)

# Set the seed for reproductive property
set.seed(123)

# Generate synthetic data
n <- 1000
x <- rnorm(n)
y <- rbinom(n, 1, plogis(1 * x - 0.25))  
#rbinom(n, size, prob)
#plogis(x) has consequently been called the ‘inverse logit’.

# Create a data frame
data <- data.frame(x, y)

# Plot the data and the logistic regression curve
ggplot(data, aes(x = x, y = y)) +
  geom_point(alpha = 0.3,color = "black") +
  geom_smooth(method = "glm", method.args = list(family = "binomial"), se = FALSE, color = "red") +
  labs(title = "Logistic Regression", x = "x", y = "Probability of Success (y)") +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

2.4.5.2 How the lognormal distribution looks like;

# Load necessary library
library(ggplot2)

# Set the seed for reproductive property
set.seed(123)

# Generate synthetic data
n <- 1000
x <- rlnorm(n)  # Generate x from a lognormal distribution
y <- rlnorm(n, meanlog = 0, sdlog = 1)   # Generate y from a lognormal distribution

# Create a data frame
data <- data.frame(x, y)

# Plot density plot of y
density_y <- ggplot(data, aes(x = y)) +
  geom_density(fill = "lightblue", color = "black") +
  labs(title = "Density Plot of y (Lognormal Distribution)", x = "y", y = "Density") +
  theme_minimal()
density_y

2.4.5.3 To generate data for analyze delta-lognormal distribution;

# Load necessary libraries
library(MASS)
library(mgcv)
## Loading required package: nlme
## This is mgcv 1.9-0. For overview type 'help("mgcv-package")'.
## 
## Attaching package: 'mgcv'
## The following object is masked from 'package:VGAM':
## 
##     s
# Set the seed for reproductive property
set.seed(123)

# Number of data points
n <- 1000

# Generate covariates
SST <- runif(n, min = 10, max = 35) 
MLD <- runif(n, min = 5, max = 300) 
CHL <- runif(n, min = 0.1, max = 2) 
# Define coefficients for the covariates
beta_SST <- 0.7 
beta_MLD <- -0.02  
beta_CHL <- 1.5  
intercept <- -8  
# Step 1: Generate binary indicator for zero and non-zero values
# Logistic function to determine probability
logit_p <- intercept + beta_SST * SST + beta_MLD * MLD + beta_CHL * CHL

prob_nonzero <- exp(logit_p) / (1 + exp(logit_p))
zero_indicator <- rbinom(n, 1, prob_nonzero)

# Define coefficients for Step 2
beta_mu_SST <- 0.1  
beta_mu_MLD <- -0.002  
beta_mu_CHL <- 0.2  
intercept_mu <- 1

# Step 2: Generate lognormal values for the non-zero cases
# Parameters for lognormal distribution
mu <- intercept_mu + beta_mu_SST * SST - beta_mu_MLD * MLD + beta_mu_CHL * CHL#
      #+0.05 * SST^2 - 0.001 * MLD^2 + 0.1 * log(CHL) # Adding non-linear terms 
sigma <- 0.7 

lognormal_values = exp(rnorm(n, mu, sigma))

# Combine into a single dataset
data <- zero_indicator * lognormal_values

# Create a data frame
data_df <- data.frame(response = data, SST = SST, MLD = MLD, CHL = CHL)
# Step 3: Fit the Delta-Lognormal model using GAM

# Fit the binomial part (probability of non-zero)
binary_val <-ifelse(data_df$response >0,1,0)

binomial_model <- gam(binary_val ~ s(SST) + s(MLD) + s(CHL), family = binomial, data = data_df)

# Fit the lognormal part (positive values)
pos.data <-subset(data_df, response >0)
pos.data$log_response <- log(pos.data$response)

lognormal_model <- gam(log_response ~ s(SST) + s(MLD) + s(CHL), family = gaussian, data = pos.data)

# Display the summary of the models
summary(binomial_model)
## 
## Family: binomial 
## Link function: logit 
## 
## Formula:
## binary_val ~ s(SST) + s(MLD) + s(CHL)
## 
## Parametric coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   5.3857     0.4547   11.84   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##          edf Ref.df Chi.sq  p-value    
## s(SST) 1.002  1.004 126.44  < 2e-16 ***
## s(MLD) 1.347  1.615  84.24  < 2e-16 ***
## s(CHL) 1.557  1.929  33.46 6.21e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =   0.64   Deviance explained = 64.5%
## UBRE = -0.65903  Scale est. = 1         n = 1000
summary(lognormal_model)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## log_response ~ s(SST) + s(MLD) + s(CHL)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.93060    0.02351   167.2   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##          edf Ref.df       F p-value    
## s(SST) 1.905  2.389 308.645  <2e-16 ***
## s(MLD) 1.000  1.000  47.461  <2e-16 ***
## s(CHL) 1.000  1.000   3.435  0.0642 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.504   Deviance explained = 50.6%
## GCV = 0.4578  Scale est. = 0.45507   n = 823
# Predicting using the models
# Predict the probability of non-zero
data_df$pred_prob <- predict(binomial_model, type = "response")

# Predict the lognormal values for non-zero cases
data_df$pred_lognormal <- exp(predict(lognormal_model, newdata = data_df))

# Combine the predictions
data_df$predicted_values <- data_df$pred_prob * data_df$pred_lognormal
#or
#data_df$predicted_values <- (1/(1+exp(-predict(binomial_model,data_df)))) * exp(predict(lognormal_model,data_df))

# Display the first few rows of the dataset with predictions
head(data_df)
##    response      SST       MLD       CHL pred_prob pred_lognormal
## 1  13.63902 17.18944  85.71871 0.4033806 0.9329159       21.15832
## 2  96.72935 29.70763 180.19075 0.3745801 0.9997597       89.35591
## 3  16.85298 20.22442  52.25452 0.3834427 0.9942429       26.15406
## 4 327.01686 32.07544 256.76192 1.0774251 0.9998741      142.39534
## 5 179.55899 33.51168 255.08305 1.0363719 0.9999467      165.76603
## 6   0.00000 11.13891 145.97661 1.2710513 0.2447985       15.18996
##   predicted_values
## 1        19.738929
## 2        89.334438
## 3        26.003485
## 4       142.377416
## 5       165.757188
## 6         3.718479
library(mgcViz)
## Warning: package 'mgcViz' was built under R version 4.3.3
## Loading required package: qgam
## Warning: package 'qgam' was built under R version 4.3.3
## 
## Attaching package: 'qgam'
## The following object is masked from 'package:VGAM':
## 
##     log1pexp
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
## Registered S3 method overwritten by 'mgcViz':
##   method from  
##   +.gg   GGally
## 
## Attaching package: 'mgcViz'
## The following object is masked from 'package:lattice':
## 
##     qq
## The following objects are masked from 'package:stats':
## 
##     qqline, qqnorm, qqplot
binomial_model.plot <- getViz(binomial_model)
print(plot(binomial_model.plot), pages = 1)

lognormal_model.plot <- getViz(lognormal_model)
print(plot(lognormal_model.plot), pages = 1)

# Calculate deviance explained for both parts
deviance_explained_binomial <- 1 - (binomial_model$deviance / binomial_model$null.deviance)
deviance_explained_lognormal <- 1 - (lognormal_model$deviance / lognormal_model$null.deviance)

# Print deviance explained
cat("Deviance explained (binomial model):", deviance_explained_binomial, "\n")
## Deviance explained (binomial model): 0.6452954
cat("Deviance explained (lognormal model):", deviance_explained_lognormal, "\n")
## Deviance explained (lognormal model): 0.5062341
#check the residuals between actual variable and predicted variable
resi <- data_df$response-(data_df$predicted_values)
plot(resi)

2.5 Zero-inflated (Nagatsuka)

Zero-inflated Poisson (ZIP) regression is a model for count data with excess zeros. As explained in chapter 2.2, Poisson distribution with parameter \(\lambda\) is used to model many natural occurring events where \(\gamma\) represents number of events per unit of time or space.

The normal Poisson distribution can be expressed as follows:

\[ P(Y = y_i) = \frac{e^{-\lambda} \cdot \lambda^{y_i}}{y_i!} \]

However, the Poisson distribution may not be useful when Y takes excess number of zeroes more than it is supposed to take in the distribution manner.

In such a case it can fit to data badly and will generate poor quality predictions, therefore a modified Poisson distribution called zero-inflated Poisson distribution(ZIP) which is going to be introduced from now becomes useful.

So, let’s proceed!

2.5.1 Overview of the distribution

In order to understand the difference between Normal Poisson distribution and ZIP, here are the probability distributions respectively.

Note that \(\omega\) is the structural zero probability, and \(\lambda\) is the mean in underlying Poisson distribution.

plot(0:12, dpois(0:12, lambda=3),
     type='h',
     main='ZIP(ω=0, λ = 3) = Poi(λ=3)',
     ylab='Probability',
     xlab ='',
     lwd=10)

plot(0:12, dzipois(0:12, lambda=3, pstr0 = 0.3, log = FALSE),
     type='h',
     main='ZIP(ω=0.3, λ = 3)',
     ylab='Probability',
     xlab ='',
     lwd=10)

See that the parameter \(\omega\) gives the extra probability at the value 0, and when \(\omega = 0\) the ZIP reduces to \(Poi(λ)\).

The variable is denoted with respective probability as follows:

\[ y_i = 0 \quad with \; probability \; ω_i\\ y_i >0 \quad with \; probability \; 1-ω_i\\ \]

Thus, the ZIP distribution has 2 parameters \(\omega\) and \(\lambda\), denoted by \(ZIP(\omega, \lambda)\) has the following probability function:

\[ P(Y=y_i) = ω_i+(1-ω_i)\exp(-λ_i) \quad if \; y_i=0\\ P(Y=y_i) = (1-ω_i)\exp(-λ_i)λ_i^{y_i}/y_i! \quad if \; y_i>0\\ \]

Note that the variable 0 can be either a structural zero with probability \(\omega\) or from underlying Poisson distribution with probability \(1-\omega\). When the variable is a positive value, then it is only from the underlying Poisson distribution with probability \(1-\omega\).

But, how do you get to know the probability \(\omega\)? Well, we introduce a logistic regression for this purpose.

The probability parameter \(\omega (\omega_1, \omega_2,... \omega_n)\) satisfies:

\[ \rm{logit(ω_i)}=\log( \frac{ω_i}{1-ω_i})=β_0+β_1*x_i \]

where X are covariate.

And, the Poisson mean \(\lambda (\lambda_1, \lambda_2,... \lambda_n)\) satisfies:

\[ \log(λ_i)=α_0+α_1*x_i \]

where \(\alpha\) are covariate.

Note that the covariate for Logistic function and for Poisson model may or may not be different depending on the analysis case.

2.5.2 Estimation of parameters

We are going to follow the Maximum Likelihood Estimation(MLE) for parameter estimation.

The likelihood function for ZIP is defined as:

\[ L_1 = \prod_{\left\{i; y_i=0 \right\}}(\frac{\exp{(β_0+β_1*x_i)}}{1 + \exp{(β_0+β_1*x_i)}})+(\frac{\exp(-e^{(α_0+α_1*x_i})}{1 + \exp{(-(β_0+β_1*x_i))}})\\ L_2 = \prod_{\left\{i; y_i>0 \right\}}(\frac{1}{1 + \exp{((β_0+β_1*x_i)}})(\frac{\exp(-e^{(α_0+α_1*x_i)})\exp(y_i(α_0+α_1*x_i))}{y_i!}) \]

where \(L_1\) is the likelihood for the variables \(y_i\) which are zero, and \(L_2\) is the likelihood for the variables \(y_i\) which are positive integers.

Then, we take the log to yield the following formula:

\[ L(α,β;y_i) = \prod_{y_i=0}\log(\exp{(β_0+β_1*x_i)}+\exp{(-(α_0+α_1*x_i))})+\prod_{y_i>0}y_i(α_0+α_1*x_i)-\exp(α_0+α_1*x_i)-\prod_{i=1}^{n}\log(1+\exp(β_0+β_1*x_i))-\prod_{y_i>1}\log(y_i!) \]

2.5.3 Analysis Examples

Now let’s see how ZIP performs.

2.5.3.1 Anaylysis for Nomal Poisson distribution

First of all, the data is generated by poisson distribution with mean \(\exp(0.5x + 2)\) where x is the covariate.

set.seed(1)
N <- 1000
X <- runif(N, 0, 3)
lambda <- exp(0.5*X + 2)
y <- rpois(N, lambda=lambda)
plot(X,y, main="scatter plot for Normal poisson distribution")

hist(y)

Let’s estimate parameters both by normal Poisson and by ZIP.

#negative log likelihood for Poisson
NLL.P <- function(params){
  lambda = exp(params[1]*X + params[2])
  NLL = -sum(dpois(y, lambda, log=T))
  return(NLL)
}

#negative log likelihood for ZIP
NLL.ZIP <- function(params){
  lambda = exp(params[1]*X + params[2])
  omega <- 1/(1 + exp(-(params[3]*X + params[4])))
  NLL = -sum(dzipois(y, lambda=lambda, pstr0 = omega, log = T))
  return(NLL)
}

init_params = c(0, 0, 1, 3)
res.P = optim(init_params, NLL.P, method="BFGS")
res.ZIP = optim(init_params, NLL.ZIP, method="BFGS")

#compare the result

true_values <- c(0.5, 2, 1, -2)
pred_by_poi <- c(matrix(res.P$par)[1:2], NA, NA)
pred_by_ZIP <- c(matrix(res.ZIP$par))

#the upper two for poisson mean predicts, the lower for logistic predicts 
df_result <- data.frame(true_values, pred_by_poi, pred_by_ZIP)
df_result
##   true_values pred_by_poi pred_by_ZIP
## 1         0.5   0.4933441  0.49336548
## 2         2.0   2.0038842  2.00386552
## 3         1.0          NA  0.03157685
## 4        -2.0          NA -9.77710742
#AIC for both models, Poisson and ZIP
print(res.P$value*2+3)
## [1] 5626.762
print(res.ZIP$value*2+5)
## [1] 5628.881

2.5.3.2 Analysis for ZIP

Then, let’s replace some data with zero using Logistic function.

Notice that there exists many zeroes.

omega <- 1/(1 + exp(-(X - 2)))
Z <- rbinom(N, 1, omega)
y[as.logical(Z)] <- 0
plot(X,y, main="scatter plot for ZIP")

hist(y)

In this case, the percentage of zero data are as shown below:

head(y, n=100)
##   [1] 11  0 12 29 11 36 22  0 16  2 12  0  0  0 24  0 17 39 10 23  0  8  0 13 14
##  [26] 15  8  0  0 12 13 17  0  8 23  0  0 13 20 15  0  9  0  0 11  0  7  0  0  0
##  [51]  0  0  0  0 12  7 11  0  0  0 25 10  0 14  0 18 23  0  7  0 18  0  0 11 16
##  [76]  0  0 11  0  0 20 24  0 17  0 11  0  6 12  7  9 14  0  0  0 22  0 12  0  0
sum(y==0)/1000*100
## [1] 40.9

Let’s again estimate parameters both by normal Poisson and by ZIP.

#negative log likelihood for Poisson
NLL.P <- function(params){
  lambda = exp(params[1]*X + params[2])
  NLL = -sum(dpois(y, lambda, log=T))
  return(NLL)
}

#negative log likelihood for ZIP
NLL.ZIP <- function(params){
  lambda = exp(params[1]*X + params[2])
  omega <- 1/(1 + exp(-(params[3]*X + params[4])))
  NLL = -sum(dzipois(y, lambda=lambda, pstr0 = omega, log = T))
  return(NLL)
}

init_params = c(0, 0, 1, 3)
res.P = optim(init_params, NLL.P, method="BFGS")
res.ZIP = optim(init_params, NLL.ZIP, method="BFGS")

init_params = c(0, 0, 1, 3)
res.P = optim(init_params, NLL.P, method="BFGS")
res.ZIP = optim(init_params, NLL.ZIP, method="BFGS")

#compare the result

true_values <- c(0.5, 2, 1, -2)
pred_by_poi <- c(matrix(res.P$par)[1:2], NA, NA)
pred_by_ZIP <- c(matrix(res.ZIP$par))

#the upper two for poisson mean predicts, the lower for logistic predicts 
df_result <- data.frame(true_values, pred_by_poi, pred_by_ZIP)
df_result
##   true_values pred_by_poi pred_by_ZIP
## 1         0.5   0.1054291   0.5048724
## 2         2.0   2.0027277   1.9843436
## 3         1.0          NA   0.9879240
## 4        -2.0          NA  -1.9115003
#AIC for both models, Poisson and ZIP
print(res.P$value*2+2*2)
## [1] 13901.81
print(res.ZIP$value*2+2*4)
## [1] 4441.978

2.6 0-truncation (Hiwatashi)

So far, we have discussed zero-inflated data in detail, but there also exists data where samples with zero values are underrepresented (zero-deflated data). In particular, data where the value zero is never observed is called zero-truncated data. Here, we will discuss probability distributions related to such data.

A discrete probability distribution where the subject is only positive integers. It is applied in cases where zero data is not expected to be observed, such as the size of school. Below, I will explain using the Zero-truncated Poisson distribution as an example.

2.6.1 Overview of the Zero-truncated Poisson distribution(ZTP)

The distribution reallocates the probability mass of a Poisson-distributed random variable \(Y\) being \(Y=0\) to the probability mass of \(Y>0\).

Below shows the difference between a Poisson distribution and ZTP with \(\lambda=3\).

2.6.2 Mathematical definition of the Zero-truncated Poisson distribution

The probability mass function is defined as follows,

\[ P(Y=k \mid Y>0)=\frac{Poisson(k;\lambda)}{1-Poisson(0;\lambda)}=\frac{\lambda^k e^{-\lambda}}{k!(1-e^{-\lambda})} \]

2.6.3 Characterization, properties and so on

Derivation of Mean and Deviance.

Since we are performing non-trivial operations such as redistributing a probability mass of 0, we need to derive them according to the definition.

Mean:

\[ \begin{array}{lcl} E[Y] &=& \sum_{k=1}^\infty k\frac{\lambda^k e^{-\lambda}}{k!(1-e^{-\lambda})} \\ &=& \frac{\lambda}{(1-e^{-\lambda})}\sum_{\textbf{k=1}}^\infty \frac{\lambda^{k-1} e^{-\lambda}}{(k-1)!} \\ \text{let }k'&=&k-1\\ E[Y] &=& \frac{\lambda}{(1-e^{-\lambda})}\sum_{\textbf{k'=0}}^\infty \frac{\lambda^{k'} e^{-\lambda}}{k'!} \\ &=& \frac{\lambda}{(1-e^{-\lambda})}\\ \end{array} \]

Variance:

\[ \begin{array}{lcl} V[Y] &=& E[Y^2]-(E[Y])^2 \\ &=& \sum_{k=1}^\infty {k^2\frac{\lambda^k e^{-\lambda}}{k!(1-e^{-\lambda})}} - \frac{\lambda^2}{(1-e^{-\lambda})^2}\\ &=& \sum_{k=1}^\infty {(k(k-1)+k)\frac{\lambda^k e^{-\lambda}}{k!(1-e^{-\lambda})}} - \frac{\lambda^2}{(1-e^{-\lambda})^2}\\ &=& \sum_{k=1}^\infty {k(k-1)\frac{\lambda^k e^{-\lambda}}{k!(1-e^{-\lambda})}}+ \sum_{k=1}^\infty {k\frac{\lambda^k e^{-\lambda}}{k!(1-e^{-\lambda})}} - \frac{\lambda^2}{(1-e^{-\lambda})^2}\\ &=& \frac{\lambda^2}{(1-e^{-\lambda})}+\frac{\lambda}{(1-e^{-\lambda})} - \frac{\lambda^2}{(1-e^{-\lambda})^2}\\ &=& \frac{\lambda^2+\lambda}{(1-e^{-\lambda})}- \frac{\lambda^2}{(1-e^{-\lambda})^2}\\ \end{array} \]

2.6.4 Examples

Let’s analyze the data(\(n=1000\)) on discount rates and counts of the number of products itemd by each customer.

In this document we consider the following data.

##              num_items
## discount_rate  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
##            5  96 57 21  5  0  0  0  0  0  0  0  0  0  0  0
##            10 54 63 22  9  0  2  0  0  0  0  0  0  0  0  0
##            15 33 52 41 21  6  3  2  0  0  0  0  0  0  0  0
##            20 12 29 45 30 26 15 10  4  1  0  0  0  0  0  0
##            25  2 13 21 35 34 31 18 16  5  4  0  1  0  0  0
##            30  0  0  3  5 17 21 21 22 20 19 14  6  7  1  5

What can be seen at quick review of the data is that

  • the number of items increases as the discount rate increases
  • data with zero items is not included

The analysis is performed on this data using the usual Poisson distribution and ZTP under the following assumptions.

Assumptions:

$$ \[\begin{array}{lcl} \lambda_i &=& \text{exp}(\beta_0+\beta_1*D_i)\\ Model1 &:& N_i\sim Poisson(\lambda_i)\\ Model2 &:& N_i\sim ZTP(\lambda_i)\\ \end{array}\]

$$

where \(D_i\) and \(N_i\) are the discount rate and number of items for the \(i\)-th sample.

The analysis proceeds in as follows

  1. Define the likelihood function for each model.
  2. Perform parameter optimization using BFGS.
  3. Do AIC comparisons of the two models.
print(res.P$value*2+4) #AIC for Model1
## [1] 3688.319
print(res.ZTP$value*2+4) #AIC for Model2
## [1] 3473.479

2.7 Shifted distribution (Hiwatashi)

A discrete probability distribution where the subject is integers greater than or equal to a specific baseline.

2.7.1 Overview of the Shifted Poisson distribution

It is a shifted version of the standard Poisson distribution by a shift amount \(s, s\in \mathbb{Z}\).

Below shows the difference between a Poisson distribution, Shifted Poisson Distribution, and ZTP with\(\lambda=3, s=1\)

As shown below, similar to ZTP, it excludes values of zero from the standard Poisson distribution, but its shape is different from ZTP.

2.7.2 Mathematical definition of the distribution

The probability mass function is defined as follows,

\[ P(Y=k \mid Y \geq s) = Poisson(Y=k-s \mid Y \geq 0) \]

2.7.3 Characterization, properties and so on

Derivation of Mean and Deviance.

Let \(Y\sim Shifted\text{-}Poisson(\lambda, s)\), \(Z\sim Poisson(\lambda)\), then \(Y=Z+s\).

Therefore, it is clear that \(E[Y] = \lambda + s, V[Y] = \lambda\)

It can also be daringly derived as follows.

Mean:

\[ \begin{array}{lcl} E[Y] &=& \sum_{k=s}^\infty k\frac{\lambda^{(k-s)} e^{-\lambda}}{(k-s)!} \\ \text{let } k'&=&k-s\\ E[Y] &=& \sum_{k'=\textbf{0}}^\infty k'\frac{\lambda^{k'} e^{-\lambda}}{k'!} \\ &=& \sum_{k'=\textbf{0}}^\infty k'\frac{\lambda^{k'} e^{-\lambda}}{k'!} + s\sum_{k'=\textbf{0}}^\infty \frac{\lambda^{k'} e^{-\lambda}}{k'!}\\ &=& \lambda + s \end{array} \]

Variance:

\[ \begin{array}{lcl} V[Y] &=& E[Y^2]-(E[Y])^2 \\ &=& \sum_{k=s}^\infty {k^2\frac{\lambda^{(k-s)} e^{-\lambda}}{(k-s)!}} - (\lambda+s)^2\\ \text{let } k'&=&k-s\\ V[Y] &=& \sum_{k'=0}^{\infty}{(k'(k-1)+k+2ks+s^2)\frac{\lambda^{k'} e^{-\lambda}}{k'!}} - (\lambda+s)^2\\ &&\sum_{k'=0}^{\infty}{(k'(k-1)+k)\frac{\lambda^{k'} e^{-\lambda}}{k'!}}=\lambda^2+\lambda\\ &&2s\sum_{k'=0}^{\infty}{k\frac{\lambda^{k'} e^{-\lambda}}{k'!}}=2s\lambda\\ &&s^2\sum_{k'=0}^{\infty}{\frac{\lambda^{k'} e^{-\lambda}}{k'!}}=s^2\\ V[Y]&=&\lambda \end{array} \]

2.7.4 Examples

Consider applying Shifted-Poisson to the data used during the ZTP example. The data were as follows.

##              num_items
## discount_rate  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
##            5  96 57 21  5  0  0  0  0  0  0  0  0  0  0  0
##            10 54 63 22  9  0  2  0  0  0  0  0  0  0  0  0
##            15 33 52 41 21  6  3  2  0  0  0  0  0  0  0  0
##            20 12 29 45 30 26 15 10  4  1  0  0  0  0  0  0
##            25  2 13 21 35 34 31 18 16  5  4  0  1  0  0  0
##            30  0  0  3  5 17 21 21 22 20 19 14  6  7  1  5

Here the minimum number of items is 1, and we can naturally assume \(s=1\).So consider the following model.

  • Model3: \(N_i\sim Shifted\text{-}Poisson(\lambda_i;s=1)\)

The likelihood calculation and parameter estimation in this model is same as model 3’ below.

  • Model3’: \(N_i-s\sim Poisson(\lambda_i)\)

The analysis will take this into account.

print(res.P$value*2+4) #AIC for Model1(Poisson)
## [1] 3688.319
print(res.ZTP$value*2+4) #AIC for Model2(ZTP)
## [1] 3473.479
print(res.SP$value*2+4) #AIC for Model3(Shifted-Poisson)
## [1] 3458.384

Model 3 has the lowest AIC.In fact, the data was generated as shown in the following code.

set.seed(19990725)
n = 1000
discount_rate = as.integer(runif(n)*30)%/%5 * 5 + 5
lambda = exp(discount_rate/10 - 1)
num_items = rpois(n, lambda = lambda) + 1 #Value shift!

This code generated random numbers based on the Poisson distribution and then added 1 to all values.

This means that the Shifted-Poisson assumption is consistent with the true data generation model. And the model selection by AIC is still Shifted-Poisson.

2.8 Hurdle model or zero-altered distribution (Kitakado)

2.8.1 Overview of the distribution

In the context of habitat analysis, the “zero-inflated distribution” tries to understand the meaning of “0” as “the site is a Not Good habitat and fish are absent” + “the site is a Good habitat but fish are not found”. On the other hand, in the hurdle model, the zero probability is simply expressed as a single additional parameter, and the positive probability is expressed using a standard distribution.

2.8.2 Mathematical definition of the distribution

For example, the zero-altered Poisson (ZAP) is as follows: \[ \begin{array}{lcl} Pr(Y=0) &=& \theta \\ Pr(Y=y) &=& (1-\theta) \cdot e^{-\lambda} \frac{\lambda^y}{y!} \cdot \frac{1}{1-e^{-\lambda}} \quad (y=1,2....) \end{array} \] On the contrary, the ZIP is \[ \begin{array}{lcl} Pr(Y=0) &=& \theta + (1-\theta) \cdot e^{-\lambda}\\ Pr(Y=y) &=& (1-\theta) \cdot e^{-\lambda} \frac{\lambda^y}{y!} \quad (y=1,2,...) \end{array} \]

2.9 Mixture distribution in general (Kitakado)

This type of distribution is sometimes used to extend basic distributions by accounting for heterogeneity in the parameters, assuming the randomness of an underlying parameter.

2.9.1 Overview of the distribution

Let \(f(y|\theta, \phi)\) denote a basic probability distribution for a random variable \(Y\), parameterized by two different parameters, \(\theta\) and \(\phi\). Suppose that \(\theta\) can vary by sampling, treating \(\theta\) as a latent random variable.

To clarify the model, we shall assume the following models.

\[ \begin{array}{lcl} Y_i|\theta_i &\sim& f(\cdot|\theta_i, \phi) \quad (i=1,2,\dots,n) \\ \theta_i &\sim& f(\cdot|\lambda) \quad (i=1,2,\dots,n) \\ \end{array} \] Then the marginal distribution of \(Y_i\) can be expressed as follows: \[ f(y_i|\phi, \lambda) = \int f(y_i|\theta_i, \phi) \cdot f(\theta_i|\lambda) d\theta_i \]

2.9.2 Mathematical definition of the distribution

To simplify the expression, we define the mixture distribution as follows:

\[ f(y|\phi, \lambda) = \int f(y|\theta, \phi) \cdot f(\theta|\lambda) d\theta \]

2.9.3 Characterization, properties and so on

For example, - the mean and variance of \(Y\) - the reproductive property like the sum of distributions keep the same distributional assumption (e.g. \(Y_i \sim Po(\lambda_i)\) then \(\sum_{i=1}^n Y_i \sim Po(\sum_{i=1}^n \lambda_i)\)) and so on.

2.9.4 Examples

Please add some typical examples of the distribution (e.g., coin tossing, male/female in the binomial distribution). In my mixture case, the context is different from others, so I have left some examples of mixture distributions themselves.

2.9.4.1 Negative binomial distribution

\[ \begin{array}{lcl} Y|\lambda &\sim& Poisson (\lambda) \\ \lambda &\sim& Gamma(\alpha,\beta) \\ \Rightarrow && \\ f(y) &=& \displaystyle \frac{\Gamma(\alpha+y)}{\Gamma(\alpha) y!} \left( \frac{1}{1+\beta} \right)^\alpha \left( \frac{\beta}{1+\beta} \right)^y \end{array} \]

2.9.4.2 Normal mixture

\[ \begin{array}{lcll} Y|I=i &\sim& N(\mu_i, \sigma_i^2) &(i=1,2,\dots,K) \\ Pr(I=i) &=& \omega_i & (i=1,2,\dots,K) \\ \Rightarrow && \\ f(y) &=& \sum_{i=1}^K \omega_i f(y|\mu_i, \sigma_i^2) \end{array} \]

2.10 Compound Poisson distribution (Kitakado)

2.10.1 Overview of the distribution

Assume that \(N\) is a random variable with a Poisson distribution \[ N \sim \operatorname{Poisson}(\lambda) \] and that \(X_1, X_2, \dots, X_N\) are independent and identically distributed random variables. Then the probability distribution of \[ Y = \sum_{i=1}^{N} X_i \] is called a compound Poisson distribution. It’s also called a stopped-sum distribution.

There is a certain probability of \(Pr(N=0)=e^{-\lambda}\), which implies \(Pr(Y=0)=e^{-\lambda}\) for any types of underlying distributions of \(X\)’s.

2.10.2 Mathematical definition of the distribution

The details are omitted, but when knowing properties of probability distribution is not easy, sometimes we can use

  • “probability generating function (pgf)” (for discrete distribution particularly)
    \[ G(z) = E[z^Y], \quad Pr(Y=n) = G^{(n)}(0), \quad E[Y(Y-1)\cdots(Y-n)] = G^{(n)}(1). \]

  • “moment generating function (mgf)” (as Laplace transformation) \[ M(t) = E[e^{tY}], \quad E[Y^n] = M^{(n)}(0). \]

  • “cumulant generating function (cgf)” (log of mgf) \[ K(t) = \log M(t), \quad K'(0)=M'(0)=E[Y], \quad K^{''}(0)=V[Y] \]

  • “characteristic function (cf) Fourier transformation”
    \[ \phi(z) = E[e^{itY}], \quad E[Y^n] = \phi^{(n)}(0). \]

The distribution and each of the function has 1:1 property, so you can check the distribution of the compound distribution using such functions. For example, \[ G_Y(z) = E[z^Y] = E[z^{\sum_{i=1}^{N} X_i}] = E[E[\prod_{i=1}^N z^{X_i} | N]] = E[G_X(z)^N] = e^{\lambda (G_X(z)-1)} \]

2.10.3 Examples

  • Poisson-geometric: negative binomial distribution
  • Poisson-binomial: Hermite distribution
  • Poisson-Poisson: Neyman type A
  • Poisson–gamma distribution: special case of Tweedie distribution (1<\(p\)<2)

2.11 Tweedie in general (Kitakado)

2.11.1 Overview of the distribution

This distribution is a continuous distribution with a positive probability mass of 0. With this special characteristics, this is sometime used for Catch, CPUE, and biomass etc. The derivation is a bit complicated, but if focusing on the use of fisheries, the Tweedie distribution is derived as a special type of compound Poisson distribution with gamma distribution. Also, the Tweedie is characterized through a special mean-variance relationship, \[ V[Y] = \phi V(\mu) = \phi \mu^p \] Using the notation in the above relations, the Tweedie distribution is denoted as \[ Y \sim TW_p(\mu, \phi). \]

  • If \(p=0\), \(E[Y]=\phi\), \(Y \sim TW_0(\mu,\phi) = N(\mu, \phi)\)

  • If \(p=1\), \(E[Y]=\phi \mu\), \(Y\phi \sim TW_1(\mu,\phi) = Po(\mu / \phi)\)

  • If \(p=2\), \(E[Y]=\phi \mu^2\), \(Y\sim TW_2(\mu,\phi) = Ga(\mu, \phi)\)

2.11.2 Mathematical definition of the distribution

The Tweedie distribution is normally characterized by the cumulant function. Before defining the Tweedie, let’s define a natural exponential family as \[ f(y; \theta) = \exp \{y \theta - \kappa(\theta) \} h(y) \] In this model, \[ \int f(y; \theta) dy = \int \exp \{y \theta - \kappa(\theta) \} h(y) dy = 1 \] \[ M(t;\theta) = E[e^{tY}] = \exp\{ \kappa(\theta + t) - \kappa(\theta) \} \\ K(t;\theta) = \log M(t,\theta) = \kappa(\theta + t) - \kappa(\theta) \]

The (reproductive) exponential dispersion model is defined as a scaled transformed natural exponential family as \[ f(y; \theta, \phi) = \exp \left\{\frac{y \theta - \kappa(\theta)}{\phi} \right\} g(\phi) h(y) \]

\[ M(t;\theta, \phi) = E[e^{tY}] = \exp \left\{ \frac{\kappa(\phi \theta + t) - \kappa(\theta)}{\phi} \right\} \\ K(t;\theta, \phi) = \log M(t,\theta, \phi) = \frac{\kappa(\phi \theta + t) - \kappa(\theta)}{\phi} \]

Here, \[ K'(0) = \kappa'(\theta) = \tau(\theta) = \mu \\ K{''}(0) = \phi \kappa{''}(\theta) = \phi \tau' (\theta) = \phi V(\mu) \] and we are incorporating Taylor’s power law \(V(\mu)=\mu^p\) into the variance function to define a cumulant function \(\kappa(\theta)\) for the Tweedie. So, the equation to be solved is the following differential equation: \[ \tau' (\theta) = \mu^p \Leftrightarrow \tau' (\theta) = \tau^p(\theta) \]

  • If \(p=0\), then \(\tau(\theta)=\theta\) and \(\kappa(\theta)=\theta^2/2\) (normal).

  • If \(p=1\), then \(\tau(\theta)=e^\theta\) and \(\kappa(\theta)=e^\theta\) (Poisson).

  • If \(p=2\), then \(\tau(\theta)= -1/\theta \ (\theta<0)\) and \(\kappa(\theta)=-\log (-\theta)\) (gamma).

If $p , 1, 2, \[ \begin{array}{rcl} \tau'(\theta) &=& \tau^p(\theta) \\ \Rightarrow \tau(\theta) &=& \left\{ (1-p)\theta \right\}^\frac{1}{1-p} = \mu\\ \Rightarrow \kappa(\theta) &=& \frac{1}{2-p} \left\{ (1-p)\theta \right\}^\frac{2-p}{1-p} = \frac{1}{2-p} \mu^{2-p}\\ \end{array} \] Here \(\theta=\frac{\mu^{1-p}}{1-p}\) and therefore the shape of Tweedie distribution is essentially expressed as follows: \[ \begin{array}{rcl} f(y; \theta, \phi, p) &=& \exp \left\{ \frac{\theta y - \kappa(\theta)}{\phi} \right\} g(\phi) h(y) \\ && \exp \left\{ \frac{1}{\phi} \left[ y \frac{\mu^{1-p}}{1-p} - \frac{1}{2-p} \mu^{2-p} \right] \right\} g(\phi) h(y) \end{array} \] where I omitted to show the normalizing constant \(g(\phi) h(y)\).

The Tweedie can be generated through a compound Poisson-gamma. (sorry for my laziness)

3 Example analyses (to come)

3.1 Cod parasite data

https://github.com/romunov/AED?tab=readme-ov-file https://www.highstat.com/index.php/our-books?view=article&id=16&catid=18

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ lubridate 1.9.3     ✔ tibble    3.2.1
## ✔ purrr     1.0.2     ✔ tidyr     1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::collapse() masks nlme::collapse()
## ✖ dplyr::filter()   masks stats::filter()
## ✖ dplyr::lag()      masks stats::lag()
## ✖ dplyr::select()   masks MASS::select()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
ParasiteCod <- read.table("ParasiteCod.txt", header=T)
head(ParasiteCod, 10)
##    Sample Intensity Prevalence Year Depth Weight Length Sex Stage Age Area
## 1       1         0          0 1999   220    148     26   0     0   0    2
## 2       2         0          0 1999   220    144     26   0     0   0    2
## 3       3         0          0 1999   220    146     27   0     0   0    2
## 4       4         0          0 1999   220    138     26   0     0   0    2
## 5       5         0          0 1999   220     40     17   0     0   0    2
## 6       6         0          0 1999   220     68     20   0     0   0    2
## 7       7         0          0 1999   220     52     19   0     0   0    2
## 8       8         0          0 1999   194   3848     77   0     0   0    3
## 9       9         0          0 1999   194   2576     67   0     0   0    3
## 10     10         0          0 1999   194   1972     60   0     0   0    3
tail(ParasiteCod, 10)
##      Sample Intensity Prevalence Year Depth Weight Length Sex Stage Age Area
## 1245   1245        50          1 2001   228    484     38   2     1   3    4
## 1246   1246        55          1 2001   228    704     45   1     1   4    4
## 1247   1247        75          1 2001   140    884     46   2     1   3    4
## 1248   1248        84          1 2001   260    910     48   2     1   4    2
## 1249   1249        89          1 2001   260   1414     56   1     1   6    2
## 1250   1250        90          1 2001   228    224     31   1     1   2    4
## 1251   1251       104          1 2001   140    690     43   2     1   3    4
## 1252   1252       125          1 2001   140    754     44   2     1   3    4
## 1253   1253       128          1 2001   140   1270     55   2     4   7    4
## 1254   1254       257          1 2001   228    370     35   2     1   3    4
dim(ParasiteCod)
## [1] 1254   11
str(ParasiteCod)
## 'data.frame':    1254 obs. of  11 variables:
##  $ Sample    : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Intensity : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Prevalence: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Year      : int  1999 1999 1999 1999 1999 1999 1999 1999 1999 1999 ...
##  $ Depth     : int  220 220 220 220 220 220 220 194 194 194 ...
##  $ Weight    : int  148 144 146 138 40 68 52 3848 2576 1972 ...
##  $ Length    : int  26 26 27 26 17 20 19 77 67 60 ...
##  $ Sex       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Stage     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Age       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Area      : int  2 2 2 2 2 2 2 3 3 3 ...
ParasiteCod$fArea <- factor(ParasiteCod$Area)
ParasiteCod$fYear <- factor(ParasiteCod$Year)
I1 <- is.na(ParasiteCod$Intensity) | is.na(ParasiteCod$Length)
ParasiteCod2 <- ParasiteCod[!I1, ]
plot(table(ParasiteCod2$Intensity), ylab = "Frequencies", xlab = "Observed intensity values")

dim(ParasiteCod2)
## [1] 1191   13
# Number of 0 data
sum(ParasiteCod2$Intensity==0)
## [1] 651

3.2 Poisson

3.2.1 IID

y <- ParasiteCod2$Intensity
NLL.pois <- function(par) (-1.0)*sum(dpois(y, exp(par), log=T))
Res.pois <- optim(0, NLL.pois, method="BFGS")
Res.pois
## $par
## [1] 1.826011
## 
## $value
## [1] 13859.25
## 
## $counts
## function gradient 
##       32        6 
## 
## $convergence
## [1] 0
## 
## $message
## NULL
lambda.pois <- exp(Res.pois$par)
lambda.pois
## [1] 6.209067
domain <- 0:max(y+1)
hist(y, freq=F, breaks=domain, main="Fitting to Pois")
points(domain, dpois(domain, lambda.pois), col="red")

Res.IID.pois <- glm(y ~ 1, family = "poisson", data=ParasiteCod2)
summary(Res.IID.pois)
## 
## Call:
## glm(formula = y ~ 1, family = "poisson", data = ParasiteCod2)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  1.82601    0.01163     157   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 25797  on 1190  degrees of freedom
## Residual deviance: 25797  on 1190  degrees of freedom
## AIC: 27721
## 
## Number of Fisher Scoring iterations: 7

3.2.2 Year*Area + Length

y <- ParasiteCod2$Intensity
Res.glm.pois <- glm(y ~ 0 + fYear*fArea + Length, family = "poisson", data=ParasiteCod2)
summary(Res.glm.pois)
## 
## Call:
## glm(formula = y ~ 0 + fYear * fArea + Length, family = "poisson", 
##     data = ParasiteCod2)
## 
## Coefficients:
##                    Estimate Std. Error z value Pr(>|z|)    
## fYear1999         2.9985075  0.0565707  53.005  < 2e-16 ***
## fYear2000         3.2079397  0.0789588  40.628  < 2e-16 ***
## fYear2001         0.6014964  0.1840927   3.267  0.00109 ** 
## fArea2           -0.5189996  0.0711025  -7.299 2.89e-13 ***
## fArea3           -1.0343800  0.0722101 -14.325  < 2e-16 ***
## fArea4            1.0659077  0.0453673  23.495  < 2e-16 ***
## Length           -0.0287971  0.0008796 -32.740  < 2e-16 ***
## fYear2000:fArea2 -0.3399908  0.1329193  -2.558  0.01053 *  
## fYear2001:fArea2  2.6265442  0.2010904  13.062  < 2e-16 ***
## fYear2000:fArea3  1.5536092  0.1029887  15.085  < 2e-16 ***
## fYear2001:fArea3  2.5895857  0.2018590  12.829  < 2e-16 ***
## fYear2000:fArea4  0.3906449  0.0815499   4.790 1.67e-06 ***
## fYear2001:fArea4  2.2480938  0.1870790  12.017  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 40395  on 1191  degrees of freedom
## Residual deviance: 18430  on 1178  degrees of freedom
## AIC: 20378
## 
## Number of Fisher Scoring iterations: 7
AIC(Res.glm.pois)
## [1] 20377.86

3.3 Negative binomial

3.3.1 IID

y <- ParasiteCod2$Intensity
logit <- function(p) log(p/(1-p)) 
ilogit <- function(x) 1.0/(1.0+exp(-x)) 
NLL.nb <- function(par) (-1.0)*sum(dnbinom(y, prob=ilogit(par[1]), size=exp(par[2]), log=T))
Res.nb <- optim(c(0,0), NLL.nb, method="BFGS")
## Warning in dnbinom(y, prob = ilogit(par[1]), size = exp(par[2]), log = T): NaNs
## produced
Res.nb
## $par
## [1] -3.657231 -1.831582
## 
## $value
## [1] 2608.355
## 
## $counts
## function gradient 
##       31       11 
## 
## $convergence
## [1] 0
## 
## $message
## NULL
prob.nb <- ilogit(Res.nb$par[1])
size.nb <- exp(Res.nb$par[2])
AIC.nb <- 2*Res.nb$value + 2*2
AIC.nb
## [1] 5220.709
domain <- 0:max(y+1)
hist(y, freq=F, breaks=domain, main="Fitting to Pois")
points(domain, dnbinom(domain, prob=prob.nb, size=size.nb), col="red")

library(MASS)
Res.IID.nb <- glm.nb(y ~ 1, data=ParasiteCod2)
summary(Res.IID.nb)
## 
## Call:
## glm.nb(formula = y ~ 1, data = ParasiteCod2, init.theta = 0.1601896019, 
##     link = log)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  1.82601    0.07333    24.9   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(0.1602) family taken to be 1)
## 
##     Null deviance: 999.77  on 1190  degrees of freedom
## Residual deviance: 999.77  on 1190  degrees of freedom
## AIC: 5220.7
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  0.16019 
##           Std. Err.:  0.00837 
## 
##  2 x log-likelihood:  -5216.70900

3.3.2 Year*Area + Length

y <- ParasiteCod2$Intensity
Res.glm.nb <- glm.nb(y ~ 0 + fYear*fArea + Length, data=ParasiteCod2)
summary(Res.glm.nb)
## 
## Call:
## glm.nb(formula = y ~ 0 + fYear * fArea + Length, data = ParasiteCod2, 
##     init.theta = 0.2153077102, link = log)
## 
## Coefficients:
##                   Estimate Std. Error z value Pr(>|z|)    
## fYear1999         2.864463   0.305823   9.366  < 2e-16 ***
## fYear2000         2.982258   0.408627   7.298 2.92e-13 ***
## fYear2001         0.423865   0.396433   1.069  0.28498    
## fArea2           -0.442768   0.295274  -1.500  0.13374    
## fArea3           -1.153175   0.255511  -4.513 6.39e-06 ***
## fArea4            0.921721   0.282246   3.266  0.00109 ** 
## Length           -0.023835   0.004729  -5.040 4.66e-07 ***
## fYear2000:fArea2 -0.387188   0.528897  -0.732  0.46413    
## fYear2001:fArea2  2.506942   0.485119   5.168 2.37e-07 ***
## fYear2000:fArea3  1.492543   0.466798   3.197  0.00139 ** 
## fYear2001:fArea3  2.614731   0.445608   5.868 4.42e-09 ***
## fYear2000:fArea4  0.503002   0.511290   0.984  0.32522    
## fYear2001:fArea4  2.263310   0.468251   4.834 1.34e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(0.2153) family taken to be 1)
## 
##     Null deviance: 2786.8  on 1191  degrees of freedom
## Residual deviance: 1013.1  on 1178  degrees of freedom
## AIC: 5030.7
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  0.2153 
##           Std. Err.:  0.0120 
## 
##  2 x log-likelihood:  -5002.6730
AIC(Res.glm.nb)
## [1] 5030.673

4 Wrap-up discussion

Questions:

Findings: