Sažetak | Za vektor međusobno nezavisnih slučajnih varijabli \(\boldsymbol{Y} = (Y_1, \ldots, Y_n)^T\) za koje pretpostavljamo da ovise o vrijednostima \(x_1, \ldots, x_p\) generalizirane linearne modele definiramo kao \begin{equation*} \begin{cases} Y_i \sim EFD(\theta_i) \\ \mathbb{E}[Y_i] = \mu_i = q^{-1}(\boldsymbol{x}_i^T\boldsymbol{\beta}), \end{cases} \end{equation*} za \(i = 1, \ldots, n\). Slučajne varijable \(Y_i \sim EFD(\theta_i)\) dolaze iz eksponencijalne familije distribucija u standardnoj formi, čija gustoća ovisi o parametru \(\theta_i\). Ta familija uključuje brojne poznate statističke distribucije, uključujući binomnu, normalnu, Poissonovu i gama distribuciju te mnoge druge. Na ovaj način generalizirani linearni modeli omogućuju modeliranje zavisne varijable $\boldsymbol{Y}$ koja pripada i drugim distribucijama, ne samo normalnoj. Nadalje, funkcija \(g\) iz gornjeg zapisa je monotono diferencijabila funkcija koju nazivamo funkcija poveznica. Ona povezuje distribuciju zavisne varijable, njeno očekivanje i varijancu, s linearnom kombinacijom nezavisnih varijabli \(\boldsymbol{x}_i^T\boldsymbol{\beta}\). Na ovaj nam način generalizirani linearni modeli omogućuju modeliranje i nelinearnih veza. Nepoznate parametre \(\boldsymbol{\beta} = (\beta_0, \beta_1, \ldots, \beta_p)^T\) procjenjujemo metodom najveće vjerodostojnosti, tražeći maksimum \(\boldsymbol{b} = (b_0, b_1, \ldots, b_p)^T\) funkcije log-vjerodostojnosti na temelju uzorka \(\boldsymbol{y}\). Maksimizacija se svodi na traženje nultočaka parcijalnih derivacija log-vjerodostojnosti. Ovisno o složenosti, nultočke možemo tražiti analitički ili iterativnom težinskom metodom najmanjih kvadrata koja koristi Fisherov algoritam za poboljšanje procjena parametara putem težinske matrice i pseudo-odgovora. Nakon što smo procijenili parametre generaliziranog linearnog modela i dobili jednadžbu modela, želimo provjeriti preciznost modela statističkim inferencijama koje uključuju testiranje statističkih hipoteza o značajnosti parametara, modela te usporedba modela korištenjem asimptotske \(N(0,1)\) i \(\chi^2\) distribucije, kao i izračunavanje pouzdanih intervala. Nakon što potvrdimo da smo dobili precizan model, možemo ga koristiti za donošenje odluka. |
Sažetak (engleski) | For a vector of independent random variables \(\boldsymbol{Y} = (Y_1, \ldots, Y_n)^T\), where we assume that they depend on the values $x_1, \ldots, x_p$, generalized linear models are defined as \begin{equation*} \begin{cases} Y_i \sim EFD(\theta_i) \\ \mathbb{E}[Y_i] = \mu_i = q^{-1}(\boldsymbol{x}_i^T\boldsymbol{\beta}), \end{cases} \end{equation*} for \(i = 1, \ldots, n\). The random variables \(Y_i \sim EFD(\theta_i)\) come from the exponential family of distributions in standard form, with the density depending on the parameter \(\theta_i\). This family includes numerous well-known statistical distributions, such as the binomial, normal, Poisson, and gamma distributions, among many others. In this way, generalized linear models allow modeling of the dependent variable \(\boldsymbol{Y}\) that belongs to distributions other than just the normal distribution. Furthermore, the function \(g\) in the above expression is a monotonic differentiable function called the link function. It connects the distribution of the dependent variable, its expectation, and its variance with the linear combination of the independent variables \(\boldsymbol{x}_i^T\boldsymbol{\beta}\). This way, generalized linear models enable the modeling of nonlinear relationships as well. The unknown parameters \(\boldsymbol{\beta} = (\beta_0, \beta_1, \ldots, \beta_p)^T\) are estimated using the method of maximum likelihood by finding the maximum \(\boldsymbol{b} = (b_0, b_1, \ldots, b_p)^T\) of the log-likelihood function based on the sample \(\boldsymbol{y}\). Maximization reduces to finding the zeroes of the partial derivatives of the log-likelihood. Depending on the complexity, the zeroes can be found either analytically or by an iterative weighted least squares method, which uses Fisher's scoring algorithm to improve parameter estimates via a weight matrix and pseudo-responses. Once the parameters of the generalized linear model have been estimated and the model equation obtained, we want to assess the accuracy of the model through statistical inferences, which include hypothesis testing for the significance of parameters, testing the overall model, and model comparison using the asymptotic \(N(0,1)\) and \(\chi^2\) distributions, as well as the computation of confidence intervals. Once we confirm the precision of the model, it can be used for decision-making. |