Facing reality: Survival modelling with missing and censored observations with numpyro. Part 3: Missing values

In the last part Conditional survival part 2: Survival probabilities very close to zero, I dealt with boundary problems of extreme survival observations, which drive the survival function toward the precision boundary of the machine and lead to underflow behavior which makes the computation of the likelihood values problematic. Using the cumulative hazard directly in the computation of the conditional probability resolves the issue and simplifies the model. Below are the functions defined in part 1 and part 2....

December 30, 2024 · 16 min

Dangerous log-normal distributions over time

Observed concentrations over time should not be modelled with a Lognormal distribution I routinely use the log-normal distribution for positive constrained values. because (due to the property of probability to distributions to have a unit integral) probability density values decrease. This means automatically, higher concentration values will be less likely than lower concentration values (even relative to the mean value). Which results in higher concentration values having a proportionally higher influence on the joint likelihood....

December 27, 2024 · 1 min

Facing reality: Survival modelling with missing and censored observations with numpyro. Part 2: Extreme early lethality - Survival probabilities very close to zero

In Conditional survival part 2: Survival probabilities very close to zero, we verified that the multinomial and the conditional binomial probability models produce identical likelihoods when calculated with numpyro. In this part of the series, we take a look at extreme survival observations and survival probabilities very close to zero. We will find out when these break the solver, disrupt the sampler and step by step identify the causes of the problem and solutions to deal with it....

December 18, 2024 · 21 min

Facing reality: Survival modelling with missing and censored observations with numpyro. Part 1: Comparison between the multinomial and the conditional binomial probability model

In my work I usually draw on probabilistic programming languages to make parameter estimates. The archetypical dataset in environmental toxicology is survival data. I.e. counts of surviving organisms or death counts over time. These type of data can statistically be modelled with a conditional binomial [@Delignette-Muller.] or multinomial distribution [@Jager.2018]. Just to be on the safe side, I always wanted to reproduce this if the approaches yield equivalent parameter estimates. Here I reproduce this statement with numpyro and extend the concept further to handle missing observations and censored values....

November 20, 2024 · 5 min

Statistical distributions of noisy concentration measurements

Calibrating physical process models on concentration data can be surprisingly tricky because there are multiple things to consider. Some of which are not always obvious: Measurement data is non-negative, which requires distribution constrained to the non-negative interval. Concentration measurements often vary multiplicatively. This is because usually, samples are diluted or enriched to bring them on a measurable concentration range before measurement. Thus, measurement errors of a physical process that spans several orders of magnitude during its temporal evolution, increases multiplicatively Different error scales have consequences for calculating the probability of a datapoint given a distribution....

September 22, 2024 · 13 min

Pramters

I think there is not a word in all of typing, that I have mispelled more often than parameter. The following is a running list of attempts to spell it out. prameters paprameter paramer parameterer parmeters parmaeter paramter

September 21, 2024 · 1 min

On the merit of making good prior predictive checks

Most of the time working on Bayesian models, I spend time diagnosing divergences and bad model fits. It is surprising how much struggle in diagnosing a broken model can be spared when proper prior predictive checks are done. In my case, this often means testing if the majority of predictions, generated from prior draws can reproduce the experimental observations. Although, I routinely do this, oftentimes, making predictions for a specific scenario are a bit laborious and sometimes I neglect them and use some standard checks....

January 19, 2023 · 2 min

The positive constrained cauchy prior

Sometimes when optimizing parameter values only an approximate idea of the size of the parameter is available and sometimes parameters operate on vastly different scales. The cauchy distribution is often used, when little information is known about the true parameter value, due to its large tails. It is particularly frustrating to sample parameters close to zero, when values smaller than zero are not allowed. A way out seems to be the positive constrained cauchy distribution, which has the remarkable property of being symmetric on the log scale....

January 12, 2023 · 1 min

Peer review in question

Peer review promises that articles undergo scientific scrutiny prior to publication. This is supposed to increase trust into the content of the article, but can peer review overcome a bias inherent to a field? A short look (without any scrutiny) into the editorial board of the Pest Management Science journal produced by the Society of Chemical Industry already reveals a bias. The editors seem to overwhelmingly stem from fields that have an interest in the continued reasearch of agricultural products such as biocides....

February 8, 2022 · 2 min