Required data format • lifelihood

library(lifelihood)
library(tidyverse)
#> ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
#> ✔ dplyr     1.1.4     ✔ readr     2.1.5
#> ✔ forcats   1.0.0     ✔ stringr   1.5.1
#> ✔ ggplot2   3.5.2     ✔ tibble    3.3.0
#> ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
#> ✔ purrr     1.0.4     
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag()    masks stats::lag()
#> ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Introduction

As life history data is by nature heterogeneous between observations from the same sample, the format of the data required can be somewhat tricky. Under the hood, lifelihood creates a text file containing the data (and other parameters), and then use this file for the (core) source code. Fortunately, lifelihood automatically transform the dataframe into the text file with the right format.

In this vignette, we’ll look at how the dataframe containing your data needs to be formatted for the Lifelihood programme to work properly.

Data preparation

Let’s create a simple dataset with just 7 observations and the following columns:

sex Column name containing the sex of the observations.
sex_start Column name containing the first date of the interval in which the sex was determined.
sex_end Column name containing the second date of the interval in which the sex was determined.
maturity_start Column name containing the first date of the interval in which the maturity was determined.
maturity_end Column name containing the second date of the interval in which the maturity was determined.
clutchs Vector containing the names of the clutch columns. The order should be: first clutch first date, first clutch second date, first clutch clutch size, second clutch first date, first clutch second date, second clutch clutch size, and so on. If the observation with the most clutches is, for example, 10, then the vector must be of size 10 x 3 = 30 (3 elements per clutch: first date, second date and size).
death_start Column name containing the first date of the interval in which the death was determined.
death_end Column name containing the second date of the interval in which the death was determined.
geno Column name of the first column to add in the input data file

df <- data.frame(
  sex = c(0, 0, 0, 0, 0, 0, 0),
  sex_start = c(1, 3, 2, 10, 3, 4, 5),
  sex_end = c(2, 4, 3, 11, 4, 5, 6),
  maturity_start = c(2, 1, 0, 1, 0, 2, 1),
  maturity_end = c(4, 2, 1000, 2, 1000, 3, 2),
  clutch_start1 = c(3, 2, NA, 2, NA, 3, 2),
  clutch_end1 = c(4, 3, NA, 3, NA, 4, 3),
  clutch_size1 = c(4, 6, NA, 5, NA, 2, 30),
  clutch_start2 = c(5, NA, NA, 5, NA, 4, 3),
  clutch_end2 = c(6, NA, NA, 6, NA, 5, 4),
  clutch_size2 = c(5, NA, NA, 7, NA, 10, 5),
  clutch_start3 = c(7, NA, NA, 6, NA, NA, 5),
  clutch_end3 = c(8, NA, NA, 7, NA, NA, 6),
  clutch_size3 = c(1, NA, NA, 1, NA, NA, 2),
  death_start = c(8, 11, 0, 11, 0, 7, 9),
  death_end = c(12, 11, 1, 12, 1, 8, 10),
  geno = c(1, 3, 1, 0, 2, 0, 1)
)

As you can see, some observations made more ponts, leading to the presence of NULL values.

One row of the dataset should represent the life history of one observation.

Next step

Once our dataframe has the right format, we now have to create a configuration file.