This function extracts a semi-colon delimited list from a data frame's column. It excludes NAs, optionally removes duplicate elements, sorts the character string and adds text as a delimiter to the last two items.

extract_semicolon_delimited_list(
  .df,
  column_name,
  sort = FALSE,
  unique_list = FALSE,
  last_delimiter = "",
  ...
)

Arguments

.df

A data frame containing the column with the delimited list to extract.

column_name

The name of the column containing the delimited list to extract.

sort

Should the delimited list be sorted (TRUE) or not (default FALSE).

unique_list

Should the delimited list contain duplicated elements (default FALSE) or not (TRUE).

last_delimiter

An optional character string used to separate the last two items in the delimited list.

Value

The semi-colon separated delimited list as a character string.

Details

It is an example of the use of a function factory, tidy evaluation and purrr's map function. The function may be called on a nested data frame to extract the delimited list.

See also

Examples

suppressPackageStartupMessages({ library(store) suppressWarnings({ library(palmerpenguins) library(dplyr) }) }) # select top 5 heaviest penguins from each species on each island heaviest_penguins <- penguins %>% select(species, island, body_mass_g) %>% group_by(species, island) %>% arrange(desc(body_mass_g)) %>% slice_head(n = 5) %>% ungroup() heaviest_penguins
#> # A tibble: 25 x 3 #> species island body_mass_g #> <fct> <fct> <int> #> 1 Adelie Biscoe 4775 #> 2 Adelie Biscoe 4725 #> 3 Adelie Biscoe 4600 #> 4 Adelie Biscoe 4400 #> 5 Adelie Biscoe 4300 #> 6 Adelie Dream 4650 #> 7 Adelie Dream 4600 #> 8 Adelie Dream 4475 #> 9 Adelie Dream 4450 #> 10 Adelie Dream 4400 #> # ... with 15 more rows
# extract comma separated list of penguin weights for each species on each island suppressPackageStartupMessages({ suppressWarnings({ library(purrr) }) }) heaviest_penguins %>% group_nest(across(c(species:island)), .key = "penguins") %>% mutate(weight = map_chr(penguins, extract_semicolon_delimited_list, column_name = "body_mass_g")) %>% select(-penguins)
#> # A tibble: 5 x 3 #> species island weight #> <fct> <fct> <chr> #> 1 Adelie Biscoe 4775; 4725; 4600; 4400; 4300 #> 2 Adelie Dream 4650; 4600; 4475; 4450; 4400 #> 3 Adelie Torgersen 4700; 4675; 4500; 4450; 4400 #> 4 Chinstrap Dream 4800; 4550; 4500; 4450; 4400 #> 5 Gentoo Biscoe 6300; 6050; 6000; 6000; 5950