*Moved from Q&A section.*
If the desire for `{dplyr}` syntax comes from wanting to use piping, then we can just stay in base `R`.
```r
myt |>
transform(effective_arrival = pmax(date_arrived, as.Date('2020-05-01')),
effective_departure = pmin(myt$date_left, as.Date('2021-05-01'))) |>
subset(effective_arrival <= effective_departure) |>
within({
stay = effective_departure - effective_arrival
total = ave(stay, name, FUN=sum)
in_country = sprintf('%.2f%%', as.numeric(stay) / as.numeric(total) * 100)
# rm(effective_departure, effective_arrival, stay, total)
})
```
might give what you are after (your code currently issues an error and there is no displayed desired output)
```none
name country date_arrived date_left effective_arrival effective_departure in_country total stay
2 Bob Canada 2020-06-01 2020-09-01 2020-06-01 2020-09-01 42.79% 215 days 92 days
3 Bob Mexico 2020-10-01 2021-02-01 2020-10-01 2021-02-01 57.21% 215 days 123 days
5 Diana France 2019-12-01 2020-07-01 2020-05-01 2020-07-01 39.87% 153 days 61 days
6 Diana Germany 2020-08-01 2020-11-01 2020-08-01 2020-11-01 60.13% 153 days 92 days
7 Eve USA 2020-12-01 2021-02-01 2020-12-01 2021-02-01 50.41% 123 days 62 days
8 Eve Canada 2021-03-01 9999-12-31 2021-03-01 2021-05-01 49.59% 123 days 61 days
10 Frank USA 2020-03-01 2020-09-01 2020-05-01 2020-09-01 40.46% 304 days 123 days
11 Frank UK 2020-11-01 9999-12-31 2020-11-01 2021-05-01 59.54% 304 days 181 days
14 Henry USA 2020-05-01 2020-08-15 2020-05-01 2020-08-15 29.04% 365 days 106 days
15 Henry Canada 2020-08-15 2020-12-01 2020-08-15 2020-12-01 29.59% 365 days 108 days
16 Henry Mexico 2020-12-01 2021-05-01 2020-12-01 2021-05-01 41.37% 365 days 151 days
```
Note. Currently we do not need to move `in_country = sprintf('%.2f%%', as.numeric(stay) / as.numeric(total) * 100)` into an `aggregate` call, because in the example dataset no person (identified by first `name`) appears more than once in the same country. (Please adapt toy data!)
_____
Edit. Per the comment, here is one way. Modfiy data
```r
myt2 = myt
myt2[16, 'country'] = 'USA'
```
and do `ave` twice.
```r
myt2 |>
transform(
effective_arrival = pmax(date_arrived, as.Date('2020-05-01')),
effective_departure = pmin(date_left, as.Date('2021-05-01'))) |>
subset(effective_arrival <= effective_departure) |>
within({
stay = as.numeric(effective_departure - effective_arrival)
total_country = ave(stay, name, country, FUN=sum)
total_person = ave(stay, name, FUN=sum)
in_country = sprintf('%.2f%%', total_country / total_person * 100)
rm(effective_departure, effective_arrival, stay, total_country, total_person)
})
```
```none
name country date_arrived date_left in_country
2 Bob Canada 2020-06-01 2020-09-01 42.79%
3 Bob Mexico 2020-10-01 2021-02-01 57.21%
5 Diana France 2019-12-01 2020-07-01 39.87%
6 Diana Germany 2020-08-01 2020-11-01 60.13%
7 Eve USA 2020-12-01 2021-02-01 50.41%
8 Eve Canada 2021-03-01 9999-12-31 49.59%
10 Frank USA 2020-03-01 2020-09-01 40.46%
11 Frank UK 2020-11-01 9999-12-31 59.54%
14 Henry USA 2020-05-01 2020-08-15 70.41% # adjusted
15 Henry Canada 2020-08-15 2020-12-01 29.59%
16 Henry USA 2020-12-01 2021-05-01 70.41% # adjusted, input modified
```
Remove the `rm`-line to see more details. It might be benefecial to keep each row (observation). Another option would be to `aggregate`.
If the desire for `{dplyr}` syntax comes from wanting to use piping, then we can just stay in base `R`.
```r
myt |>
transform(effective_arrival = pmax(date_arrived, as.Date('2020-05-01')),
effective_departure = pmin(myt$date_left, as.Date('2021-05-01'))) |>
subset(effective_arrival <= effective_departure) |>
within({
stay = effective_departure - effective_arrival
total = ave(stay, name, FUN=sum)
in_country = sprintf('%.2f%%', as.numeric(stay) / as.numeric(total) * 100)
# rm(effective_departure, effective_arrival, stay, total)
})
```
might give what you are after (your code currently issues an error and there is no displayed desired output)
```none
name country date_arrived date_left effective_arrival effective_departure in_country total stay
2 Bob Canada 2020-06-01 2020-09-01 2020-06-01 2020-09-01 42.79% 215 days 92 days
3 Bob Mexico 2020-10-01 2021-02-01 2020-10-01 2021-02-01 57.21% 215 days 123 days
5 Diana France 2019-12-01 2020-07-01 2020-05-01 2020-07-01 39.87% 153 days 61 days
6 Diana Germany 2020-08-01 2020-11-01 2020-08-01 2020-11-01 60.13% 153 days 92 days
7 Eve USA 2020-12-01 2021-02-01 2020-12-01 2021-02-01 50.41% 123 days 62 days
8 Eve Canada 2021-03-01 9999-12-31 2021-03-01 2021-05-01 49.59% 123 days 61 days
10 Frank USA 2020-03-01 2020-09-01 2020-05-01 2020-09-01 40.46% 304 days 123 days
11 Frank UK 2020-11-01 9999-12-31 2020-11-01 2021-05-01 59.54% 304 days 181 days
14 Henry USA 2020-05-01 2020-08-15 2020-05-01 2020-08-15 29.04% 365 days 106 days
15 Henry Canada 2020-08-15 2020-12-01 2020-08-15 2020-12-01 29.59% 365 days 108 days
16 Henry Mexico 2020-12-01 2021-05-01 2020-12-01 2021-05-01 41.37% 365 days 151 days
```
Note. Currently we do not need to move `in_country = sprintf('%.2f%%', as.numeric(stay) / as.numeric(total) * 100)` into an `aggregate` call, because in the example dataset no person (identified by first `name`) appears more than once in the same country. (Please adapt toy data!)
_____
Edit. Per the comment, here is one way. Modfiy data
```r
myt2 = myt
myt2[16, 'country'] = 'USA'
```
and do `ave` twice.
```r
myt2 |>
transform(
effective_arrival = pmax(date_arrived, as.Date('2020-05-01')),
effective_departure = pmin(date_left, as.Date('2021-05-01'))) |>
subset(effective_arrival <= effective_departure) |>
within({
stay = as.numeric(effective_departure - effective_arrival)
total_country = ave(stay, name, country, FUN=sum)
total_person = ave(stay, name, FUN=sum)
in_country = sprintf('%.2f%%', total_country / total_person * 100)
rm(effective_departure, effective_arrival, stay, total_country, total_person)
})
```
```none
name country date_arrived date_left in_country
2 Bob Canada 2020-06-01 2020-09-01 42.79%
3 Bob Mexico 2020-10-01 2021-02-01 57.21%
5 Diana France 2019-12-01 2020-07-01 39.87%
6 Diana Germany 2020-08-01 2020-11-01 60.13%
7 Eve USA 2020-12-01 2021-02-01 50.41%
8 Eve Canada 2021-03-01 9999-12-31 49.59%
10 Frank USA 2020-03-01 2020-09-01 40.46%
11 Frank UK 2020-11-01 9999-12-31 59.54%
14 Henry USA 2020-05-01 2020-08-15 70.41% # adjusted
15 Henry Canada 2020-08-15 2020-12-01 29.59%
16 Henry USA 2020-12-01 2021-05-01 70.41% # adjusted, input modified
```
Remove the `rm`-line to see more details. It might be benefecial to keep each row (observation). Another option would be to `aggregate`.