CopyPastor

*Moved from Q&A section.*
If the desire for `{dplyr}` syntax comes from wanting to use piping, then we can just stay in base `R`.
```r myt |> transform(effective_arrival = pmax(date_arrived, as.Date('2020-05-01')), effective_departure = pmin(myt$date_left, as.Date('2021-05-01'))) |> subset(effective_arrival <= effective_departure) |> within({ stay = effective_departure - effective_arrival total = ave(stay, name, FUN=sum) in_country = sprintf('%.2f%%', as.numeric(stay) / as.numeric(total) * 100) # rm(effective_departure, effective_arrival, stay, total) }) ```
might give what you are after (your code currently issues an error and there is no displayed desired output)
```none name country date_arrived date_left effective_arrival effective_departure in_country total stay 2 Bob Canada 2020-06-01 2020-09-01 2020-06-01 2020-09-01 42.79% 215 days 92 days 3 Bob Mexico 2020-10-01 2021-02-01 2020-10-01 2021-02-01 57.21% 215 days 123 days 5 Diana France 2019-12-01 2020-07-01 2020-05-01 2020-07-01 39.87% 153 days 61 days 6 Diana Germany 2020-08-01 2020-11-01 2020-08-01 2020-11-01 60.13% 153 days 92 days 7 Eve USA 2020-12-01 2021-02-01 2020-12-01 2021-02-01 50.41% 123 days 62 days 8 Eve Canada 2021-03-01 9999-12-31 2021-03-01 2021-05-01 49.59% 123 days 61 days 10 Frank USA 2020-03-01 2020-09-01 2020-05-01 2020-09-01 40.46% 304 days 123 days 11 Frank UK 2020-11-01 9999-12-31 2020-11-01 2021-05-01 59.54% 304 days 181 days 14 Henry USA 2020-05-01 2020-08-15 2020-05-01 2020-08-15 29.04% 365 days 106 days 15 Henry Canada 2020-08-15 2020-12-01 2020-08-15 2020-12-01 29.59% 365 days 108 days 16 Henry Mexico 2020-12-01 2021-05-01 2020-12-01 2021-05-01 41.37% 365 days 151 days ```
Note. Currently we do not need to move `in_country = sprintf('%.2f%%', as.numeric(stay) / as.numeric(total) * 100)` into an `aggregate` call, because in the example dataset no person (identified by first `name`) appears more than once in the same country. (Please adapt toy data!)
_____
Edit. Per the comment, here is one way. Modfiy data
```r myt2 = myt myt2[16, 'country'] = 'USA' ```
and do `ave` twice.
```r myt2 |> transform( effective_arrival = pmax(date_arrived, as.Date('2020-05-01')), effective_departure = pmin(date_left, as.Date('2021-05-01'))) |> subset(effective_arrival <= effective_departure) |> within({ stay = as.numeric(effective_departure - effective_arrival) total_country = ave(stay, name, country, FUN=sum) total_person = ave(stay, name, FUN=sum) in_country = sprintf('%.2f%%', total_country / total_person * 100) rm(effective_departure, effective_arrival, stay, total_country, total_person) }) ```
```none name country date_arrived date_left in_country 2 Bob Canada 2020-06-01 2020-09-01 42.79% 3 Bob Mexico 2020-10-01 2021-02-01 57.21% 5 Diana France 2019-12-01 2020-07-01 39.87% 6 Diana Germany 2020-08-01 2020-11-01 60.13% 7 Eve USA 2020-12-01 2021-02-01 50.41% 8 Eve Canada 2021-03-01 9999-12-31 49.59% 10 Frank USA 2020-03-01 2020-09-01 40.46% 11 Frank UK 2020-11-01 9999-12-31 59.54% 14 Henry USA 2020-05-01 2020-08-15 70.41% # adjusted 15 Henry Canada 2020-08-15 2020-12-01 29.59% 16 Henry USA 2020-12-01 2021-05-01 70.41% # adjusted, input modified ```
Remove the `rm`-line to see more details. It might be benefecial to keep each row (observation). Another option would be to `aggregate`.

If the desire for `{dplyr}` syntax comes from wanting to use piping, then we can just stay in base `R`.
```r myt |> transform(effective_arrival = pmax(date_arrived, as.Date('2020-05-01')), effective_departure = pmin(myt$date_left, as.Date('2021-05-01'))) |> subset(effective_arrival <= effective_departure) |> within({ stay = effective_departure - effective_arrival total = ave(stay, name, FUN=sum) in_country = sprintf('%.2f%%', as.numeric(stay) / as.numeric(total) * 100) # rm(effective_departure, effective_arrival, stay, total) }) ```
might give what you are after (your code currently issues an error and there is no displayed desired output)
```none name country date_arrived date_left effective_arrival effective_departure in_country total stay 2 Bob Canada 2020-06-01 2020-09-01 2020-06-01 2020-09-01 42.79% 215 days 92 days 3 Bob Mexico 2020-10-01 2021-02-01 2020-10-01 2021-02-01 57.21% 215 days 123 days 5 Diana France 2019-12-01 2020-07-01 2020-05-01 2020-07-01 39.87% 153 days 61 days 6 Diana Germany 2020-08-01 2020-11-01 2020-08-01 2020-11-01 60.13% 153 days 92 days 7 Eve USA 2020-12-01 2021-02-01 2020-12-01 2021-02-01 50.41% 123 days 62 days 8 Eve Canada 2021-03-01 9999-12-31 2021-03-01 2021-05-01 49.59% 123 days 61 days 10 Frank USA 2020-03-01 2020-09-01 2020-05-01 2020-09-01 40.46% 304 days 123 days 11 Frank UK 2020-11-01 9999-12-31 2020-11-01 2021-05-01 59.54% 304 days 181 days 14 Henry USA 2020-05-01 2020-08-15 2020-05-01 2020-08-15 29.04% 365 days 106 days 15 Henry Canada 2020-08-15 2020-12-01 2020-08-15 2020-12-01 29.59% 365 days 108 days 16 Henry Mexico 2020-12-01 2021-05-01 2020-12-01 2021-05-01 41.37% 365 days 151 days ```
Note. Currently we do not need to move `in_country = sprintf('%.2f%%', as.numeric(stay) / as.numeric(total) * 100)` into an `aggregate` call, because in the example dataset no person (identified by first `name`) appears more than once in the same country. (Please adapt toy data!)
_____
Edit. Per the comment, here is one way. Modfiy data
```r myt2 = myt myt2[16, 'country'] = 'USA' ```
and do `ave` twice.
```r myt2 |> transform( effective_arrival = pmax(date_arrived, as.Date('2020-05-01')), effective_departure = pmin(date_left, as.Date('2021-05-01'))) |> subset(effective_arrival <= effective_departure) |> within({ stay = as.numeric(effective_departure - effective_arrival) total_country = ave(stay, name, country, FUN=sum) total_person = ave(stay, name, FUN=sum) in_country = sprintf('%.2f%%', total_country / total_person * 100) rm(effective_departure, effective_arrival, stay, total_country, total_person) }) ```
```none name country date_arrived date_left in_country 2 Bob Canada 2020-06-01 2020-09-01 42.79% 3 Bob Mexico 2020-10-01 2021-02-01 57.21% 5 Diana France 2019-12-01 2020-07-01 39.87% 6 Diana Germany 2020-08-01 2020-11-01 60.13% 7 Eve USA 2020-12-01 2021-02-01 50.41% 8 Eve Canada 2021-03-01 9999-12-31 49.59% 10 Frank USA 2020-03-01 2020-09-01 40.46% 11 Frank UK 2020-11-01 9999-12-31 59.54% 14 Henry USA 2020-05-01 2020-08-15 70.41% # adjusted 15 Henry Canada 2020-08-15 2020-12-01 29.59% 16 Henry USA 2020-12-01 2021-05-01 70.41% # adjusted, input modified ```
Remove the `rm`-line to see more details. It might be benefecial to keep each row (observation). Another option would be to `aggregate`.

CopyPastor

Possible Plagiarism

Original Post