Let's try [`groupby agg`]() with [Named Aggregation]() [`reindex`]() to match the `horse_id` column and [`join`]() back to the initial DataFrame:
```
df = df.join(
df.groupby('Sire_horse_id')
.agg(Offspring=('horse_id', 'count'), Offspring_races=('Races', 'sum'))
.reindex(df['horse_id'], fill_value=0)
.reset_index(drop=True)
)
```
`df`:
```
horse_id horse_type Sire_horse_id Dam_horse_id Races Offspring Offspring_races
0 101 Stalllion 50 80 20 3 62
1 102 Mare 51 81 3 1 5
2 103 Stallion 90 70 33 2 51
3 104 Colt 101 77 27 0 0
4 105 Filly 52 102 17 0 0
5 106 Filly 101 102 23 0 0
6 107 Mare 103 35 33 0 0
7 108 Colt 103 77 18 0 0
8 109 Colt 102 107 5 0 0
9 110 Filly 101 107 12 0 0
```
Just out of curiosity I've taken a look at what happens under the hood, and I've used [dtruss/strace][1] on each test.
C++
./a.out < in
Saw 6512403 lines in 8 seconds. Crunch speed: 814050
syscalls `sudo dtruss -c ./a.out < in`
CALL COUNT
__mac_syscall 1
<snip>
open 6
pread 8
mprotect 17
mmap 22
stat64 30
read_nocancel 25958
Python
./a.py < in
Read 6512402 lines in 1 seconds. LPS: 6512402
syscalls `sudo dtruss -c ./a.py < in`
CALL COUNT
__mac_syscall 1
<snip>
open 5
pread 8
mprotect 17
mmap 21
stat64 29
[1]: http://en.wikipedia.org/wiki/Strace