CopyPastor

***Setup***
d1 = {'A': [1, 2, 3, 4, 5, 6, 7], 'B': [12, 13, 14, 15, 16, 17, 18]} d2 = {'A': [8, 9, 10, 11, 12, 13, 14], 'B': [18, 19, 20, 21, 22, 23, 24]}
cat_one = 'M' cat_two = 'P'
---
Assuming that you know which category goes with which dictionary, you can restructure your dictionaries and use `concat`:
df1 = pd.DataFrame([[k, el, cat_one] for k, v in d1.items() for el in v]) df2 = pd.DataFrame([[k, el, cat_two] for k, v in d2.items() for el in v])
cols = {'columns': {0: 'Name', 1: 'Value', 2: 'Category'}} pd.concat([df1, df2]).reset_index(drop=True).rename(**cols)
<!- ->
Name Value Category 0 A 1 M 1 A 2 M 2 A 3 M 3 A 4 M 4 A 5 M 5 A 6 M 6 A 7 M 7 B 12 M 8 B 13 M 9 B 14 M 10 B 15 M 11 B 16 M 12 B 17 M 13 B 18 M 14 A 8 P 15 A 9 P 16 A 10 P 17 A 11 P 18 A 12 P 19 A 13 P 20 A 14 P 21 B 18 P 22 B 19 P 23 B 20 P 24 B 21 P 25 B 22 P 26 B 23 P 27 B 24 P

Just out of curiosity I've taken a look at what happens under the hood, and I've used [dtruss/strace][1] on each test.
C++
./a.out < in Saw 6512403 lines in 8 seconds. Crunch speed: 814050
syscalls `sudo dtruss -c ./a.out < in`
CALL COUNT __mac_syscall 1 <snip> open 6 pread 8 mprotect 17 mmap 22 stat64 30 read_nocancel 25958

Python
./a.py < in Read 6512402 lines in 1 seconds. LPS: 6512402
syscalls `sudo dtruss -c ./a.py < in`
CALL COUNT __mac_syscall 1 <snip> open 5 pread 8 mprotect 17 mmap 21 stat64 29
[1]: http://en.wikipedia.org/wiki/Strace

CopyPastor

Possible Plagiarism

Original Post