CopyPastor

Detecting plagiarism made easy.

Score: 0.8029484748840332; Reported for: String similarity Open both answers

Possible Plagiarism

Plagiarized on 2022-11-23
by mozway

Original Post

Original - Posted on 2014-11-12
by Thomas Kimber



            
Present in both answers; Present only in the new answer; Present only in the old answer;

The question is ambiguous, but assuming you want to perform one-hot encoding on the two columns: ``` out = (df['Topics'].str.get_dummies(sep='; ') .join(df['co-authors'].str.get_dummies(sep='; ')) ) ``` Output: ``` Beriberi Character Recognition Crops Deep Learning End Effectors IOU Malus Number Object Detection Plant Diseases and Disorders Robot Social Insects Swarm Swarm Robotics Tesseract Bandala, Argel A. Billones, Robert Kerwin C. Concepcion, Ronnie Sybingco, E. Vicerra, Ryan Rhay P. 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 2 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 3 1 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 4 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 0 0 0 ```
OK, two steps to this - first is to write a function that does the translation you want - I've put an example together based on your pseudo-code:

def label_race (row): if row['eri_hispanic'] == 1 : return 'Hispanic' if row['eri_afr_amer'] + row['eri_asian'] + row['eri_hawaiian'] + row['eri_nat_amer'] + row['eri_white'] > 1 : return 'Two Or More' if row['eri_nat_amer'] == 1 : return 'A/I AK Native' if row['eri_asian'] == 1: return 'Asian' if row['eri_afr_amer'] == 1: return 'Black/AA' if row['eri_hawaiian'] == 1: return 'Haw/Pac Isl.' if row['eri_white'] == 1: return 'White' return 'Other'
You may want to go over this, but it seems to do the trick - notice that the parameter going into the function is considered to be a Series object labelled "row".
Next, use the apply function in pandas to apply the function - e.g.
df.apply (lambda row: label_race(row), axis=1)
Note the axis=1 specifier, that means that the application is done at a row, rather than a column level. The results are here:
0 White 1 Hispanic 2 White 3 White 4 Other 5 White 6 Two Or More 7 White 8 Haw/Pac Isl. 9 White
If you're happy with those results, then run it again, saving the results into a new column in your original dataframe.
df['race_label'] = df.apply (lambda row: label_race(row), axis=1) The resultant dataframe looks like this (scroll to the right to see the new column):
lname fname rno_cd eri_afr_amer eri_asian eri_hawaiian eri_hispanic eri_nat_amer eri_white rno_defined race_label 0 MOST JEFF E 0 0 0 0 0 1 White White 1 CRUISE TOM E 0 0 0 1 0 0 White Hispanic 2 DEPP JOHNNY NaN 0 0 0 0 0 1 Unknown White 3 DICAP LEO NaN 0 0 0 0 0 1 Unknown White 4 BRANDO MARLON E 0 0 0 0 0 0 White Other 5 HANKS TOM NaN 0 0 0 0 0 1 Unknown White 6 DENIRO ROBERT E 0 1 0 0 0 1 White Two Or More 7 PACINO AL E 0 0 0 0 0 1 White White 8 WILLIAMS ROBIN E 0 0 1 0 0 0 White Haw/Pac Isl. 9 EASTWOOD CLINT E 0 0 0 0 0 1 White White




        
Present in both answers; Present only in the new answer; Present only in the old answer;