CopyPastor

Detecting plagiarism made easy.

Score: 0.8020686507225037; Reported for: String similarity Open both answers

Possible Plagiarism

Plagiarized on 2020-05-13
by Chuck P

Original Post

Original - Posted on 2012-03-11
by 2mia



            
Present in both answers; Present only in the new answer; Present only in the old answer;

Well if you're trying to prep for csv a common dodge is using pipes to separate so full credit to @akrun for the hard part but...
```r nm1 <- sub("\\..*", "", names(df)) nm2 <- names(which(table(nm1) > 1))
df[paste0('combined', seq_along(nm2))] <- lapply(nm2, function(x) apply(df[grep(x, names(df))], 1, function(x) str_replace_all(toString(x), pattern = ", ", replacement = "|")))
> df id mon.1 mon.2 mon.2.4.1...1 tue.6 tue heh wed wed.01 thu.0234 rating combined1 combined2 combined3 1 HD1 1 a #ji 1 190 1mn 1890 <NA> @jksdff 1 1|a|#ji 1| 190 1890|NA 2 HD2 0 b #ki 2 2345 2a 9002 @ksdf @sfd 2 0|b|#ki 2|2345 9002|@ksdf 3 HD3 1 c <NA> 3 41 g78 14341 <NA> @kukg.676 3 1|c|NA 3| 41 14341|NA 4 HD4 4 d #ui 4 89 asd324 657 <NA> @jdkfjk 4 4|d|#ui 4| 89 657|NA ```
Just out of curiosity I've taken a look at what happens under the hood, and I've used [dtruss/strace][1] on each test.
C++
./a.out < in Saw 6512403 lines in 8 seconds. Crunch speed: 814050
syscalls `sudo dtruss -c ./a.out < in`
CALL COUNT __mac_syscall 1 <snip> open 6 pread 8 mprotect 17 mmap 22 stat64 30 read_nocancel 25958

Python
./a.py < in Read 6512402 lines in 1 seconds. LPS: 6512402
syscalls `sudo dtruss -c ./a.py < in`
CALL COUNT __mac_syscall 1 <snip> open 5 pread 8 mprotect 17 mmap 21 stat64 29
[1]: http://en.wikipedia.org/wiki/Strace

        
Present in both answers; Present only in the new answer; Present only in the old answer;