CopyPastor

Detecting plagiarism made easy.

Score: 0.8006477952003479; Reported for: String similarity Open both answers

Possible Plagiarism

Plagiarized on 2018-06-09
by Noman Khan

Original Post

Original - Posted on 2012-03-11
by 2mia



            
Present in both answers; Present only in the new answer; Present only in the old answer;

You need to define escape character so that comma(,) inside test can be ignored while parsing
this can be done as ***spark.read.option("escape","\"")***
working example :


scala> val df = spark.read.option("header",true).option("escape","\"").csv("train.csv"); df: org.apache.spark.sql.DataFrame = [id: string, teacher_id: string ... 14 more fields] scala> df.select($"project_is_approved").show +-------------------+ |project_is_approved| +-------------------+ | 1| | 0| | 1| | 0| | 1| | 1| | 1| | 1| | 1| | 1| | 1| | 1| | 1| | 0| | 1| | 1| | 1| | 1| | 1| | 0| +-------------------+ only showing top 20 rows

Just out of curiosity I've taken a look at what happens under the hood, and I've used [dtruss/strace][1] on each test.
C++
./a.out < in Saw 6512403 lines in 8 seconds. Crunch speed: 814050
syscalls `sudo dtruss -c ./a.out < in`
CALL COUNT __mac_syscall 1 <snip> open 6 pread 8 mprotect 17 mmap 22 stat64 30 read_nocancel 25958

Python
./a.py < in Read 6512402 lines in 1 seconds. LPS: 6512402
syscalls `sudo dtruss -c ./a.py < in`
CALL COUNT __mac_syscall 1 <snip> open 5 pread 8 mprotect 17 mmap 21 stat64 29
[1]: http://en.wikipedia.org/wiki/Strace

        
Present in both answers; Present only in the new answer; Present only in the old answer;