CopyPastor

Detecting plagiarism made easy.

Score: 1.8527010083198547; Reported for: String similarity, Exact paragraph match Open both answers

Possible Plagiarism

Reposted on 2018-10-03
by Ganondorfz

Original Post

Original - Posted on 2018-10-03
by Ganondorfz



            
Present in both answers; Present only in the new answer; Present only in the old answer;

**There is a serverless solution using AWS Glue!** (I nearly died figuring this out)
No need for EC2 or dodging memory limits of Lambda.
**This solution is two parts:**
1. A lambda function that is triggered by S3 upon upload of a ZIP file and creates a GlueJobRun - passing the S3 Object key as an argument to Glue. 2. A Glue Job that unzips files (in memory!) and uploads back to S3.
See my code below which unzips the ZIP file and places the contents back into the same bucket (configurable).
Please upvote if helpful :)
**Lambda Script (python3) that calls a Glue Job called YourGlueJob**
import boto3 import urllib.parse glue = boto3.client('glue') def lambda_handler(event, context): bucket = event['Records'][0]['s3']['bucket']['name'] key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8') print(key) try: newJobRun = glue.start_job_run( JobName = 'YourGlueJob', Arguments = { '--bucket':bucket, '--key':key, } ) print("Successfully created unzip job") return key except Exception as e: print(e) print('Error starting unzip job for' + key) raise e
**AWS Glue Job Script to unzip the files**
import sys from awsglue.transforms import * from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext from awsglue.context import GlueContext from awsglue.job import Job ## @params: [JOB_NAME] args = getResolvedOptions(sys.argv, ['JOB_NAME','bucket','key'],) sc = SparkContext() glueContext = GlueContext(sc) spark = glueContext.spark_session job = Job(glueContext) job.init(args['JOB_NAME'], args) import boto3 import zipfile import io from contextlib import closing s3 = boto3.client('s3') s3r = boto3.resource('s3') bucket = args["bucket"] key = args["key"] obj = s3r.Object( bucket_name=bucket, key=key ) buffer = io.BytesIO(obj.get()["Body"].read()) z = zipfile.ZipFile(buffer) list = z.namelist() for filerr in list: print(filerr) y=z.open(filerr) arcname = key + filerr x = io.BytesIO(y.read()) s3.upload_fileobj(x, bucket, arcname) y.close() print(list) job.commit()

A bit late but for those who come after me...
**There is a serverless solution using AWS Glue!** (I nearly died figuring this out)
**This solution is two parts:**
1. A lambda function that is triggered by S3 upon upload of a ZIP file and creates a GlueJobRun - passing the S3 Object key as an argument to Glue. 2. A Glue Job that unzips files (in memory!) and uploads back to S3.
See my code below which unzips the ZIP file and places the contents back into the same bucket (configurable).
Please upvote if helpful :)
**Lambda Script (python3) that calls a Glue Job called YourGlueJob**
import boto3 import urllib.parse glue = boto3.client('glue') def lambda_handler(event, context): bucket = event['Records'][0]['s3']['bucket']['name'] key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8') print(key) try: newJobRun = glue.start_job_run( JobName = 'YourGlueJob', Arguments = { '--bucket':bucket, '--key':key, } ) print("Successfully created unzip job") return key except Exception as e: print(e) print('Error starting unzip job for' + key) raise e
AWS Glue Job Script to unzip the files
import sys from awsglue.transforms import * from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext from awsglue.context import GlueContext from awsglue.job import Job ## @params: [JOB_NAME] args = getResolvedOptions(sys.argv, ['JOB_NAME','bucket','key'],) sc = SparkContext() glueContext = GlueContext(sc) spark = glueContext.spark_session job = Job(glueContext) job.init(args['JOB_NAME'], args) import boto3 import zipfile import io from contextlib import closing s3 = boto3.client('s3') s3r = boto3.resource('s3') bucket = args["bucket"] key = args["key"] obj = s3r.Object( bucket_name=bucket, key=key ) buffer = io.BytesIO(obj.get()["Body"].read()) z = zipfile.ZipFile(buffer) list = z.namelist() for filerr in list: print(filerr) y=z.open(filerr) arcname = key + filerr x = io.BytesIO(y.read()) s3.upload_fileobj(x, bucket, arcname) y.close() print(list) job.commit()


        
Present in both answers; Present only in the new answer; Present only in the old answer;