Apache Beam Support UTF-8 Encoding on Default, So in my case I received a ANSI Encoding file. In order to process this I used the below logic. So if you receive encoding in any format use the below logic it will resolve your issue.
from apache_beam.coders.coders import Coder
class CustomCoder(Coder):
"""A custom coder used for reading and writing strings as UTF-8."""
def encode(self, value):
return value.encode("utf-8", "replace")
def decode(self, value):
return value.decode("utf-8", "ignore")
def is_deterministic(self):
return True
with beam.Pipeline(options=pipeline_options) as p:
input_data =( p | "Read From GCS" >> beam.io.ReadFromText("input_file_path",coder=CustomCoder()))
Apache Beam Support UTF-8 Encoding on Default, So in my case I received a ANSI Encoding file. In order to process this I used the below logic. So if you receive encoding in any format use the below logic it will resolve your issue.
from apache_beam.coders.coders import Coder
class CustomCoder(Coder):
"""A custom coder used for reading and writing strings as UTF-8."""
def encode(self, value):
return value.encode("utf-8", "replace")
def decode(self, value):
return value.decode("utf-8", "ignore")
def is_deterministic(self):
return True
with beam.Pipeline(options=pipeline_options) as p:
input_data =( p | "Read From GCS" >> beam.io.ReadFromText("input_file_path",coder=CustomCoder()))