python - PySpark: PicklingError: Could not serialize object: TypeError: can't pickle CompiledFFI objects -
i'm new pyspark environment , came across error while trying encrypt data in rdd cryptography module. here's code:
from pyspark.sql import sparksession spark = sparksession.builder.appname('encrypt').getorcreate() df = spark.read.csv('test.csv', inferschema = true, header = true) df.show() df.printschema() cryptography.fernet import fernet key = fernet.generate_key() f = fernet(key) dfrdd = df.rdd print(dfrdd) mappedrdd = dfrdd.map(lambda value: (value[0], str(f.encrypt(str.encode(value[1]))), value[2] * 100)) data = mappedrdd.todf() data.show()
everything works fine of course until try mapping value[1]
str(f.encrypt(str.encode(value[1])))
. receive following error:
picklingerror: not serialize object: typeerror: can't pickle compiledffi objects
i have not seen many resources referring error , wanted see if else has encountered (or if via pyspark have recommended approach column encryption).
recommended approach column encryption
you may consider hive built-in encryption (hive-5207, hive-6329) limited @ moment (hive-7934).
your current code doesn't work because fernet
objects not serializable. can make work distributing keys:
def f(value, key=key): return value[0], str(fernet(key).encrypt(str.encode(value[1]))), value[2] * 100 mappedrdd = dfrdd.map(f)
or
def g(values, key=key): f = fernet(key) value in values: yield value[0], str(f.encrypt(str.encode(value[1]))), value[2] * 100 mappedrdd = dfrdd.mappartitions(g)
Comments
Post a Comment