python - PySpark: PicklingError: Could not serialize object: TypeError: can't pickle CompiledFFI objects -


i'm new pyspark environment , came across error while trying encrypt data in rdd cryptography module. here's code:

from pyspark.sql import sparksession spark = sparksession.builder.appname('encrypt').getorcreate()  df = spark.read.csv('test.csv', inferschema = true, header = true) df.show() df.printschema()  cryptography.fernet import fernet key = fernet.generate_key() f = fernet(key)  dfrdd = df.rdd print(dfrdd) mappedrdd = dfrdd.map(lambda value: (value[0], str(f.encrypt(str.encode(value[1]))), value[2] * 100)) data = mappedrdd.todf() data.show() 

everything works fine of course until try mapping value[1] str(f.encrypt(str.encode(value[1]))). receive following error:

picklingerror: not serialize object: typeerror: can't pickle compiledffi objects

i have not seen many resources referring error , wanted see if else has encountered (or if via pyspark have recommended approach column encryption).

recommended approach column encryption

you may consider hive built-in encryption (hive-5207, hive-6329) limited @ moment (hive-7934).

your current code doesn't work because fernet objects not serializable. can make work distributing keys:

def f(value, key=key):      return value[0], str(fernet(key).encrypt(str.encode(value[1]))), value[2] * 100  mappedrdd = dfrdd.map(f) 

or

def g(values, key=key):     f = fernet(key)     value in values:         yield value[0], str(f.encrypt(str.encode(value[1]))), value[2] * 100  mappedrdd = dfrdd.mappartitions(g) 

Comments

Popular posts from this blog

ubuntu - PHP script to find files of certain extensions in a directory, returns populated array when run in browser, but empty array when run from terminal -

php - How can i create a user dashboard -

javascript - How to detect toggling of the fullscreen-toolbar in jQuery Mobile? -