scala - Apache Zeppelin cannot deserialize dataset: "NoSuchMethodError" -
i trying use apache zeppelin (0.7.2, net install running locally on mac) explore data loaded s3 bucket. data seems load fine, command:
val p = spark.read.textfile("s3a://sparkcookbook/person")
gives result:
p: org.apache.spark.sql.dataset[string] = [value: string]
however, when try call methods on object p
, error. example:
p.take(1)
results in:
java.lang.noclassdeffounderror: not initialize class org.apache.spark.rdd.rddoperationscope$ @ org.apache.spark.sql.execution.sparkplan.executequery(sparkplan.scala:132) @ org.apache.spark.sql.execution.sparkplan.execute(sparkplan.scala:113) @ org.apache.spark.sql.execution.sparkplan.getbytearrayrdd(sparkplan.scala:225) @ org.apache.spark.sql.execution.sparkplan.executetake(sparkplan.scala:308) @ org.apache.spark.sql.execution.collectlimitexec.executecollect(limit.scala:38) @ org.apache.spark.sql.dataset$$anonfun$org$apache$spark$sql$dataset$$execute$1$1.apply(dataset.scala:2371) @ org.apache.spark.sql.execution.sqlexecution$.withnewexecutionid(sqlexecution.scala:57) @ org.apache.spark.sql.dataset.withnewexecutionid(dataset.scala:2765) @ org.apache.spark.sql.dataset.org$apache$spark$sql$dataset$$execute$1(dataset.scala:2370) @ org.apache.spark.sql.dataset.org$apache$spark$sql$dataset$$collect(dataset.scala:2377) @ org.apache.spark.sql.dataset$$anonfun$head$1.apply(dataset.scala:2113) @ org.apache.spark.sql.dataset$$anonfun$head$1.apply(dataset.scala:2112) @ org.apache.spark.sql.dataset.withtypedcallback(dataset.scala:2795) @ org.apache.spark.sql.dataset.head(dataset.scala:2112) @ org.apache.spark.sql.dataset.take(dataset.scala:2327)
my conf/zeppelin-env.sh
same default, except have amazon access key , secret key environment variables defined there. in spark interpreter in zeppelin notebook, have added following artifacts:
org.apache.hadoop:hadoop-aws:2.7.3 com.amazonaws:aws-java-sdk:1.7.9 com.fasterxml.jackson.core:jackson-core:2.9.0 com.fasterxml.jackson.core:jackson-databind:2.9.0 com.fasterxml.jackson.core:jackson-annotations:2.9.0
(i think first 2 necessary). 2 commands above work fine in spark shell, not in zeppelin notebook (see how use s3 apache spark 2.2 in spark shell how set up).
so seems there problem 1 of jackson libraries. maybe i'm using wrong artifacts above zeppelin interpreter?
update: following advice in proposed answer below, removed jackson
jars came zeppelin, , replaced them following:
jackson-annotations-2.6.0.jar jackson-core-2.6.7.jar jackson-databind-2.6.7.jar
and replaced artifacts these, artifacts now:
org.apache.hadoop:hadoop-aws:2.7.3 com.amazonaws:aws-java-sdk:1.7.9 com.fasterxml.jackson.core:jackson-core:2.6.7 com.fasterxml.jackson.core:jackson-databind:2.6.7 com.fasterxml.jackson.core:jackson-annotations:2.6.0
the error get, however, running above commands same.
udpate2: per removed jackson
libraries list of artifacts, since in jars/
folder - added artifacts aws artifacts above. cleaned classpath entering following in notebook (as per instructions):
%spark.dep z.reset()
i different error now:
val p = spark.read.textfile("s3a://sparkcookbook/person") p.take(1) p: org.apache.spark.sql.dataset[string] = [value: string] java.lang.nosuchmethoderror: com.fasterxml.jackson.module.scala.deser.bigdecimaldeserializer$.handledtype()ljava/lang/class; @ com.fasterxml.jackson.module.scala.deser.numberdeserializers$.<init>(scalanumberdeserializersmodule.scala:49) @ com.fasterxml.jackson.module.scala.deser.numberdeserializers$.<clinit>(scalanumberdeserializersmodule.scala) @ com.fasterxml.jackson.module.scala.deser.scalanumberdeserializersmodule$class.$init$(scalanumberdeserializersmodule.scala:61) @ com.fasterxml.jackson.module.scala.defaultscalamodule.<init>(defaultscalamodule.scala:20) @ com.fasterxml.jackson.module.scala.defaultscalamodule$.<init>(defaultscalamodule.scala:37) @ com.fasterxml.jackson.module.scala.defaultscalamodule$.<clinit>(defaultscalamodule.scala) @ org.apache.spark.rdd.rddoperationscope$.<init>(rddoperationscope.scala:82) @ org.apache.spark.rdd.rddoperationscope$.<clinit>(rddoperationscope.scala) @ org.apache.spark.sql.execution.sparkplan.executequery(sparkplan.scala:132) @ org.apache.spark.sql.execution.sparkplan.execute(sparkplan.scala:113) @ org.apache.spark.sql.execution.sparkplan.getbytearrayrdd(sparkplan.scala:225) @ org.apache.spark.sql.execution.sparkplan.executetake(sparkplan.scala:308) @ org.apache.spark.sql.execution.collectlimitexec.executecollect(limit.scala:38) @ org.apache.spark.sql.dataset$$anonfun$org$apache$spark$sql$dataset$$execute$1$1.apply(dataset.scala:2371) @ org.apache.spark.sql.execution.sqlexecution$.withnewexecutionid(sqlexecution.scala:57) @ org.apache.spark.sql.dataset.withnewexecutionid(dataset.scala:2765)
update3: per suggestion in comment proposed answer below, cleaned class path deleting files in local repo:
rm -rf local-repo/*
i restarted zeppelin server. check class path, executed following in notebook:
val cl = classloader.getsystemclassloader cl.asinstanceof[java.net.urlclassloader].geturls.foreach(println)
this gave following output (i include jackson libraries output here, otherwise output long paste):
... file:/users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/local-repo/2ct9cpaa9/jackson-annotations-2.1.1.jar file:/users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/local-repo/2ct9cpaa9/jackson-annotations-2.2.3.jar file:/users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/local-repo/2ct9cpaa9/jackson-core-2.1.1.jar file:/users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/local-repo/2ct9cpaa9/jackson-core-2.2.3.jar file:/users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/local-repo/2ct9cpaa9/jackson-core-asl-1.9.13.jar file:/users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/local-repo/2ct9cpaa9/jackson-databind-2.1.1.jar file:/users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/local-repo/2ct9cpaa9/jackson-databind-2.2.3.jar file:/users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/local-repo/2ct9cpaa9/jackson-jaxrs-1.9.13.jar file:/users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/local-repo/2ct9cpaa9/jackson-mapper-asl-1.9.13.jar file:/users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/local-repo/2ct9cpaa9/jackson-xc-1.9.13.jar file:/users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/lib/jackson-annotations-2.6.0.jar file:/users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/lib/jackson-core-2.6.7.jar file:/users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/lib/jackson-databind-2.6.7.jar file:/users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/jackson-annotations-2.6.5.jar file:/users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/jackson-core-2.6.5.jar file:/users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/jackson-core-asl-1.9.13.jar file:/users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/jackson-databind-2.6.5.jar file:/users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/jackson-mapper-asl-1.9.13.jar file:/users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/jackson-annotations-2.6.5.jar file:/users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/jackson-core-2.6.5.jar file:/users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/jackson-core-asl-1.9.13.jar file:/users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/jackson-databind-2.6.5.jar file:/users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/jackson-jaxrs-1.9.13.jar file:/users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/jackson-mapper-asl-1.9.13.jar file:/users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/jackson-module-paranamer-2.6.5.jar file:/users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/jackson-module-scala_2.11-2.6.5.jar file:/users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/jackson-xc-1.9.13.jar file:/users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/json4s-jackson_2.11-3.2.11.jar file:/users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/parquet-jackson-1.8.1.jar ...
it seems multiple versions fetched repo. should exclude older versions? if so, how do that?
use jar versions;
aws-java-sdk-1.7.4.jar
hadoop-aws-2.6.0.jar
like in script : https://github.com/2dmitrypavlov/sparkdocker/blob/master/zeppelin.sh not use package download jars , put them in path, let's in "/root/jars/" edit zeppelin-env.sh; run command zeppelin/conf dir;
echo 'export spark_submit_options="--jars /root/jars/mysql-connector-java-5.1.39.jar,/root/jars/aws-java-sdk-1.7.4.jar,/root/jars/hadoop-aws-2.6.0.jar"'>>zeppelin-env.sh
after restart zeppelin.
the code @ link above pasted below (just in case link becomes stale):
#!/bin/bash # download jars cd /root/jars wget http://central.maven.org/maven2/mysql/mysql-connector-java/5.1.39/mysql-connector-java-5.1.39.jar cd /usr/share/ wget http://archive.apache.org/dist/zeppelin/zeppelin-0.7.1/zeppelin-0.7.1-bin-all.tgz tar -zxvf zeppelin-0.7.1-bin-all.tgz cd zeppelin-0.7.1-bin-all/conf cp zeppelin-env.sh.template zeppelin-env.sh echo 'export master=spark://'$masterz':7077'>>zeppelin-env.sh echo 'export spark_submit_options="--jars /root/jars/mysql-connector-java-5.1.39.jar,/root/jars/aws-java-sdk-1.7.4.jar,/root/jars/hadoop-aws-2.6.0.jar"'>>zeppelin-env.sh echo 'export zeppelin_notebook_storage="org.apache.zeppelin.notebook.repo.vfsnotebookrepo, org.apache.zeppelin.notebook.repo.zeppelinhub.zeppelinhubrepo"'>>zeppelin-env.sh echo 'export zeppelinhub_api_address="https://www.zeppelinhub.com"'>>zeppelin-env.sh echo 'export zeppelin_port=9999'>>zeppelin-env.sh echo 'export spark_home=/usr/share/spark'>>zeppelin-env.sh cd ../bin/ ./zeppelin.sh
Comments
Post a Comment