scala - Apache Zeppelin cannot deserialize dataset: "NoSuchMethodError" -


i trying use apache zeppelin (0.7.2, net install running locally on mac) explore data loaded s3 bucket. data seems load fine, command:

val p = spark.read.textfile("s3a://sparkcookbook/person") 

gives result:

p: org.apache.spark.sql.dataset[string] = [value: string] 

however, when try call methods on object p, error. example:

p.take(1) 

results in:

java.lang.noclassdeffounderror: not initialize class org.apache.spark.rdd.rddoperationscope$   @ org.apache.spark.sql.execution.sparkplan.executequery(sparkplan.scala:132)   @ org.apache.spark.sql.execution.sparkplan.execute(sparkplan.scala:113)   @ org.apache.spark.sql.execution.sparkplan.getbytearrayrdd(sparkplan.scala:225)   @ org.apache.spark.sql.execution.sparkplan.executetake(sparkplan.scala:308)   @ org.apache.spark.sql.execution.collectlimitexec.executecollect(limit.scala:38)   @ org.apache.spark.sql.dataset$$anonfun$org$apache$spark$sql$dataset$$execute$1$1.apply(dataset.scala:2371)   @ org.apache.spark.sql.execution.sqlexecution$.withnewexecutionid(sqlexecution.scala:57)   @ org.apache.spark.sql.dataset.withnewexecutionid(dataset.scala:2765)   @ org.apache.spark.sql.dataset.org$apache$spark$sql$dataset$$execute$1(dataset.scala:2370)   @ org.apache.spark.sql.dataset.org$apache$spark$sql$dataset$$collect(dataset.scala:2377)   @ org.apache.spark.sql.dataset$$anonfun$head$1.apply(dataset.scala:2113)   @ org.apache.spark.sql.dataset$$anonfun$head$1.apply(dataset.scala:2112)   @ org.apache.spark.sql.dataset.withtypedcallback(dataset.scala:2795)   @ org.apache.spark.sql.dataset.head(dataset.scala:2112)   @ org.apache.spark.sql.dataset.take(dataset.scala:2327) 

my conf/zeppelin-env.sh same default, except have amazon access key , secret key environment variables defined there. in spark interpreter in zeppelin notebook, have added following artifacts:

org.apache.hadoop:hadoop-aws:2.7.3   com.amazonaws:aws-java-sdk:1.7.9     com.fasterxml.jackson.core:jackson-core:2.9.0    com.fasterxml.jackson.core:jackson-databind:2.9.0    com.fasterxml.jackson.core:jackson-annotations:2.9.0 

(i think first 2 necessary). 2 commands above work fine in spark shell, not in zeppelin notebook (see how use s3 apache spark 2.2 in spark shell how set up).

so seems there problem 1 of jackson libraries. maybe i'm using wrong artifacts above zeppelin interpreter?

update: following advice in proposed answer below, removed jackson jars came zeppelin, , replaced them following:

jackson-annotations-2.6.0.jar jackson-core-2.6.7.jar jackson-databind-2.6.7.jar 

and replaced artifacts these, artifacts now:

org.apache.hadoop:hadoop-aws:2.7.3   com.amazonaws:aws-java-sdk:1.7.9     com.fasterxml.jackson.core:jackson-core:2.6.7    com.fasterxml.jackson.core:jackson-databind:2.6.7    com.fasterxml.jackson.core:jackson-annotations:2.6.0 

the error get, however, running above commands same.

udpate2: per removed jackson libraries list of artifacts, since in jars/ folder - added artifacts aws artifacts above. cleaned classpath entering following in notebook (as per instructions):

%spark.dep z.reset() 

i different error now:

val p = spark.read.textfile("s3a://sparkcookbook/person") p.take(1)  p: org.apache.spark.sql.dataset[string] = [value: string] java.lang.nosuchmethoderror: com.fasterxml.jackson.module.scala.deser.bigdecimaldeserializer$.handledtype()ljava/lang/class;   @ com.fasterxml.jackson.module.scala.deser.numberdeserializers$.<init>(scalanumberdeserializersmodule.scala:49)   @ com.fasterxml.jackson.module.scala.deser.numberdeserializers$.<clinit>(scalanumberdeserializersmodule.scala)   @ com.fasterxml.jackson.module.scala.deser.scalanumberdeserializersmodule$class.$init$(scalanumberdeserializersmodule.scala:61)   @ com.fasterxml.jackson.module.scala.defaultscalamodule.<init>(defaultscalamodule.scala:20)   @ com.fasterxml.jackson.module.scala.defaultscalamodule$.<init>(defaultscalamodule.scala:37)   @ com.fasterxml.jackson.module.scala.defaultscalamodule$.<clinit>(defaultscalamodule.scala)   @ org.apache.spark.rdd.rddoperationscope$.<init>(rddoperationscope.scala:82)   @ org.apache.spark.rdd.rddoperationscope$.<clinit>(rddoperationscope.scala)   @ org.apache.spark.sql.execution.sparkplan.executequery(sparkplan.scala:132)   @ org.apache.spark.sql.execution.sparkplan.execute(sparkplan.scala:113)   @ org.apache.spark.sql.execution.sparkplan.getbytearrayrdd(sparkplan.scala:225)   @ org.apache.spark.sql.execution.sparkplan.executetake(sparkplan.scala:308)   @ org.apache.spark.sql.execution.collectlimitexec.executecollect(limit.scala:38)   @ org.apache.spark.sql.dataset$$anonfun$org$apache$spark$sql$dataset$$execute$1$1.apply(dataset.scala:2371)   @ org.apache.spark.sql.execution.sqlexecution$.withnewexecutionid(sqlexecution.scala:57)   @ org.apache.spark.sql.dataset.withnewexecutionid(dataset.scala:2765) 

update3: per suggestion in comment proposed answer below, cleaned class path deleting files in local repo:

rm -rf local-repo/* 

i restarted zeppelin server. check class path, executed following in notebook:

val cl = classloader.getsystemclassloader cl.asinstanceof[java.net.urlclassloader].geturls.foreach(println) 

this gave following output (i include jackson libraries output here, otherwise output long paste):

... file:/users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/local-repo/2ct9cpaa9/jackson-annotations-2.1.1.jar file:/users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/local-repo/2ct9cpaa9/jackson-annotations-2.2.3.jar file:/users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/local-repo/2ct9cpaa9/jackson-core-2.1.1.jar file:/users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/local-repo/2ct9cpaa9/jackson-core-2.2.3.jar file:/users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/local-repo/2ct9cpaa9/jackson-core-asl-1.9.13.jar file:/users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/local-repo/2ct9cpaa9/jackson-databind-2.1.1.jar file:/users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/local-repo/2ct9cpaa9/jackson-databind-2.2.3.jar file:/users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/local-repo/2ct9cpaa9/jackson-jaxrs-1.9.13.jar file:/users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/local-repo/2ct9cpaa9/jackson-mapper-asl-1.9.13.jar file:/users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/local-repo/2ct9cpaa9/jackson-xc-1.9.13.jar file:/users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/lib/jackson-annotations-2.6.0.jar file:/users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/lib/jackson-core-2.6.7.jar file:/users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/lib/jackson-databind-2.6.7.jar file:/users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/jackson-annotations-2.6.5.jar file:/users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/jackson-core-2.6.5.jar file:/users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/jackson-core-asl-1.9.13.jar file:/users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/jackson-databind-2.6.5.jar file:/users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/jackson-mapper-asl-1.9.13.jar file:/users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/jackson-annotations-2.6.5.jar file:/users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/jackson-core-2.6.5.jar file:/users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/jackson-core-asl-1.9.13.jar file:/users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/jackson-databind-2.6.5.jar file:/users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/jackson-jaxrs-1.9.13.jar file:/users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/jackson-mapper-asl-1.9.13.jar file:/users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/jackson-module-paranamer-2.6.5.jar file:/users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/jackson-module-scala_2.11-2.6.5.jar file:/users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/jackson-xc-1.9.13.jar file:/users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/json4s-jackson_2.11-3.2.11.jar file:/users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/parquet-jackson-1.8.1.jar ... 

it seems multiple versions fetched repo. should exclude older versions? if so, how do that?

use jar versions;

aws-java-sdk-1.7.4.jar

hadoop-aws-2.6.0.jar

like in script : https://github.com/2dmitrypavlov/sparkdocker/blob/master/zeppelin.sh not use package download jars , put them in path, let's in "/root/jars/" edit zeppelin-env.sh; run command zeppelin/conf dir;

echo 'export spark_submit_options="--jars /root/jars/mysql-connector-java-5.1.39.jar,/root/jars/aws-java-sdk-1.7.4.jar,/root/jars/hadoop-aws-2.6.0.jar"'>>zeppelin-env.sh

after restart zeppelin.

the code @ link above pasted below (just in case link becomes stale):

#!/bin/bash # download jars cd /root/jars wget http://central.maven.org/maven2/mysql/mysql-connector-java/5.1.39/mysql-connector-java-5.1.39.jar  cd /usr/share/ wget http://archive.apache.org/dist/zeppelin/zeppelin-0.7.1/zeppelin-0.7.1-bin-all.tgz tar -zxvf zeppelin-0.7.1-bin-all.tgz cd zeppelin-0.7.1-bin-all/conf cp zeppelin-env.sh.template zeppelin-env.sh echo 'export master=spark://'$masterz':7077'>>zeppelin-env.sh echo 'export spark_submit_options="--jars /root/jars/mysql-connector-java-5.1.39.jar,/root/jars/aws-java-sdk-1.7.4.jar,/root/jars/hadoop-aws-2.6.0.jar"'>>zeppelin-env.sh echo 'export zeppelin_notebook_storage="org.apache.zeppelin.notebook.repo.vfsnotebookrepo, org.apache.zeppelin.notebook.repo.zeppelinhub.zeppelinhubrepo"'>>zeppelin-env.sh echo 'export zeppelinhub_api_address="https://www.zeppelinhub.com"'>>zeppelin-env.sh echo 'export zeppelin_port=9999'>>zeppelin-env.sh echo 'export spark_home=/usr/share/spark'>>zeppelin-env.sh  cd ../bin/ ./zeppelin.sh 

Comments

Popular posts from this blog

ubuntu - PHP script to find files of certain extensions in a directory, returns populated array when run in browser, but empty array when run from terminal -

php - How can i create a user dashboard -

javascript - How to detect toggling of the fullscreen-toolbar in jQuery Mobile? -