scala - Spark Sql: Loading the file from excel sheet (with extension .xlsx) can not infer the schema of a date-type column properly -


i have xlsx file containing date/time filed (my time) in following format , sample records -

5/16/2017 12:19:00 5/16/2017 12:56:00 5/16/2017 1:17:00 pm 5/16/2017 5:26:00 pm 5/16/2017 6:26:00 pm 

i reading xlsx file in following manner: -

val inputdf = spark.sqlcontext.read.format("com.crealytics.spark.excel")     .option("location","file:///c:/users/file.xlsx")     .option("useheader","true")     .option("treatemptyvaluesasnulls","true")     .option("inferschema","true")     .option("addcolorcolumns","false")     .load() 

when try schema using: -

inputdf.printschema() 

, double. sometimes, schema string. , when print data, output as: -

------------------ time ------------------ 42871.014189814814 42871.03973379629 42871.553773148145 42871.72765046296 42871.76887731482 ------------------ 

above output not correct given input.

moreover, if convert xlsx file in csv format , read it, output correctly. here way how read in csv format: -

spark.sqlcontext.read.format("csv")       .option("header", "true")       .option("inferschema", true)       .load("file:///c:/users/file.xlsx") 

so, in regard, how infer correct schema of column of type date.

note:- spark version 2.0.0 language used scala


Comments

Popular posts from this blog

python - Operations inside variables -

Generic Map Parameter java -

arrays - What causes a java.lang.ArrayIndexOutOfBoundsException and how do I prevent it? -