scala - Spark Sql: Loading the file from excel sheet (with extension .xlsx) can not infer the schema of a date-type column properly -
i have xlsx file containing date/time filed (my time) in following format , sample records -
5/16/2017 12:19:00 5/16/2017 12:56:00 5/16/2017 1:17:00 pm 5/16/2017 5:26:00 pm 5/16/2017 6:26:00 pm i reading xlsx file in following manner: -
val inputdf = spark.sqlcontext.read.format("com.crealytics.spark.excel") .option("location","file:///c:/users/file.xlsx") .option("useheader","true") .option("treatemptyvaluesasnulls","true") .option("inferschema","true") .option("addcolorcolumns","false") .load() when try schema using: -
inputdf.printschema() , double. sometimes, schema string. , when print data, output as: -
------------------ time ------------------ 42871.014189814814 42871.03973379629 42871.553773148145 42871.72765046296 42871.76887731482 ------------------ above output not correct given input.
moreover, if convert xlsx file in csv format , read it, output correctly. here way how read in csv format: -
spark.sqlcontext.read.format("csv") .option("header", "true") .option("inferschema", true) .load("file:///c:/users/file.xlsx") so, in regard, how infer correct schema of column of type date.
note:- spark version 2.0.0 language used scala
Comments
Post a Comment