pyspark - Spark function for loading parquet file on memory -

i have rdd loaded parquet file using sparksql

data_rdd = sqlcontext.read.parquet(filename).rdd

i have noticed actual reading file operation gets executed once there aggregation function triggering spark job.

i need measure computation time of job without time takes read data file. (i.e. same input rdd(dataframe) there because created sparksql)

is there function triggers loading of file on executors memory?

i have tried .cache() seems it's still triggering reading operation part of job.

spark lazy , computations needs. can .cache() .count() lines:

data_rdd = sqlcontext.read.parquet(filename).rdd data_rdd.cache() data_rdd.count()

any set of computations follow start cached state of data_rdd since read whole table using count().

Force Net