Change datatype of a column in a spark RDD to date and query on it -

- March 15, 2010

by default when loading data, every column being considered string type. data looks like:

firstname,lastname,age,doj dileep,gog,21,2016-01-01 avishek,ganguly,21,2016-01-02 shreyas,t,20,2016-01-03

after updating schema of rdd looks like

temp.printschema |-- firstname: string (nullable = true) |-- lastname: string (nullable = true) |-- age: string (nullable = true) |-- doj: date (nullable = true)

registered temporary table , queried on

temp.registertemptable("temptable");  val temp1 = sqlcontext.sql("select * temptable")  temp1.show() +---------+--------+---+----------+ |firstname|lastname|age|       doj| +---------+--------+---+----------+ |   dileep|     gog| 21|2016-01-01| |  avishek| ganguly| 21|2016-01-02| |  shreyas|       t| 20|2016-01-03| +---------+--------+---+----------+  val temp2 = sqlcontext.sql("select * temptable doj > cast('2016-01-02' date)")

but when trying see result giving me:

temp2: org.apache.spark.sql.dataframe = [firstname: string, lastname: string, age: string, doj: date]

when do

temp2.show() java.lang.classcastexception: java.lang.string cannot cast java.lang.integer

so have tried code , works me. suspect problem in how change schema initially, looks off me (granted little hard read when post in comment - should update question code instead).

anyway, have done way:

first simulating input:

val df = sc.parallelize(list(("dileep","gog","21","2016-01-01"), ("avishek","ganguly","21","2016-01-02"), ("shreyas","t","20","2016-01-03"))).todf("firstname", "lastname", "age", "doj")

then:

import org.apache.spark.sql.functions._  val temp = df.withcolumn("doj", to_date('doj)) temp.registertemptable("temptable"); val temp2 = sqlcontext.sql("select * temptable doj > cast('2016-01-02' date)")

doing temp2.show() reveals expected:

+---------+--------+---+----------+ |firstname|lastname|age|       doj| +---------+--------+---+----------+ |  shreyas|       t| 20|2016-01-03| +---------+--------+---+----------+

Search This Blog

If cop

Change datatype of a column in a spark RDD to date and query on it -

Comments

Post a Comment

Popular posts from this blog

Android volley - avoid multiple requests of the same kind to the server? -

magento2 - Magento 2 admin grid add filter to collection -

Combining PHP Registration and Login into one class with multiple functions in one PHP file -