How to cast string dataype to date in spark sql join condition: to_date not working and cast is throwing error

by Tejas   Last Updated May 15, 2019 17:26 PM

I'm joining two tables in redshift using pyspark. The join condition is between date in one table with date in string format and of datatype varchar in other table

sales:

Sales    id    tx_date
123.5    1234  2019-02-15
154      1235  2019-02-14

datainfo :

Date      Multiplier 
02/15/19  2
02/14/19  3

And my spark sql is

data = spark.sql(""" select s.sales,s.id, ,d.multiplier from  sales s inner join
                    dateinfo d on s.tx_date=to_date(d.date,'mm/dd/yy')""")

this returns empty values for multiplier column

Sales    id    multiplier
123.5    1234  
154      1235 

I've tried to cast the date

data = spark.sql(""" select s.sales,s.id, ,d.multiplier from  sales s inner join
                        dateinfo d on s.tx_date=to_date(cast(unix_timestamp(d.date,'mm/dd/yy')))"""

This throws error AnalysisException: u'Invalid number of arguments for function cast;

How should i go about using date conversion in spark-sql



Related Questions


JDBC Errors Connecting Zeppelin Pyspark to Redshift

Updated February 22, 2019 18:26 PM