Convert week 2020531 to unix_timestamp

Hello Scala’s gurus.

I try to convert my week 53 (year: 2020, day:1) to unix_timestamp.

val miRdd = Seq(((202053+0)*10+1).toString()).toDF("init")
val fecha = miRdd.select(unix_timestamp($"init","yyyywwu"))
fecha.show(false)

The output are nulls:
Output 2020531

I have the same result with value 202101:
scala> val miRdd = Seq(((202101+0)*10+1).toString()).toDF(“init”)
miRdd: org.apache.spark.sql.DataFrame = [init: string]

scala> val fecha = miRdd.select(unix_timestamp($"init","yyyywwu"))
fecha: org.apache.spark.sql.DataFrame = [unix_timestamp(init, yyyywwu): bigint]

scala> fecha.show(false)

±----------------------------+
|unix_timestamp(init, yyyywwu)|
±----------------------------+
|null |
±----------------------------+

Perhaps, If I try with 202102 value, I get the convert value:
scala> val miRdd = Seq(((202102+0)*10+1).toString()).toDF(“init”)
miRdd: org.apache.spark.sql.DataFrame = [init: string]

scala> val fecha = miRdd.select(unix_timestamp($"init","yyyywwu"))
fecha: org.apache.spark.sql.DataFrame = [unix_timestamp(init, yyyywwu): bigint]

scala> fecha.show(false)

±----------------------------+
|unix_timestamp(init, yyyywwu)|
±----------------------------+
|1609740000 |
±----------------------------+

Is there a problem with weeks 53 and 01?
I use Scala version 2.11.12

Thank you for your support.
Best regards, Victor.

Hi.

This forum is mainly for questions regarding Scala, the language. Your question is specifically about Spark, a library written in Scala.

Usually, here are not many people using / having experience with Spark, so I’d suggest asking elsewhere. All I can see from the docs (https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html), there is no w or u pattern letter… ?!

You see, no expert here. Better ask at Stackoverflow or some forum for Spark.

The problem may be with Java’s GregorianCalendar. I can reproduce your last example with the following code:
static void test2() {
long goal = 1609740000;
GregorianCalendar gc = new GregorianCalendar(2021, 0, 1);
gc.setTimeZone(TimeZone.getTimeZone(“GMT-6”));
gc.setWeekDate(2021, 2, Calendar.SUNDAY);
gc.add(Calendar.DAY_OF_YEAR, 1);
System.out.println("Unix Time Stamp?: " + gc.getTimeInMillis()/1000);
}

However, setting WEEK_OF_YEAR to 53 and WeekYear to 2020 in the above code:
gc.setWeekDate(2020, 53, Calendar.SUNDAY);
produces mostly garage EXCEPT for DAY_OF_YEAR: 363. It is entirely possible that the 53rd week of 2020 (the first week of 2021) starts of the 363rd day of 2020, but I am not sure of that.

GregorianCalendar is the Swiss Army Knife of date and time classes, and this is the first time it has failed me; 53 is a valid input to WEEK_OF_YEAR. Consider checking out the Java documentation ([https://www.oracle.com/java/technologies/javase-jre8-downloads.html]) for GregorianCalendar on the Oracle website. Also see Java’s other Date and Time classes. These are relatively new, and very, very nice. I read a tutorial of them, and I may have found it on the Oracle website.

This depends on your environment. What is java.util.Locale.getDefault returning?

For example, with a german locale, there is no 53th week in 2021. (1st week is from 4th to 10th January).

And why are you using Sunday as the day-of-week? AFAICS, 1 means Monday. (u Day number of week (1 = Monday, ..., 7 = Sunday) )

BTW, javax.time was introduced in Java 8 almost 7 years ago. You should not use the old Java Date / Calendar classes anymore. See here: https://programminghints.com/2017/05/still-using-java-util-date-dont/

Thank you cbley for your replys.

Do you consider that Spark doesn´t know how treat with week 53?

Best regards, Victor.

I just took a quick look and it seems they have removed week based time parsing support in version 3.x – because of the inherent inaccuracies arising from different handling depending on the locale or whatnot.

Looking at the Spark 2 docs, they say that they simply use the SimpleDateFormat.

This code:

import java.text.SimpleDateFormat
import java.time.Instant
import java.util.Date

val formatter = new SimpleDateFormat("ww")
val date = Instant.parse("2020-12-28T00:00:00Z")

formatter.format(Date.from(date)) // returns "01"

seems to suggest that 53 is not a valid week number for this date, or at least is ambiguous since it’s also the first week of 2021.