Object

com.github.mrpowers.spark.daria.sql

functions

Related Doc: package sql

Permalink

object functions

Spark [has a ton of SQL functions](https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/functions.html) and spark-daria is meant to fill in any gaps.

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. functions
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. def antiTrim(col: Column): Column

    Permalink

    Deletes inner whitespace and leaves leading and trailing whitespace

    Deletes inner whitespace and leaves leading and trailing whitespace

    val actualDF = sourceDF.withColumn(
      "some_string_anti_trimmed",
      antiTrim(col("some_string"))
    )

    Removes all inner whitespace, but doesn't delete leading or trailing whitespace (e.g. changes " this has some " to " thishassome ".

  5. def arrayExNull(cols: Column*): Column

    Permalink

    Like array() function but doesn't include null elements

  6. def array_filter_nulls[T]()(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[T]): UserDefinedFunction

    Permalink
  7. def array_groupBy[T](f: (T) ⇒ Boolean)(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[T]): UserDefinedFunction

    Permalink
  8. def array_map_ex_null[T](f: (T) ⇒ T)(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[T]): UserDefinedFunction

    Permalink
  9. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  10. def beginningOfMonth(colName: String): Column

    Permalink
  11. def beginningOfMonthDate(col: Column): Column

    Permalink
  12. def beginningOfMonthTime(col: Column): Column

    Permalink
  13. def beginningOfWeek(col: Column, lastDayOfWeek: String = "Sat"): Column

    Permalink
  14. def broadcastArrayContains[T](col: Column, broadcastedArray: Broadcast[Array[T]]): Column

    Permalink
  15. def bucketFinder(col: Column, buckets: Array[(Any, Any)], inclusiveBoundries: Boolean = false, lowestBoundLte: Boolean = false, highestBoundGte: Boolean = false): Column

    Permalink
  16. val capitalizeFully: UserDefinedFunction

    Permalink

    Like initcap, but factors in additional delimiters

  17. def capitalizeFullyFun(colName: String, delimiters: String): Option[String]

    Permalink
  18. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  19. def dayOfWeekStr(col: Column): Column

    Permalink
  20. def endOfMonthDate(col: Column): Column

    Permalink
  21. def endOfWeek(col: Column, lastDayOfWeek: String = "Sat"): Column

    Permalink
  22. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  23. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  24. def excelEpochToDate(colName: String): Column

    Permalink

    Convert an Excel epoch to date.

    Convert an Excel epoch to date. Let's see how it works: Suppose we have the following testDF

    +-----------------+
    |       excel_time|
    +-----------------+
    |43967.24166666666|
    |33966.78333333378|
    |43965.58383363244|
    |33964.58393533934|
    +-----------------+

    We can run the excelEpochToDate function as follows:

    import com.github.mrpowers.spark.daria.sql.functions._
    
    val actualDF = testDF
      .withColumn("timestamp", excelEpochToDate("excel_time"))
    
    actualDf.show()
    
    +-----------------+----------+
    |       excel_time|      date|
    +-----------------+----------+
    |43967.24166666666|2020-05-16|
    |33966.78333333378|1992-12-28|
    |43965.58383363244|2020-05-14|
    |33964.58393533934|1992-12-26|
    +-----------------+----------+
  25. def excelEpochToDate(col: Column): Column

    Permalink

    Convert an Excel epoch to date.

    Convert an Excel epoch to date. Let's see how it works: Suppose we have the following testDF

    +-----------------+
    |       excel_time|
    +-----------------+
    |43967.24166666666|
    |33966.78333333378|
    |43965.58383363244|
    |33964.58393533934|
    +-----------------+

    We can run the excelEpochToDate function as follows:

    import com.github.mrpowers.spark.daria.sql.functions._
    
    val actualDF = testDF
      .withColumn("timestamp", excelEpochToDate(col("excel_time")))
    
    actualDf.show()
    
    +-----------------+----------+
    |       excel_time|      date|
    +-----------------+----------+
    |43967.24166666666|2020-05-16|
    |33966.78333333378|1992-12-28|
    |43965.58383363244|2020-05-14|
    |33964.58393533934|1992-12-26|
    +-----------------+----------+
  26. def excelEpochToTimestamp(colName: String): Column

    Permalink

    Convert an Excel epoch to timestamp.

    Convert an Excel epoch to timestamp. Inspired by [Filip Czaja]{http://fczaja.blogspot.com/2011/06/convert-excel-date-into-timestamp.html} Let's see how it works: Suppose we have the following testDF

    +-----------------+
    |       excel_time|
    +-----------------+
    |43967.24166666666|
    |33966.78333333378|
    |43965.58383363244|
    |33964.58393533934|
    +-----------------+

    We can run the excelEpochToTimestamp function as follows:

    import com.github.mrpowers.spark.daria.sql.functions._
    
    val actualDF = testDF
      .withColumn("timestamp", excelEpochToTimestamp("excel_time"))
    
    actualDf.show()
    
    +-----------------+-------------------+
    |       excel_time|          timestamp|
    +-----------------+-------------------+
    |43967.24166666666|2020-05-16 05:47:59|
    |33966.78333333378|1992-12-28 18:48:00|
    |43965.58383363244|2020-05-14 14:00:43|
    |33964.58393533934|1992-12-26 14:00:52|
    +-----------------+-------------------+
  27. def excelEpochToTimestamp(col: Column): Column

    Permalink

    Convert an Excel epoch to timestamp.

    Convert an Excel epoch to timestamp. Inspired by [Filip Czaja]{http://fczaja.blogspot.com/2011/06/convert-excel-date-into-timestamp.html} Let's see how it works: Suppose we have the following testDF

    +-----------------+
    |       excel_time|
    +-----------------+
    |43967.24166666666|
    |33966.78333333378|
    |43965.58383363244|
    |33964.58393533934|
    +-----------------+

    We can run the excelEpochToTimestamp function as follows:

    import com.github.mrpowers.spark.daria.sql.functions._
    
    val actualDF = testDF
      .withColumn("timestamp", excelEpochToTimestamp(col("excel_time")))
    
    actualDf.show()
    
    +-----------------+-------------------+
    |       excel_time|          timestamp|
    +-----------------+-------------------+
    |43967.24166666666|2020-05-16 05:47:59|
    |33966.78333333378|1992-12-28 18:48:00|
    |43965.58383363244|2020-05-14 14:00:43|
    |33964.58393533934|1992-12-26 14:00:52|
    +-----------------+-------------------+
  28. def excelEpochToUnixTimestamp(colName: String): Column

    Permalink

    Convert an Excel epoch to unix timestamp.

    Convert an Excel epoch to unix timestamp. Inspired by [Filip Czaja]{http://fczaja.blogspot.com/2011/06/convert-excel-date-into-timestamp.html} Let's see how it works: Suppose we have the following testDF

    +-----------------+
    |       excel_time|
    +-----------------+
    |43967.24166666666|
    |33966.78333333378|
    |43965.58383363244|
    |33964.58393533934|
    +-----------------+

    We can run the excelEpochToUnixTimestamp function as follows:

    import com.github.mrpowers.spark.daria.sql.functions._
    
    val actualDF = testDF
      .withColumn("num_years", excelEpochToUnixTimestamp("excel_time"))
    
    actualDf.show()
    
    +-----------------+--------------------+
    |       excel_time|      unix_timestamp|
    +-----------------+--------------------+
    |43967.24166666666|1.5896080799999995E9|
    |33966.78333333378| 7.255684800000383E8|
    |43965.58383363244|1.5894648432258427E9|
    |33964.58393533934| 7.253784520133189E8|
    +-----------------+--------------------+
  29. def excelEpochToUnixTimestamp(col: Column): Column

    Permalink

    Convert an Excel epoch to unix timestamp.

    Convert an Excel epoch to unix timestamp. Inspired by [Filip Czaja]{http://fczaja.blogspot.com/2011/06/convert-excel-date-into-timestamp.html} Let's see how it works: Suppose we have the following testDF

    +-----------------+
    |       excel_time|
    +-----------------+
    |43967.24166666666|
    |33966.78333333378|
    |43965.58383363244|
    |33964.58393533934|
    +-----------------+

    We can run the excelEpochToUnixTimestamp function as follows:

    import com.github.mrpowers.spark.daria.sql.functions._
    
    val actualDF = testDF
      .withColumn("num_years", excelEpochToUnixTimestamp(col("excel_time")))
    
    actualDf.show()
    
    +-----------------+--------------------+
    |       excel_time|      unix_timestamp|
    +-----------------+--------------------+
    |43967.24166666666|1.5896080799999995E9|
    |33966.78333333378| 7.255684800000383E8|
    |43965.58383363244|1.5894648432258427E9|
    |33964.58393533934| 7.253784520133189E8|
    +-----------------+--------------------+
  30. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  31. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  32. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  33. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  34. val isLuhnNumber: UserDefinedFunction

    Permalink
  35. def multiEquals[T](value: T, cols: Column*)(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[T]): Column

    Permalink

    Returns true if multiple columns are equal to a given value

    Returns true if multiple columns are equal to a given value

    Returns true if multiple columns are equal to a value.

    Suppose we have the following sourceDF:

    +---+---+
    | s1| s2|
    +---+---+
    |cat|cat|
    |cat|dog|
    |pig|pig|
    +---+---+

    We can use the multiEquals function to see if multiple columns are equal to "cat".

    val actualDF = sourceDF.withColumn(
      "are_s1_and_s2_cat",
      multiEquals[String]("cat", col("s1"), col("s2"))
    )
    
    actualDF.show()
    
    +---+---+-----------------+
    | s1| s2|are_s1_and_s2_cat|
    +---+---+-----------------+
    |cat|cat|             true|
    |cat|dog|            false|
    |pig|pig|            false|
    +---+---+-----------------+
  36. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  37. def nextWeekday(col: Column): Column

    Permalink
  38. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  39. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  40. val regexp_extract_all: UserDefinedFunction

    Permalink
  41. val regexp_extract_all_by_group: UserDefinedFunction

    Permalink
  42. def regexp_extract_all_by_group_fun(pattern: String, text: String, captureGroup: Int): Array[String]

    Permalink
  43. val regexp_extract_all_by_groups: UserDefinedFunction

    Permalink
  44. def regexp_extract_all_by_groups_fun(pattern: String, text: String, captureGroups: Seq[Int]): Array[Array[String]]

    Permalink
  45. def removeAllWhitespace(colName: String): Column

    Permalink

    Removes all whitespace in a string

    Removes all whitespace in a string

    val actualDF = sourceDF.withColumn(
      "some_string_without_whitespace",
      removeAllWhitespace(col("some_string"))
    )
    Since

    0.16.0

  46. def removeAllWhitespace(col: Column): Column

    Permalink

    Removes all whitespace in a string

    Removes all whitespace in a string

    val actualDF = sourceDF.withColumn(
      "some_string_without_whitespace",
      removeAllWhitespace(col("some_string"))
    )

    Removes all whitespace in a string (e.g. changes "this has some" to "thishassome".

    Since

    0.16.0

  47. def removeNonWordCharacters(col: Column): Column

    Permalink

    Removes all non-word characters from a string

    Removes all non-word characters from a string

    val actualDF = sourceDF.withColumn(
      "some_string_remove_non_word_chars",
      removeNonWordCharacters(col("some_string"))
    )

    Removes all non-word characters from a string, excluding whitespace (e.g. changes " ni!!ce h^^air person " to " nice hair person ").

  48. def singleSpace(col: Column): Column

    Permalink

    Replaces all whitespace in a string with single spaces

    Replaces all whitespace in a string with single spaces

    val actualDF = sourceDF.withColumn(
      "some_string_single_spaced",
      singleSpace(col("some_string"))
    )

    Replaces all multispaces with single spaces (e.g. changes "this has some" to "this has some".

  49. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  50. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  51. def truncate(col: Column, len: Int): Column

    Permalink

    Truncates the length of StringType columns

    Truncates the length of StringType columns

    sourceDF.withColumn(
      "some_string_truncated",
      truncate(col("some_string"), 3)
    )

    Truncates the "some_string" column to only have three characters.

  52. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  53. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  54. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  55. def yeardiff(end: Column, start: Column): Column

    Permalink

    Returns the number of years from start to end.

    Returns the number of years from start to end. There is a datediff function that calculates the number of days between two dates, but there isn't a yeardiff function that calculates the number of years between two dates.

    The com.github.mrpowers.spark.daria.sql.functions.yeardiff function fills the gap. Let's see how it works!

    Suppose we have the following testDf

    +--------------------+--------------------+
    |      first_datetime|     second_datetime|
    +--------------------+--------------------+
    |2016-09-10 00:00:...|2001-08-10 00:00:...|
    |2016-04-18 00:00:...|2010-05-18 00:00:...|
    |2016-01-10 00:00:...|2013-08-10 00:00:...|
    |                null|                null|
    +--------------------+--------------------+

    We can run the yeardiff function as follows:

    import com.github.mrpowers.spark.daria.sql.functions._
    
    val actualDf = testDf
      .withColumn("num_years", yeardiff(col("first_datetime"), col("second_datetime")))
    
    actualDf.show()
    
    +--------------------+--------------------+------------------+
    |      first_datetime|     second_datetime|         num_years|
    +--------------------+--------------------+------------------+
    |2016-09-10 00:00:...|2001-08-10 00:00:...|15.095890410958905|
    |2016-04-18 00:00:...|2010-05-18 00:00:...| 5.923287671232877|
    |2016-01-10 00:00:...|2013-08-10 00:00:...| 2.419178082191781|
    |                null|                null|              null|
    +--------------------+--------------------+------------------+

Deprecated Value Members

  1. def array_map[T](f: (T) ⇒ T)(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[T]): UserDefinedFunction

    Permalink
    Annotations
    @deprecated
    Deprecated

    (Since version 0.38.2) Removing for Spark 3

  2. def exists[T](f: (T) ⇒ Boolean)(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[T]): UserDefinedFunction

    Permalink

    Like Scala Array exists method, but for ArrayType columns Scala has an Array#exists function that works like this:

    Like Scala Array exists method, but for ArrayType columns Scala has an Array#exists function that works like this:

    Array(1, 2, 5).exists(_ % 2 == 0) // true

    Suppose we have the following sourceDF:

    +---------+
    |     nums|
    +---------+
    |[1, 4, 9]|
    |[1, 3, 5]|
    +---------+

    We can use the spark-daria exists function to see if there are even numbers in the arrays in the nums column.

    val actualDF = sourceDF.withColumn(
    "nums_has_even",
    exists[Int]((x: Int) => x % 2 == 0).apply(col("nums"))
    )
    
    actualDF.show()
    
    +---------+-------------+
    |     nums|nums_has_even|
    +---------+-------------+
    |[1, 4, 9]|         true|
    |[1, 3, 5]|        false|
    +---------+-------------+
    Annotations
    @deprecated
    Deprecated

    (Since version 0.38.2) Removing for Spark 3

  3. def forall[T](f: (T) ⇒ Boolean)(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[T]): UserDefinedFunction

    Permalink

    Like Scala Array forall method, but for ArrayType columns

    Like Scala Array forall method, but for ArrayType columns

    Scala has an Array#forall function that works like this:

    Array("catdog", "crazy cat").forall(_.contains("cat")) // true

    Suppose we have the following sourceDF:

    +------------+
    |       words|
    +------------+
    |[snake, rat]|
    |[cat, crazy]|
    +------------+

    We can use the spark-daria forall function to see if all the strings in an array contain the string "cat".

    val actualDF = sourceDF.withColumn(
    "all_words_begin_with_c",
    forall[String]((x: String) => x.startsWith("c")).apply(col("words"))
    )
    
    actualDF.show()
    
    +------------+----------------------+
    |       words|all_words_begin_with_c|
    +------------+----------------------+
    |[snake, rat]|                 false|
    |[cat, crazy]|                  true|
    +------------+----------------------+
    Annotations
    @deprecated
    Deprecated

    (Since version 0.38.2) Removing for Spark 3

  4. def transform(column: Column, f: (Column, Column) ⇒ Column): Column

    Permalink

    Returns an array of elements after applying a tranformation to each element in the input array with its index.

    Returns an array of elements after applying a tranformation to each element in the input array with its index.

    Suppose we have the following sourceDF:

    +---------+
    |     nums|
    +---------+
    |[1, 2, 3]|
    +---------+
    val actualDF = sourceDF.withColumn(
      "idx_sum", transform(col("nums"), (x, i) => x + i)
    )
    
    actualDF.show()
    
    +---------+---------+
    |     nums|  idx_sum|
    +---------+---------+
    |[1, 2, 3]|[1, 3, 5]|
    +---------+---------+
    Annotations
    @deprecated
    Deprecated

    (Since version 0.38.2) Removing for Spark 3

  5. def transform(column: Column, f: (Column) ⇒ Column): Column

    Permalink

    Returns an array of elements after applying a tranformation to each element in the input array.

    Returns an array of elements after applying a tranformation to each element in the input array.

    Suppose we have the following sourceDF:

    +---------+
    |     nums|
    +---------+
    |[1, 4, 9]|
    |[1, 3, 5]|
    +---------+
    val actualDF = sourceDF.withColumn(
      "squared", transform(col("nums"), x => x * 2)
    )
    
    actualDF.show()
    
    +---------+-----------+
    |     nums|    squared|
    +---------+-----------+
    |[1, 4, 9]|[1, 16, 81]|
    |[1, 3, 5]| [1, 9, 25]|
    +---------+-----------+
    Annotations
    @deprecated
    Deprecated

    (Since version 0.38.2) Removing for Spark 3

Inherited from AnyRef

Inherited from Any

Collection functions

Date time functions

Misc functions

String functions

Support functions for DataFrames