Class

com.github.mrpowers.spark.daria.sql.DataFrameExt

DataFrameMethods

Related Doc: package DataFrameExt

Permalink

implicit class DataFrameMethods extends AnyRef

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. DataFrameMethods
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new DataFrameMethods(df: DataFrame)

    Permalink

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  6. def columnDiff(otherDF: DataFrame): Seq[String]

    Permalink

    Returns the columns in otherDF that aren't in self

  7. def composeTrans(customTransforms: List[CustomTransform]): DataFrame

    Permalink

    Executes a list of transformations in CustomTransform objects Uses function composition

  8. def composeTransforms(transforms: (DataFrame) ⇒ DataFrame*): DataFrame

    Permalink

    Executes a list of custom DataFrame transformations Uses function composition to run a list of DataFrame transformations.

    Executes a list of custom DataFrame transformations Uses function composition to run a list of DataFrame transformations.

    def withGreeting()(df: DataFrame): DataFrame = { df.withColumn("greeting", lit("hello world")) }

    def withCat(name: String)(df: DataFrame): DataFrame = { df.withColumn("cats", lit(name + " meow")) }

    sourceDF.composeTransforms(withGreeting(), withCat("sandy"))

  9. def composeTransforms(transforms: List[(DataFrame) ⇒ DataFrame]): DataFrame

    Permalink

    Executes a list of custom DataFrame transformations Uses function composition to run a list of DataFrame transformations.

    Executes a list of custom DataFrame transformations Uses function composition to run a list of DataFrame transformations.

    def withGreeting()(df: DataFrame): DataFrame = { df.withColumn("greeting", lit("hello world")) }

    def withCat(name: String)(df: DataFrame): DataFrame = { df.withColumn("cats", lit(name + " meow")) }

    val transforms = List( withGreeting()(_), withCat("sandy")(_) )

    sourceDF.composeTransforms(transforms)

  10. def containsColumn(structField: StructField): Boolean

    Permalink

    Returns true if the DataFrame contains the StructField

    Returns true if the DataFrame contains the StructField

    sourceDF.containsColumn(StructField("team", StringType, true))

    Returns true if sourceDF contains the StructField and false otherwise.

  11. def containsColumn(colName: String): Boolean

    Permalink

    Returns true if the DataFrame contains the column

    Returns true if the DataFrame contains the column

    sourceDF.containsColumn("team")

    Returns true if sourceDF contains a column named "team" and false otherwise.

  12. def containsColumns(colNames: String*): Boolean

    Permalink

    Returns true if the DataFrame contains all the columns

    Returns true if the DataFrame contains all the columns

    sourceDF.containsColumns("team", "city")

    Returns true if sourceDF contains the "team" and "city" columns and false otherwise.

  13. def dropColumns(f: (String) ⇒ Boolean): DataFrame

    Permalink

    Drops multiple columns that satisfy the conditions of a function Here is how to drop all columns that start with an underscore df.dropColumns(_.startsWith("_"))

  14. def dropNestedColumn(fullColumnName: String): DataFrame

    Permalink

    Drop nested column by specifying full name (for example foo.bar)

  15. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  16. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  17. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  18. def flattenSchema(delimiter: String = "."): DataFrame

    Permalink

    Converts all the StructType columns to regular columns This StackOverflow answer provides a detailed description how to use flattenSchema: https://stackoverflow.com/a/50402697/1125159

  19. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  20. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  21. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  22. def killDuplicates(): DataFrame

    Permalink

    Completely removes all duplicates from a DataFrame

  23. def killDuplicates(col1: String, cols: String*): DataFrame

    Permalink

    Completely removes all duplicates from a DataFrame

  24. def killDuplicates(cols: Column*): DataFrame

    Permalink

    Completely removes all duplicates from a DataFrame

  25. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  26. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  27. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  28. def printSchemaInCodeFormat(): Unit

    Permalink

    Prints the schema with StructType and StructFields so it's easy to copy into code Spark has a printSchema method to print the schema of a DataFrame and a schema method that returns a StructType object.

    Prints the schema with StructType and StructFields so it's easy to copy into code Spark has a printSchema method to print the schema of a DataFrame and a schema method that returns a StructType object.

    The Dataset#schema method can be easily converted into working code for small DataFrames, but it can be a lot of manual work for DataFrames with a lot of columns.

    The printSchemaInCodeFormat DataFrame extension prints the DataFrame schema as a valid StructType object.

    Suppose you have the following sourceDF:

    +--------+--------+---------+
    |    team|   sport|goals_for|
    +--------+--------+---------+
    |    jets|football|       45|
    |nacional|  soccer|       10|
    +--------+--------+---------+
    
    `sourceDF.printSchemaInCodeFormat()` will output the following rows in the console:
    
    StructType(
      List(
        StructField("team", StringType, true),
        StructField("sport", StringType, true),
        StructField("goals_for", IntegerType, true)
      )
    )
  29. def renameColumns(f: (String) ⇒ String): DataFrame

    Permalink

    Rename columns Here is how to lowercase all the columns df.renameColumns(_.toLowerCase) Here is how to trim all the columns df.renameColumns(_.trim)

  30. def reorderColumns(colNames: Seq[String]): DataFrame

    Permalink

    Reorders columns as specified Reorders the columns in a DataFrame.

    Reorders columns as specified Reorders the columns in a DataFrame.

    val actualDF = sourceDF.reorderColumns(
      Seq("greeting", "team", "cats")
    )

    The actualDF will have the greeting column first, then the team column then the cats column.

  31. def setNullableForAllColumns(nullable: Boolean): DataFrame

    Permalink

    Makes all columns nullable or vice versa

  32. def structureSchema(delimiter: String = "_"): DataFrame

    Permalink

    This method is opposite of flattenSchema.

    This method is opposite of flattenSchema. For example, if you have flat dataframe with snake case columns it will convert it to dataframe with nested columns.

    From: root |-- person_id: long (nullable = true) |-- person_name: string (nullable = true) |-- person_surname: string (nullable = true)

    To: root |-- person: struct (nullable = false) | |-- name: string (nullable = true) | |-- surname: string (nullable = true) | |-- id: long (nullable = true)

  33. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  34. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  35. def trans(customTransform: CustomTransform): DataFrame

    Permalink

    Like transform(), but for CustomTransform objects Enables you to specify the columns that should be added / removed by a custom transformations and errors out if the columns the columns that are actually added / removed are different.

    Like transform(), but for CustomTransform objects Enables you to specify the columns that should be added / removed by a custom transformations and errors out if the columns the columns that are actually added / removed are different.

    val actualDF = sourceDF .trans( CustomTransform( transform = ExampleTransforms.withGreeting(), addedColumns = Seq("greeting"), requiredColumns = Seq("something") ) ) .trans( CustomTransform( transform = ExampleTransforms.withCat("spanky"), addedColumns = Seq("cats") ) ) .trans( CustomTransform( transform = ExampleTransforms.dropWordCol(), removedColumns = Seq("word") ) )

  36. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  37. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  38. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  39. def withColumnCast(columnName: String, newType: DataType): DataFrame

    Permalink

    Returns a new DataFrame with the column columnName cast as newType.

    Returns a new DataFrame with the column columnName cast as newType.

    columnName

    the column to cast

    newType

    the new type for columnName

  40. def withColumnCast(columnName: String, newType: String): DataFrame

    Permalink

    Returns a new DataFrame with the column columnName cast as newType.

    Returns a new DataFrame with the column columnName cast as newType.

    columnName

    the column to cast

    newType

    the new type for columnName

Inherited from AnyRef

Inherited from Any

Ungrouped