Skip to content

Spark session ext

create_df(spark, rows_data, col_specs)

Creates a new DataFrame from the given data and column specs.

The returned DataFrame is created using the StructType and StructField classes provided by PySpark.

Parameters:

Name Type Description Default
rows_data list[tuple]

the data used to create the DataFrame

required
col_specs list[tuple]

list of tuples containing the name and type of the field

required

Returns:

Type Description
DataFrame

a new DataFrame

Source code in quinn/extensions/spark_session_ext.py
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
def create_df(
    spark: SparkSession,
    rows_data: list[tuple],
    col_specs: list[tuple],
) -> DataFrame:
    """Creates a new DataFrame from the given data and column specs.

    The returned DataFrame is created using the StructType and StructField classes provided by PySpark.

    :param rows_data: the data used to create the DataFrame
    :type rows_data: array-like
    :param col_specs: list of tuples containing the name and type of the field
    :type col_specs: list of tuples
    :return: a new DataFrame
    :rtype: DataFrame
    """
    warnings.warn(
        "Extensions may be removed in the future versions of quinn. Please use `quinn.create_df()` instead",
        category=DeprecationWarning,
        stacklevel=2,
    )

    struct_fields = [StructField(*x) for x in col_specs]
    return spark.createDataFrame(data=rows_data, schema=StructType(struct_fields))