Dataframe helpers
column_to_list(df, col_name)
Collect column to list of values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
DataFrame
|
Input DataFrame |
required |
col_name |
str
|
Column to collect |
required |
Returns:
Type | Description |
---|---|
List[Any]
|
List of values |
Source code in quinn/dataframe_helpers.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
|
create_df(spark, rows_data, col_specs)
Create a new DataFrame from the given data and column specs.
The returned DataFrame s created using the StructType and StructField classes provided by PySpark.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
spark |
SparkSession
|
SparkSession object |
required |
rows_data |
array-like
|
the data used to create the DataFrame |
required |
col_specs |
list of tuples
|
list of tuples containing the name and type of the field |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
a new DataFrame |
Source code in quinn/dataframe_helpers.py
136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 |
|
print_athena_create_table(df, athena_table_name, s3location)
Generate the Athena create table statement for a given DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
DataFrame
|
The pyspark.sql.DataFrame to use |
required |
athena_table_name |
str
|
The name of the athena table to generate |
required |
s3location |
str
|
The S3 location of the parquet data |
required |
Returns:
Type | Description |
---|---|
None
|
None. |
Source code in quinn/dataframe_helpers.py
84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 |
|
show_output_to_df(show_output, spark)
Show output as spark DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
show_output |
str
|
String representing output of 'show' command in spark |
required |
spark |
SparkSession
|
SparkSession object |
required |
Returns:
Type | Description |
---|---|
Dataframe
|
DataFrame object containing output of a show command in spark |
Source code in quinn/dataframe_helpers.py
115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 |
|
to_list_of_dictionaries(df)
Convert a Spark DataFrame to a list of dictionaries.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
DataFrame
|
The Spark DataFrame to convert. |
required |
Returns:
Type | Description |
---|---|
List[Dict[str, Any]]
|
A list of dictionaries representing the rows in the DataFrame. |
Source code in quinn/dataframe_helpers.py
73 74 75 76 77 78 79 80 81 |
|
two_columns_to_dictionary(df, key_col_name, value_col_name)
Collect two columns as dictionary when first column is key and second is value.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
DataFrame
|
Input DataFrame |
required |
key_col_name |
str
|
Key-column |
required |
value_col_name |
str
|
Value-column |
required |
Returns:
Type | Description |
---|---|
Dict[str, Any]
|
Dictionary with values |
Source code in quinn/dataframe_helpers.py
53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 |
|