UDF/UDTF/UDAF

UDFs works on a single row in a table and produces a single row as output.

UDTF: User defined tabular function works on one row as input and returns multiple rows as output. So here the relation in one to many. e.g Hive built in EXPLODE() function. UDTF can be used to split a column into multiple column as well.

UDAF: User defined aggregate functions works on more than one row and gives single row as output. e.g Hive built in MAX() or COUNT() functions.

HiveSessionCatalog is responsible to convert functions from hive into spark domain.

HiveFunctionRegistry is used to lookup function info for hive builtin functions.

HiveSessionCatalog.makeFunctionBuilder will create the FunctionBuilder, which is a function with input parameter as Seq[Expression] and return type as Expression. createTempFunction will register the info into functionRegistry.

type FunctionBuilder = Seq[Expression] => Expression

FunctionRegistry.lookupFunction will retrieve the builder based on the function name, and then apply the children to get the expression in Spark, which is actually HiveSimpleUDF/HiveUDAFFunction/HiveGenericUDTF

results matching ""

    No results matching ""