Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Coral-Hive] [Coral-Trino] Make named_struct a Coral IR operator and Migrate GenericProject Function #431

Merged
merged 10 commits into from
Jul 10, 2023

Conversation

aastha25
Copy link
Contributor

@aastha25 aastha25 commented Jun 27, 2023

What changes are proposed in this pull request, and why are they necessary?

This PR covers two migrations - (1) named_struct() (2) generic_project()

[1]
This PR uses code changes from open PR #412 and adds minor modifications on top of it to be compatible with the new API.

Summary from the PR#412:

This patch removes the transformation from HiveConvertletTable that converts named_struct to CAST (ROW() AS ROW()). Instead, it makes named_struct a Coral IR operator. Engine translations on the RHS are also adapted to accommodate this change. This also eliminates the need to rewrite from CAST (ROW() AS ROW()) to named_struct on the Spark side, because named_struct is now maintained all along. CastToNamedStructTransformer on the Spark side will be removed in a future PR.

This PR also introduces a Trino transformer, NamedStructToCastTransformer, which converts the Coral IR operator: named_struct to its equivalent Trino compatible operator.

This PR should address #357 and also unblocks migration of CONCAT operator here #378

[2]
This PR also migrates the Rel transformer: GenericProjectToTrinoConverter to a SqlCall transformer: GenericProjectTransformer.

How was this patch tested?

./gradlew build
updated & added UTs
tested with production views for spark, avro, trino

@aastha25 aastha25 changed the title [Coral-Hive] Make named_struct a Coral IR operator [Coral-Hive] [Coral-Trino] Make named_struct a Coral IR operator and Migrate GenericProject Function Jun 29, 2023
Copy link
Contributor

@yiqiangin yiqiangin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -21,7 +21,7 @@
* If a column, colA, has a RelDataType, relDataTypeA, with a Trino type string, trinoTypeStringA = buildStructDataTypeString(relDataTypeA),
* then the following operation is syntactically and semantically correct in Trino: CAST(colA as trinoTypeStringA)
*/
class RelDataTypeToTrinoTypeStringConverter {
public class RelDataTypeToTrinoTypeStringConverter {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I know why these three classes including TrinoMapTransformValuesFunction and TrinoStructCastRowFunction are converted to public? I don't see any usage of this class in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These classes are used in GenericProjectTransformer for example here. Previously GenericProjectTransformer was in the same package as RelDataTypeToTrinoTypeStringConverter but now it's moved to another package.

@aastha25 aastha25 merged commit 633474f into linkedin:master Jul 10, 2023
1 check passed
@aastha25 aastha25 deleted the nestedStruct2 branch July 14, 2023 21:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants