Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[chore] Schema Processor Revamp Implementation Parent PR #35248

Open
wants to merge 108 commits into
base: main
Choose a base branch
from

Conversation

ankitpatel96
Copy link
Contributor

Description:
This is a duplicate of #35213 - I had to reopen this because the Github Actions refused to trigger.

This is the schema processor, as originally written by @MovieStoreGuy in #11547 - refreshed and rebased. I've rewritten some parts and added significant amounts of testing to many parts - resulting in quite a few changes. I will be splitting this PR into many parts.

There are several features that need to be completed before this is ready for production use:

This only supports the Schema File Format 1.0.0 - we need to support 1.1.0
Right now only downgrades work. Upgrades need some work to enable
We should improve internal metrics and observability.

MovieStoreGuy and others added 30 commits July 31, 2024 16:11
As the schema processor improves, this package path changes to help make
sense of what is trying to do.
In order to help validate the works of the schema transformer in a
future PR, these additional packages add in functionality to verify /
abstract efforts.
A modifier allows mutating a signal from a previous version to the
current one and vice versa.
A revision handles applying the modifiers to incoming signals and
converting a signal to the next version.
A translation is an immutable representation of the schema translation
file that can be applied to each imcoming signal that is then managed by
the schema manager.

The schema manager has a set of providers that allow the schema
translation to be fetched from the remote location.
@ankitpatel96 ankitpatel96 requested a review from a team September 17, 2024 16:00
@github-actions github-actions bot added the processor/schema Schema processor label Sep 17, 2024
@ankitpatel96 ankitpatel96 changed the title Schema Processor Revamp implementation [chore] Schema Processor Revamp Implementation Parent PR Sep 17, 2024
@tigrannajaryan
Copy link
Member

I will be splitting this PR into many parts.

@ankitpatel96 thank you for working on this. Are you looking for reviews on this PR or we should wait until you split into smaller ones?

@ankitpatel96
Copy link
Contributor Author

ankitpatel96 commented Sep 18, 2024

I will be splitting this PR into many parts.

@ankitpatel96 thank you for working on this. Are you looking for reviews on this PR or we should wait until you split into smaller ones?

Hi @tigrannajaryan, I am definitely not trying to get a review on this entire PR haha - I am in the process of splitting it into 3-4 PRs.

Part 1 - #35214
Part 2 - #35267

The rest of the code is largely done - I'm just working on splitting it into stacked PRs that makes sense.

I am working up a mini design document so people have an idea of what they are looking at.
Will leave a comment on this PR when that's done!

@jsuereth
Copy link
Contributor

jsuereth commented Sep 19, 2024

Edit: Added link to specification

One major question -> Is this not intended to support resource attribute changes?

I was going through looking for the intricate "fun" code to deal with that scenario, but I didn't see anything, nor do I see tests associated with it.

I think it's fine for to disallow that for an initial implementation/improvement, but you may want to consider the architecture/design for when you need to apply attirbute name changes on resources in addition to signals.

Specifically -> The shuffling of Data points out of a resource and into a new one is... fun. From my SDK implementation this was much simpler, as the SDK only allows one resource. For the Collector -> I think you'll need some robust code, checks and performance tests around that.

See https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/schemas/file_format_v1.1.0.md#all-section - specifically the notion there's a resource attribute rename possibility and that all must apply to that too.

@ankitpatel96
Copy link
Contributor Author

Hi Josh,
I have implemented attribute renames for the all block and the resource block but I made a mistake and didn't realize that I had to make each ResourceLogs struct have a unique Resource within in a plog.Logs object (along with all the other signal types). I cut a ticket for it - #35305 - and I think this is definitely something we can support in the future. Thanks for bringing it up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
processor/schema Schema processor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants