Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use instead of * for creating weighted measures? #170

Open
cscherrer opened this issue Nov 7, 2021 · 11 comments
Open

Use instead of * for creating weighted measures? #170

cscherrer opened this issue Nov 7, 2021 · 11 comments

Comments

@cscherrer
Copy link
Collaborator

@keorn pointed out that in Distributions, * and + behave like this:

julia> 3 * Dists.Normal()
Distributions.LocationScale{Float64, Distributions.Continuous, Distributions.Normal{Float64}}(
μ: 0.0
σ: 3.0
ρ: Distributions.Normal{Float64}=0.0, σ=1.0)
)


julia> 3 + Dists.Normal()
Distributions.LocationScale{Float64, Distributions.Continuous, Distributions.Normal{Float64}}(
μ: 3.0
σ: 1.0
ρ: Distributions.Normal{Float64}=0.0, σ=1.0)
)

This is very different from MeasureTheory, where

julia> density(3 * Normal(), 2.4) / density(Normal(), 2.4)
3.0

This issue is to consider making some changes to this, to minimize confusion for those coming from Distributions.

We currently allow for a "likelihood operating on a measure". We could potentially consider a scalar to work in a similar way, almost like a likelihood that always returns the given value.

Notes / Concerns

Currently for any constant k and measure μ we have

density(k * μ) = k * density(μ)

Under this change, this would become

density(k  μ) = k * density(μ)

Despite its common use in Distributions, it's a little strange from a type perspective to expect this to work. It feels a little like having a function f and wanting k * f to return a new function x -> k * f(x).

@keorn
Copy link
Contributor

keorn commented Nov 7, 2021

I would say that an additional argument against using * for WeightedMeasure, is that the operation of scaling density is not commonly performed using * operator. Having this operator, which is very common in numerical computing, be overloaded to perform this somewhat niche operation can cause confusion when reading code. I could see someone else being equally confused about its semantics here as we are about the semantics in Distributions.jl.

@cscherrer
Copy link
Collaborator Author

Yep, I agree. In general, I just want to be sure to think through the semantics and any potential implications.

If we think of "lifting" the constants, we can treat them as constant (log-)likelihoods. So k ⊙ μ would give a new measure with

density(k  μ) = k * density(μ)
logdensity(k  μ) = log(k) + logdensity(μ)

Then * and + would give affine transformations. All of that seems fine.

I think the natural next question is, what if k is an array? If we allow this for scalars, the analogous thing for arrays seems natural.

Also, between two measures μ and ν, we currently have μ * ν for the product measure, and μ + ν for the superposition. So for example, k * μ would be very different from Dirac(k) * μ. But this is already very different, it would just be different in a different way :)

@keorn
Copy link
Contributor

keorn commented Nov 8, 2021

Yeah, other usages of * and + is another can of worms. At PlantingSpace we prefer for product measure and for what you call superposition to make things more distinct.

Also I do not think there is a natural extension to arrays - array currently does not have measure theoretic semantics besides just being a possible support.

@mschauer
Copy link
Member

mschauer commented Nov 8, 2021

I think we do not want to conflate *-multiplication of random variables c*X with * multiplication of densities, so we perhaps won't follow distributions.jl. But to avoid confusion I think you are right that makes sense.

@cscherrer
Copy link
Collaborator Author

At PlantingSpace we prefer for product measure and for what you call superposition to make things more distinct.

I've considered this. I think product measure and superposition are the category-theoretic product and coproduct, in which case and make a lot of sense. The biggest concern I can see is that people also use for kronecker product.

@mschauer
Copy link
Member

mschauer commented Nov 8, 2021

The use for the Kronecker seems unproblematic given that both uses can be thought of as instances of a tensor product.

@cscherrer
Copy link
Collaborator Author

@mschauer what do you think of transitioning to use and in this way? Maybe we should talk to the Catlab folks about getting a core interface with some categorical primitives. From a categorical perspective it's pretty standardized, so getting common ground should be easy. That would help avoid future name collisions.

@cscherrer
Copy link
Collaborator Author

@ablaom
Copy link

ablaom commented Nov 25, 2021

I'm not yet too familiar with this package but I feel that scalar * measure should keep its current meaning, the most natural in the context of general measures. Whatever choice you make, there will be confusion to some users. Why not choose the path most consistent with the "measures" vision? This is what I would expect from the name of the package, anyhow.

I agree makes sense for product measures, but prefer + for adding measures (as in (m1 + m2)(A) = m1(A) + m2(A)). Is this what is meant by "superposition" above? Sorry I'm not familiar with the term in context of measures.

Probably I misunderstand, but I would have thought the coproduct of two measure spaces is their disjoint union, which would make the coproduct of two measures different from their sum in the sense just mentioned. For the coproduct measure on a disjoint union, makes sense to me.

@cscherrer
Copy link
Collaborator Author

When I started this package, it was to address some aspects of the Distributions design that were making things difficult for my work in Soss. So certainly there's no inherent requirement to follow that design in any way.

When Distributions uses + and * as a + b * Normal(), it's described as an affine transform of a random variable, but of course Normal() isn't a random variable. So in reality, it's lifting these operations, roughly like the pseudo-notation

a::Real + d::Distribution = (a + x for x in d)

This kind of silent conversion is very non-Julian, but unfortunately I think that ship has sailed. I've suggested this should really be written as broadcasting, but there seem to be at least implicit assumptions that broadcasting is over a finite set of values.

Anyway, I haven't had much luck influencing the design of Distributions, so I try not to spend much time on it. At the same time, we want people to use the package, and most new users will already be familiar with Distributions. And certainly any conflicts with Distributions should have good reasons behind them. It can be very difficult to strike the right balance.

In some cases, a compromise can be in the best interest of the package and its future users. I think this has been the case for DensityInterface.jl. Here, the challenge has been to allow an in-road for users without measure-theoretic experience who use "density" in a much looser sense. This leads to some complications in our design that aren't really ideal, but are almost certainly better than "going it alone". As a result of this, there's some potential for MeasureBase to become a dependency for Distributions. That's very much a WIP, but it seems it could help with tying the ecosystem together more cleanly.

Ok I'm rambling now. Sorry, I'll get back to the points you made.

I agree makes sense for product measures, but prefer + for adding measures (as in (m1 + m2)(A) = m1(A) + m2(A)). Is this what is meant by "superposition" above?

Yes, that's right.

Probably I misunderstand, but I would have thought the coproduct of two measure spaces is their disjoint union, which would make the coproduct of two measures different from their sum in the sense just mentioned. For the coproduct measure on a disjoint union, makes sense to me.

Great point. I agree it would make sense for and to be used for the category-theoretic product and coproduct. I think you're right that superposition is not the coproduct (coproduct is always defined, but superposition requires two measures on the same space). But there's still an algebra. So any operator names we commit to need to be consistent with this algebra. Also, it's interesting that superposition can be considered as disjoint union (of two measures on a common space) followed by "forgetting" which component was chosen.

@ablaom
Copy link

ablaom commented Nov 25, 2021

At the same time, we want people to use the package, and most new users will already be familiar with Distributions. And certainly any conflicts with Distributions should have good reasons behind them. It can be very difficult to strike the right balance.

Thanks for the response, @cscherrer. Yes, striking this balance is difficult. I would say you are in a much better position than I to do this. I just wanted to give you another point on the graph.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants