Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure Table Storage Change Feed? #22

Open
deyanp opened this issue Apr 17, 2019 · 13 comments
Open

Azure Table Storage Change Feed? #22

deyanp opened this issue Apr 17, 2019 · 13 comments
Labels
question Further information is requested

Comments

@deyanp
Copy link

deyanp commented Apr 17, 2019

Hi,

I saw the comment that the author moved from Cosmos DB to Azure Table Storage due to high costs with the former. How do you push the data to the Read Model though, I couldnt find Change Feed or similar for Table Storage ... and polling doesnt sound workable ...

Best regards,
Deyan

@Dzoukr
Copy link
Owner

Dzoukr commented Apr 17, 2019

Hi @deyanp,

there is IObservable of appended event as part of CosmoStore instance - see https://github.com/Dzoukr/CosmoStore/blob/master/src/CosmoStore/CosmoStore.fs#L59

Another (and also widely used by us) approach is to compose functions Cmd -> Event list with Event list -> Event list (doing side-effect writing to projection database). It is matter of taste, someone doesn't like reactive approach, someone does.

@deyanp
Copy link
Author

deyanp commented Apr 17, 2019

Hi,

Does this mean that the projections get built "in-process", without any guarantee in case the process crashes after writing to the event stream?

Best regards,
Deyan

@Dzoukr
Copy link
Owner

Dzoukr commented Apr 17, 2019

Yes, it would have to happen just between writing to event store & projection database, but it can theoretically happen and you would have to do replay of missing events in such case. Or you can plug queue in-between and write projections in separate process / application. AFAIK there is no Change feed for Table Storage so it is up to you how to lower the risks of eventual consistency.

@deyanp
Copy link
Author

deyanp commented Apr 17, 2019

Yep, this is the problem I am facing ... writing to a queue is not a solution, as I cannot (and dont want to) open a distributed transaction between Azure Table Storage and Azure Event Hub for example ..

What issues with the costs of Cosmos DB did you face exactly (if I may ask), and do you think there is a solution to them?

@Dzoukr
Copy link
Owner

Dzoukr commented Apr 17, 2019

Well, the pricing of Cosmos DB scales differently. If you need to start "low" (imagine weekend project) with few events stored, few aggregates, you still need to have 400 RU/s as current minimum. And such minimum is still expensive as hell comparing to Azure Table Storage where you pay mostly for space, which is negligible.

To make it clear, I still love Cosmos DB - amazing product, but until MS will change pricing to be more friendly for low-cost/weekend projects, it will be product chosen mainly by bigger companies.

@deyanp
Copy link
Author

deyanp commented Apr 17, 2019

Thank you for sharing your concerns, now I understand better.
I am thinking of using

  1. Cosmos DB for the write side (taking advantage of the Change Feed)
  2. Azure Table Storage for
    a) the read side (duplicate denormalized projections)
    b) duplicating all events from Cosmos DB to Azure Table Storage for replay purposes, assuming reading all events directly from Cosmos DB would incur a lot of RUs/costs
    c) aggregate snapshots (last state of aggregate, not to have to read and replay all old events)

Alternatively I was thinking about Azure PostgreSQL for 2a), as Azure SQL Database seems to be much more expensive ...

What do you think about the above approach?

@Dzoukr
Copy link
Owner

Dzoukr commented Apr 17, 2019

reading all events directly from Cosmos DB would incur a lot of RUs/costs

That is the funny part. If your Cosmos DB collection has 400 RUs, you just pay for it. Constantly. No matter if you use it or not.

Otherwise it looks ok - let me know how it works.

@Dzoukr Dzoukr added the question Further information is requested label Apr 18, 2019
@dharmaturtle
Copy link

dharmaturtle commented Oct 11, 2020

@deyanp I independently arrived at the same architecture you described (namely CosmosDB for writes and Azure Table Storage for denormalized views, changefeed duplication, and snapshots). I arrived here after googling "Azure Table Storage change feed" :) I haven't implemented anything yet, just theorycrafting my own pet project.

How did your project turn out?

@bartelink
Copy link

Slight tangent but... I'd be interested to see how you represent the events and/or manage efficient idempotent writing to azure tables (the thing termed 'changefeed duplication' above)

I suspect that forking Propulsion.Cosmos.Sink might be a good way to scale the archival process. In the proArchiver template (complete, but unmerged in jet/dotnet-templates#79), I duplicate events from the primary out to CosmosDB (see in-depth discussion of my rationale).

@deyanp
Copy link
Author

deyanp commented Oct 11, 2020

@deyanp I independently arrived at the same architecture you described (namely CosmosDB for writes and Azure Table Storage for denormalized views, changefeed duplication, and snapshots). I arrived here after googling "Azure Table Storage change feed" :) I haven't implemented anything yet, just theorycrafting my own pet project.

How did your project turn out?

@dharmaturtle , as many things in life, this one also turned into a different direction: MongoDB for writes and some reads, and Azure Data Explorer (ADX) for DWH/Reporring/more complicated reads.

Cosmos DB surprised me a bit negatively - everything must be partitioned, bloated storage (200 bytes turn into 900 bytes somehow, and you pay for uncompressed storage) and what I need very much - missing atomic updates ...

ADX is sth I recommend a lot, MongoDB has its quirks ..

@bartelink
Copy link

and what I need very much - missing atomic updates ...

what about the batch APIs ? can stored procs to the job (in general you should be able to get it done with the bulk apis though [unless you have specific things that really benefit from being able to gain efficiency with reduced roundtrips])

Re that per doc overhead, I can definitely concur (which is why equinox packs events into docs, it seems that ~30k is the sweet spot though there are lots of factors to consider)

@deyanp
Copy link
Author

deyanp commented Oct 15, 2020

@bartelink , neither sprocs nor anything else helps I am afraid. I need to update a shared account balance multiple times per second in parallel (e.g. 20x), and I cannot at all afford any optimistic concurrency exceptions. I have looked at stored procedures and under the hood they also do optimistic locking stuff .. so no way that I found, unfortunately :(

They say they support MongoDB's API (even though 3.2/3.6, which is outdated) and findOneAndModify/Update in particular (which is atomic, with $set, $inc etc commands) but even though I asked (see https://feedback.azure.com/forums/263030-azure-cosmos-db/suggestions/38110195-support-for-atomic-updates-in-sql-api) they did not confirm and I am afraid also there under the hood there is some optimistic concurrency going on ...

@bartelink
Copy link

@deyanp I'd be surprised if the Cosmos MongoDB interface offers any increment on native functionality. I agree the bulk facility is covering a very different use case

Not sure if it's remotely useful but in Equinox.Cosmos we solved a similar problem via:

  • the Equinox Sync stored proc yields the conflicting state if there is a conflict (which does not turn into an exception or necessitate another roundtrip to sync with the state)
  • In the app layer, use AsyncBatchingGate to gather concurrent requests into a single roundtrip - i.e. if 5 inc operations need to happen concurrently, send them via the batching gate, aggregate them into a single request and then have each caller share that fate.

In some cases, you can stack the requests up in some form of queue or bus (which also)

If you're literally only looking to do an inc operation, the bottom line is that at CosmosDB level there simply has to be a read, then an update followed by an etag-contingent update - you can rig it such that in the failure case you recurse within the stored proc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants