(Modern Software Architecture) Event Sourcing

(Modern Software Architecture) Event Sourcing

Programming/Architecture 2018. 2. 25. 18:25

Content

Discovering the Domain Architecture through DDD

The DDD Layered Architecture

The "Domain Model" Supporting Architecture

The CQRS Supporting Architecture

Event Sourcing

Designing Software Driven by the Domain

Outline

Event Sourcing Introduction

Event Sourcing at a Glance

Events as the Data Source

Event-based Persistence

Data Projections from Stored Events

Event Sourcing in Action

Event-based Data Stores

Event Sourcing Introduction

The real world is mostly made of events. We see events after facts, and events actually are notifications of things that just happened. In software instead, we rarely use events in the business logic. We tend instead to build abstract models that hopefully can provide a logical path for whatever we see in the domain, and sometimes working this way we end up with a root object we just call God or Universe. Events may bring some key benefits to software architecture. Primarily this is because events are immutable pieces of information, and by tracking events, you never miss a thing that happens in and around the business domain, and finally, events can be replayed, and by replaying events you can process their content and build on top of that multiple projections of the same core data.

Event Sourcing at a Glance

Event sourcing is a design approach based on the assumption that all changes made to the application state during the entire lifetime of the application are stored as a sequence of events. Event sourcing can be summarized by saying that we end up having serialized events as the data building blocks of the application. Serialized events are actually the data source of the application.

This is not how the vast majority of today's applications work. Most applications today work by storing the current state of domain entities and use that stored state as the starting point to process business transactions.

e.g. Let's say you have a shopping cart. How would you model a shopping cart?

Reasonably, if you're going to model it, you would probably add a list of ordered products, payment information, for example, credit card details, maybe shipping address. The shopping cart has now been modeled and the model is a faithful representation of the internal logic we commonly associate with a shopping cart.

But here is another way to represent that same information you can expect to find around the shopping cart. Instead of storing all the pieces of information in the columns of a single record or in the properties of a single object, you can describe the state of the shopping cart through the sequence of events that brought it to contain a given list of items. So, add first item, add second item, add payment information, update the second item, for example, to change the quantity or the type of a product, maybe then remove the first item, add shipping address. All these events relate to the same shopping cart entity, but we don't save in any place the current state of the shopping, but just the steps and related data that brought it to be what it is actually under our eyes. By going through the list of events then, you can rebuild at any time that state of the shopping cart. This is actually an event-based representation of an entity.

For most applications, think for example of CRUD applications, today structural and event-based representations of entities are functionally equivalent, and most applications use to store their state via snapshot, so they just store the current state and they ignore events. More in general, the picture you see now onscreen is only the tip of the iceberg. There is a more general and all-encompassing way to look at storage that incorporates events, but this is just under the surface. On top of an event-based representation of data, you can still create as a snapshot database the current state representation that returns the last known good state of a domain entity. This is, at the end of the day, event sourcing.

Key Facts of Event Sourcing

An event is something that has happened in the past

Events are expression of the ubiquitous language

Events are not imperative and are named using past tense verbs

Have a persistent store for events

Append-only, no delete

Replay the (related) events to get to the last known state of an entity

Replay from the beginning or a known point (snapshot)

Events express altogether the state of a domain entity. To get that state, you need to replay events. To get the full state, you should replay from the beginning of the application timeline. Sometimes this may require you process too much data. In this case, you can define snapshots along the way and replay events as the delta of the state from the last known snapshot. In the end, a snapshot is the state of the entity at a given time.

An event is something that happened in the past. This is a very important point about events, and should be kept clearly in mind.

Once stored, events are immutable. Events can be duplicated and replicated, especially for scalability reasons.

Any behavior associated with a given event has been performed the moment in which the event is notified, replaying the event, in other words, doesn't require to repeat the behavior.

When using events, you don't miss a thing. You track everything that happened at the time it happened, and regardless of the effects it produced.

With events, any system data is saved at a lower abstraction level than today.

Events as the Data Source

CQRS is the dawn of a new software design world, and events are what we find when the dawn has actually turned into a full day.

At some point, relational databases have become the center of the universe as far as software design is concerned. They brought up the idea of the data model and persistence of the data model. It worked. And it still works. But it's about time you reconsider this approach, because chances are that it shows incapable of letting you achieve some results, getting more and more important in the coming days. Events are a revolutionary new approach to software design and architecture, except that events are not really new and not really revolutionary. Relational databases themselves don't manage current state internally, even though they expose current state outside. But internally, relational databases were looking at the actual actions that modify the initial state. Events in the end are not a brilliant new idea of these days. Thirty years ago old apps still used something based on the concept of events. Simply there was no fancy name at the time for the approach they were using. Events may be the tokens stored in the application data source in nearly every business scenario. Events just go at the root of the problem and offer a more general storage solution for just every system.

So, what should you do today? I see two options.

First option, you can insist on a system that primarily saves the current state, but start tracking relevant application facts through a separate log.

Second option, you store events and then replaying through the event stream you build any relevant facts you want to present to users as data.

So in other words, from events you create a projection of data in much the same logical way you build a projection of computed database columns when you run certain SQL queries.

To decide whether or not in your specific scenario you need events, consider what follows, the two points you see onscreen now.

Is it important to track what was added and then removed?

Is it important business-wise to track when an item was removed from the cart?

If you say yes, then you probably need events.

Storing events in a stream works really well, I would even say naturally with the implementation of commands. And commands submitted by the user form a natural stream of events. At the same time, queries work much better if you have structured data, so in the end you need both. You need commands for storing and queries and classic current state storage for queries. This is the whole point of CQRS, command and query responsibility segregation. As an architect, you only have to decide whether you want to have a command stack based on an event stream that works in sync with, say, an orders table, or more conservatively, you want to start with orders tables and then just record an event stream separately for any relevant fact. It's a sort of middle ground to proceed step-by-step ahead.

Event-based Persistence

In software, persistence is made of four key operations through which developers manipulate the state of the system. They are CREATE, UPDATE, and DELETE, to alter the state, and QUERIES to read the state without altering. A key principle of software and especially CQRS inspired software, is that asking a question should not change the answer.

CREATE

Let's see what happens when a new relevant entity is created and must be persisted in an event-based system. There is a request coming from the presentation layer or in some other known UI asynchronous way, the command stack processes that, and appends a new record, say, an order, with all the details. The existing data store is extended with a new item that contains all the information it needs to be immutable. If, say, the information is required, the current rate is better stored inside the logged information and not read dynamically from elsewhere. In this way, the stored event is really immutable and self-contained. There's also some information of other type you need to have. Each event should be identified uniquely and in some way you need to give each event an-app specific code to indicate whether you're adding an order or an invoice. This can be achieved in a variety of ways, by type, if you use document store, or through one columns if you use a relational store. In addition, you want to have a timestamp so that it's clear when the operation took place, and any relevant ID you need to identify the entity better the aggregate being stored. Finally, full details of the record must be saved as well. If you are recording the creation of an order, you want to have all order details, including shipping payment, transaction ID, confirmation ID, order ID, and so forth. If necessary, payment and shipping operations may in turn generate other events being tracked as well. And finally, when it comes to event storage, it is transparent the technology you use so it can be relational document-based, graph-based, or whatever works for you. The CREATE operation is an event-based persistence scenario that is not much different from classic persistence. You just add a record.

UPDATE

Update is bit different, as you don't override an existing record, but just add another record that contains delta. You need to store the same information as we create operations, including a unique event ID, timestamp, code to recognize the operations, and more, and changes applied. You also need the aggregate ID and delta. If you only updated the quantity of a given product in a given order, in particular, you only store the new quantity and the product ID. Storage also in this case is transparent and can be relational, document-based, graph-based, or whatever works for you. Note that in some cases, you might want to consider also storing the full state of the entity along with the specific event information. This can be seen as a form of optimization so that you keep track of individual events, but also serialize the aggregate with the most updated state in the same place to speed up first level queries. I will have more to say about queries in awhile.

DELETE

The DELETE operations are analogous to UPDATE operations. Subsequently, we can say that the deletion is logical and consists in writing that the entity with a given ID is no longer valid and should not be considered for the purposes of the business. The information in this case is just made of event ID, timestamp, code of the operation, aggregate ID. There is no strict need of having delete specific information unless, for example, the system requires a reason for the deletion. Storage is transparent and can be also in this case whatever works for you.

UNDO

Events lend themselves very well to implement the UNDO functionality. The simplest way to have it done is by deleting physically the last record as if it never happened. In this way, there is a contrast with the philosophy of event sourcing while you never miss a thing and track just anything that happens. UNDO can also be a logical operation, which, for example, allows to track how many times a user attempted to undo things. But personally I wouldn't mind a physical deletion in this specific case. What's far more important instead is not deleting events in the middle of the stream. That would really be dangerous and lead to inconsistent data.

Query... see Replay

Data Projections from Stored Events

The main aspect of event sourcing is the persistence of messages, which enables you to keep track of all changes in the state of the application. By reading back the log of messages, you rebuild the state of the system, and at that point, you get to know the state of the system. This aspect is what is commonly called the replay of events.

Replay

Replay is a two-step operation.

First, you grab all events stored for a given aggregate

Second, you look through all events in some way and extract information from events and copy that information to a fresh instance of the aggregate of choice.

A key function one expects out of an event-based data store is the ability to return the full or partial stream of events. This function is necessary to rebuild the state of an aggregate out of recorded events.

public IEnumerable<GenericEventWrapper> GetEventStream(String id)

{

return DocumentSession

.Query<GenericEventWrapper>()

.Where(t => t.AggregateId == id)

.OrderBy(t => t.Timestamp)

.ToList();

}

As you can see from the code, it's all about querying records in some way using the AggregateId and Timestamp to order or restrict. Notice that you can code the same also using a relational database.

public class GenericEventWrapper

{

public string EventId { get; set; }

public string EventOperationCode { get; set; }

public DateTime TimeStamp { get; set; }

public string AggregateId { get; set; }

public DomainEvent Data { get; set; }

}

That structure of a generic event class depends on the application. It may be different or in some way constrained if you are using some ad-hoc tools for event sourcing. In general terms, an event class can be anything like the following code. Again, key pieces of information are EventId, something that allows to distinguish and easily query the type of the event, Timestamp, AggregateId, and of course, event specific data.

The actual rebuilding of the state consists in going through all events, grab information, and then altering the state of a fresh new instance of the aggregate of choice.

public static Aggregate PlayEvents(String id, IEnumerable<DomainEvent> events)

{

var aggregate = new Aggregate(id);

foreach (var e in events)

{

if (e is AggregateCreatedEvent) aggregate.Create(e.Data);

if (e is AggregateUpdateEvent) aggregate.Update(e.Data);

if (e is AggregateDeletedEvent) aggregate.Delete(e.Data);

}

return aggregate;

}

What you want to do is storing in the fresh instance of the aggregate the current state it has in the system, or the state that results from the selected stream of events. The way you update the state of the aggregate depends on the actual interface exposed by the aggregate itself, whether it's a domain class or relies on domain services for manipulation.

There are a few things to mention about event replay.

1. Replay is not about repeating commands for any generated events. Commands are potentially long-running operations with concrete effects that generate event data. Replay is just about looking into this data and perform logic to extract information from this data.

2. Event replay copies the effects of occurred events and applies that to fresh instances of the aggregate.

3. Stored events may be processed in different ways in different applications. There is a great potential here. Events are data rendered at a lower abstraction level than plain state. From events you can rebuild any projection of data you like, including the current state of aggregates, which is just one possible way of projecting data(Custom Projection of Event Data). Ad-hoc projections can address other, more interesting scenarios, like business intelligence, statistical analysis, what if, and why not simulation.

More specifically, let's say that you have a stream of events. You can extract a specific subset, whether by date or type or anything else. Once you've got the selected events, you can replay them and apply ad-hoc calculations(Perform different calculations) and business processes(Apply different forms of business logic) and extract just the custom new information you were looking for.

Another great point about events and replay of events is that streams are constant data, and because of that, they can be replicated easily and effectively, for example, to enhance the scalability potential of the application. This is actually a very, very pleasant side effect of event immutability.

What if you get to process too many events for rebuilding the desired projection of data? (Performance concern)

Projecting state from logged events might be heavy-handed and impractical with a very large number of events, and the number of logged events in many application be aware can only grow over time, because it's an append only store.

An effective workaround consists of setting a snapshot of the aggregate state or whatever business entities you use at some recent point in time. Instead of processing the entire stream of events, you serialize the state of aggregates at a given point in time and save that as a value. Next you keep track of the snapshot point and replay events for an aggregate from the latest snapshot to the point of interest.

Event Sourcing in Action (SKIP)

Event-based Data Stores

You can definitely arrange an event sourcing solution all by yourself, but some ad-hoc tools are appearing to let you deal with storage of events in a more structured way. The main benefit of using an event aware data store is that the tool like database guarantees essentially business consistency and full respect of the event sourcing approach.

Let's briefly look at one of these event-based data stores

Event Store : geteventstore.com

Event store works by offering an API for plain HTTP and .NET and the API is for event streams. In the jargon of event store, an aggregate equates to a stream in the store. No concerns about the ability of the event store database to manage and store potentially millions of events grouped by aggregates. This is not a concern of yours, the framework underneath the tool is able to work with these numbers. You can do three basic operations on an event stream and event store. You can write events, you can read events, in particular, you can read the last event, a specific event by ID and also a slice of events, and you can subscribe to get updates. This is the representation of an event as it is written to the store. As you can see, the timestamp is managed internally, and you only have an eventId, eventType, data, and there's no way that's important to delete events arbitrarily. Another interesting feature of event store is subscriptions. There are three types of subscriptions. One is volatile, which means that a callback function is invoked every time an event is written to a given stream. You get notifications from this subscription until it is stopped. Another subscription is catch-up, and it means that you'll get notifications from a given event specified by position up to the current end of the stream. So give me all events from this moment onward, and once the end of the stream has been reached, the catch-up subscription turns into a volatile, and you still keep on getting any new event being added to the stream. And finally, the persistent subscription address set the scenario when multiple consumers are waiting for events to process. The subscription guarantees that events are delivered to customers at least once, but possibly multiple times, and if this happens, the order is unpredictable. This solution is specially designed for high scalable systems, collaborative systems, and to be effective it requires a software design that supports the notion of hidden potency, so if multiple times an operation is performed, the effect is always the same. In particular, catch-up subscriptions are good for components called denormalizers, which play a key role in a CQRS, and in CQRS jargon, denormalizers just refer to those components that projections of data for the query stack.

출처

이 모든 내용은 Pluralsight에 Dino Esposito가 올린 'Modern Software Architecture: Domain Models, CQRS, and Event Sourcing'라는 강의의 여섯번째 챕터를 듣고 정리한 것입니다(https://app.pluralsight.com/library/courses/modern-software-architecture-domain-models-cqrs-event-sourcing/table-of-contents). 제가 정리한 것보다 더 많은 내용과 Demo를 포함하고 있으며 최종 Summary는 생략하겠습니다. Microsoft 지원을 통해 한달간 무료로 Pluralsight의 강의를 들으실 수도 있습니다.

저작자표시 비영리 변경금지

'Programming > Architecture' 카테고리의 다른 글

(Modern Software Architecture) Designing Software Driven by the Domain (0)	2018.02.26
(Modern Software Architecture) The CQRS Supporting Architecture (0)	2018.02.24
(Modern Software Architecture) The "Domain Model" Supporting Architecture (0)	2018.02.22
(Modern Software Architecture) The DDD Layered Architecture (0)	2018.02.14
(Modern Software Architecture) Domain-Driven Design (DDD) (0)	2018.02.13

Posted by 코딩하는 경제학도

AND

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

윈도우 앱개발을 향하여

TAG CLOUD

(Modern Software Architecture) Event Sourcing

'Programming > Architecture' 카테고리의 다른 글

ARTICLE CATEGORY

RECENT ARTICLE

RECENT COMMENT

CALENDAR

ARCHIVE

LINK

티스토리툴바