What can I do to prevent or reduce message loss in a microservices system?

by Rombix   Last Updated July 12, 2019 16:05 PM

Quite often I have methods that do the following:

  1. Process some data
  2. (frequent, but optional) Save some state to database
  3. Publish a message to a queue / topic

What options do I have to protect myself against transient errors (but not only transient) with #3? Implementing a retry / repeat mechanism is one approach, but it probably won't work if the issue that prevents the message from being sent lasts longer than a few seconds or a few minutes.

Tags : microservices

Answers 1

Invariably, you need to have some sort of transactional mechanism to guarantee that the state change and the message are both done (eventually. At least once.)

That will mean a retry/repeat mechanism, but it also requires an acknowledgement or confirmation that both steps are complete. Usually I've seen this done with the DB being the first step because the DB is usually more resilient. In a literal DB transaction, the change is made and a record is added to a table that says "this message needs to be sent". Some worker process watches it, and pushes messages to a bus/queue. When it succeeds, it deletes the row from that second table. If the worker is dead, then the message is delayed. If the worker dies mid-process, then the message may be duplicated later (which also implies they can show up out of order). Because of this, messages in this system are better as idempotent "here is my new state/version" rather than "here is a delta".

The same sort of thing can work from the other side. Instead of writing the state first, you post a message and one part of that message's processing is to write the new state to some db. Until all of the message processing is complete, you get to persist the message and retry until you get confirmation that everything is done. That too means that processing can be delayed, duplicated, and potentially done out of order. The trade-off here is that the message is out to listeners faster, but might show up before the source of truth hears about it.

July 12, 2019 15:53 PM

Related Questions

Should a microservice expose only one public API?

Updated September 11, 2016 09:02 AM