You may be using asynchronous processing with one of the messaging platforms like RabbitMQ, SQS, Beanstalkd, etc already.
Or you may be using one of the frameworks (Ecotone, Symfony Messenger or Laravel Queues) to hide messaging platform details.
No matter what you use, sooner or later you will face errors when processing a message.
So how do you deal with failures, when your code runs asynchronously?
Handling errors with grace may actually change the way you code and make your application more maintainable and more robust.
Auto recover wherever it's possible
All the frameworks comes with functionality to redeliver the message in case of failure.
This is your first line of defense to avoid being called at night or having support shifts.
There are errors that can be auto recovered like connection failures, 3rd party service being unavailable or optimistic locking exceptions.
Our main aim should be to self-heal wherever it's possible. Message will fail from time to time this is unavoidable, what is avoidable is manual intervention.
You want to redeliver those message with increasing delay, to have higher chance of self-heal.
Handling everything within HTTP Request
There may be temptation to handle everything like placing an order, sending an email, taking an payment within HTTP Request.
This may lead to a solution based on try catches and saving the errors so someone can pick it up and fix later.
This makes us write custom code to handle failures and may lead to non recoverable state or manual intervention in order to recover.
System can auto recover without the need to write any additional code. Messaging Frameworks will take care of it, thanks to that your code can focus on the business problems not technical ones.
This is one of the reasons why messaging platforms exists, to help you build more solid and stable code.
Let your HTTP request do single action like placing an order and
as a effect send an event message that will state, that the order was placed.
From there you can subscribe to this event and do the rest in asynchronous manner.
Multiple Handlers for single message
If your event message is handled by multiple handlers, then in case of redelivering such messages due to failure, you may sent email twice or make a second payment.
Some external providers allow for using
imdepodency keys, which allow to handle duplicate calls, but that is not always the case.
The best way is to actually have single action per message, how to achieve that when there is a need for two or more actions to happen?
Let's take as an example situation of placing an order, which in result we want to send an email and take payment from credit card.
The main implementation would look like this:
If there will be a failure, we may may end up with double payment or email.
In order to solve this we need to create custom handlers.
This fixes the main issue, however introduce extra messages that normally would not exists and increase complexities of the code.
We could try to solve this by sending
MakePayment messages instead of
OrderWasPlaced directly after order was placed.
This however will put responsibility on crafting those messages during placing the order and will invert responsibility. Placing an order is a complete action in itself, the above actions are just result of it.
In case of Ecotone we mark given Handler as asynchronous, not the Message.
Then a copy of a message is delivered to each of the handlers.
This means that each asynchronous handler works in atomic way and process the message separately.
In case of failure, only single handler will fail and will be safe to retry.
In case of Laravel Queues, there is no concept of handling Event Messages, everything is a Job that should fulfill given action.
This solves the main problem by design, however it inverts the responsibility and make placing an order action aware of things that normally would subscribe to it a result.
Temporary Interrupted Flow
Developers that are new to messaging architectures tend to build surrounding handling code, instead of allowing messages to fail.
This may be a field in database like
wasEmailSent which is populated by try catch, whenever email sending fails.
We can write code agnostic of the infrastructure errors and keep track of the them without adding extra storage for each of the errors.
Messages are first class citizens, not just a job to perform. They actually tell us a story, how the flow looks like in our system.
Messages may fail for variety of reasons, however if they fail, we will auto recover, and if not, then we investigate, fix and replay.
Treat messages like part of the flow that can be temporary interrupted, after fixing the issue the flow will resume.
There are cases when error will not be recoverable or will take to much time in order to be recovered automatically.
This kind of errors are mostly related to issues with the application code, incorrect calls / compatibility broken with 3rd party API, or service that we use being down for longer period of time.
The application level error can be really beneficial
Unrecoverable errors are places where learning happens, as it may reveals scenarios that we have not thought of before.
For example, as a result of closing an account, we want to terminate electronic wallet, however the wallet has positive balance, which end up in exception.
What should we do in that case, payout the money to customer's bank account or close it anyway?
Those are the errors that may rise questions to our Product Owners / Domain Experts, in order to learn more about how the business works.
Dead Letter Queues
Dead Letter Queue is a place where unrecoverable error messages lands and we need to make manual intervention in order to solve the problem.
After fixing the error, we can replay the error message to handle it correctly and resume the flow.
Dead Letter Queues are your last line of defense in case the errors that can not be recovered automatically.
Let's check how our frameworks are handling those.
Symfony provides way to store unrecoverable errors, in a way so you can review, replay or delete them.
You may review the error messages from the console or directly from the database.
When your failure database storage is down, Symfony will drop your error message and you will not be able to recover it.
Ecotone provides way to store unrecoverable errors, in a way so you can review, replay or delete them.
You can review the errors directly from the console and the database.
In order to control error messages for all your services from single place,
Ecotone Pulse was created. It allows your to review, replay and delete error messages using single application.
When your failure database storage is down, Ecotone will keep your message in the queue, till the moment your database will be back online.
Laravel provides way to store unrecoverable errors, in a way so you can review, replay or delete them.
You can review the errors directly from the console and the database.
php artisan queue:work redis --tries=3 --backoff=3
When your failure database storage is down, Laravel will keep your message in the queue, till the moment your database will be back online.
Existing frameworks provide variant of battle tested solutions, that will help you in building more solid and stable applications.
Some errors will happen no matter of how well we test or design our code, that is why we need support tooling that will help us recover from those.
In the end, it's about customers having good experience, which means system working in stable way, even when add tons of new features :)