Here is a summary of our updated plan. It has the benefits of triggering email sending events directly from the satellite (notifications should be more prompt), while maintaining Customer.io as the email sender (rather than sending emails directly from the satellite). In addition, it should be easy to batch events this way, as @cameron mentioned, and to add new events in the future, such as the ones @Toyoo suggested.
The plan outline:
- Add a new satellite table, called
node_events
, or similar. It will have columnsemail
,node_id
,event_type
(e.g. an enum representing “offline”, “disqualified”, “online”, etc…),email_sent
(nullable timestamp), andcreated_at
- When a “reputation event” occurs (node gets disqualified, for example), add a new row to the
node_events
table. We already have code for these “triggers” written. We just need to replace the line that sends the email with a line that adds a row to the new table - Add a new satellite chore which does the following:
- select the oldest row in
node_events
wherenotified=false
and wherecreated_at
is at least 5 minutes ago (or some other configured buffer time) - call thisr
- select all rows in
node_events
wherenotified=false
andemail=r.email
, grouped byevent_type
- this way, if one email is associated with 10 nodes that go offline at the same time, these events will be grouped together - compile each event type for this email address, and send an event to customer.io indicating that this email address needs to be sent an email for
event_type
for one or more nodes (providing a list of node IDs to customer.io) - set
email_sent=true
for all these rows - repeat - if no rows returned, wait 5 minutes (or some other configured buffer time) and repeat
- select the oldest row in
Advantages of this approach:
- customer.io can still handle the emails. We don’t need to spend time engineering our own solution to deal with unsubscribing, checking open rate, getting off spam lists, etc…
- “reputation change” events are triggered directly from satellite. No more dataflow/customer.io segment/redash query annoyances
- should guarantee prompt email sending (within 5 or 10 mins of event occurring, which is much better than our current process)
- should combine multiple emails of the same type (e.g. node offline) for the same email address when they occur close to each other (within 5 minutes). Less spam, in other words.
- new useful table
node_events
which has utility outside of email sending. We can get a detailed history of any node’s reputation events - it is not very different from the original design, and a lot of the code that has already been written can be preserved
- while it still makes use of customer.io, the end-to-end process of how these emails are triggered and sent should be a lot more understandable to the average developer