RS
  • Status
  • Events
  • Monitors
RunReveal Status
All Systems OperationalDegraded PerformanceDowntime PerformanceMaintenance
RunReveal
today
Ingestion API
today
Web
today
Apr 10, 2026
3 hours ago
RunReveal Data Delay
RunReveal
Resolved · April 10 at 4:28 PM (in 1 day)

Incident Report: Service Degradation - April 9, 2026

Summary

On April 9, 2026, RunReveal experienced degraded performance caused by slow query execution on one of our ClickHouse data warehouses following a patch upgrade by ClickHouse Cloud. Data ingestion was delayed by up to 5 hours while we worked to mitigate the issue. The API and web app remained available throughout.

Timeline

  • ClickHouse Cloud applied a routine patch upgrade to our production warehouse.
  • Shortly after, query performance degraded significantly, causing ingestion pipelines to back up.
  • To unblock data ingestion, we migrated to a new ClickHouse warehouse pointing at the same coherent dataset in S3.
  • During recovery, our IP geolocation database (IPDB) failed to load due to degraded performance on a shared EFS volume, preventing some backend services from restarting cleanly.
  • We disabled IP geolocation enrichment to unblock service recovery.
  • Ingestion fully caught up and all services were restored.
  • IP geolocation enrichment was re-enabled after migrating IPDB from shared storage to per-pod downloads from S3.

Impact

  • Data ingestion was delayed by up to 5 hours during the incident window. No data was lost.
  • Log events ingested during the incident were not enriched with IP geolocation data (city, country, etc.). The events themselves were stored normally.
  • Search queries were slow or timed out during the period of ClickHouse degradation.

Resolution

  • We migrated query and ingestion workloads to a new ClickHouse warehouse, restoring normal performance.
  • IP geolocation enrichment was re-enabled after rearchitecting the IPDB delivery mechanism to eliminate the EFS dependency.
  • We are awaiting confirmation from ClickHouse Cloud on the root cause of the query performance degradation following their patch upgrade.
Monitoring · April 9 at 10:28 PM (18 hours earlier)

We were able to mitigate the issue and data ingestion has now fully caught up.

The root cause is still under investigation and we'll have an update for you tomorrow.

RunReveal experienced a period of degraded performance caused by slow query execution on one of our ClickHouse data warehouses.

During the incident, data ingestion was significantly delayed (up to 5 hours), and IP geolocation enrichment was temporarily unavailable. The API and webapp remained available throughout the incident.

Monitoring · April 9 at 7:50 PM (3 hours earlier)

We're enabling all of our queues slowly to warm up our datastore. We've resolved a few unrelated issues that complicated fixing this issue quickly. A more detailed status report will be written and posted here, but we expect service to be fully restored shortly.

Identified · April 9 at 5:57 PM (2 hours earlier)

We're continuing to work on the data delay issue. No data is being lost, and the API / web experience should continue working normally.

We're working on a rollback of an upgrade to our underlying data store and paused data ingestion during this. We will provide an update soon.

Investigating · April 9 at 4:04 PM (2 hours earlier)

We're dealing with a data delay caused by a recent change we released to production

Apr 6, 2026
4 days ago
IP Location Enrichments not processing
Resolved · April 6 at 7:56 PM

We've identified the issue and deploying a fix. Our IP location enrichments, during our migration to kubernetes, were not properly configured in our queue processes. These continued processing over roughly the past week causing logs to not have a the src and dst enrichments across these fields (listing only src fields below, but dst fields have the same with prefix dst).

  • srcASCountryCode
  • srcASNumber
  • srcASOrganization
  • srcCity
  • srcConnectionType
  • srcISP
  • srcLatitude
  • srcLongitude
Mar 24, 2026
17 days ago
App Availability Issues
Web
Monitoring · March 24 at 8:03 PM

We've identified the issue, we're implementing a fix and we'll be monitoring the rollout momentarily.

Investigating · March 24 at 6:58 PM (1 hour earlier)

We're currently experiencing trouble accessing the app. Data ingestion is currently unaffected. We are investigating the issue and we will provide an update as soon as we can.

Mar 4, 2026
1 month ago
Bring your own cloud errors
Resolved · March 4 at 10:10 PM (in 2 hours)

We identified the issue. Atlantis was using the wrong strategy for planning and applying terraform configuration, resulting in branches that had not rebased on the latest main branch to apply out-of-date updates.

We are addressing this by ensuring that atlantis uses the right strategy for applying plans that involves planning and applying the terraform config after merging the changes with the main branch (referred to as the "merge head".

We mitigated the issue at about 1pm PT, but will continue to monitor closely.

Investigating · March 4 at 8:24 PM (2 hours earlier)

We're working to resolve an issue with bring your own cloud deployments of RunReveal that is impacting the API.

This is not impacting data processing or running of detections, however the front end and API is currently not working.

We'll provide an update once this is resolved

View events history

powered by openstatus.dev