If your SSO platform had a service disruption at 2am tonight, how would your team find out about it?
For most IT operations and security teams, the honest answer is: when the support tickets start arriving in the morning. Someone arrives at work, tries to log in, fails, and raises a ticket. By the time the third ticket arrives, the pattern is clear. By the time the root cause is identified, the disruption has been running for several hours.
This is not a process failure. It is an instrumentation gap. The authentication health signal that would have indicated the disruption at 2am was present in the IAM platform’s event stream the entire time. The success rate dropped. The failure rate climbed. The volume pattern shifted. All of it was in the data. None of it was connected to a monitoring layer that could evaluate it against a threshold and generate an alert before the first user noticed anything.
The same gap applies to security incidents. A credential stuffing attack that ramps up at 3am and stops before dawn leaves its pattern in the authentication event stream. A SAML assertion configuration that broke during a maintenance window produces a consistent failure rate from the moment the change took effect. A compromised account used at unusual hours generates authentication volume that does not match the user’s baseline pattern. All detectable. All invisible if the authentication layer has no monitoring coverage.
This blog covers the architectural difference between a log interface and an authentication telemetry API, what the Login Metrics API with 24-hour trend data exposes, the specific detection scenarios it enables, and how it connects to the event-level signal data from the MFA Failures and Lockout API covered in the previous blog.
The Architectural Difference Between a Log Interface and a Telemetry API
Every IAM platform produces authentication event data. The question is what form it takes and what it is designed for.
A log interface is designed for a human analyst investigating a known problem. They navigate to the log viewer, apply filters, select a time range, and read the results. It answers questions when asked. It does not surface anomalies unprompted. If an analyst does not know a problem exists, the log interface provides no signal. The detection latency of a log interface is the time between an anomaly occurring and the next time a human happens to look at the right screen.
A telemetry API is designed for a monitoring system that evaluates data continuously against defined thresholds. It delivers pre-aggregated metrics in a structured format that a dashboard or alerting rule can consume without a human in the loop. The detection latency of a telemetry API connected to a monitoring layer is the time between an anomaly occurring and the alerting threshold being breached, which is minutes rather than hours.
Most IAM platforms provide the first. Akku’s Login Metrics API provides the second.
What the Login Metrics API Provides
Total Login Count Over Configurable Time Windows
Login volume is the baseline for anomaly detection. Normal Monday morning volume for a given environment might be 2,400 authentication events in the first hour. A Monday morning producing 400 events in that window is an anomaly: a service disruption, a network issue, or a configuration problem blocking users from reaching the authentication layer. Volume anomaly detection requires a baseline. The configurable time windows in the API allow the monitoring layer to compare current volume against historical baselines for the same time window and day of week automatically.
Login Success Rate as a Percentage
Success rate is the most sensitive indicator of authentication health. In a healthy environment with MFA enforcement, success rate sits within a consistent range, typically between 93 and 97 percent accounting for routine MFA failures. A drop to 78 percent is a signal regardless of the cause: broken SAML assertion, misconfigured service account credentials, active attack, or integration failure. The monitoring layer does not need to identify the cause to fire the alert. It needs to know the rate has moved outside the normal range.
Login Failure Rate by Authentication Stage
Failure rate broken down by authentication stage provides diagnostic context. A spike in primary credential failures indicates something different from a spike in MFA failures. Primary credential failures at elevated rates suggest a password sync problem, stale credentials in an integration, or a credential stuffing attempt using out-of-date credentials that fail before reaching the MFA step. MFA failure spikes indicate the attack patterns described in Blog 2 of this series. Stage-level breakdown narrows the investigation before anyone opens a log viewer.
24-Hour Sparkline Trend Data
The 24-hour sparkline provides temporal context. A success rate of 89 percent is alarming if yesterday’s baseline was 96 percent. It is expected if the last 24 hours shows a gradual drift from 92 percent correlating with a known migration. The sparkline is an ordered array of hourly data points covering the previous 24 hours, structured for direct embedding in Grafana, Datadog, and custom JSON-sourced dashboards. The time series format is compatible with Grafana’s native time series panel without transformation. A visual trend line rather than a single number makes the distinction between a sudden anomaly and gradual drift immediately obvious.
Detection Scenarios This Data Enables
SAML and OIDC Integration Failure
A SAML assertion configuration is modified during a maintenance window. The change is syntactically valid but semantically incorrect: the audience restriction is set to the wrong service provider identifier. Every authentication attempt against that integration fails from the moment the change takes effect. Without authentication telemetry, the failure accumulates invisibly until users arrive in the morning and report problems. With a success rate threshold alert connected to the Login Metrics API, the monitoring layer fires within minutes of the configuration change. The investigation identifies the maintenance window as the cause before the first user has noticed anything.
Gradual Certificate Expiry
A certificate used in an OIDC integration approach’s expiry. The relying party begins intermittently rejecting authentication attempts as the validity window narrows. The success rate drifts downward over several days from 96 percent to 88 percent. A sharp drop triggers a threshold alert immediately. A gradual drift requires trend analysis. A monitoring rule comparing the current success rate against the 7-day rolling average fires when the deviation exceeds a defined percentage, detecting the drift long before the certificate expires entirely and authentication fails completely.
Off-Hours Authentication Volume Anomaly
A compromised account is used by an attacker outside the legitimate user’s normal working hours. Authentication volume for that identifier between midnight and 4 am significantly exceeds the user’s normal overnight baseline. Per-identifier volume monitoring built on the Login Metrics API detects this. A rule flagging identifiers with overnight volume exceeding their rolling baseline by a defined percentage surfaces the anomaly before the account owner’s working day begins.
How This Connects to the MFA Failures and Lockout API
The Login Metrics API provides aggregate health data for operational monitoring and threshold alerting. The MFA Failures and Lockout API covered in Blog 2 provides individual event data for SIEM detection rules and investigation. The two APIs address different points in the detection and response chain. When the metrics layer fires a success rate alert, the event layer provides the data to investigate the cause: which specific identifiers are failing, which methods are failing, and whether the pattern matches an attack or a configuration problem. Both are needed for complete authentication visibility.
For ITeS and BPO organisations where authentication availability directly affects client SLA obligations, the combination of metrics-layer alerting and event-layer investigation changes means time to detect and mean time to resolve for authentication incidents. For financial services organisations where RBI’s Cybersecurity Framework requires near-real-time monitoring of authentication anomalies, the Login Metrics API with sub-minute polling latency satisfies the monitoring requirement at the data layer.
What to Check About Your Current Authentication Visibility
Ask whoever is on call tonight this question: if the login success rate across your SSO dropped 20 per cent right now, how long would it take them to find out, assuming no users report anything?
If the answer is measured in hours, authentication health is not part of your current monitoring architecture. The IAM platform is producing the data. Nothing is consuming it for alerting purposes.
The follow-up question: What is the current source of truth for authentication health? If the answer is the IAM platform’s internal admin console rather than a connected monitoring tool with configurable alerting, the detection latency for authentication anomalies is the time between the anomaly occurring and the next time someone manually opens that console. That gap is where incidents live between 2 am and 9 am.
See Akku IAM Authentication Architecture | Explore Akku MFA Capabilities | Talk to the Akku Team
Questions IT Operations and Security Teams Ask About Authentication Telemetry
Q: What is the data format and query interface for the Login Metrics API?
A: The Login Metrics API returns structured JSON. Each response includes the time window covered, total login count, success count, failure count, success rate as a percentage, failure rate as a percentage, and the 24-hour sparkline as an ordered array of objects where each object contains a timestamp and a value. The API supports configurable time windows for aggregate metrics. The sparkline provides hourly data points over the previous 24 hours by default, with the option to request higher granularity at shorter intervals. The format is designed for embedding in dashboard tools that support JSON data sources, including Grafana and Datadog.
Q: How does this API differ from the logs already available in the Akku admin console?
A: The admin console provides access to the full audit log through a user interface designed for manual review and investigation. The Login Metrics API provides pre-aggregated metrics in a format designed for programmatic consumption by monitoring tools and alerting systems. The distinction is between a log interface requiring a human to navigate and interpret the data, and a metrics API delivering structured numerical data ready for automated threshold evaluation. The underlying authentication event data is the same source. The format and delivery mechanism are optimised for different consumers.
Q: Can this data be integrated with Grafana, Datadog, or similar observability platforms?
A: Yes. The structured JSON output is compatible with any observability platform supporting HTTP-based data sources, including Grafana, Datadog, and New Relic. For Grafana, the API can be configured as an Infinity data source. The sparkline data array is directly compatible with Grafana’s time series panel format without a data transformation step. For teams centralising infrastructure and application metrics in an observability platform, the Login Metrics API closes the gap where authentication health was previously absent from the unified operational dashboard.
Q: How does the 24-hour trend data help distinguish attacks from configuration problems?
A: Attacks and configuration problems produce different trend shapes. A sudden step-change drop at a specific timestamp is characteristic of a configuration event. A gradual drift downward over hours or days is more consistent with an expiring certificate or degrading integration. A spike in failure rate affecting only certain MFA methods while others remain normal points to an MFA delivery issue rather than a broad authentication failure. The 24-hour sparkline provides the temporal context needed to distinguish these patterns, which determines whether the appropriate response is a configuration rollback, a certificate renewal, or a security incident response.
Q: How does authentication health monitoring connect to privileged session monitoring in Akku PAM?
A: Authentication health monitoring through the Login Metrics API covers the authentication layer: who is attempting to authenticate, at what volume, and with what success rate. Privileged session monitoring through AkkuReka covers what authenticated privileged users do inside their sessions: commands executed, files accessed, and configuration changes made. The two monitoring layers address different risk surfaces. Authentication monitoring detects anomalies at the perimeter of the identity layer. Session monitoring detects anomalies inside the sessions of users who have already authenticated. For a comprehensive identity security monitoring posture, both layers are needed.
Q: How does login volume monitoring support capacity planning beyond security monitoring?
A: Authentication volume data over configurable time windows provides the baseline for IAM infrastructure capacity planning. Peak authentication times, day-of-week patterns, and growth trends are all derivable from the Login Metrics API output. For organisations planning IAM platform capacity for workforce growth, mergers, or seasonal peaks, the historical volume data provides the input for capacity modelling. The security monitoring use case and the capacity planning use case share the same API output, making the integration investment serve both purposes.
