PowerAuth Cloud Monitoring
The PowerAuth Cloud (PAC) is instrumented for monitoring and observability. It exposes:
- Prometheus-compatible metrics
- Application logs with distributed tracing correlation
- Standard Spring Boot health and probe endpoints
Observability is provided by the Spring Boot framework with support for Prometheus. All standard configuration options and built-in metrics are available. These capabilities are available out of the box in all environments; only integration with your monitoring stack (Prometheus, logging, Kubernetes, etc.) is required. Optionally, you can connect PowerAuth Cloud to an observability platform supporting OpenTelemetry (OTEL).
Health Checks & Probes
The service exposes standard Spring Boot Actuator health endpoints for use by load balancers and orchestration platforms (such as Kubernetes). All endpoints are enabled, when authenticated with the ADMIN or MONITORING role, you may see details.
Endpoints
- Overall health
GET /powerauth-cloud/actuator/health - Liveness probe
GET /powerauth-cloud/actuator/health/liveness - Readiness probe
GET /powerauth-cloud/actuator/health/readiness
Metrics (Prometheus)
By default, the metrics are not enabled in PowerAuth Cloud. Please note that PowerAuth Cloud consists of several internal components and single actuator endpoint is not possible.
If enabled the internal components exposes runtime metrics in Prometheus format via Spring Boot Actuator using the plain-text Prometheus exposition format.
Endpoints: PowerAuth Cloud Exposed at internal port 8000.
- Prometheus format
GET /powerauth-cloud/actuator/prometheus
- Spring Boot JSON format
GET /powerauth-cloud/actuator/metrics
Enrollment Server Exposed at internal port 8081.
- Prometheus format
GET /enrollment-server/actuator/prometheus
- Spring Boot JSON format
GET /enrollment-server/actuator/metrics
Push Server Exposed at internal port 8080.
- Prometheus format
GET /powerauth-push-server/actuator/prometheus
- Spring Boot JSON format
GET /powerauth-push-server/actuator/metrics
PowerAuth Server Exposed at internal port 8080.
- Prometheus format
GET /powerauth-java-server/actuator/prometheus
- Spring Boot JSON format
GET /powerauth-java-server/actuator/metrics
Configuration:
Set the following configuration properties to enable metric endpoints and their exposure over HTTP:
JAVA_OPTS=-Dpowerauth.service.scheduled.job.operationCleanup=600000 -Dmanagement.endpoint.metrics.enabled=true -Dmanagement.endpoint.prometheus.enabled=true -Dmanagement.endpoints.web.exposure.include=metrics,prometheus -Dmanagement.prometheus.metrics.export.enabled=true
You also need to export internal ports i.e. 8000, 8080 and 8081 to access the endpoints. Please note context /powerauth-cloud is protected by a configured authorization (i.e., Basic or OAuth)
JMX Exporter
The unified metrics can be gathered by JMX Exporter
1) Mount the JMX Exporter Java agent to directory /app/config/.
2) Create a tomcat config tomcat.yml in the directory:
startDelaySeconds: 0
lowercaseOutputName: true
lowercaseOutputLabelNames: true
rules:
- pattern: ".*"
3) Configure the following environment variables:
# Java options to enable the OpenTelemetry agent (example for Tomcat)
JAVA_OPTS=-javaagent:/app/config/jmx_prometheus_javaagent.jar=9000:/app/config/tomcat.yml
Other Monitoring Options
PowerAuth Cloud can be monitored via other tools as well. Example configuration to enable monitoring via OpenTelemetry:
1) Mount the Open Telemetry Java agent to directory /app/config/.
2) Configure the following environment variables:
# Java options to enable the OpenTelemetry agent (example for Tomcat)
JAVA_OPTS=-javaagent:/app/config/opentelemetry-javaagent.jar -Dotel.jmx.target.system=tomcat
# OpenTelemetry environment variables (with example values):
# The OTLP endpoint to which traces/metrics are exported
OTEL_EXPORTER_OTLP_ENDPOINT=https://otel-collector.example.com:4317
# Optional: Custom headers for authentication or other purposes
OTEL_EXPORTER_OTLP_HEADERS=Authorization=Bearer <token>
# Optional: Comma-separated list of HTTP request headers to capture
OTEL_INSTRUMENTATION_HTTP_SERVER_CAPTURE_REQUEST_HEADERS=x-request-id,x-b3-traceid
# Resource attributes describing the service (e.g., environment, region)
OTEL_RESOURCE_ATTRIBUTES=deployment.environment=prod,region=eu-central-1
# The logical service name for traces
OTEL_SERVICE_NAME=next-step-server
Application Logging & Distributed Tracing
The service produces structured application logs and participates in distributed tracing using the W3C Trace Context standard. Logs are written to standard output (stdout), which is suitable for containerized environments.
Each log entry is enriched (when tracing is active) with correlation identifiers, typically:
traceId– ID of the distributed tracespanId– ID of the current span within that trace
These fields allow correlating log messages across multiple services that participate in the same request.
Tracing (W3C traceparent)
The service supports the W3C traceparent header for incoming and outgoing HTTP calls to allow end-to-end request inspection and performance analysis.
For incoming requests:
- If traceparent (or compatible headers) are present, the service joins the existing trace.
- If no trace headers are present, the service starts a new trace.
For outgoing requests:
- HTTP clients used by the service automatically propagate the current trace context to downstream systems.
By default, the service accepts multiple tracing header formats (W3C, B3, B3 multi) and uses the W3C format for outgoing headers.
Configuration: You can change which header formats are consumed and produced by setting the following properties:
# Accept multiple header formats
# Produce B3 format instead of W3C (example)
JAVA_OPTS= -Dmanagement.tracing.propagation.consume=B3,B3_MULTI,W3C -Dmanagement.tracing.propagation.produce=B3
Structured Logging
The PowerAuth components support structured logging all entry points to system logs in the following manner:
INFO action: RegisteredAuthenticatorsRequest, state: initiated, userId: a7e28c79-99c9-4ba9-8890-87aab9a51254, applicationId: WAU-dev
INFO action: RegisteredAuthenticatorsRequest, state: succeeded, size: 1
This enables better machine gathering of logs and indexing by the keywords:
- action
- state
- userId
- applicationId
- registrationId
- activationId
- tokenId
This enables fast identification of a entity causing problem. The full log is then possible using the tracing.
Log example:
powerauth-cloud-server-1 | 2025-12-08T21:20:44.583Z INFO 49 --- [powerauth-java-server] [io-8080-exec-10] [1afd08b4cb31b5ff20bb85921ff3c270-3f4a329573f2c53e] i.g.s.p.a.s.c.api.ApplicationController : action: getApplicationList, state: initiated
powerauth-cloud-server-1 | 2025-12-08T21:20:44.611Z INFO 49 --- [powerauth-java-server] [io-8080-exec-10] [1afd08b4cb31b5ff20bb85921ff3c270-3f4a329573f2c53e] i.g.s.p.a.s.c.api.ApplicationController : action: getApplicationList, state: succeeded
Monitoring Targets
This section describes what should be monitored when operating the PowerAuth Cloud in production and how it maps to the metrics exposed by the server itself (via Spring Boot / Micrometer / Prometheus).
Note: Metric names below are the defaults commonly produced by Spring Boot + Micrometer Prometheus registry. Exact names may differ slightly depending on framework versions and configuration.
Resource Utilization
CPU Utilization
Monitor CPU usage per instance to detect overloads or misbehaving deployments.
What to watch
- Average CPU utilization per pod/instance.
- Sustained high CPU (e.g., ~80% or more).
- Sudden spikes correlated with increased traffic or specific operations.
Metrics produced by the server
process_cpu_usage
Fraction of CPU used by the JVM process (0.0–1.0).system_cpu_usage
Overall system CPU usage on the node (if exposed).
Memory Utilization
Monitor memory to catch leaks and OOM situations before they happen.
What to watch
- Overall memory usage close to container limits.
- JVM heap usage and its trend over time.
- Frequent or long garbage collection pauses (if exposed).
Metrics produced by the server
jvm_memory_used_bytes{area="heap"}jvm_memory_max_bytes{area="heap"}jvm_gc_pause_seconds_maxjvm_gc_pause_seconds_sumjvm_gc_pause_seconds_count
It is useful to watch max values and trends over time. Percentiles for GC pause duration (if configured and exposed) can also be used to monitor the worst-case behaviour.
Database Connection Pool (HikariCP)
PowerAuth Cloud uses a JDBC connection pool HikariCP. Monitoring the pool is critical to detect database saturation and connection issues.
Pool Utilization
What to watch
- How close the pool is to its configured maximum size.
- Whether there are threads waiting for a connection.
- Whether acquiring a connection is becoming slow or timing out.
Metrics produced by the server
hikaricp_connections{pool="HikariPool-NextStep", state="active"}hikaricp_connections{pool="HikariPool-NextStep", state="idle"}hikaricp_connections{pool="HikariPool-NextStep", state="pending"}hikaricp_connections{pool="HikariPool-NextStep", state="max"}
Key signals:
state="active"regularly close tostate="max"→ pool saturation.state="pending"consistently > 0 → threads are waiting for DB connections.
Connection Acquisition Time / Timeouts
What to watch
- Time needed to obtain a connection from the pool.
- Maximum observed acquisition times.
- Number of timeouts when acquiring a connection.
Metrics produced by the server
hikaricp_connections_acquire_seconds_sum{pool="HikariPool-NextStep"}hikaricp_connections_acquire_seconds_count{pool="HikariPool-NextStep"}hikaricp_connections_acquire_seconds_max{pool="HikariPool-NextStep"}
Use:
*_maxto see how bad the slowest acquisitions are in recent intervals.- The combination of
*_sumand*_countto understand general acquisition time behaviour. - If histogram buckets are configured for this timer, percentiles (p95/p99 acquisition time) are good indicators of waiting time under load, but how to compute them depends on your monitoring stack.
Timeouts are typically visible via:
- Specific HikariCP timeout metrics (if enabled), and/or
- Increased HTTP 5xx responses for endpoints that perform database operations.
API Performance & Reliability
API behaviour reflects user-facing health. Use HTTP server metrics to monitor throughput, error rate, and latency.
Traffic / Throughput
What to watch
- Requests per second overall and for key API endpoints.
- Unexpected drop to zero requests during normal operating hours.
- Sudden spikes that may overload the service or downstream systems.
Metrics produced by the server
(Exact labels may vary.)
http_server_requests_seconds_count{...}
Common labels include:
uri– endpoint pattern (e.g.,/...)method– HTTP method (GET, POST, …)status– HTTP status codeoutcome–SUCCESS,CLIENT_ERROR,SERVER_ERROR, etc.application/service–powerauth-nextstep
Success Rate (Error Rate)
What to watch
- Ratio of successful responses (2xx) vs. server errors (5xx).
- Error rate per key endpoint (e.g., powerauth-cloud/v2/operation APIs).
Metrics produced by the server
Use the same http_server_requests_seconds_count metric filtered by labels:
- Successful responses:
http_server_requests_seconds_count{outcome="SUCCESS", ...} - Server errors:
http_server_requests_seconds_count{outcome="SERVER_ERROR", ...}
Monitoring proportion of server errors vs. total requests is recommended. Percentiles are not needed here; focus on counts and relative ratios over time.
Response Time / Latency
What to watch
- Response times (median and tail behaviour) overall and for key endpoints.
- Latency spikes, especially for operations involving the database or external systems.
Metrics produced by the server
http_server_requests_seconds_sum{...}http_server_requests_seconds_count{...}http_server_requests_seconds_max{...}
These metrics describe how long requests take in total and the maximum observed duration in a given period. If HTTP request histograms are enabled, percentiles (p95/p99 latency) are very useful to monitor “worst case” performance of the API, especially under load.
Health & Availability
Use the Actuator health endpoints in combination with metrics to see if the service is able to accept traffic.
What to watch
- Status returned by:
/powerauth-cloud/actuator/health/powerauth-cloud/actuator/health/liveness/powerauth-cloud/actuator/health/readiness
- Frequency of transitions to
DOWNorOUT_OF_SERVICE. - Pod restarts and probe failures (usually observed via Kubernetes/platform metrics rather than application metrics).
While the health endpoints themselves are not Prometheus metrics, they are part of the overall monitoring picture and should be correlated with:
- Resource metrics (CPU, memory),
- Database pool metrics (HikariCP),
- API metrics (HTTP request counts, errors, latency),
to diagnose issues affecting PowerAuth Cloud’s availability and performance.
Log Volume & Severity
In addition to raw logs in the central logging system, PowerAuth Cloud can expose log event counters as metrics. Monitoring the rate of log messages at different severity levels helps to detect problems early:
- A rise in ERROR logs can indicate internal failures or bugs in the server.
- A rise in WARN logs often indicates issues in external systems (downstream services, databases, message brokers) that are affecting PowerAuth Cloud, even if the server is still partially functioning.
Note: The exact metric names and availability depend on the logging metrics binder being enabled (e.g., Micrometer Logback metrics). The names below assume the standard Spring Boot + Micrometer + Logback setup.
What to watch
- Number and rate of ERROR and WARN log events over time.
- Sudden spikes in ERROR-level logs (potential incident).
- Gradual or repeated increase in WARN-level logs, especially when correlated with external dependency problems (timeouts, connection issues, etc.).
Metrics produced by the server
Typical logging metrics:
logback_events_total{level="ERROR"}logback_events_total{level="WARN"}
Key signals:
- A sustained increase in
logback_events_total{level="ERROR"}indicates that the server is frequently encountering errors and may require immediate investigation. - An elevated or gradually increasing
logback_events_total{level="WARN"}may indicate that some external component (external API, message broker) is unstable or misconfigured, and the server is compensating but not yet failing hard. - Large changes in the ratio of ERROR/WARN logs to INFO logs can be used as an additional early-warning indicator.
These log-level metrics should be monitored together with other metrics to understand whether the problem is internal to the server or caused by external dependencies.