Metrics

The ComUnity Platform's metrics functionality is a crucial component for monitoring your project's performance, providing an in-depth view of various operational aspects through the Metrics dashboard.

The ComUnity Platform’s metrics functionality is a crucial component for monitoring your project’s performance, providing an in-depth view of various operational aspects through the Metrics dashboard. This dashboard presents critical data points and trends that are vital for maintaining and optimising your project’s health and performance.

The Metrics dashboard is one of four core components of the ComUnity Platform’s Observability framework, alongside Traces, Client Analytics and Logs.

While metrics focus on system performance, traces provide detailed request-level insights, and client analytics capture user behaviour and interaction data.

Key Benefits

Comprehensive Performance Monitoring: Gain insights into key performance indicators such as server response times, enabling you to detect and address performance issues proactively.
Informed Decision-Making: Leverage detailed metrics to make informed decisions, ensuring your project's resources are optimised for peak performance.
Enhanced System Reliability: Monitor system health and performance trends over time, aiding in the prevention of potential issues and ensuring system stability.

Detailed Insights Available on the Metrics Dashboard

Server Response Time: This graph provides a real-time view of your server's response times, helping you identify trends and potential performance bottlenecks.
Concurrent Responses: Monitor the number of concurrent responses your server is handling to understand the load and performance under various conditions.
Accumulative Users: Track the growth of user engagement by viewing the cumulative number of users interacting with your project over time.
Requests per Day (7 Days): Analyse the daily request volume over a week to identify usage patterns, peak times, and potential stress points on your infrastructure.

Accessing Your Project's Metrics Dashboard

By default, the Metrics dashboard includes a set of standard panels that provide insights such as cumulative users and system performance statistics.

The available dashboards may change over time as new metrics are introduced or existing ones are refined. Some metrics may be temporarily disabled to reduce data noise or improve performance.

Metrics are configured and visualised through Grafana.

The default dashboards are automatically generated, but additional or customised dashboards can be created directly in Grafana where access permissions allow.

In shared or hosted environments, users may not have rights to modify or add dashboards.

Alerts can be configured in Grafana to notify your team when key thresholds are reached, for example, when disk usage or response times exceed defined limits.

Future updates will expand available dashboards and may allow users to select or configure which metrics are displayed directly within the platform.

Access the "Metrics" tab in the Observability section. The Metrics dashboard will automatically display, offering a detailed overview of your project's key performance metrics.
Metrics

Understanding Your Metrics Dashboard

When you open the Metrics tab, you'll see your service's health dashboard with several panels showing different aspects of performance.

What the Metrics Tell You

Server Response Time

What it shows: How long your server takes to respond to requests

Healthy range:

APIs: Under 500ms
Web pages: Under 2 seconds
Background processes: Depends on the task

When to investigate:

Sudden spikes (may indicate performance issue)
Gradual increase over time (may indicate resource exhaustion)
Response time consistently above your target

What to check: If response time is high, look at the trace data to identify slow operations.

Concurrent Responses

What it shows: Number of requests being handled simultaneously

What's normal: Varies by service and traffic patterns

When to investigate:

Unusually high (may indicate slow processing or stuck requests)
Drops to zero during business hours (service may be down)

What to check: Compare with Request Rate - if requests are coming in but concurrent responses are low, check for errors.

Accumulative Users

What it shows: Total number of unique users who have accessed your application over time

Use this to:

Track user growth trends
Identify successful features or campaigns
Compare across time periods

Requests per Day (7 Days)

What it shows: Daily volume of requests over the past week

Use this to:

Identify usage patterns (weekday vs weekend)
Spot unusual traffic spikes
Capacity planning

What's normal: Consistent patterns with predictable peaks

When to investigate:

Unexpected spikes (potential attack or viral content)
Sudden drops (service issues or deployment problems)

Reading the Graphs

Time Series Graphs

Most metrics are displayed as line graphs showing values over time.

How to use them:

Hover over the line to see exact values at specific times
Click and drag to zoom into a specific time period
Compare patterns - Does today look different from yesterday?

What to look for:

Spikes - Sudden increases may indicate problems or unusual events
Drops - Sudden decreases may indicate service outages
Trends - Gradual changes over days/weeks indicate capacity needs

Understanding Percentiles

You may see metrics labeled P99 or P95 - these are percentiles.

P99 Latency = 500ms means:

99% of requests complete in under 500ms
Only 1% of requests are slower

Why this matters: Average response time can be misleading. If most requests are fast (50ms) but a few are very slow (10 seconds), the average might look okay while users are experiencing problems.

Rule of thumb:

Focus on P99 for user-facing services (represents worst-case user experience)
P95 is useful for understanding typical performance
P50 (median) shows what "most users" experience

Common Investigation Patterns

Pattern 1: Error Rate Increases

You notice: Error percentage panel shows 5% (was normally <1%)

Steps:

Note the time when errors started
Navigate to Logs and search for errors during that time:
```
{service_name="your-service"} |= "ERROR"
```
Examine error messages to identify the cause
If logs show a trace_id, view the trace for detailed flow

Pattern 2: Latency Spike

You notice: Server Response Time suddenly increases

Steps:

Check if error rate also increased (errors often cause latency)
Look at Concurrent Responses - are requests backing up?
View traces from the spike period to identify slow operations
Common causes:
- Slow database queries
- External API timeouts
- Memory/CPU exhaustion

Pattern 3: Traffic Drop

You notice: Requests per Day shows sudden decrease

Steps:

Check if service is actually down (Concurrent Responses = 0?)
Look for deployment events at that time
Check logs for startup errors or crashes
Verify with team if intentional (maintenance, feature flag change)

Using Time Controls

Selecting Time Ranges

The time range selector (top right of dashboard) lets you focus on specific periods:

Quick ranges:

Last 5 minutes - Real-time monitoring
Last 1 hour - Recent issue investigation
Last 24 hours - Daily pattern analysis
Last 7 days - Weekly trend comparison

Custom range: Click the time range and select specific start/end dates

Tip: Use the refresh interval dropdown to auto-update dashboards every 5-30 seconds when actively monitoring.

Comparing Time Periods

To compare current performance to a baseline:

Note current metrics (e.g., response time = 800ms)
Change time range to yesterday at the same time
Compare values
Look for differences in patterns

Example: If latency is high now but was normal yesterday at the same time, it's likely a new issue (not normal load).

When to Create an Alert

Dashboards are great for investigation, but you can't watch them 24/7. Create alerts for:

Critical issues:

Error rate > 5%
Server response time > 2 seconds for 5+ minutes
Service becomes unreachable

Capacity planning:

Database connections approaching limit
Disk space < 20%
Memory usage > 85%

Business metrics:

Payment processing rate drops
User signups below threshold

See Alerts for how to configure notifications.

Tips for Daily Monitoring

✅ DO:

Check dashboards regularly (daily for production services)
Compare to historical data - Is this normal for this time/day?
Investigate gradual changes - Slow degradation is easy to miss
Use multiple metrics together - Latency + Errors + Requests tells the full story

❌ DON'T:

Panic at single spikes - Brief anomalies are normal
Ignore sustained issues - If it lasts >10 minutes, investigate
Forget about off-peak hours - Problems can start when traffic is low
Rely only on dashboards - Use logs and traces for root cause

Next Steps

See elevated errors? → Search Logs to find specific error messages
Identify slow requests? → View Traces to see detailed request flow
Need to be notified? → Set up Alerts for automatic notifications
Want custom metrics? → Learn about Instrumentation

Technical Details

The metrics system uses:

Grafana for visualisation and dashboards
Prometheus for metrics collection and storage
Thanos for long-term metric retention

Last updated 22 days ago

hashtagKey Benefits

hashtagDetailed Insights Available on the Metrics Dashboard

hashtagAccessing Your Project's Metrics Dashboard

hashtagUnderstanding Your Metrics Dashboard

hashtagWhat the Metrics Tell You

hashtagReading the Graphs

hashtagTime Series Graphs

hashtagUnderstanding Percentiles

hashtagCommon Investigation Patterns

hashtagPattern 1: Error Rate Increases

hashtagPattern 2: Latency Spike

hashtagPattern 3: Traffic Drop

hashtagUsing Time Controls

hashtagSelecting Time Ranges

hashtagComparing Time Periods

hashtagWhen to Create an Alert

hashtagTips for Daily Monitoring

hashtag✅ DO:

hashtag❌ DON'T:

hashtagNext Steps

hashtagTechnical Details

Key Benefits

Detailed Insights Available on the Metrics Dashboard

Accessing Your Project's Metrics Dashboard

Understanding Your Metrics Dashboard

What the Metrics Tell You

Reading the Graphs

Time Series Graphs

Understanding Percentiles

Common Investigation Patterns

Pattern 1: Error Rate Increases

Pattern 2: Latency Spike

Pattern 3: Traffic Drop

Using Time Controls

Selecting Time Ranges

Comparing Time Periods

When to Create an Alert

Tips for Daily Monitoring

✅ DO:

❌ DON'T:

Next Steps

Technical Details