Any UDM/HSS KPIs actionable to focus on for improvement?

Hi Core Experts,

I am looking for the top four UDM/HSS KPIs that are most important and actionable to focus on for improvement.

If anyone has worked on this before, could you please share any documents, suggestions, guidance, or best practices? for Huawei

Thank you in advance.

1 Like

Here is a structured response you can provide to the user, focusing on the most “actionable” KPIs that drive actual performance improvements.

Top 4 Actionable UDM/HSS KPIs

1. Registration Success Rate (SR)

This is the “bread and butter” of subscriber management. If this drops, users can’t attach to the network.

* Why it’s actionable: Low SR often points to specific issues like Diameter/Sbi signaling congestion, authentication failures (triplet/quintet retrieval issues), or synchronization errors (SQN out of range).

* Huawei Specifics: Monitor L.HSS.Reg.Succ vs L.HSS.Reg.Att.

2. Authentication Success Rate

This measures the ability of the UDM/HSS to challenge and verify a subscriber.

* Why it’s actionable: Failures here are often due to HSS/UDM database sync issues or mismatching security algorithms (MILENAGE/TUAK). Frequent “Sync Failure” causes usually mean the HLR/HSS is out of sync with the USIM, which can be fixed by adjusting resynchronization procedures.

3. Mean Processing Latency (Request Response Time)

This measures how long the UDM/HSS takes to process a message (e.g., S6a, Sh, or N8/N10 interfaces).

* Why it’s actionable: High latency is a precursor to a crash or “Signaling Storm.” It is usually solved by load balancing across Front-End (FE) nodes or checking for CPU bottlenecks in the User Data Repository (UDR/CUDB).

4. Database (UDR/CUDB) Access Success Rate

The HSS/UDM is just a “brain” without its “memory” (the UDR).

* Why it’s actionable: If the FE can’t talk to the BE (Backend), the whole node fails. Actionable steps include checking the LDAP/SQL link health and ensuring the provisioning flow (MML/Soap) isn’t locking the database during peak hours.

Best Practices for Improvement (Huawei Context)

* Audit Signaling Retries: Check if the MME/AMF is retrying too aggressively. Reducing the retry timers on the Core nodes can prevent the HSS from being overwhelmed during a network recovery.

* Check “User Not Found” Trends: A high rate of “User Not Found” errors often indicates a provisioning lag or a mismatch between the HLR and HSS data during migration.

* Traffic Shaping: Use Huawei’s Diameter Congestion Control (DCC) or Sbi over-load control to prioritize existing subscribers over new attachments during high-load scenarios.

Note: Always differentiate between Functional Failures (Network issues) and User Failures (Wrong SIM, expired subscription) to ensure you aren’t chasing “ghost” technical issues that are actually commercial ones.

you should focus on specific MML commands and Counter IDs. In Huawei’s SingleSDB or CloudUDM architecture, the KPIs are often split between the Front-End (FE) and the Backend (UDR/CUDB).

Here is a technical draft:

Technical KPI Deep-Dive: Huawei UDM/HSS
To effectively improve performance, focus on these specific Huawei-defined counters and the logic behind them.

  1. Registration & Location Update Success Rate (S6a/N8)
    This is the primary indicator of mobility management health.
  • Key Counters: * HSS: L.HSS.S6a.ULR.Succ / L.HSS.S6a.ULR.Att
    • UDM: L.UDM.N8.Nudm_UECM_Registration.Succ
  • Actionable Steps: If the SR is low, run the DSP S6ACFG or DSP N8CFG commands to check interface status. Analyze the Result-Code 5xxx (Permanent Failures) vs 3xxx/4xxx (Transient/Protocol Failures). If you see high DIAMETER_TOO_BUSY (3004), you must adjust the Flow Control parameters in the FE.
  1. Authentication Vector Retrieval Success Rate
    Critical for the initial attach and security procedures.
  • Key Counters: * HSS: L.HSS.S6a.AIR.Succ / L.HSS.S6a.AIR.Att
    • UDM: L.UDM.N10.Nudm_UEAU_Get.Succ
  • Actionable Steps: High failure rates here often point to USIM/AuC mismatch. Use the LST USIM command to verify the K/OPC/Algorithm configuration. If Sync Failures are high, investigate the SQN (Sequence Number) range settings in the HSS data template.
  1. Database Access Latency (UDR/CUDB)
    Since Huawei uses a de-layered architecture, the FE performance depends entirely on the Backend (BE).
  • Key Counters: L.SDB.BE.Query.MeanTime or L.UDR.LDAP.Search.Latency
  • Actionable Steps: If latency exceeds 20ms-50ms, check for CUDB CPU Load. You may need to optimize the LDAP Indexing or check if the provisioning system (SPG/BOSS) is performing massive “Batch Queries” that are locking the database threads.
  1. Notification & Termination Success Rate
    This measures the HSS/UDM’s ability to “push” data back to the network (e.g., when a profile changes or a user is kicked off).
  • Key Counters: L.HSS.S6a.IDR.Succ (Insert Subscriber Data) and L.HSS.S6a.CLR.Succ (Cancel Location).
  • Actionable Steps: Failures here usually mean the MME/AMF is unreachable or the SCTP association is flapping. Check the DSP SCTPLNK status and look for “Path Down” alarms in the alarm log.
    Recommended Optimization “Toolkit” (Huawei Specific)
  • Signaling Trace: Use the U2020/NCE “Signaling Trace” tool specifically for the IMSI/MSISDN having issues. Look for the Diameter Error Code or HTTP2 Cause Code.
  • Congestion Control: Check the Flow Control settings using LST FLOWCTRL. If the network is under stress, ensure that Low Priority traffic (like SMS-over-S6s) is throttled before High Priority traffic (ULR/AIR).
  • Data Audit: Regularly run the HSS/HLR Data Consistency Check tool provided in the Huawei Maintenance terminal to find “zombie” subscribers that cause unnecessary DB overhead.

many thanks brother