Examples of Semantic Clustering
The nlp
command can be used to extract keywords from a
string field, or to cluster records based on these extracted keywords. Keyword
extraction can be controlled using a custom NLP dictionary. If no dictionary is
provided, the default Oracle-defined dictionary is used.
Topics:
For more information on semantic clustering, see Semantic Clustering.
Cluster Kernel Errors in Linux Syslog Logs
The following query clusters Kernel messages in Linux Syslog Logs:
'Log Source' = 'Linux Syslog Logs' and kernel
| link cluster()
| where 'Potential Issue' = '1'
| nlp table = 'iSCSI Errors' cluster('Cluster Sample') as 'Cluster ID',
keywords('Cluster Sample') as Summary
| sort 'Cluster ID'
In the above query:
-
link cluster()
runs the traditional cluster and returns aCluster Sample
field. -
nlp cluster('Cluster Sample')
processes eachCluster Sample
and assigns a Cluster ID. Messages that have similar meaning would get the same Cluster ID. -
keywords('Cluster Sample')
extracts the keywords used in clustering. This is returned in theSummary
field.
The following image shows the link results returned:
-
The first and second rows are not similar, and hence get different cluster IDs.
-
The third and fourth rows have similarity in the Cluster Sample. This can be seen in the overlap of keywords extracted in the
Summary
field. -
By default, a 70% overlap is required to form a cluster. This can be overridden using the
similarity
parameter to cluster. -
The Cluster ID generated is deterministic. Thus, the Cluster ID can be used as a shortcut for the list of keywords shown in the Summary column.
Use similarity
to Control the Number of Clusters
Running cluster using the default dictionary and a lower similarity threshold would produce fewer clusters:
'Log Source' = 'Linux Syslog Logs' and kernel
| link cluster()
| where 'Potential Issue' = '1'
| nlp similarity=0.2 cluster('Cluster Sample') as 'Cluster ID',
keywords('Cluster Sample') as Summary
| sort 'Cluster ID'
This merged some of the rows into the existing clusters, as well as reduced the number of clusters:
Cluster the Database Alert Logs
The following query shows an example of semantically clustering Database Alert Logs:
'Log Source' = 'Database Alert Logs'
| link cluster()
| nlp cluster('Cluster Sample') as 'Cluster ID',
keywords('Cluster Sample') as Summary
| where Summary != null
| classify 'Start Time', Summary, 'Cluster ID' as 'Database Messages'