Named Entity Recognition
Named Entity Recognition (NER) detects named entities in text.
The NER model uses natural language processing to find a variety of named entities. For each entity extracted, NER also returns the location of the entity extracted (offset and length), and a confidence score, which is a value 0 to 1.
Supported Languages for Input Text
- English
- Spanish
Use Cases
You could use the NER endpoint effectively in these scenarios:
- Classifying content for news providers
-
It can be difficult to classify and categorize news article content. The NER model can automatically scan articles to identify the major people, organizations, and places in them. The extracted entities can be saved as tags with the related articles. Knowing the relevant tags for each article helps with automatically categorizing the articles in defined hierarchies, and content discovery.
- Customer support
-
Recognizing relevant entities in customer complaints and feedback, product specifications, department details, or company branch details, helps to classify the feedback appropriately. The entities can then be forwarded to the person responsible for the identified product.
Similarly, there could be feedback tweets where you can categorize them all based on their locations, and the products mentioned.
- Efficient search algorithms
-
You could use NER to extract entities that are then searched against the query, instead of searching for a query across the millions of articles and websites online. When run on articles, all the relevant entities associated with each article are extracted and stored separately. This separation could speed up the search process considerably. The search term is only matched with a small list of entities in each article, leading to quick and efficient searches.
It can be used for searching content from millions of research papers, Wikipedia articles, blogs, articles, and so on.
- Content recommendations
-
Extracting entities from a particular article, and recommending the other articles that have the most similar entities mentioned in them is possible with NER. For example, it can be used effectively to develop content recommendations for a media industry client. It enables the extraction of the entities associated with historical content or previous activities. NER compares them with the label assigned to other unseen content to filter relevant entities.
- Automatically summarizing job candidates
-
The NER model could facilitate the evaluation of job candidates, by simplifying the effort required to shortlist candidates with numerous applications. Recruiters could filter and categorize them based on identified entities like location, college degrees, employers, skills, designations, certifications, and patents.
Supported Entities
The following table describes the different entities that NER can extract. The entity
type and subtype depends on the API that you call
(detectDominantLanguageEntities
or
batchDetectDominantLanguageEntities
).
To maintain backward compatibility, the
detectDominantLanguageEntities
wasn't modified when we
introduced the concept of subtype. We recommend that you use the
batchDetectDominantLanguageEntities
endpoint because the
service uses types and subtypes. The isPii
property was dropped
to introduce the batching API so you can compute it with the supported entity
types as in the following table.
Entity (Full Name) | Entity Type (In Prediction) | Entity Subtype (In prediction) | Single Record API / Batch API (if blank, both APIs are consistent) | Is PII | Description |
---|---|---|---|---|---|
DATE |
DATE |
Single record |
X |
Absolute or relative dates, periods, and date range. Examples: "10th of June", "third Friday in August" "the first week of March" |
|
DATETIME |
DATE |
Batch | |||
EMAIL |
EMAIL |
√ | |||
EVENT |
EVENT |
Χ | Named hurricanes, sports events, and so on. | ||
FACILITY |
FACILITY |
Single record | Χ | Buildings, airports, highways, bridges, and so on. | |
LOCATION |
FACILITY |
Batch | |||
GEOPOLITICAL ENTITY |
GPE |
Single record | Χ | Countries, cities, and states. | |
LOCATION |
GPE |
Batch | |||
IP ADDRESS |
IPADDRESS |
√ | IP address according to IPv4 and IPv6 standards. | ||
LANGUAGE |
LANGUAGE |
Χ | Any named language. | ||
LOCATION |
LOCATION |
Χ | Non-GPE locations, mountain ranges, bodies of water. | ||
CURRENCY |
MONEY |
Single record |
X |
Monetary values, including the unit. | |
QUANTITY |
CURRENCY |
Batch | |||
|
NORP |
Χ | Nationalities, religious or political groups. | ||
ORGANIZATION |
ORG |
Χ | Companies, agencies, institutions, and so on. | ||
PERCENTAGE |
PERCENT |
Single record | Χ | Percentage. | |
QUANTITY |
PERCENTAGE |
Batch | |||
PERSON |
PERSON |
√ | People, including fictional characters. | ||
PHONENUMBER |
PHONE_NUMBER |
√ |
Supported phone numbers:
|
||
PRODUCT |
PRODUCT |
Χ | Vehicles, tools, foods, and so on (not services). | ||
NUMBER |
QUANTITY |
Single record | Χ | Measurements, as weight or distance. | |
QUANTITY |
NUMBER |
Batch | X | ||
TIME |
TIME |
Single record |
Χ
|
Anything less than 24 hours (time, duration, and so on). | |
DATETIME |
TIME |
Batch | |||
URL |
URL |
√ | URL. |
Examples
Input Text | Entities and Scores |
---|---|
|
|
|
|
The JSON for the first example is:
- Sample Request
-
POST https://<region-url>/20210101/actions/batchDetectLanguageEntities
- API Request format:
-
"{ "documents": [ { "key": "doc1", "text": " Red Bull Racing Honda, the four-time Formula-1 World Champion team, has chosen Oracle Cloud Infrastructure (OCI) as their infrastructure partner." } ] }"
- Response JSON:
-
{ "documents": [ { "key": "1", "entities": [ { "offset": 0, "length": 15, "text": "Red Bull Racing", "type": "ORGANIZATION", "subType": null, "score": 0.9914557933807373, "metaInfo": null }, { "offset": 16, "length": 5, "text": "Honda", "type": "ORGANIZATION", "subType": null, "score": 0.6515499353408813, "metaInfo": null }, { "offset": 27, "length": 9, "text": "four-time", "type": "QUANTITY", "subType": null, "score": 0.9998091459274292, "metaInfo": [ { "offset": 27, "length": 9, "text": "four-time", "subType": "UNIT", "score": 0.9998091459274292 } ] }, { "offset": 47, "length": 5, "text": "World", "type": "LOCATION", "subType": "NON_GPE", "score": 0.5825434327125549, "metaInfo": null }, { "offset": 79, "length": 27, "text": "Oracle Cloud Infrastructure", "type": "ORGANIZATION", "subType": null, "score": 0.998045802116394, "metaInfo": null }, { "offset": 108, "length": 3, "text": "OCI", "type": "ORGANIZATION", "subType": null, "score": 0.9986366033554077, "metaInfo": null } ], "languageCode": "en" } ], "errors": [] }
Limitations
-
Sometimes, entities might not be separated or combined as you expect.
-
NER uses the context of the sentence to identify entities. If the context isn't present in the text processed, entities might not be extracted as you expect.
-
Malformed text (structure and semantics) might reduce the performance.
-
Age isn't a separate entity so age-related periods might be identified as a date entity.