Empower your work with Enterprise Data & Technology monthly insights
Even with our extensive tagging, a structured news feed will need to be interpreted and can have duplicated information. Let's consider an example: the Keystone oil pipeline in the US was unexpectedly shut down on July 3, 2024, which resulted in a spike of oil price just minutes after the first headline (Chart 1). To accurately track this event, we want to follow news headlines with the same topic code.
Based on Bloomberg News, users can apply large language models (LLMs) as a versatile tool for NLP (Natural Language Processing) analysis of these headlines. Using LLMs and embedding techniques, they can compute similarity between different headlines (Chart 2) and therefore identify duplicated information (Table 1). The number of headlines in a cluster can indicate the significance of the event being reported.
Once duplicates have been identified, prompt engineering is used to extract features such as 'Is the WTI oil market affected by the event?' or 'Is the event likely to affect oil supply?' (Table 2). Note that the LLM will be able to answer these questions much faster and in more volume than a human would. These features can then be used to generate market and volatility signals.