Priming LLMs to “reason” on newsworthiness and find stories in real estate data
Team Real Estate Alerter is a collaboration between E24 (Norway), Hearst (USA), Detroit Free Press & Gannett (USA) and The Globe and Mail (Canada) as part of the 2024 JournalismAI Fellowship. Journalists and technologists from the organisations aimed to harness anomaly detection methods and LLMs to uncover hidden news stories within real estate data. Here, they discuss their progress through the 2024 JournalismAI Fellowship.
Real Estate Alerter aims to find newsworthy anomalies in real estate data. For our low-resolution prototype, at the point where we are now, we are using clustering for anomaly detection and then testing to leverage LLMs to detect newsworthiness.
Work was done to pre-process and clean sample data from Norway before going through this pipeline.
Preprocessing along with clustering helped us identify the outliers. These outliers were then fed into an LLM, which was instructed to review them for newsworthiness. The LLM prompts were designed based on newsroom interviews with real estate reporters to define newsworthiness criteria. The low-resolution prototype generated relevant and insightful outputs while also showing false positives which the team is working to minimize by refining the system.
Among the identified anomalies, the following examples can be highlighted:
Transactions with prices significantly higher or lower than the average for similar properties.
If a property was subject to foreclosure.
If a property is used for recreation or industrial use.
Here’s an example of a newsworthy transaction identified during our pilot phase, in the style of how the newsroom would receive an alert:
Transaction Amount: 11,200,000 NOK
Building Type: Other
Property Use: Farming/fishing
Transaction Type: Forced sale
Municipality Info: Gjerdrum
Context: Forced sales are generally newsworthy due to the circumstances leading to such transactions. Additionally, properties used for farming/fishing in Gjerdrum are relatively unique, adding an extra layer of interest for our audience.
The context above represents the LLMs “reasoning” of newsworthiness.
Thus, the pilot successfully demonstrated the potential of the Real Estate Alerter project. By leveraging Machine Learning techniques and Generative Artificial Intelligence, we identified newsworthy anomalies and a process to generate relevant alerts.
We aim to enhance our system with more comprehensive datasets and refined newsworthiness criteria. More comprehensive datasets bring their own challenges to separate valid results from noise. Success in selecting the right data elements to describe what is newsworthy could lead to a system that will significantly benefit newsrooms and the real estate beat.
Some lessons we learned so far:
Cooperation on a nine-hour time difference can give us a feeling of working around the clock, but also creates some practical difficulties, particularly when it comes to finding well-fitted meeting times.
Data is not equal across geographies, and even when it is, geographical context necessitates different approaches to the data and analysis of it. E.g. in Europe buildings that were finished in the years after WW2, were set up quickly to house people as fast as possible. They were not necessarily the nicest buildings and have a lower price today. This is not the same for the US and Canada.
Also, some countries have a lot of cheap quality data, while others have little, none or very expensive data.
Be creative in the problem-solving process. Just the question, “How do we define newsworthiness?”, and then the natural question to follow, “How do we then compute this newsworthiness?” gives us some interesting ideas and side projects that could turn out to be more scalable than our main project.
We will leave it on this slight cliffhanger for now. But we do, as well, promise further updates in a future post.
To learn more about team Real Estate Alerter’s work, email us at lakshmi@journalismai.info. The 2024 JournalismAI Fellowship brought together 40 journalists and technologists to collaboratively work on using AI to improve journalism, its systems and processes. Read more about it here.