%pip install pandas
%pip install pydantic-ai
%pip install duckduckgo-search
%pip install trafilatura
AI classification agent
You find yourself staring at a dataset with tens or hundreds of thousands of rows. Maybe you want to get up-to-date FOIA contact details for all government departments in your country, or to find out which political donors have links to the fossil fuels industry. What do you do?
This kind of work is time-consuming and challenging. However, Large Language Models (LLMs) like those powering ChatGPT can help journalists automate simple research and classification tasks that would take an unreasonably long time to do otherwise.
In this session, we’ll outline how to use LLMs, search engines and web scraping to help us identify links between Donald Trump and his donors. You can download the notebook and run it yourself, or you can run it in the cloud using Google Colab. Both links also in the sidebar to the right, or at the bottom of the page on mobile.
Install and load libraries
First, we’ll need to install some libraries to help us call different LLMs and retrieve search results.
Then, we’ll import those libraries.
import os
# data
import pandas as pd
# scraping
from urllib import request
import ssl
import requests
from trafilatura import extract
# validation
from typing import List, Tuple
from pydantic import BaseModel
from typing import Literal
# AI
from pydantic_ai import Agent, Tool
from pydantic_ai.common_tools.duckduckgo import duckduckgo_search_tool
# make async work in jupyter
import nest_asyncio
apply() nest_asyncio.
Finally, we need to define some of the API keys we’ll be using. You can get your own OpenAI API key or from another provider.
Get a temporary API key here.
"OPENAI_API_KEY"] = "" os.environ[
Prep data
We’ll work with some data from the US Federal Election Commission (FEC) on donations to Donald Trump’s inaugural committee. Because a downloadable version isn’t provided, we’ll scrape the data directly from the website.
= request.urlopen("https://docquery.fec.gov/cgi-bin/forms/C00894162/1889684/f132",
response =ssl._create_unverified_context())
context= response.read()
html
= pd.read_html(html)[0] fec_raw
We need to fix some of the formatting issues, like removing dollar signs and converting dates to a more standard format.
= fec_raw.copy()
fec = fec.iloc[:-1] # the last row is a summary
fec
'Date Donation Received'] = pd.to_datetime(
fec['Date Donation Received'],
fec[format='%m/%d/%Y' # Format for "31/12/2025"
)
'Donation Amount'] = fec['Donation Amount'].str.replace('$', '', regex=False)
fec['Donation Amount'] = pd.to_numeric(fec['Donation Amount'])
fec[
= fec.drop(columns=["Donor's Aggregate Donations To Date"]) fec
Some donors donated multiple times, so we’ll want to group them together and only classify them once.
= fec.groupby('Name')['Donation Amount'].sum().reset_index()
fec_total = fec_total.sort_values(by='Donation Amount', ascending=False)
fec_total = fec_total.head(5) # To keep things simple, we're only going to use the first 5 rows of the dataset.
fec_total 'data', exist_ok=True)
os.makedirs('data/fec_unclassified.csv', index=False) fec_total.to_csv(
We should be ready to go. Let’s see what the data looks like.
fec_total
Name | Donation Amount | |
---|---|---|
618 | PILGRIM'S PRIDE CORPORATION | 5000000.00 |
652 | RIPPLE LABS, INC. | 4889345.33 |
824 | WARREN A STEPHENS | 4000000.00 |
667 | ROBINHOOD MARKETS, INC. | 2000000.00 |
360 | JARED ISAACMAN | 2000000.00 |
Trying to make AI lie less
Large Language Models are prone to hallucinations. If you ask an LLM a question it doesn’t know the answer to, it will confidently make up a plausible-sounding answer that is completely wrong. This is particularly the case with lesser known organisations or individuals that wouldn’t feature promionently in the training data.
One way to minimise (but not completely eliminate) hallucinations is by “grounding” the LLM with some factual data like documents or databases. This is sometimes a technique called Retrieval-Augmented Generation (RAG).
In our case, we want to allow out agent to search the internet to look for additional information about each donor.
We could try scraping search results from Google like we did with the FEC data, but Google doesn’t like that, and they will put various roadblocks in your way such as CAPTCHAs, rate limits or even blocking your IP address. Instead, they want you to use their Custom Search API, which is paid and limited to 10k queries per day.
Other search engines like Brave provide generous free tiers and more reasonable pricing, and there are some services that specifically cater to AI applications, like Tavily and Perplexity.
We’ll use the DuckDuckGo API since it’s free (but rate-limited), doesn’t require signing up for an API key, and comes included with the agents framework we’ll be using.
We also want to give our AI the ability to visit each of those search results and extract the text from there. We’ll use a library called Trafilatura since it extract the main text without headers, footers and other irrelevant bits that we probably don’t need for the classification.
def extract_text(urls: List[str]) -> List[Tuple[str, str]]:
"""
Extract text content from a list of URLs.
Args:
urls: A list of URLs to scrape
Returns:
A list of tuples containing (url, extracted_text)
"""
= []
results
for url in urls:
# print(f"Scraping {url}...")
try:
= requests.get(url, timeout=30, verify=False)
response
response.raise_for_status()= extract(response.text, output_format="markdown")
extracted_text
results.append((url, extracted_text))except requests.RequestException as e:
# print(f"Error scraping {url}: {str(e)}")
f"Error: {str(e)}"))
results.append((url,
return results
We’ll eventually give our agent access to both the duckduckgo search function, as well as the extract_text
function that we just created.
Creating machine-readable data
Most modern LLMs are trained to produce structured output, which means they can return data in a specific format. This is useful for producing consistent outputs when we’re asking the model to classify or research multiple items.
We can define the output format we expect using a Pydantic model.
class TrumpConnection(BaseModel):
str
donor_industry: 'Family', 'Business', 'Political', 'Other', 'No connection', 'Don\'t know or not enough information']
type_of_connection: Literal[str
explanation: str source_urls:
Building agents
If we have an idea about what we want to search for, we could use a simple workflow where we first search for the donor’s name, then extract the text from the first 5-10 results, and finally feed them to an LLM. An earlier version of this project used this approach, and it worked well enough.
Since then, AI agents have become a lot more prominent. Agents are systems in which LLMs are allowed to choose their own steps, use tools appropriate for different tasks and make decisions about when to stop.
The code required to create an agent from scratch is fairly simple, but there are several frameworks which come with batteries included. We’ll use PydanticAI, but most frameworks provide a similar set of features.
First, let’s create the agent. We’ll choose a model to use, give it a system prompt, define the tools it should have access to, and the structured output we expect.
= Agent(
agent 'openai:gpt-4.1-mini',
='Answer the question to the best of your abilities, using the tools at your disposal (`duckduckgo_search_tool` and `extract_text`).',
system_prompt=[duckduckgo_search_tool(), Tool(extract_text)],
tools=TrumpConnection,
output_type={'temperature': 0.0}
model_settings )
Let’s check whether that works. We’ll ask the agent to classify a single donor, and see what it returns.
= agent.run_sync(
result 'What is the connection between WARREN A STEPHENS and Donald Trump?'
)
result.all_messages()
[ModelRequest(parts=[SystemPromptPart(content='Answer the question to the best of your abilities, using the tools at your disposal (`duckduckgo_search_tool` and `extract_text`).', timestamp=datetime.datetime(2025, 5, 21, 10, 32, 20, 9940, tzinfo=datetime.timezone.utc), dynamic_ref=None, part_kind='system-prompt'), UserPromptPart(content='What is the connection between WARREN A STEPHENS and Donald Trump?', timestamp=datetime.datetime(2025, 5, 21, 10, 32, 20, 9955, tzinfo=datetime.timezone.utc), part_kind='user-prompt')], instructions=None, kind='request'),
ModelResponse(parts=[ToolCallPart(tool_name='duckduckgo_search', args='{"query":"WARREN A STEPHENS and Donald Trump connection"}', tool_call_id='call_Nm06XTWC5CNY7d0N114NoE86', part_kind='tool-call')], usage=Usage(requests=1, request_tokens=226, response_tokens=26, total_tokens=252, details={'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0, 'cached_tokens': 0}), model_name='gpt-4.1-mini-2025-04-14', timestamp=datetime.datetime(2025, 5, 21, 10, 32, 27, tzinfo=datetime.timezone.utc), kind='response', vendor_id='chatcmpl-BZb2Jc45GK38dNYhKUptZR8KJeyRv'),
ModelRequest(parts=[ToolReturnPart(tool_name='duckduckgo_search', content=[{'title': 'Warren Stephens - Wikipedia', 'href': 'https://en.wikipedia.org/wiki/Warren_Stephens', 'body': 'Warren Amerine Stephens (born February 18, 1957) is an American diplomat and businessman who is the designate to serve as the United States ambassador to the United Kingdom under President Donald Trump in his second administration.He is the chairman, president and chief executive officer (CEO) of Stephens Inc., a privately held investment bank. [1]On December 2, 2024, President-elect Trump ...'}, {'title': "Trump's UK Ambassador Has Investments in Industries Set to Benefit from ...", 'href': 'https://www.desmog.com/2025/05/15/donald-trump-uk-ambassador-warren-stephens-investments-industries-set-to-benefit-from-trade-deal/', 'body': "Stephens Inc., founded in 1933, has a $7 billion portfolio listed with the SEC, making it one of the largest privately owned investment banks in the U.S.. Warren Stephens is listed as chairman of the company, having also served as its CEO and president prior to being appointed as Trump's ambassador to the UK in February. The firm is now run ..."}, {'title': 'Trump picks investment banker Warren Stephens as UK ambassador - BBC', 'href': 'https://www.bbc.co.uk/news/articles/c20e79nlpe6o', 'body': 'President-elect Donald Trump has nominated businessman Warren Stephens to serve as America\'s ambassador to the United Kingdom. "Warren has always dreamed of serving the United States full time ...'}, {'title': 'Trump names billionaire investment banker Warren Stephens as his envoy ...', 'href': 'https://apnews.com/article/trump-transition-appointments-ambassador-52e4744db02a0f8e078685a7a754835d', 'body': 'WASHINGTON (AP) — President-elect Donald Trump has named billionaire investment banker Warren Stephens as his envoy to Britain, a prestigious posting for the Republican donor whose contributions this year included $2 million to a Trump-backing super PAC. Trump, in a post on his Truth Social site Monday evening, announced he was selecting Stephens to be the U.S. ambassador to the Court of ...'}, {'title': "Who Is Warren Stephens: Net Worth, Family, Prison Scandal, and Trump's ...", 'href': 'https://www.ibtimes.co.uk/who-warren-stephens-net-worth-family-prison-scandal-trumps-pick-uk-ambassador-1729149', 'body': 'Warren Stephens, a billionaire investment banker and CEO of Stephens Inc., has been nominated by President-elect Donald Trump to serve as the next U.S. Ambassador to the United Kingdom.'}, {'title': "Trump's UK Ambassador Stands to Benefit From New Trade Deal - Jacobin", 'href': 'https://jacobin.com/2025/05/trump-uk-ambassador-trade-deal', 'body': "President Donald Trump's UK ambassador, Warren Stephens, holds shares worth hundreds of millions in the key sectors that stand to benefit from last week's UK-US trade agreement. Stephens played a key role in pushing through the deal, according to the United Kingdom's ambassador to the United States, Peter Mandelson, and campaigners have raised concerns that he will personally benefit."}, {'title': 'Meet Warren Stephens, Investment Banker Picked By Trump To Be UK ...', 'href': 'https://www.timesnownews.com/world/us/us-news/who-is-warren-stephens-donald-trump-uk-ambassador-pick-age-net-worth-career-family-controversiesotherdetails-article-115912443', 'body': 'Warren Stephens, a multibillionaire investment banker, has been nominated by President-elect Donald Trump to be the next American ambassador to the United Kingdom. Stephens, the chairman of Stephens Inc., brings to the position his conservative political connections and decades of economic success. The nomination comes amid rumors that former U.K. Ambassador Woody Johnson will return, and ...'}, {'title': "Donald Trump news: US President's man in London, Warren Stephens, backs ...", 'href': 'https://thenightly.com.au/business/donald-trump-news-us-presidents-man-in-london-warren-stephens-backs-aukus-partnership-c-18749933', 'body': 'Donald Trump\'s new ambassador to the UK has used his first public speech to back the AUKUS partnership with Britain and Australia. Warren Stephens highlighted how "vital the US-UK relationship is to our countries and to the world" at an event in parliament attended by UK Prime Minister Keir ...'}, {'title': 'Warren Stephens: the Arkansan Spurs fan Trump is sending to London', 'href': 'https://www.ft.com/content/ec972106-e168-42ee-9805-b9854bb65b82', 'body': "Warren Stephens took control of his family's investment bank in 2005 after the death of his father, Jack, just as Wall Street entered a debt bubble that would nearly collapse the global ..."}, {'title': 'Trump Picks Warren Stephens, Billionaire Investment Banker, for U.K ...', 'href': 'https://www.nytimes.com/2024/12/02/us/politics/britain-ambassador-trump-stephens.html', 'body': "Warren Stephens, an investment banker, gave $2 million in 2016 to a group aiming to block Donald J. Trump's political rise. More recently, he backed Asa Hutchinson, Chris Christie, Mike Pence ..."}], tool_call_id='call_Nm06XTWC5CNY7d0N114NoE86', timestamp=datetime.datetime(2025, 5, 21, 10, 32, 28, 44169, tzinfo=datetime.timezone.utc), part_kind='tool-return')], instructions=None, kind='request'),
ModelResponse(parts=[ToolCallPart(tool_name='extract_text', args='{"urls":["https://en.wikipedia.org/wiki/Warren_Stephens","https://www.bbc.co.uk/news/articles/c20e79nlpe6o","https://apnews.com/article/trump-transition-appointments-ambassador-52e4744db02a0f8e078685a7a754835d","https://www.ibtimes.co.uk/who-warren-stephens-net-worth-family-prison-scandal-trumps-pick-uk-ambassador-1729149","https://www.nytimes.com/2024/12/02/us/politics/britain-ambassador-trump-stephens.html"]}', tool_call_id='call_9R70VTnj65ktLbgkGHTUrOps', part_kind='tool-call')], usage=Usage(requests=1, request_tokens=1368, response_tokens=146, total_tokens=1514, details={'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0, 'cached_tokens': 0}), model_name='gpt-4.1-mini-2025-04-14', timestamp=datetime.datetime(2025, 5, 21, 10, 32, 30, tzinfo=datetime.timezone.utc), kind='response', vendor_id='chatcmpl-BZb2MDc6jiTXRvu9F1iTnwSp2sC6C'),
ModelRequest(parts=[ToolReturnPart(tool_name='extract_text', content=[('https://en.wikipedia.org/wiki/Warren_Stephens', '# Warren Stephens\n\n\nWarren Stephens | |\n|---|---|\n| United States Ambassador to the United Kingdom | |\nAssuming officeTBD | |\n| President | Donald Trump |\n| Succeeding | Jane D. Hartley |\n| Personal details | |\n| Born | Warren Amerine Stephens February 18, 1957 Little Rock, Arkansas, U.S. |\n| Spouse | Harriet Stephens |\n| Children | 3 |\n| Parent |\n|\n| Education | Washington and Lee University (BA) Wake Forest University (MBA) |\n| Occupation | Chairman, President, and CEO of Stephens Inc. |\n\n**Warren Amerine Stephens** (born February 18, 1957) is an American diplomat and businessman who is the designate to serve as the United States ambassador to the United Kingdom under President Donald Trump in his second administration. He is the chairman, president and chief executive officer (CEO) of Stephens Inc., a privately held investment bank.[1]\n\nOn December 2, 2024, President-elect Trump announced his nomination of Stephens to serve as the ambassador to the United Kingdom.[2] The U.S. Senate confirmed his nomination in a 59–39 vote.[3] He was sworn in on April 30, 2025.[ citation needed]\n\n## Early life and education\n\n[edit]Stephens was born in Little Rock, the son of Jackson T. Stephens and Mary Amerine Stephens. Warren\'s father, "Jack" Stephens, and his uncle, "Witt" Stephens, partnered as investors and financiers in the investment firm, Stephens Inc.[4]\n\nWarren began his education in Little Rock, and in 1975, graduated from Trinity Presbyterian High School in Montgomery, Alabama. He graduated from Washington and Lee University in 1979 with a BA in economics, and received his MBA from Wake Forest University in 1981.[5]\n\n## Career\n\n[edit]Stephens joined his father and uncle in the investment banking business in Little Rock, which had 139 employees. At that time, the firm resembled and operated much like one of the old British merchant banks, investing the firm\'s and family funds in various businesses and ventures, and it still operates the same way today. Stephens Inc is noted for handling the IPO of Wal-Mart Stores in 1970.[4]\n\nStephens began work as an associate in the corporate finance department, concentrating on oil and gas. He became head of the department in 1983 and spent a lot of time on mergers and acquisitions. On February 18, 1986, Stephens was appointed president and CEO of Stephens Inc.[4]\n\nIn 1990, he was the senior advisor to Tyson Foods in their acquisition of Holly Farms in a nine-month takeover battle.[4] He is only the third chairman, president and CEO in the firm\'s more than 90 years of operations since 1933.[6]\n\nIn 2006, Stephens acquired 100 percent of the outstanding shares of Stephens Inc from the other family members.[4]\n\nStephens is on the board of Dillard\'s.[7]\n\nIn January 2025, Warren Stephens revealed his decision to step down from his role as CEO of Stephens Inc., as he gets ready to take on the position of U.S. ambassador to the United Kingdom. His sons, Miles and John Stephens, have been named co-CEOs, representing the third generation of family leadership at the firm.[8]\n\n## Political involvement\n\n[edit]A Republican, he supported Bob Dole in 1996, Steve Forbes in 1999, and has supported Mike Huckabee.[9] Stephens was a bundler for Mitt Romney in 2012.[10] He has been critical of presidents Bill Clinton and Barack Obama.[11][12][9] During the 2016 election, Stephens and his brother Jackson Stephens were major financial supporters of the Stop Trump movement.[13]\n\nDuring the 2020 presidential election, Stephens donated more than $3 million to Super PACs supporting Trump. In the 2024 Republican presidential primaries, Stephens was a major financial backer of Asa Hutchinson and Nikki Haley.[14][15] During the 2024 presidential campaign, Stephens also contributed to MAGA Inc., a Super PAC aligned with Donald Trump.[16]\n\nOn December 2, 2024, President-elect Donald Trump announced Stephens as the nominee to serve as the United States ambassador to the United Kingdom.[17] On February 12, 2025, his nomination was sent to the Senate.[18] His nomination was confirmed by a vote of 59–39 on April 29, 2025.[3] He is awaiting the presentation of his credentials.\n\n## Philanthropy\n\n[edit]Stephens was on the board of trustees of his alma mater, Washington and Lee University.[5] Stephens and his wife Harriet are benefactors of various organizations, most notably the Episcopal Collegiate School and the Arkansas Arts Center, both in Little Rock.[19][20]\n\n## References\n\n[edit]**^**"Warren Stephens".*Stephens*.**^**"Trump picks investment banker Warren Stephens as UK ambassador".*BBC News*. Retrieved December 3, 2024.- ^\n**a**Solomon, Scott (April 29, 2025). "Warren Stephens receives confirmation to serve as U.S. Ambassador to United Kingdom".**b***KATV*. Retrieved April 29, 2025. - ^\n**a****b****c****d**"Stephens History".**e***Stephens*. Archived from the original on December 8, 2014. Retrieved December 11, 2014. - ^\n**a**"W&L".**b***Washington & Lee*. **^**"University of Arkansas/Walton College/Arkansas Business Hall of Fame/Warren A. Stephens". Retrieved December 7, 2023.**^**"Dillard\'s Board of Directors".*Bloomberg Business Week*. Archived from the original on December 13, 2014.**^**Clarke, Paul. "Warren Stephens\' UK ambassador role triggers succession at family owned investment bank".*www.fnlondon.com*. Retrieved May 16, 2025.- ^\n**a**Steven Barnes, \'PRIVATE SECTOR; Understatement, at $5 Billion\', in**b***The New York Times*, November 21, 1999 [1] **^**Schouten, Fredreka (July 13, 2012). "Mitt Romney announces three bundlers".*USA Today*. Retrieved October 15, 2013.**^**"Warren Stephens".*Forbes*. Archived from the original on May 28, 2023.**^**BusinessWeek profile**^**Mider, Zachary (March 21, 2016). "Arkansas Billionaire Warren Stephens A Leading Stop-Trump Donor". Southwest Times Record. Archived from the original on August 17, 2016. Retrieved July 18, 2016.**^**Schwartz, Brian (July 31, 2023). "Almost all of Trump\'s biggest 2020 super PAC donors refrain from giving to his rivals". NBC News.**^**Reston, Maeve; Morse, Clara Ence; Knowles, Hanna (February 1, 2024). "Trump spent more than $55 million in donor money on legal fees last year, filings show".*Washington Post*.**^**Woodward, Alex. "Reclusive Republican megadonor drops $50m into Trump super PAC". Independent.**^**"Donald Trump nominates billionaire investment banker Warren Stephens to be US ambassador to UK".*Sky News*. United States. December 2, 2024. Retrieved December 3, 2024.**^**"Nominations Sent to the Senate" (Press release). The White House. February 12, 2025.**^**"Episcopal Collegiate School: History".*Episcopal Collegiate School*.**^**"The King of Little Rock".*Barron\'s*.\n\n- 1957 births\n- Living people\n- 20th-century American businesspeople\n- 21st-century American businesspeople\n- 21st-century American diplomats\n- Ambassadors of the United States to the United Kingdom\n- American billionaires\n- American chief executives of financial services companies\n- American investment bankers\n- Arkansas Republicans\n- Businesspeople from Little Rock, Arkansas\n- People named in the Paradise Papers\n- Second Trump administration personnel\n- Wake Forest University alumni\n- Washington and Lee University alumni\n- Washington and Lee University trustees'), ('https://www.bbc.co.uk/news/articles/c20e79nlpe6o', '# Trump picks investment banker Warren Stephens as UK ambassador\n\n- Published\n\n**President-elect Donald Trump has nominated businessman Warren Stephens to serve as Americaâs ambassador to the United Kingdom.**\n\nâWarren has always dreamed of serving the United States full time,â Trump said in a statement on Truth Social. âI am thrilled that he will now have that opportunity as the top diplomat, representing the USA to one of Americaâs most cherished and beloved allies.â\n\nStephens, who is CEO of private Arkansas-based investment bank Stephens Inc, donated millions to Trumpâs re-election campaign this year.\n\nIf his nomination is successful, Stephens will help to maintain the so-called âspecial relationshipâ between the US and the UK.\n\nThe two countries maintain close military, intelligence, and cultural ties.\n\nTrump has vowed to rework the USâ relationships abroad, vowing to implement an âAmerica-firstâ ethos to foreign policy.\n\nThe role of US ambassador to the UK is one of the most coveted diplomatic positions, and presidents have been known to bestow the role to prominent backers.\n\nDuring his first term in office, Trump appointed Woody Johnson, a top Republican donor and owner of the American football team the New York Jets, as UK ambassador.\n\nBarack Obama chose long-time Democratic Party fundraiser and lawyer Louis Susman as his first emissary to the UK - or the Court of St James.\n\nStephens did not always back Trumpâs candidacies, however. In 2016, when Trump first ran for president, Stephens donated about $4m (£3.1m) to political groups that sought to stop Trumpâs ascent, the Arkansas Democrat-Gazette reported at the time.\n\nThe investment banker also donated over $2m to a political action committee that supported former UN Ambassador Nikki Haley as she unsuccessfully ran for president in 2024, the campaign finance watchdog OpenSecrets found.\n\nWhen Trump ultimately triumphed in this yearâs Republican presidential primary, however, Stephens threw his support behind him. Federal campaign finance data show he also donated to Republican groups and US Senate candidates this year.\n\nIn a statement, Stephens said he was honoured by the nomination.\n\n"I have expressed to President Trump that I would be extremely proud to serve our country and his administration, working to implement the Presidentâs agenda and further strengthen the long-standing alliance between the United States and the United Kingdom,â he said.\n\nHe and his wife also maintain a philanthropic organisation, the Harriet and Warren Stephens Family Foundation, which has donated to the Arkansas Museum of Fine Arts, the University of Mississippi, and the Episcopal Collegiate School in Little Rock.\n\n- Published2 December 2024\n\n- Published1 December 2024\n\n- Published25 February\n\n\nNorth America correspondent Anthony Zurcher makes sense of US politics in his twice weekly US Election Unspun newsletter. Readers in the UK can sign up here. Those outside the UK can sign up here.'), ('https://apnews.com/article/trump-transition-appointments-ambassador-52e4744db02a0f8e078685a7a754835d', '# Trump names billionaire investment banker Warren Stephens as his envoy to Britain\n\nWASHINGTON (AP) — President-elect Donald Trump has named billionaire investment banker Warren Stephens as his envoy to Britain, a prestigious posting for the Republican donor whose contributions this year included $2 million to a Trump-backing super PAC.\n\nTrump, in a post on his Truth Social site Monday evening, announced he was selecting Stephens to be the U.S. ambassador to the Court of Saint James. The Senate is required to confirm the choice.\n\n“Warren has always dreamed of serving the United States full time. I am thrilled that he will now have that opportunity as the top Diplomat, representing the U.S.A. to one of America’s most cherished and beloved Allies,” Trump said in in his post.\n\nStephens is the chairman, president and CEO of Little Rock, Arkansas-based financial services firm Stephens Inc., having taken over the firm from his father.\n\nTrump has already named many of his nominees for his Cabinet and high-profile diplomatic posts, assembling a roster of staunch loyalists. Over the weekend, Trump announced he intends to nominate real estate developer Charles Kushner, father of Trump’s son-in-law Jared Kushner, to serve as ambassador to France.\n\nDuring his first term, Trump selected Robert “Woody” Johnson, a contributor to his campaign and the owner of the New York Jets football team, as his representative to the United Kingdom.'), ('https://www.ibtimes.co.uk/who-warren-stephens-net-worth-family-prison-scandal-trumps-pick-uk-ambassador-1729149', 'Error: 403 Client Error: Forbidden for url: https://www.ibtimes.co.uk/who-warren-stephens-net-worth-family-prison-scandal-trumps-pick-uk-ambassador-1729149'), ('https://www.nytimes.com/2024/12/02/us/politics/britain-ambassador-trump-stephens.html', 'Error: 403 Client Error: Forbidden for url: https://www.nytimes.com/2024/12/02/us/politics/britain-ambassador-trump-stephens.html')], tool_call_id='call_9R70VTnj65ktLbgkGHTUrOps', timestamp=datetime.datetime(2025, 5, 21, 10, 32, 33, 605096, tzinfo=datetime.timezone.utc), part_kind='tool-return')], instructions=None, kind='request'),
ModelResponse(parts=[ToolCallPart(tool_name='final_result', args='{"donor_industry":"Investment banking and financial services","type_of_connection":"Political","explanation":"Warren A. Stephens is a billionaire investment banker and CEO of Stephens Inc. He was nominated by President Donald Trump to serve as the United States ambassador to the United Kingdom in Trump\'s second administration. Stephens has been a significant political donor, including donations to Trump-aligned Super PACs and campaigns. Although he initially supported efforts to stop Trump in 2016, he later became a major financial supporter of Trump and his aligned political groups. His appointment as ambassador is a political connection stemming from his support and donations to Trump and his administration.","source_urls":"https://en.wikipedia.org/wiki/Warren_Stephens, https://www.bbc.co.uk/news/articles/c20e79nlpe6o, https://apnews.com/article/trump-transition-appointments-ambassador-52e4744db02a0f8e078685a7a754835d"}', tool_call_id='call_lyRikPjRKAIGUgXvxW7r9lxc', part_kind='tool-call')], usage=Usage(requests=1, request_tokens=4513, response_tokens=206, total_tokens=4719, details={'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0, 'cached_tokens': 0}), model_name='gpt-4.1-mini-2025-04-14', timestamp=datetime.datetime(2025, 5, 21, 10, 32, 34, tzinfo=datetime.timezone.utc), kind='response', vendor_id='chatcmpl-BZb2QjD4o50FAzQDPPzPcwg6DiQV0'),
ModelRequest(parts=[ToolReturnPart(tool_name='final_result', content='Final result processed.', tool_call_id='call_lyRikPjRKAIGUgXvxW7r9lxc', timestamp=datetime.datetime(2025, 5, 21, 10, 32, 38, 978804, tzinfo=datetime.timezone.utc), part_kind='tool-return')], instructions=None, kind='request')]
As you can see above, our agent started by searching the internet, read the search results, selected a few of the most relevant ones, read the text from those pages, and finally produced an output in the format we defined earlier.
Running on the full dataset
We now have a working agent that can search the internet, read websites and return structured outputs. We can now have it run on each row of our dataset and return a full classification for each donor.
First, let’s define a simple function that incorporates the agent run shown above.
def classify_donor(donor: str):
= agent.run_sync(f'What is the connection between {donor} and Donald Trump?')
result return result.output
Now, we can run it on the scraped data from the FEC website.
def apply_classify_donor(df):
# Apply the classify_donor function to each row
'classification'] = df.apply(lambda row: classify_donor(donor=row['Name']), axis=1)
df[
# Convert the Pydantic objects to dictionaries
= df['classification'].apply(lambda x: x.model_dump())
classification_dicts
# Create a new DataFrame from the classification dictionaries
# This ensures the index matches the original dataframe
= pd.DataFrame(classification_dicts.tolist(), index=df.index)
classification_df
# Combine original data with classification data
= pd.concat([df.drop(['classification'], axis=1), classification_df], axis=1)
result_df
return result_df
= apply_classify_donor(fec_total) fec_classified
fec_classified
Name | Donation Amount | donor_industry | type_of_connection | explanation | source_urls | |
---|---|---|---|---|---|---|
618 | PILGRIM'S PRIDE CORPORATION | 5000000.00 | Poultry processing | Business | Pilgrim's Pride Corporation, a major poultry p... | https://readsludge.com/2025/04/21/trumps-top-i... |
652 | RIPPLE LABS, INC. | 4889345.33 | Cryptocurrency and blockchain technology | Political | Ripple Labs, Inc. has a political connection w... | https://abcnews.go.com/US/sec-drops-case-crypt... |
824 | WARREN A STEPHENS | 4000000.00 | Investment banking and financial services | Political | Warren A. Stephens is a billionaire investment... | https://en.wikipedia.org/wiki/Warren_Stephens,... |
667 | ROBINHOOD MARKETS, INC. | 2000000.00 | Financial Technology (Fintech) | Political | Robinhood Markets, Inc. made a $2 million dona... | https://www.benzinga.com/markets/equities/24/1... |
360 | JARED ISAACMAN | 2000000.00 | Casinos and hospitality (Trump Taj Mahal casino) | Business | Jared Isaacman was sued by Trump's Taj Mahal c... | https://www.firstpost.com/world/trumps-nasa-no... |
We now have a full classification for each donor which we can export to a CSV file, analyse further or review manually.
Next steps
The example above shows a minimal working example of how to use LLMs to research a dataset. However, there are several ways in which this process could be improved. Here are some ideas:
- Play with the prompt. We asked a simple question, but the more specific you are, the better the results. Write a good prompt, include examples and iterate.
- Make the AI reason. Allowing an LLM to “reason” about its next steps can help it produce better outputs.
- Use different search engines. DuckDuckGo is free and good for a prototype. However, their API isn’t meant to be used this way and will often return errors, especially when used on bigger datasets. It also doesn’t return the best results. I recomment switching to Google, potentially via a SERP API.
- Try other models. I like to use gpt-4.1-mini because it is reasonably fast, cheap and “smart” enough for most tasks. If you find it often gets it wrong, you can use the smarter gpt-4.1 or a “reasoning” model like o3.
- Validate the output. While we did code the classifier to return structured output, you can go beyond that and validate the output (ie. check that the sources are valid URLs, check that the explanation includes exact quotes from the sources, etc).
- Save your progress. Things can and will go wrong! The search engine may fail, the LLM may be down (Claude often is), your Python script may throw an unexpected error. If you don’t want to lose progress, you should save your progress to file or a database so you can pick up from where you left off.
- Async/Multithreading. In my tests, each classification takes about 20-50 seconds, but it can take longer if the LLM is trying to accesss an unresponsive website. If you have a lot of rows to classify, you should consider running multiple agents in parallel.
- Verify. LLMs are still dumb and shouldn’t be trusted! Manually verify the classifications if you’re going to publish the results.
Augmenta: A better way
I’ve used the approach described above for several projects at Global Witness, including our annual classification of fossil fuel donors at COP.
As the codebase grew to account for the suggestions described above, I decided to package everything into a Python library called Augmenta. You can use Augmenta to build agents without writing any code. It has a few features that make life easier, such as:
- Built-in search and text extraction tools
- Support for multiple LLMs
- Asynchronous processing
- Automatic caching to save progress
- Support for third-party tools via MCP
- Logfire integration for monitoring and debugging
If you would like to test it out, you can read the documentation on GitHub and get in touch if you have any questions or suggestions.