Feed aggregator

The Homeland Security Spending Trail: How to Follow the Money Through U.S. Government Databases

EFF: Updates - Tue, 01/06/2026 - 12:08pm

This guide was co-written by Andrew Zuker with support from the Heinrich Boell Foundation.

The U.S. government publishes volumes of detailed data on the money it spends, but searching through it and finding information can be challenging. Complex search functions and poor user interfaces on government reporting sites can hamper an investigation, as can inconsistent company profiles and complex corporate ownership structures. 

This week, EFF and the Heinrich Boell Foundation released an update to our database of vendors providing technology to components of the U.S. Department of Homeland Security (DHS), such as Immigration and Customs Enforcement (ICE) and Customs and Border Protections (CBP). It includes new vendor profiles, new fields, and updated data on top contractors, so that journalists and researchers have a jumping-off point for their own investigations.

Access the dataset through Google Sheets (Google's Privacy Policy applies) or download the Excel file here

This time we thought we would also share some of the research methods we developed while assembling this dataset.

This guide covers the key databases that store information on federal spending and contracts (often referred to as "awards"), government solicitations for products and services, and the government's "online shopping superstore," plus a few other deep-in-the-weeds datasets buried in the online bureaucracy. We have provided a step-by-step guide for searching these sites efficiently and help tips for finding information. While we have written this specifically with DHS agencies in mind, it should serve as a useful resource for procurement across the federal government. 


1. Procurement Sites: FPDS.gov and USASpending.Com  Federal Procurement Data System - fpds.gov

The Federal Procurement Data System (FPDS) is the best place to start for finding out what companies are working with DHS. It is the official system for tracking federal discretionary spending and contains current data on contracts with non-governmental entities like corporations and private businesses. Award data is up-to-date and includes detailed information on vendors and awards which can be helpful when searching the other systems. It is a little bit old-school, but that often makes it one of the easiest and quickest sites to search, once you get the hang of it, since it offers a lot of options for narrowing search parameters to specific agencies, vendors, classification of services, etc. 

How to Use FDPS
To begin searching Awards for a particular vendor, click into the “ezSearch” field in the center of the page, delete or replace the text “Google-like search to help you find federal contracts…” with a vendor name or keywords, and hit Enter to begin a new search. 

A new tab will open automatically with exact matches at the top. 

Four “Top 10” modules on the left side of the page link to top results in descending order: Department Full Name, Contracting Agency Name, Full Legal Business Name, and Treasury Account Symbol. These ranked lists help the user quickly narrow in on departments and agencies that vendors do business with. DHS may not appear in the “Top 10” results, which may indicate that the vendor hasn’t yet been awarded DHS or subagency contracts.

For example, if you searched the term “FLIR”, as in Teledyne FLIR who make infrared surveillance systems used along the U.S.-Mexico border, DHS is the 2nd result in the “Top 10: Department Full Name” box. 

To see all DHS contracts awarded to the vendor, click “Homeland Security, Department of” from the “Top 10 Department Full Name” module. When the page loads, you will see the subcomponents of DHS (e.g., ICE, CBP, or the U.S. Secret Service) in the lefthand menu. You can click on each of those to drill down even further. You can also drill down by choosing a company. 

Sorting options can be found on the right side of the page which offer the ability to refine and organize search results. One of the most useful is "Date Signed," which will arrange the results in chronological order. 

You don't have to search by a company name. You can also use a product keyword, such as "LPR" (license plate reader). However, because keywords are not consistently used by government agencies, you will need to try various permutations to gather the most data. 

Each click or search filter adds a new term to the search both in the main field at the top and in the Search Criteria module on the right. They can be deleted by clicking the X next to the term in this module or by removing the text in the main search field.

For each contract item, you can click "View" to see the specific details. However, these pages don't have permalinks, so you'll want to print-to-pdf if you need to retain a permanent copy of the record. 

Often the vendor brand name we know from their marketing or news media is not the same entity that is awarded government contracts. Foreign companies in particular rely on partnerships with domestic entities that are established federal contractors. If you can’t find any spending records for a vendor, search the web for information on the company including acquisitions, partnerships, licensing agreements, parent companies, and subsidiaries. It is likely that one of these types of related companies is the contract holder. 

USA Spending - usaspending.gov

The Federal Funding and Accountability Act (FFATA) of 2006 and the DATA Act of 2014 require the government to publish all spending records and contracts on a single, searchable public website, including agency-specific contracts, using unified reporting standards to ensure consistent, reliable, searchable data. This led to the creation of USA Spending (usaspending.gov). 

USA Spending is populated with data from multiple sources including the Federal Procurement Data System (fpds.gov) and the System for Awards Management (sam.gov - which we'll discuss in the next section). It also compiles Treasury Reports and data from the financial systems of dozens of federal agencies. We relied heavily on Awards data from these systems to verify vendor information including contracts with the DHS and its subagencies such as CBP and ICE. 

USA Spending has a more modern interface, but is often very slow with the information often hidden in expandable menus. In many ways it is duplicative of FPDS, but with more features, including the ability to bookmark individual pages. We often found ourselves using FPDS to quickly identify data, and then using the "Award ID" number to find the specific record within USA Spending. 

USA Spending also has some visualizations and ways to analyze data in chart form, which is not possible with the largely text-based FPDS. 

How to Use USA Spending

To begin searching for DHS awards, click on either “Search Award Data” on the navigation bar, or the blue “Start Searching Awards”button. 

On the left of the Search page are a list of drop down menus with options. You can enter a vendor name as a keyword, or expand the “Recipient” menu if you know the full company name or their Unique Entity Identifier (UEI) number. Expand the “Agency Tab” and enter DHS which will bring up the Department of Homeland Security Option.

In the example below, we entered “Palantir Technologies” as a keyword, and selected DHS in the Agency dropdown:

For vendors with hundreds of contracts that return many pages of results, consider adding more filters to the search such as a specific time period or specifying a Funding Agency such as ICE or CBP. In this example, the filters “Palantir Technologies” and “DHS” returned 13 results (at the time of publication). It is important to note that the search results table is larger than what displays in that module. You can scroll down to view more Awards and scroll to the right to see much more information. 

Scroll down outside of that module to reveal more info including modules for Results by Category, Results over Time, and Results by Geography, all of which can be viewed as a list or graph. 

Once you've identified a contract, you can click the "Prime Award ID" to see the granular details for each time. 

From the search, you can also select just the agency to see all the contracts on file. Each agency also has its own page showing a breakdown for every fiscal year of how much money they had to spend and which components spent the most. For example, here's DHS's page.

2. Contracting Opportunities  - SAM.gov  

So far we've talked about how to track contracts and spending, but now let's take a step back and look at how those contracts come to be. The System for Award Management, SAM.gov, is the site that allows companies to see what products and services the government intends to buy so they can bid on the contract. But SAM.gov is also open to the public, which means you can see the same information, including a detailed scope of a project and sometimes even technical details. 

How to Use Sam.gov

SAM.gov does not require an account for its basic contracting opportunity searches, but you may want to create one in order to save the things you find and to receive keyword- or agency-based alerts via email when new items of interest are posted. 

First you will click "Search" in the menu bar, which will bring you to this page: 

We recommend selecting both "Active" and "Inactive" in the Status menu. Contracts quickly go inactive, and besides, sometimes the contracts you are most interested in are several years old. 

If you are researching a particular technology such as unmanned aerial vehicles, you might just type "unmanned" in the Simple Search bar. That will bring up every solicitation with that keyword across the federal government.

One of the most useful features is filtering by agency, while leaving the keyword search blank. This will return a running list of an agency's calls for bids and related procurement activities. It is worth checking regularly. For example, here's what CBP's looks like on a given day: 

If you click on an item, you should next scroll down to see if there are attachments. These tend to contain the most details. Specifically, you should look for the term "SOW," the abbreviation for "Statement of Work." For example, here are the attachments for a CBP contracting opportunity for "Cellular Covert Cameras": 

The first document is the Statement of Work, which tells you the exact brand, model, and number of devices they want to acquire: 

The attachments also included a "BNO Justification." BNO stands for "Brand Name Only," and this document explains in even more detail why CBP wants that specific product:

If you see the terms "Sole Source" in a listing, that also means that an agency has decided that only one product meets its requirements and it will not open bidding to other companies. 

In addition to contracting, many agencies announce "Industry Day" events, usually virtual, that members of the public can join. This is a unique opportunity to listen in on what contractors are being told by government purchasing officials. The presentation slides are also often later uploaded to the SAM.gov page. Occasionally, the list of attendees will also be posted, and you'll find several examples of those lists in our dataset.

3. The Government's "Superstore" - gsaadvantage.gov

Another way to investigate DHS purchasing is by browsing the catalog of items and services immediately available to them. The General Services Administration operates GSA Advantage, which it describes as "the government's central online shopping superstore." The website's search is open, allowing members of the public to view any vendors' offerings–including both products and services– easily as they would with any online marketplace. 

For example, you could search for "license plate reader" and produce a list of available products: 

If you click "Advanced Search," you can also isolate every product available from a particular manufacturer. For example, here are the results when you search for products available from Skydio, a drone manufacturer.

If you switch from "Products" to "Services" you can export datasets for each company about their offerings. For example, if you search for "Palantir" you'll get results that look like this:

This means all these companies are offering some sort of Palantir-related services. If you click "Matches found in Terms and Conditions," you'll download a PDF with a lot of details about what the company offers. 

For example, here's a a screengrab from Anduril's documentation

If you click "Matches Found in Price List" you'll download a spreadsheet that serves as a blueprint of what the company offers, including contract personnel. Here's a snippet from Palantir's: 

4. Other Resources

Daily Public Report of Covered Contract Awards - Maybe FPDS isn't enough for you and you want to know every day what contracts have been signed. Buried in the DHS website are links to a daily feed of all contracts worth $4 million or more. It's available in XML, JSON, CSV and XLSX formats. 

DHS Acquisition Planning Forecast System (APFS) - DHS operates a site for vendors to learn about upcoming contracts greater than $350,000. You can sort by agency at a granular level,  such as upcoming projects by ICE Enforcement & Removal Operations. This is one to check regularly for updates. 

DHS Artificial Intelligence Use Case Inventory - Many federal agencies are required to maintain datasets of "AI Use Cases." DHS has broken these out for each of its subcomponents, including ICE and CBP. Advanced users will find the spreadsheet versions of these inventory more interesting. 

NASA Solutions for Enterprise-Wide Procurement (SEWP) - SEWP is a way for agencies to fast track acquisition of "Information Technology, Communication and Audio Visual" products through existing contracts. The site provides an index of existing contract holders, but the somewhat buried "Provider Lookup" has a more comprehensive list of companies involved in this type of contracting, illustrating how the companies serve as passthroughs for one another. Relatedly, DHS's list of "Prime Contractors" shows which companies hold master contracts with the agency and its components. 

TechInquiry - Techinquiry is a small non-profit that aggregates records from a wide variety of sources about tech companies, particularly those involved in government contracting. 

A Cyberattack Was Part of the US Assault on Venezuela

Schneier on Security - Tue, 01/06/2026 - 11:08am

We don’t have many details:

President Donald Trump suggested Saturday that the U.S. used cyberattacks or other technical capabilities to cut power off in Caracas during strikes on the Venezuelan capital that led to the capture of Venezuelan President Nicolás Maduro.

If true, it would mark one of the most public uses of U.S. cyber power against another nation in recent memory. These operations are typically highly classified, and the U.S. is considered one of the most advanced nations in cyberspace operations globally.

5 climate court battles to watch in 2026

ClimateWire News - Tue, 01/06/2026 - 6:26am
The Trump administration is playing a leading role in litigation to stop climate action.

Trump admin launches new bid to pressure US oil companies on Venezuela

ClimateWire News - Tue, 01/06/2026 - 6:25am
The president’s Energy and Interior secretaries are joining the effort to cajole the petroleum businesses to invest in the country’s shattered oil fields

Judge keeps Honolulu climate case alive

ClimateWire News - Tue, 01/06/2026 - 6:24am
The ruling rejected efforts by oil giants to dismiss the 2020 lawsuit seeking compensation for the costs of dealing with climate change.

Deadly climate collision: Cutting forests and raging floods

ClimateWire News - Tue, 01/06/2026 - 6:23am
The devastating flood that killed more than 1,000 people in Indonesia was exacerbated by years of deforestation.

Scientists go global in attempt to better predict atmospheric rivers

ClimateWire News - Tue, 01/06/2026 - 6:22am
A long-running collaboration between NOAA and Scripps will launch new research flights from Canada and Ireland this winter.

Court upholds New Jersey’s landmark environmental justice rule

ClimateWire News - Tue, 01/06/2026 - 6:21am
It’s unclear if the industrial groups that are fighting the rule will keep fighting in court.

Why Europe’s night-train renaissance derailed

ClimateWire News - Tue, 01/06/2026 - 6:21am
Aging carriages, high costs and reluctant incumbents choked off the night-train revival — even as passengers clamor for more.

UK set new annual heat and sunshine records last year

ClimateWire News - Tue, 01/06/2026 - 6:20am
The record amount of sunshine helped fuel a boom in solar generation.

South Africa’s Ramaphosa names new presidential climate commission

ClimateWire News - Tue, 01/06/2026 - 6:19am
President Cyril Ramaphosa will announce the deputy chair at the first meeting of the commission in 2026 and further outline its priorities from now until 2030, his office said.

Banks notch higher fees from green bonds than fossil fuel debt

ClimateWire News - Tue, 01/06/2026 - 6:19am
Lenders generated roughly $3.7 billion of revenue from climate-related loans and bond underwriting in 2025, according to data compiled by Bloomberg.

AI-generated sensors open new paths for early cancer detection

MIT Latest News - Tue, 01/06/2026 - 5:00am

Detecting cancer in the earliest stages could dramatically reduce cancer deaths because cancers are usually easier to treat when caught early. To help achieve that goal, MIT and Microsoft researchers are using artificial intelligence to design molecular sensors for early detection.

The researchers developed an AI model to design peptides (short proteins) that are targeted by enzymes called proteases, which are overactive in cancer cells. Nanoparticles coated with these peptides can act as sensors that give off a signal if cancer-linked proteases are present anywhere in the body.

Depending on which proteases are detected, doctors would be able to diagnose the particular type of cancer that is present. These signals could be detected using a simple urine test that could even be done at home.

“We’re focused on ultra-sensitive detection in diseases like the early stages of cancer, when the tumor burden is small, or early on in recurrence after surgery,” says Sangeeta Bhatia, the John and Dorothy Wilson Professor of Health Sciences and Technology and of Electrical Engineering and Computer Science at MIT, and a member of MIT’s Koch Institute for Integrative Cancer Research and the Institute for Medical Engineering and Science (IMES).

Bhatia and Ava Amini ’16, a principal researcher at Microsoft Research and a former graduate student in Bhatia’s lab, are the senior authors of the study, which appears today in Nature Communications. Carmen Martin-Alonso PhD ’23, a founding scientist at Amplifyer Bio, and Sarah Alamdari, a senior applied scientist at Microsoft Research, are the paper’s lead authors.

Amplifying cancer signals

More than a decade ago, Bhatia’s lab came up with the idea of using protease activity as a marker of early cancer. The human genome encodes about 600 proteases, which are enzymes that can cut through other proteins, including structural proteins such as collagen. They are often overactive in cancer cells, as they help the cells escape their original locations by cutting through proteins of the extracellular matrix, which normally holds cells in place.

The researchers’ idea was to coat nanoparticles with peptides that can be cleaved by a specific protease. These particles could then be ingested or inhaled. As they traveled through the body, if they encountered any cancer-linked proteases, the peptides on the particles would be cleaved.

Those peptides would be secreted in the urine, where they could be detected using a paper strip similar to a pregnancy test strip. Measuring those signals would reveal the overactivity of proteases deep within the body.

“We have been advancing the idea that if you can make a sensor out of these proteases and multiplex them, then you could find signatures of where these proteases were active in diseases. And since the peptide cleavage is an enzymatic process, it can really amplify a signal,” Bhatia says.

The researchers have used this approach to demonstrate diagnostic sensors for lungovarian, and colon cancers.

However, in those studies, the researchers used a trial-and-error process to identify peptides that would be cleaved by certain proteases. In most cases, the peptides they identified could be cleaved by more than one protease, which meant that the signals that were read could not be attributed to a specific enzyme.

Nonetheless, using “multiplexed” arrays of many different peptides yielded distinctive sensor signatures that were diagnostic in animal models of many different types of cancer, even if the precise identity of the proteases responsible for the cleavage remained unknown.

In their new study, the researchers moved beyond the traditional trial-and-error process by developing a novel AI system, named CleaveNet, to design peptide sequences that could be cleaved efficiently and specifically by target proteases of interest.

Users can prompt CleaveNet with design criteria, and CleaveNet will generate candidate peptides likely to fit those criteria. In this way, CleaveNet enables users to tune the efficiency and specificity of peptides generated by the model, opening a path to improving the sensors’ diagnostic power.

“If we know that a particular protease is really key to a certain cancer, and we can optimize the sensor to be highly sensitive and specific to that protease, then that gives us a great diagnostic signal,” Amini says. “We can leverage the power of computation to try to specifically optimize for these efficiency and selectivity metrics.”

For a peptide that contains 10 amino acids, there are about 10 trillion possible combinations. Using AI to search that immense space allows for prediction, testing, and identification of useful sequences much faster than humans would be able to find them, while also considerably reducing experimental costs.

Predicting enzyme activity

To create CleaveNet, the researchers developed a protein language model to predict the amino acid sequences of peptides, analogous to how large language models can predict sequences of text. For the training data, they used publicly available data on about 20,000 peptides and their interactions with different proteases from a family known as matrix metalloproteinases (MMPs).

Using these data, the researchers trained one model to generate peptide sequences that are predicted to be cleaved by proteases. These sequences could then be fed into another model that predicted how efficiently each peptide would be cleaved by any protease of interest.

To demonstrate this approach, the researchers focused on a protease called MMP13, which cancer cells use to cut through collagen and help them metastasize from their original locations. Prompting CleaveNet with MMP13 as a target allowed the models to design peptides that could be cut by MMP13 with considerable selectivity and efficiency. This cleavage profile is particularly useful for diagnostic and therapeutic applications.

“When we set the model up to generate sequences that would be efficient and selective for MMP13, it actually came up with peptides that had never been observed in training, and yet these novel sequences did turn out to be both efficient and selective,” Martin-Alonso says. “That was very exciting to see.”

This kind of selectivity could help to reduce the number of different peptides needed to diagnose a given type of cancer, to identify novel biomarkers, and to provide insight into specific biological pathways for study and therapeutic testing, the researchers say.

Bhatia’s lab is currently part of an ARPA-H funded project to create reporters for an at-home diagnostic kit that could potentially detect and distinguish between 30 different types of cancer, in early stages of disease, based on measurements of protease activity. These sensors could include detection of not only MMP-mediated cleavage, but other enzymes such as serine proteases and cysteine proteases.

Peptides designed using CleaveNet could also be incorporated into cancer therapeutics such as antibody treatments. Using a specific peptide to attach a therapeutic such as a cytokine or small molecule drug to a targeting antibody could enable the medicine to be released only when the peptides are exposed to proteases in the tumor environment, improving efficacy and reducing side effects.

Beyond direct applications in diagnostics and therapeutics, combining efforts from the ARPA-H work with this modeling framework could enable the creation of a comprehensive “protease activity atlas” that spans multiple protease classes and cancers. Such a resource could further accelerate research in early cancer detection, protease biology, and AI models for peptide design.

The research was funded by La Caixa Foundation, the Ludwig Center at MIT, and the Marble Center for Cancer Nanomedicine.

Sean Luk: Addressing the urgent need for better immunotherapy

MIT Latest News - Tue, 01/06/2026 - 12:00am

In elementary school, Sean Luk loved donning an oversized lab coat and helping her mom pipette chemicals at Johns Hopkins University. A few years later, she started a science blog and became fascinated by immunoengineering, which is now her concentration as a biological engineering major at MIT.

Her grandparents’ battles with cancer made Luk, now a senior, realize how urgently patients need advancements in immunotherapy, which leverages a patient’s immune system to fight tumors or pathogens.

“The idea of creating something that is actually able to improve human health is what really drives me now. You want to fight that sense of helplessness when you see a loved one suffering through this disease, and it just further motivates me to be excellent at what I do,” Luk says.

A varsity athlete and entrepreneur as well as a researcher, Luk thrives when bringing people together for a common cause.

Working with immunotherapies

Luk was introduced to immunotherapies in high school after she listened to a seminar about using components of the immune system, such as antibodies and cytokines, to improve graft tolerance.

“The complexity of the immune system really fascinated me, and it is incredible that we can build antibodies in a very logical way to address disease,” Luk says.

She worked in several Johns Hopkins labs as a high school student in Maryland, and a professor there connected her to MIT Professor Dane Wittrup. Luk has worked in the Wittrup lab throughout her time at MIT. One of her main projects involves developing ultra-stable cyclic peptide drugs to help treat autoimmune diseases, which could potentially be taken orally instead of injected.

Luk has been a co-author on two published articles and has become increasingly interested in the intersection between computational and experimental protein design. Currently, she is working on engineering an interferon gamma construct that preferentially targets myeloid cells in the tumor microenvironment.

“We're trying to target and reprogram the immunosuppressive myeloid cells surrounding the cancer cells, so that they can license T cells to attack cancer cells and kickstart the cancer immunity cycle,” she explains.

Communication for all

Through her work in high school with Best Buddies, an organization that aims to promote one-on-one friendships between students with and without intellectual and developmental disabilities, Luk became passionate about empowering people with special needs. At MIT, she started a project focusing on children with Down syndrome, with support from the Sandbox Innovation Fund.

“Through talking to a lot of parents and caretakers, the biggest issue that people with Down syndrome face is communication. And when you think about it, communication is crucial to everything that we do,” Luk says, “We want to communicate our thoughts. We want to be able to interact with our peers. And if people are unable to do that, it’s isolating, it’s frustrating.”

Her solution was to co-found EasyComm, an online game platform that helps children with Down syndrome work on verbal communication.

“We thought it would be a great way to improve their verbal communication skills while having fun and incentivize that kind of learning through gamification,” Luk says. She and her co-founder recently filed a provisional patent and plan to make the platform available to a wider audience.

A global perspective

Luk grew up in Hong Kong before moving to Maryland in the fifth grade. She’s always been athletic; in Hong Kong, she was a competitive jump roper. At just 9 years old, she won bronze in the Asian Jump Rope Championships among children 14 years old and younger. At 7 years old, she started playing soccer on her brother’s team, despite being the only girl. She says the sport was considered “manly” in Hong Kong, and girls were discouraged from joining, but her coaches and family were supportive.

Moving to the U.S. meant that her time in competitive jump roping was cut short, and Luk focused more on soccer. Her team in the U.S. felt far more intense than boys soccer in Hong Kong, but the Luk family was in it together, Luk says. She credits her success to the combination of her hard-working nature she learned from Hong Kong, and the innovation and experiences she was exposed to in the U.S.

“We had a really close bond within the family,” Luk says, “Figuring out taxes for my dad and our family, like driving and houses and all that stuff, it was totally new. But I think we really took it in stride, just adjusting as we went.”

Luk continued soccer throughout high school and eventually committed to play on the MIT team. She likes that the team allows players to prioritize academics while still being competitive. Last season, she was elected captain.

“It’s really a pleasure to be captain, and it’s challenging, but it’s also very rewarding when you see the team be cohesive. When you see the team out there winning games through grit,” Luk says.

During her first year at MIT, Luk got back in touch with her old soccer coach from Hong Kong, who then worked on the national team. After sending over some tape, she was offered a spot on the U-20 national team, and played in the U20 Asian Football Championship Qualifiers.

“It was so, so cool to be able to represent Hong Kong because I played soccer all my life but it just carries a different weight to it when you’re wearing your country’s jersey,” Luk says.

Besides her cross-cultural background, Luk is also proud of her international experiences playing soccer, staying with host families and doing lab work in Copenhagen, Denmark; Stuttgart, Germany; and Ancona, Italy. She speaks English, Cantonese, and Mandarin fluently.

“Aside from the textbook academic knowledge, I feel like a global perspective is so important when you’re trying to collaborate with other people from different walks of life,” Luk says, “When you’re just thinking about science or the impact that you can have in general, it’s important to realize you don’t have all the answers and to learn from the world outside your little bubble.”

MIT scientists investigate memorization risk in the age of clinical AI

MIT Latest News - Mon, 01/05/2026 - 4:55pm

What is patient privacy for? The Hippocratic Oath, thought to be one of the earliest and most widely known medical ethics texts in the world, reads: “Whatever I see or hear in the lives of my patients, whether in connection with my professional practice or not, which ought not to be spoken of outside, I will keep secret, as considering all such things to be private.” 

As privacy becomes increasingly scarce in the age of data-hungry algorithms and cyberattacks, medicine is one of the few remaining domains where confidentiality remains central to practice, enabling patients to trust their physicians with sensitive information.

But a paper co-authored by MIT researchers investigates how artificial intelligence models trained on de-identified electronic health records (EHRs) can memorize patient-specific information. The work, which was recently presented at the 2025 Conference on Neural Information Processing Systems (NeurIPS), recommends a rigorous testing setup to ensure targeted prompts cannot reveal information, emphasizing that leakage must be evaluated in a health care context to determine whether it meaningfully compromises patient privacy.

Foundation models trained on EHRs should normally generalize knowledge to make better predictions, drawing upon many patient records. But in “memorization,” the model draws upon a singular patient record to deliver its output, potentially violating patient privacy. Notably, foundation models are already known to be prone to data leakage.

“Knowledge in these high-capacity models can be a resource for many communities, but adversarial attackers can prompt a model to extract information on training data,” says Sana Tonekaboni, a postdoc at the Eric and Wendy Schmidt Center at the Broad Institute of MIT and Harvard and first author of the paper. Given the risk that foundation models could also memorize private data, she notes, “this work is a step towards ensuring there are practical evaluation steps our community can take before releasing models.”

To conduct research on the potential risk EHR foundation models could pose in medicine, Tonekaboni approached MIT Associate Professor Marzyeh Ghassemi, who is a principal investigator at the Abdul Latif Jameel Clinic for Machine Learning in Health (Jameel Clinic) and a member of the Computer Science and Artificial Intelligence Lab. Ghassemi, a faculty member in the MIT Department of Electrical Engineering and Computer Science and Institute for Medical Engineering and Science, runs the Healthy ML group, which focuses on robust machine learning in health.

Just how much information does a bad actor need to expose sensitive data, and what are the risks associated with the leaked information? To assess this, the research team developed a series of tests that they hope will lay the groundwork for future privacy evaluations. These tests are designed to measure various types of uncertainty, and assess their practical risk to patients by measuring various tiers of attack possibility.  

“We really tried to emphasize practicality here; if an attacker has to know the date and value of a dozen laboratory tests from your record in order to extract information, there is very little risk of harm. If I already have access to that level of protected source data, why would I need to attack a large foundation model for more?” says Ghassemi. 

With the inevitable digitization of medical records, data breaches have become more commonplace. In the past 24 months, the U.S. Department of Health and Human Services has recorded 747 data breaches of health information affecting more than 500 individuals, with the majority categorized as hacking/IT incidents.

Patients with unique conditions are especially vulnerable, given how easy it is to pick them out. “Even with de-identified data, it depends on what sort of information you leak about the individual,” Tonekaboni says. “Once you identify them, you know a lot more.”

In their structured tests, the researchers found that the more information the attacker has about a particular patient, the more likely the model is to leak information. They demonstrated how to distinguish model generalization cases from patient-level memorization, to properly assess privacy risk. 

The paper also emphasized that some leaks are more harmful than others. For instance, a model revealing a patient’s age or demographics could be characterized as a more benign leakage than the model revealing more sensitive information, like an HIV diagnosis or alcohol abuse. 

The researchers note that patients with unique conditions are especially vulnerable given how easy it is to pick them out, which may require higher levels of protection. “Even with de-identified data, it really depends on what sort of information you leak about the individual,” Tonekaboni says. The researchers plan to expand the work to become more interdisciplinary, adding clinicians and privacy experts as well as legal experts. 

“There’s a reason our health data is private,” Tonekaboni says. “There’s no reason for others to know about it.”

This work supported by the Eric and Wendy Schmidt Center at the Broad Institute of MIT and Harvard, Wallenberg AI, the Knut and Alice Wallenberg Foundation, the U.S. National Science Foundation (NSF), a Gordon and Betty Moore Foundation award, a Google Research Scholar award, and the AI2050 Program at Schmidt Sciences. Resources used in preparing this research were provided, in part, by the Province of Ontario, the Government of Canada through CIFAR, and companies sponsoring the Vector Institute.

Telegram Hosting World’s Largest Darknet Market

Schneier on Security - Mon, 01/05/2026 - 7:01am

Wired is reporting on Chinese darknet markets on Telegram.

The ecosystem of marketplaces for Chinese-speaking crypto scammers hosted on the messaging service Telegram have now grown to be bigger than ever before, according to a new analysis from the crypto tracing firm Elliptic. Despite a brief drop after Telegram banned two of the biggest such markets in early 2025, the two current top markets, known as Tudou Guarantee and Xinbi Guarantee, are together enabling close to $2 billion a month in money-laundering transactions, sales of scam tools like stolen data, fake investment websites, and AI deepfake tools, as well as other black market services as varied as ...

Coal over wind: How Trump used emergency powers to help a favored fuel

ClimateWire News - Mon, 01/05/2026 - 6:17am
The president propped up old coal plants and killed offshore wind farms in the name of national security. That has raised accusations of contradictory energy policies.

DOE forces Colorado coal plant to keep running

ClimateWire News - Mon, 01/05/2026 - 6:16am
Energy Secretary Chris Wright says keeping the plant online would prevent dangerous outages. State leaders disagree.

Trump’s Venezuela gambit relies on oil boom for payback

ClimateWire News - Mon, 01/05/2026 - 6:12am
A global crude glut may complicate the president's plans to boost production in the South American country, whose leader was captured over the weekend.

EU might expand carbon fees on imports to include appliances

ClimateWire News - Mon, 01/05/2026 - 6:12am
The 27-nation bloc starts imposing tariffs this year on raw materials with high carbon intensity. Washing machines and car parts could be next.

Pages