The Internet Archive faces a new threat: Wary publishers who opt out to stop scraping by AI bots

Ruins of the Library of Pantainos in Athens, Greece. Photo (cc) 2018 by Michael Kogan.

Has the Internet Archive reached the end of the line? The 30-year-old nonprofit, which has saved and made searchable more than a trillion webpages, has proved itself to be of enormous value over the years.

I’ve used it to track changes in reporting, including this blog post about The New York Times’ shifting coverage of an explosion at Ahli Arab Hospital in Gaza City in the days after Hamas’ October 2023 terrorist attack on Israel. The Times and other news organizations initially reported that Israeli forces had bombed the hospital, but they later had to walk back that unverified claim.

Follow my Bluesky newsfeed for additional news and commentary. And please join my Patreon for just $6 a month. You’ll receive a supporters-only newsletter every Thursday.

The Internet Archive is also home to The Boston Phoenix’s online digital and print archives thanks to an agreement that it made with Northeastern University, which acquired the Phoenix’s intellectual property after the legendary alt-weekly went out of business in 2013. (Note: I was a longtime staff columnist for the Phoenix, and I helped arrange the donation to Northeastern.)

Now, though, the Internet Archive and its Wayback Machine, which reproduces web content from years past, are facing an existential threat. News organizations ranging from the Times to USA Today are inserting code into their sites that blocks the Archive from crawling their content, mainly to prevent AI companies from accessing their journalism without permission.

As Katie Knibbs reports for Wired, the irony is that USA Today recently published an important piece of investigative journalism documenting ICE detention statistics that wouldn’t have been possible without the Archive. Knibbs writes:

According to analysis by the artificial-intelligence-detection startup Originality AI, 23 major news sites are currently blocking ia_archiverbot, the web crawler commonly used by the Internet Archive for the Wayback project. The social platform Reddit is too. Other outlets are limiting the project in different ways: The Guardian does not block the crawler, but it excludes its content from the Internet Archive API and filters out articles from the Wayback Machine interface, which makes it harder for regular people to access archived versions of its articles.

The Electronic Frontier Foundation, which is helping to lead a signature drive in support of the Archive, compares the publishers’ actions to “a newspaper publisher announcing it will no longer allow libraries to keep copies of its paper,” according to a recent EFF article by Joe Mullin, who writes:

For nearly three decades, historians, journalists, and the public have relied on the Internet Archive to preserve news sites as they appeared online. Those archived pages are often the only reliable record of how stories were originally published. In many cases, articles get edited, changed, or removed—sometimes openly, sometimes not. The Internet Archive often becomes the only source for seeing those changes. When major publishers block the Archive’s crawlers, that historical record starts to disappear.

This is not the first time the Archive has run into legal problems. One major challenge was of its own making: a project begun during the COVID pandemic to make books available for free without permission and without any compensation to publishers or authors. Not surprisingly, the Archive lost that case in a federal appeals court in 2024. As I wrote in describing that decision: “The Archive claimed that it was in compliance with copyright law because it limited e-book borrowing to correspond with physical books that it had in its collection or that was owned by one of its partner libraries. That’s not the way it works, though.”

The current threat involves the right of publishers’ to make the content available as they see fit, which they have a legal right to do. They are under no obligation to let the Internet Archive repurpose it. Ideally, they will come to understand the incalculable damage they are doing.

As EFF’s Mullin puts it: “There are real disputes over AI training that must be resolved in courts. But sacrificing the public record to fight those battles would be a profound, and possibly irreversible, mistake.”

The Times may not be perfect, but it remains staunchly independent in an era of bent knees

New York Times assistant managing editor Michael Slackman, left, with Northeastern School of Journalism director Jonathan Kaufman. Photo (cc) 2026 by Dan Kennedy.

Donald Trump’s second stint in the White House has been fraught with peril for independent journalism. I couldn’t possibly list the threats emanating from the regime without omitting many others, but you know what’s been happening:

This post was originally published as part of last week’s Supporters Newsletter. To receive this newsletter every Thursday, join my Patreon for just $6 a month.

Outrageous legal settlements agreed to by the parent companies of ABC News and CBS News. The suspension of Jimmy Kimmel. The arrests of reporters Don Lemon and Georgia Fort while they were covering a protest. Threats against broadcast licenses by FCC chair Brendan Carr. The pending Trump-greased acquisition of CNN by billionaires David Ellison and his father, Larry, for whom wrecking CBS wasn’t enough. The Trump-friendly direction taken by The Washington Post and Los Angeles Times opinion sections at the behest of their billionaire owners. An illegal raid on a Washington Post reporter’s home.

Continue reading “The Times may not be perfect, but it remains staunchly independent in an era of bent knees”

The fog of war: The media try to assess responsibility for the bombing of a girls’ school in Iran

Perhaps the most fraught topic during the first week of the war in Iran was the bombing of an elementary girls’ school, a horrendous event that killed about 165 people.

Some of the first reports, including one in Al Jazeera, claimed that Israel was responsible. That was followed by a social media campaign claiming that the Iranian government itself had admitted that the bombing was caused by one of its missiles that had gone astray. That was debunked by PolitiFact. Finally, investigations by media outlets like The New York Times and Bellingcat found that it was almost certain that the United States was responsible. The most likely explanation is that U.S. forces had targeted a Revolutionary Guard facility that was adjacent to the school.

I’m going to discuss with my graduate ethics students this evening how the story unfolded, and I’ve put together the slideshow you see here to go with it. You can also click here for a larger view.

On the new ‘Beat the Press,’ we look at the week in media, starting with Don Lemon’s arrest

Don Lemon reporting from Cities Church in St. Paul, Minn.

On the new “Beat the Press with Emily Rooney,” we look at Don Lemon’s arrest, when journalists should (and shouldn’t) use the word “murder,” looming cuts at The Washington Post, and transitions for Scot Lehigh, who’s retiring from The Boston Globe, and David Brooks, who’s moving from The New York Times to The Atlantic. With Emily, Scott Van Voorhis and me — plus a big assist from producer Tonia Magras.

Pundits moving on: David Brooks heads for The Atlantic, and Scot Lehigh retires from The Boston Globe

David Brooks

A couple of big moves to catch you up on in the world of newspaper punditry.

First, David Brooks is leaving The New York Times, where he’s been a center-right columnist for the past 22 years. He’ll be taking a job as a staff writer and podcaster for The Atlantic, where he’s already a contributor. He’s also joining Yale University as a Presidential Senior Fellow at the Jackson School of Global Affairs. Presumably he’ll continue as a commentator for the “PBS NewsHour.” Brooks wrote a rather downbeat farewell column today, saying in part:

We have become a sadder, meaner and more pessimistic country. One recent historical study of American newspapers finds that public discourse is more negative now than at any time since the 1850s. Large majorities say our country is in decline, that experts are not to be trusted, that elites don’t care about regular people. Only 13 percent of young adults believe America is heading in the right direction. Sixty-nine percent of Americans say they do not believe in the American dream.

Scot Lehigh

Second, and closer to home, Scot Lehigh is retiring from The Boston Globe, where he’s worked for the past 36 years. Lehigh has been a columnist for the opinion pages for most of that time, and had been on leave while finishing his second  novel. Before that, Lehigh was a political reporter for The Boston Phoenix (we did not intersect) and was a finalist for a 1989 Pulitzer Prize for his coverage of Michael Dukakis’ presidential campaign. Lehigh, too, has a farewell column up today, and he says (sub. req.):

[O]nce you reach your mid-60s, you become acutely aware that time isn’t limitless and if you want to try different things, you have to saddle up and sally forth. And so I’m sallying. I had just enough luck with my first novel, “Just East of Nowhere,” a coming-of-age story set in Maine, that I’m attempting a more ambitious novel.

Mid-60s? Scot is a mere child.

Lehigh’s moderate-liberal voice will be missed, and I wish him the best on a long and productive retirement. Brooks isn’t retiring, and, since I’m already an Atlantic subscriber, I’ll continue to be a reader.

What the Times, the AP and Merriam-Webster say about the words ‘murder’ and ‘execution’

Photo (cc) 2026 by Nicole Neri / Minnesota Reformer.

Following the horrific deaths of Renee Good and Alex Pretti at the hands of ICE agents, I’ve seen a lot of references to the words “murder” and “execution.” On Tuesday, The New York Times addressed when it’s appropriate for journalists to use those terms. So, this morning, a brief lesson on journalistm ethics.

As standards editor Susan Wessling explains, both of those words have a specific legal definition, which means that the Times doesn’t use them outside of those definitions. She writes:

Readers might see references elsewhere to the “murder” of Mr. Pretti or Ms. Good, but that word has a clear and significant meaning in law enforcement and the legal system. We do not use it unless a formal charge has been made or a court has found that a killing was, indeed, a murder.

We also hear from those who want to see the word “execution” in our news report. But that, too, has a distinct definition — putting someone to death as a legal penalty — and we don’t want to dilute its meaning by using it when that’s not the case.

Now, we all know that those words have generic, everyday meanings as well as precise legal definitions. In the generic sense, to “murder” someone is to kill them deliberately, which is a judgment call that lay people can make, even if it doesn’t hold up in a court of law. In that sense someone might say that ICE agent Jonathan Ross murdered Renee Good, or that Border Patrol officers murdered Alex Pretti, even though the shooters might be found guilty in court of a lesser charge such a manslaughter — or acquitted, or never charged. There’s also an everyday meaning to “execute” other than carrying out the death penalty.

In practice, I try to be careful not to use “murder” unless I’m describing a criminal charge or verdict. For instance, I referred to former police officer Derek Chauvin as having “killed” George Floyd until Chauvin was convicted. After the verdict, it wasn’t just generically true but legally adjudicated that Chauvin had in fact committed murder. I’m less careful with “execute,” and I regard “execution” as a valid description of how Pretti was killed.

The Associated Press Stylebook, which many news organizations use, has an entry for “homicide, murder, manslaughter” that reads:

Do not say that a victim was murdered until someone has been convicted in court. Instead, say that a victim was killed, stabbed to death, etc.

Use caution in the phrasing charged with murdering; not everyone charged with murder is accused of the act of shooting, stabbing, etc. An alternative, in such cases, is charged in the murder of …

That’s an interesting observation about using “charged in the murder of” rather than “charged with murder.” If I were a copy editor, as I was at one time in my career, I would probably be guided by what specific behavior the defendant had been accused of. To go back to my earlier example, “Chauvin was charged with murder” would be both generically and legally accurate.

Here’s what the AP says about “execute, execution”: “To execute a person is to kill that person in compliance with a military order or judicial decision.” The guide also cautions against referring to an “execution-style” killing: “Avoid use of this term to describe how people are killed, since it means different things to different people. Be specific as to how the person was killed, if that information is necessary.”

Now, you might ask whether the Times and the AP Stylebook are too specific to journalism, and if it’s all right for non-journalists to use those terms in everyday speech or on social media. As it happens, the Merriam-Webster Dictionary (which, by the way, is what the AP Stylebook instructs journalists to use for issues that aren’t covered in its own guide) backs up the stylebook on “murder” but is more permissive on “execute.”

“Murder,” according to Merriam-Webster, is “the crime of unlawfully and unjustifiably killing a person.” To “execute” a person is “to put (someone) to death especially in compliance with a legal sentence.” I take that “especially” to mean that we are free to use “execute” and “execution” in the generic sense if the facts fit what happened — at least according to Merriam-Webster if not the AP Stylebook.

Follow my Bluesky newsfeed for additional news and commentary. And please join my Patreon for just $6 a month. You’ll receive a supporters-only newsletter every Thursday.

New York Times editor says his paper did not hold back on reporting that the U.S. would attack Venezuela

Photo (cc) 2019 by Dan Kennedy

Semafor reported on Jan. 3 that The New York Times and The Washington Post learned of the pending U.S. raid on Venezuela shortly before it began but held off reporting on it “to avoid endangering US troops.”

Now Times executive editor Joe Kahn says it’s not true, at least with regard to his paper. He chose an unusually low-key forum in which to push back — in a response to a reader question in The Morning Newsletter. Here’s the relevant part of his answer (sub. req.). The boldface is mine, not his.

We reported on U.S. missions targeting Venezuela, including boat strikes and preparation for land-based military action, in considerable detail for several months. Our Pentagon, national-security and intelligence-agency beat reporters talked repeatedly with their sources about heightened preparations for bolder action against the Venezuelan leadership. Contrary to some claims, however, The Times did not have verified details about the pending operation to capture Maduro or a story prepared, nor did we withhold publication at the request of the Trump administration….

While not relevant in this case, The Times does consult with the military when there are concerns that exposure of specific operational information could risk the lives of American troops. We take those concerns seriously, and have at times delayed publication or withheld details if they might lead to direct threats to members of the military. But in all such cases, we make our editorial decisions independently. And we have often published accountability and investigative stories about military and intelligence operations and national-security decision making that government officials pressed us to withhold.

Last week I wrote about the parallels between Venezuela and the Bay of Pigs invasion of Cuba in 1961, noting that the Times was accused of withholding key details. I cited research I did as a Boston University graduate student in the 1980s that showed the Times actually published what it knew and held back only on aspects of the story it couldn’t verify. The parallels between then and now may be even closer than I realized.

I don’t believe that the Post has responded to the Semafor story, which has not been corrected or amended.

Why the Times’ and Post’s decision not to publish calls to mind the Bay of Pigs myth of 1961

Front and center: The New York Times reports on the imminent invasion of Cuba on April 7, 1961.

The New York Times and The Washington Post learned about U.S. plans to attack Venezuela shortly before the raid began, according to Max Tani and Shelby Talcott of Semafor. But they declined to run with the story “to avoid endangering US troops, two people familiar with the communications between the administration and the news organizations said.”

Sign up for free email delivery of Media Nation. You can also join my Patreon for just $6 a month and receive a weekly newsletter with exclusive content.

The decision was reminiscent of the legend over how the Times reported on an imminent U.S.-backed invasion of Cuba in 1962, which I’ll get to in a few moments.

But first, regarding the Venezuela decision: Right call or wrong call? As the Semafor story notes, the decision was “in keeping with longstanding American journalistic traditions.” Independent media commentator Margaret Sullivan writes that she’s torn and asks her readers to weigh in. At the Columbia Journalism Review, Jem Bartholomew leans toward yes they should have on the grounds that the Times and the Post knew the raid would violate international law.

Continue reading “Why the Times’ and Post’s decision not to publish calls to mind the Bay of Pigs myth of 1961”

David Brooks tells the ‘PBS NewsHour’ that he didn’t know Jeffrey Epstein was in the room

The last thing I want to be doing on the Saturday morning before Christmas is writing about David Brooks’ undisclosed (by him) encounter with the notorious pedophile and sex criminal Jeffrey Epstein. But it’s in the news, and there are plenty of people, especially on social media, who are demanding that the New York Times columnist and “PBS NewsHour” commentator be held accountable.

Sign up for free email delivery of Media Nation. You can also become a supporter for just $6 a month and receive a weekly newsletter with exclusive content.

So let’s review the facts that have come out. As Jeremy Barr reported in The Guardian, photos released on Thursday by House Committee on Oversight and Government Reform reveal that Brooks attended a lunch or dinner where Epstein was present in 2011. Unlike photos of many other powerful men that have been released recently, there are no photos of Brooks actually with Epstein.

Continue reading “David Brooks tells the ‘PBS NewsHour’ that he didn’t know Jeffrey Epstein was in the room”

A right-wing influencer smears CNN; plus, murder on the high seas, and an immigration outrage

The Pentagon. Photo (cc) by Wiyre Media.

On the latest edition of the public radio program “On the Media,” co-host Micah Loewinger engages in a wonderfully contentious interview with right-wing influencer Cam Higby, a newly minted member of the Pentagon press corps. Higby is among a gaggle of MAGA promoters who’ve moved in after actual reporters walked out rather than sign Secretary of Defense Pete Hegseth’s directive that they agree not to report any unauthorized news.

Continue reading “A right-wing influencer smears CNN; plus, murder on the high seas, and an immigration outrage”