Hillary Clinton claimed she used automated keyword searches to sort official messages on her private server from personal correspondence, so she could (eventually) give the former to the State Department and delete the latter at her leisure. We were given assurances this process was extremely sophisticated and highly accurate.
When that claim didn’t go over well, Camp Clinton hurriedly backpedaled and said someone read everything before deleting it, but there’s no sense anyone on Earth is actually buying the new story. Some interesting thoughts on the difficulty of using algorithms to sort email were offered on Wednesday at New York magazine by Michael Wolraich, who has experience working on just such a project as a government contractor for Bill Clinton’s administration in 1994.
“One day a colleague invited me to join a mysterious new project for the Executive Office of the President (EOP),” Wolraich writes. “The White House had hired IMC to archive its email after the court ordered it to preserve electronic records. Few people had multiple email accounts back then and many federal employees used their work accounts for personal communication, so we had to figure out some way to distinguish work email from personal correspondence.”
IMC is Information Management Consultants, the company he was working for at the time, wittily described as “one of many three-letter-acronym corporations that ring Washington’s famous beltway and feed off government contracts.”
As Wolraich describes it, his company’s task was to create an algorithm that would do exactly what Hillary Clinton claims she did with her private mail server: analyze correspondence for certain keywords and sort out personal vs. professional messages. The team put a great deal of effort into the program’s logic, created an exhaustive list of keywords…. and came up with “abysmal” results of, at best, 70 percent accuracy.
While he allows for the possibility that superior search algorithms might have been developed since 1994, Wolraich makes some interesting points about the extreme difficulty of working around the imprecision of human communication: “Our problem was that natural language – the way people ordinarily speak and write – is notoriously difficult to parse. To make sense of natural language, it’s not sufficient to recognize the words; you also need to understand grammar, appreciate nuance, interpret metaphors, grasp allusions, infer from context, and even have a sense of humor. Right now, only humans can do that reliably.”
That would be tricky even if we assumed the people who produced the correspondence, and those implementing the search algorithms, were making every possible good-faith effort to achieve total accuracy, which of course is hardly the case with Hillary Clinton.
As Wolraich notes, nothing about the procedure Clinton described sounds like a super-advanced expert system that took advantage of cutting-edge 2015 artificial-intelligence technology. She talked about doing the same sort of keyword search that couldn’t deliver better than 70 percent accuracy. Even if Clinton’s team is given unwarranted credit for diligence, and we assume she didn’t delete anything important before the keyword search was undertaken – an assumption for which we will never have supporting evidence – that’s not a very encouraging success rate for a trove of 60,000 emails.
Wolraich makes the very reasonable suggestion that Clinton should release detailed information about her search algorithm and its results, including whether it was tested before implementation. Given how secretive Clinton has been every step of the way, I wouldn’t expect such disclosure any time soon.