Examining the data, UpGuard found "loose correlations" to regional U.S. security concerns in the likes of Iraq and Pakistan. The online storage misconfiguration allowed anyone with a free Amazon AWS account to browse and even download the data. The Department of Defense was storing the billions of public posts that it had collected on Amazon S3 repositories.
The security firm was able to gain access to the data because a contractor used by the Defense Department stored it in a way that was accessible by anyone with an AWS account. The data from only one bucket is estimated to contain 1.8 billion posts gathered over a period of eight years. CENTCOM refers to the U.S. Central Command, responsible for United States military operations from East Africa to Central Asia, including the Iraq and Afghan Wars.
In response to Upguard's discovery, Centcom has complained about the company using "unauthorized access" to get access to the data and "employing methods to circumvent security protocols". Given the enormous size of these data stores, a cursory search reveals a number of foreign-sourced posts that either appear entirely benign, with no apparent ties to areas of concern for USA intelligence agencies, or ones that originate from American citizens, including a vast quantity of Facebook and Twitter posts, some stating political opinions. It's hard to say, though, if the data had previously been accessed.
The U.S. government's thinking around this seems to be that gathering as much information as possible helps it "find the needle in the haystack", when in fact, it's merely adding more hay to the stack when it collects data that's irrelevant to national security.
Even intelligence gatherers aren't immune to making mistakes that leave data wide open.
The UpGuard researchers discovered the public buckets on September 6, but it's unclear for how long this data has been available to the public and how many malicious actors may have taken advantage of the Pentagon's error.
The DoD has since confirmed the data leak to CNN. One of the reasons why the Pentagon may have kept the data in plain-text is because it wanted other intelligence agencies and third-party tools to have access to it, which is much easier to do when the data is not encrypted.