Data leak
Microsoft AI Research Division 38TB Data Exposure via SAS Token β GitHub Misconfiguration
Primary Source βIncident Details
In July 2020, Microsoft’s AI research division accidentally published an Azure Shared Access Signature (SAS) token with overly permissive access when sharing an open-source training data contribution on GitHub. The SAS token granted anyone with the link full access to the entire Azure Storage account β not just the intended public training data. The storage account contained 38 terabytes of sensitive data including private keys, passwords, internal Microsoft Teams messages from 359 Microsoft employees, and over 30,000 internal Microsoft Teams messages, as well as secrets, private keys, passwords, and other sensitive internal Microsoft files. The token was accidentally published for approximately 3 years (July 2020 to September 2023), until Wiz.io security researchers discovered and reported it. Microsoft patched and secured the storage account after Wiz’s notification. Microsoft stated that no customer data was exposed and no other internal Microsoft services were put at risk. The overly permissive SAS token also allowed write and delete access β meaning anyone who discovered it could have modified or deleted the exposed data or potentially planted malicious data into AI training datasets. The case illustrated a fundamental risk with Azure SAS tokens: they grant access based on the URL alone (no authentication required), making accidental exposure in code or documentation particularly dangerous, and they can persist for years if not carefully managed.
Technical Details
- Initial Attack Vector
- Microsoft AI researchers accidentally included an overly permissive Azure Shared Access Signature (SAS) token when publishing open-source training data to a public GitHub repository; the SAS token granted full read-write-delete access to the entire Azure Storage account β not just the intended public dataset
- Vendor / Product
- Microsoft Azure Storage (AI division internal data)
Timeline
- 2020-07-01 Breach occurred
- 2023-09-18 Publicly disclosed