Know your cloud storage: Integrity, privacy and SLAs

The cloud storage services have been around for years, and the offerings have been improving rapidly. There are Google Docs Drive, Microsoft SkyDrive OneDrive, Dropbox, Amazon S3, Azure (Blob) Storage, Google Cloud Storage, iCloud, et cetera.

What I see ever more often is a lack of understanding of what each of the storage services are meant for. This is made more confusing by the tendency of storage service providers to wrap their services with tools closely resembling the operating systems file explorer view, even if internally the storage is nothing like plain disk storage.

Last week published their findings on how OneDrive alters the stored files, which rightly caused many to reconsider their usage of the service for sensitive data or corporate information. Integrity of the files means that you expect to get out files in identical condition as they were when you put them in. OneDrive fails at this, and what is worse, does so while making it superficially look like nothing has changed. The actual changed content was some sort of metadata about the files.

End user services

From the list above, I would classify OneDrive, Dropbox, Google Drive and iCloud as built for the end users, and therefore intended not to require any prior knowledge of the internal functioning of the service.

OneDrive and iCloud are built in to specific desktop operating systems and mobile platforms, to the degree that it actually takes more user effort to disable them than to use them. Dropbox needs to be separately installed, but similarly ‘just works’ as automatically cloud syncing file system path.

Drive differs from the rest in that it is intended primarily as online office document creation service, and only provides the storage capability as a side effect of that function. Regardless, many use it for lightweight cloud storage, myself included. The tools and the service experience are regardless built for layman use, and should be considered from that viewpoint.

The OneDrive incident only shows that Microsoft keeps true to their vision of making the end user experience as easy as possible, regardless of how a small portion of technically inclined users might feel about their service effectively breaking the integrity of the stored files. Microsoft has a long history of altering user data starting with MediaPlayer & MP3 metadata. And can you blame them for forcing their vision for their ecosystem, when Apple would do the same thing on theirs? Only if you made the assumption that you should use the same service to backup your corporate files that teenagers are using for sharing their selfies.

I came across another breach of implied service level with Dropbox shared directories. I purchased my old work laptop when leaving an employer, so I ended up having a previously shared Dropbox directory with monthly sales summary documents in them. My access rights to those files had been revoked, but they still existed on my local hard drive, and Dropbox still kept trying to sync them with online versions: Actual file contents were not updated, but I still got the metadata updates on when files had been changed. My personal assumption had been that once you revoke someones access to files, they will get nothing related to those files. Turns out I was wrong.

Infrastructure services

Amazon S3, Azure Storage and Google Cloud Storage are entirely different services from the end user services. While at core their function is to act as storage services like previous ones do, they are in a stark contrast with end user services by providing very versatile access controls while providing only crude graphical tools for file operations. Access to and manipulation of the files are intended to be done through APIs, and effective use of the services for heavy users requires somewhat thorough understanding of the storage architecture.

The most significant difference comes from having a service level agreement (or SLA) for these infrastructure services. Amazon S3 SLA promises their users 99,9% Monthly Uptime Percentage, with up to 25% of monthly costs given as service credits for failing to live up to that promise. In addition, their product details describe the service having been designed for 99.999999999% durability and 99.99% availability of objects over a given year, and to sustain the concurrent loss of data in two facilities.

Azure Storage SLA and Google Cloud Storage SLA provide similar promises.

Building blocks

You should also be aware that some of the end user services are actually built on those infrastructure services. Most famous example is Dropbox, which is often used as a good example of how to use Amazon S3 innovatively as the storage backend. Dropbox only stores user information and metadata about the files on their own servers, and does all the heavy lifting with S3. Comparing their billing ($9.99/month for Dropbox Pro, 100 Gb) with S3’s ($0.03/month/Gb * 100 Gb = $3/month) they get a decent cut for providing good user experience on top of other providers infrastructure.

Jukka Dahlbom
Jukka Dahlbom
Head of Data Engineering, Co-founder Tietotyön vastapainoksi vaellan, musisoin, tanssin ja pelaan.