An introduction to file, block, and cloud object storage at IBM Cloud.
The world we live in is made up of data. It is everywhere and can be anything — from a first name to a playlist of songs, a video game, a website, or an entire feature-length film. It can also be program code, private keys, an operating system image, and everything in between. To a computer, data is just broken down into ones and zeroes (bits). However, it all needs to be stored somehow.
Along with compute and networking, storage is an indispensable component of cloud computing infrastructure. It provides both the physical space and the organizational models required to keep data for cloud-native workloads. In other words — it’s how cloud-based applications like Netflix and YouTube can store and stream terabytes and petabytes of videos anywhere in the world.
At IBM Cloud, storage teams are differentiated between raw storage and managed databases.
- Raw storage — stores just the data, with no software on top that is responsible for management. However, users have the flexibility to assemble it the way they want and add the services they need such as backup, encryption, snapshots, and more.
- Managed databases — a collection of raw storage controlled by a database management system such as IBM Db2, Postgres, MSSQL, MongoDB, or Cloudant.
We’ll focus on raw storage for this article and take a deep dive into databases another time. Within raw storage, there are three primary types: file, block, and object.
File storage is the simplest type of storage and is best for sharing data across multiple servers in your cloud environment. Files are stored hierarchically in folders and subfolders, similar to your Mac Finder or Windows Explorer. If you’ve ever used a shared network drive at work or at school, you’ve used file storage. File storage works by storing data on a “Network Attached Storage (NAS) device”, which is a type of server that other servers can connect to over a network.
Since file storage uses human-readable labels in folders and subfolders, it’s the most intuitive to use for retrieving documents. It’s also excellent for sharing and collaborating on files. However, if you need to make updates to many different documents, it’d be quite a slow process because of the folder and subfolder hierarchy. So, file storage is easy to use on a small scale, but if a company has lots of data that needs to be updated constantly, they might want to use other storage methods for faster performance.
Block storage is like a portable hard drive that you attach to a single server and is best for managing a lot of structured data. Unlike file storage — where a file is stored hierarchically in one piece — block storage breaks a file down into equal chunks of data (“blocks”) which are stored separately wherever it’s most efficient in a storage environment. When you need to retrieve the file, those blocks are quickly reassembled to present the file back as a whole piece. Because of this, it’s much faster to retrieve a large file with block storage compared to file storage.
If you’re familiar with the story of Charlie and the Chocolate Factory by Roald Dahl, there’s a scene where master inventor Willy Wonka demonstrates sending a physical chocolate bar through the TV. He explains that his machine breaks the chocolate bar into little particles transmitted through the air, then reassembles the bar for the viewer on the other end. That’s similar to block storage, where the little pieces are blocks.
Block storage is great if you have a lot of structured data that needs to be accessed and processed quickly, as in the case of databases. In addition, if you want to make changes to a file, only the specific block with the changes needs to be updated, as opposed to rewriting the entire file. Some drawbacks to using block storage are that other servers can’t access information simultaneously without additional configuration, and it’s generally more expensive to maintain compared to file storage. For these reasons, companies might use cloud object storage for more static types of data.
Unlike file or block storage, cloud object storage (COS) groups data into “objects” and stores them in a flat, non-hierarchical way without being attached to a single server. Objects contain metadata, which makes it easy to store and analyze lots of data like videos, images, or backups. If you’ve used any social media, they’re likely using object storage for their cloud infrastructure behind the scenes.
Objects are stored in a “storage pool”, which is a collection of servers called nodes. Think of these nodes as buoys in an ocean. An object is broken down into pieces based on the metadata (not in equal-sized blocks like block storage), which are then stored across the various nodes. When you retrieve the object, it comes back whole, similar to block. The key difference from block is that even if a node goes down, you can still get your complete file.
One benefit of COS is that storing metadata makes it easy to filter and analyze lots of information without having to open the entire file. This makes it the ideal choice for long-term data archiving. COS is also extremely scalable since it doesn’t need to be attached to a single server like with block or file. However, this also means that it may take longer to retrieve an entire file.
Wrapping this up
While there are nuances to consider when selecting the best type of storage, it’s an indispensable piece of any cloud infrastructure. As applications become more technologically demanding and require increasingly more data, the need for highly scalable, fast, and secure storage becomes greater. We hope this has helped give a short introduction to the basics of enterprise storage at IBM Cloud and encourages you to continue exploring.
– Written by: Alissa Chan, Josef Bodine, and Austin Edwards
– Illustrations by: Austin Edwards
– Special thanks to reviewers Violet Rodriguez and Liz Mitchell!