SimpliVity Data Protection
Data protection requires that data copies be maintained both locally for operational recovery and offsite for disaster recovery.
Data Protection is any process or technology that makes a secondary copy of data. Taking copies at one or more points in the course of a day provides an “insurance policy” should the primary copy of the data be lost or corrupted. An effective data protection strategy includes local backup copies for operational recovery, and remote backup copies for disaster recovery at an off-site location.
Sounds simple, right? Not exactly. Lots can go wrong with data protection strategies. There are lots of factors to consider: how the copy is made, where it is stored, how long it is retained, how frequently copies are made, how much has to be copied, how much time it takes to make the copy, and more. And, just when you think you’ve perfected data protection in your environment, something changes and disrupts an established data protection process, schedule, or component.
The cycle of perfecting data protection and then having some catalyst event “break” it is fairly common. It’s frustrating and can be costly: in time, risk to your business, and with your data protection budget.
One of the major issues plaguing data protection over the last several years has been data growth. With more data to back up and recover, it’s difficult to complete copy processes within prescribed backup windows. It’s also difficult to store and transfer more data across your LAN, SAN or WAN. These data protection challenges have likely caused you to continually adapt – and probably make significant investments in data protection hardware and/or software.
Data Protection – Virtualized Data Centers
Virtualizing your data center is one of those catalyst events that “breaks” data protection. Pre-virtualization, you probably leveraged traditional backup software and performed file-level backup to and recovery from disk and/or tape. Post-virtualization, you were probably adapting your data protection to take advantage of hypervisor APIs to facilitate LAN-free image backup of virtual machines. The switch helped you eliminate inefficiency and resource contention, but required more data protection components – and costs.
In a pre-virtualization environment, a data protection approach leveraging array-based snapshots and replication made sense. There was a one-to-one mapping of an application on a physical server to an allocated LUN (block storage) or a share (file storage). LUN-level policies for data protection were straightforward and simple.
The same data protection strategy for virtual workloads introduces efficiency and management challenges. After virtualization, there is a many-to-one relationship between applications and storage. A single datastore on a LUN contains the data of multiple virtual machines. Policies applied to a LUN apply to all virtual machines sharing the datastore. Performing a snapshot of virtual machines at the datastore level removes the impact of data protection on server resources. However, now the snapshot includes a mix of workloads and it’s more difficult to capture application-consistent backups. Policies for frequency, retention time, and backup location cannot be refined for an individual virtual machine. When the data protection policy calls for an off-site copy, the replicated snapshot transferred to the remote disaster recovery site contains all virtual machines associated with the LUN. There is no way to be selective.
Backup software, backup hardware, deduplication features or standalone products, and WAN optimization add to the cost and complexity of virtual machine data protection.
Data Protection – Remote/Branch Offices
If you have multiple locations to oversee, how challenging is it to ensure data protection? There are pros and cons to centralizing data protection, and there are pros and cons to maintaining remote infrastructure for local data protection. A centralization strategy requires transfer of backup data across a WAN, which could extend the time it takes to complete your backup, and be more costly due to bandwidth requirements. A distributed strategy allows you to maintain data protection at the remote/branch office. However, you may have challenges remotely managing and troubleshooting data protection issues if the remote office is not staffed.
Ideally, your remote office data protection strategy should allow for both local and off-site copies, highly efficient transfer of data, and centralized management. SimpliVity’s integrated data protection accommodates such a strategy.
Data protection was a forethought – not an afterthought – in the design of SimpliVity’s OmniCube hyperconverged infrastructure. Modern data protection is a feature of the system. OmniCube automates data protection through policies for backup frequency, retention, and destination, and application consistency. VM-level backup at one or more points in the day allows for on-premises operational recovery to meet the most aggressive recovery objectives. Transferring bandwidth- and storage-optimized backup copies between OmniCubes at central and remotes sites enables cost-effective disaster recovery.
Data Protection – Disaster Recovery
Your data protection strategy is incomplete without disaster recovery. Operational recovery happens every day. A disk or server fails, a database table gets corrupted, or you have a software defect. Having a local backup to use for recovery ensures minimal downtime. Disaster recovery is less common. A systemic failure, or natural or man-made disaster creates a situation where access to physical resources and data copies housed at a remote location is required.
Even though virtualization created challenges in data protection, it also enabled big improvements in disaster recovery, including:
- Encapsulating the virtual machine into a single file to enable mobility.
- Eliminating the need to mirror the physical system for disaster recovery.
- Delivering flexibility with disaster recovery testing.
Virtualization is facilitating disaster recovery for organizations that thought implementing it was too complex or costly. According to Gartner, “By 2017, 50% of large enterprises will use IT services failover between multiple data center sites as their primary disaster recovery strategy.”
Cost has always been a key consideration to implementing and maintaining disaster recovery. In addition to virtualization enabling physical-to-virtual and virtual-to-virtual disaster recovery scenarios to reduce expenses, hyperconverged infrastructure is impacting the economics of disaster recovery.
Deploying an OmniCube system in two sites – with each system being the disaster recovery target for the other – facilitates disaster recovery. SimpliVity captures backup copies at the primary site and optimizes their transfer and storage to the disaster recovery site. SimpliVity stores data in a deduplicated, compressed and optimized state at inception and throughout its lifecycle – including data protection copies.
Data Protection – Efficiency with Deduplication
Deduplication reduces bandwidth and storage capacity needs by eliminating redundant data and retaining only one unique instance of the data on storage media. Replacing redundant data with a pointer to the unique data takes significantly less storage capacity. Considering the multiple copies made for data protection, introducing efficiency with deduplication in data protection processes is key.
There are lots of techniques for determining redundancy, and, at the end of the day, splitting hairs over deduplication ratios is not what’s important. Efficiency is key, but you will want to understand the risks and tradeoffs of the different deduplication approaches. One key aspect to decipher is “inline” versus “post-process” reduplication.
Performing deduplication “inline” is efficient. As data is being written to disk, deduplication occurs. “Post process” can be less efficient. That’s because data is written to disk in its regular state, and then, at a later time, a process kicks off to deduplicate the data. Writing data to disk, reading data from disk, deduplicating data, and writing data to disk again takes up resources unnecessarily. Any contention for resources slows application performance.
SimpliVity has inline deduplication. SimpliVity’s Data Virtualization Platform is software that abstracts data from its underlying hardware and the hypervisor. It achieves data efficiency by deduplicating, compressing and optimizing all data at inception in real time, once and forever – and without a performance penalty. One way that SimpliVity delivers zero overhead in its deduplication process is via an accelerator card in OmniCube to “boost” resources for this process.
One thing that’s important to note is that data remains in its deduplicated state throughout its lifecycle. If a copy is taken for backup, the copy is made in this optimized state. There’s no process of “rehydrating” data to make backup copies and re-deduplicating it after.
Deduplication addresses another (even larger) issue in today’s modern data center. IOPS requirements have increased by 10x in post-virtualization environments. Hard disk drive IOPS are stagnant and can’t keep pace with today’s requirements. Using more flash storage is one way to address this problem. However, flash is pricey and it’s only suitable for portions of the data lifecycle. Worrying about having adequate capacity to keep pace with data growth is no longer what keeps IT professionals up at night. Ensuring adequate performance/IOPS to fuel application requirements is the challenge – and achieving it in the most efficient way.
That’s where SimpliVity comes in. Deduplication, compression and optimization of data is performed at the time it is created, once and forever – saving IOPS and improving performance – across all data lifecycles, tiers, data centers, and to the Cloud.
Data Protection – Public Cloud Storage
When you’re lacking a second site, it creates a hole in your data protection strategy. Where do you store off premises copies for disaster recovery?
That’s why public cloud storage is becoming more popular in data protection strategies. Transferring copies to and storing copies in a public cloud repository is a failsafe measure if you don’t maintain a second site.
Since only the last full copy is required for disaster recovery and public cloud storage services are consumed on a per-use basis – in this case, based on capacity – public cloud storage is cost-effective. It’s even more compelling when the backup copy is capacity and bandwidth optimized.
SimpliVity integrates with Amazon Web Services (AWS) public cloud storage. Running an instance of OmniCube software in AWS allows for seamless integration with the OmniCube federation. This configuration enables a protection policy to securely move data to and from public cloud storage – in the same manner you would move backup copies between data centers.
If the primary production system and any local copies become unavailable due to a disaster, you can reconstitute the physical environment and recover the public cloud computing copy to resume operations.