MirrorDisk/UX ============= Key benefits * allows for drive redundancy, reducing the impact of total disk failures and bad blocks as the software can spare off out the block and reread the data from the other drive in the mirror set * when configured to mirror across two disk interfaces, protects against data loss in the event of an interface failure * when an interface or disk is brought back online after a failure, resyncs the mirror automatically * simplifies system backup with the ability to split a mirror pair, allowing one to continue serving data while the other is being backed up Note, however that mirroring does *NOT* protect against writing bad data or accidental deletion of files. Mirroring is a supplement to proper tape backup and should not be thought of as a replacement. Concepts A normal logical volume (LV) under HP-UX 10.x and 11.x is built from logical extents (LE) which map one to one onto physical extents (PE). MirrorDisk/UX extends this functionality by allowing you to map an LE onto two or three LEs, providing as many as two mirror copies of your data. MirrorDisk/UX allows you to choose from two scheduling policies - parallel or sequential. Under the parallel policy, the schedular will read from the physical volume (PV) with the lowest outstanding operations and write to the PVs simultaneously. Under sequential, reads and writes are scheduled in PV order. Parallel scheduling provides the best throughput in virtually all cases and is the default. Sequential scheduling is typically only used to provide extreme caution in maintaining mirror consistency. If a disk goes down (for example, due to failure of disk or interface) then the scheduled I/O for that disk will be timed out by the disk driver. LVM will continue retrys over a period of time, but will eventually mark the disk as unavailable. The status of the disk is then posted to the volume group reserve area (VGRA) in a high priority write. If mirroring is in use, I/O will continue to the the other PEs if they are on different PV that is currently available. On reoot, if the stale disk is available again, the VGRA is updated and the PEs are resyncronized from the good copy. LVM commands for mirroring lvcreate [-m copies] [-L size in MB] VG lvextend [-m copies] LV PV lvreduce [-m copies] LV PV lvdisplay -v LV These options are in addition to those available to LVM without the MirrorDisk/UX product. When using lvextend to add a mirror to an existing volume, the entire contents of the volume will be copied. The possible values for the -m option are 0, 1 or 2; i.e. having no mirror, single mirror or double mirror. It is not possible to use the size option and the mirror option on the same lvextend command. If you wish to place your mirrors on specific PVs, you must first create an LV with no size option, then use *two* lvextend commands, one to place the original, and the second to place the mirror copy on a specific disk. For example, lvcreate -n mylvol /dev/vg00 creates an LV mylvol in the first volume group, lvextend -L 200 /dev/vg00/mylvol /dev/dsk/c0t6d0 maps 200MB of space to mylvol on the disk with id 6 on the first disk interface, lvextend -m 1 /dev/dsk/c1t6d0 finally creates the mirror copy, located on the disk with id 5 on the second disk interface. LVM commands for synchronization lvsync LV vgsync VG Resyncronization is rarely done manually, as mirrors automatically syncronize themselves upon activation at boot time or through the use of the vgchange command. However, there are instances where you will need to manually resync your mirror set, primarily when you replace a disk in a volume group that is already active. When you reactive an already active VG, vgsync is not done automatically. Procedure for online backups lvsplit [-s suffix] lvpath [lvpath...] lvmerge backup_lvpath master_lvpath To reduce downtime, it is possible to split one copy out of a mirror set for the purpose of doing an online backup through the use of the lvsplit command. Lvsplit will freeze the mirror copy and create a second logical volume which points to it. When splitting a logical volume it is desirable to close it first or at least flush any data that may be stored in disk buffers.. With a file system, of course, unmounting would be the prefered course of action, as this ensures that all data belonging on the disk is written out. If this is not possible execute sync prior to splitting the volume and fsck the split off volume prior to backing up. Once the volume has been split off, the origional LV can continue to be used as normal. By utilizing the three-way mirroring provided by MirrorDisk, it is possible to maintain data redundancy while still allowing for online backup. Example sync lvsplit -s backup /dev/vg00/mylvol fsck -p /dev/vg00/rmylvol1backup mount /dev/vg00/mylvolackup /backup Then you can use your preferred backup command (tar, cpio, fbackup) to write /backup to tape. Notice that fsck was run with the -p flag; this is to indicate preening mode. You should always use this flag when checking filesystems split out of a mirror set. Once the backup is completed, you can use the lvmerge command to remerge the backup LV back into the mirror set. When remerging a mirror you must close the backup volume, unmount it if it is a filesystem, then use lvmerge. The master copy can maintain active and mounted. Once the mirror is rejoined, the mirror will resyncronize and modified physcial extents. Example umount /backup lvmerge /dev/vg00/mylvolbackup /dev/vg00/mylvol Warning - lvmerge is able to merge the volumes in either direction. Merging the master volume into the backup will result in losing any changes that happened whilst the backup copy was split. If you have multiple logical volumes that contain data that must be kept consistant, such as a database, it is recommended that you split them off in a single lvsplit command. This ensures that the split is atomic; i.e. all the logical volumes are brought offline in one system call. Mirrored I/O scheduling and recovery In the event of an ungraceful shutdown, it is unlikely that all pending I/O will have completed. When using a mirror set, this means that the PEs in each mirror may not hold exactly the same data when the machine is brought back up. Given a situation where we are using a three way mirror a write I/O may not have the chance to complete for all three mirrors at the same time, so it becomes necessary to have some mechanism for tracking which I/Os have completed. Any given scheme requires a delicate balance - you need the tracking process to be fast, so it does not affect I/O throughput; and you need it to survive an ungraceful shutdown so we can resyncronize the mirrors when the system reboots. Tracking of pending I/O under MirrorDisk/UX is done in a cache called the Mirror Write Cache (MWC). This holds the status of write I/Os to relatively large areas of a logical volume called logical track groups (LTG). When a write I/O is scheduled, it is resolved onto the LTG it affects and the entry is marked dirty. When the I/O completes then the entry is marked clean. When a write I/O request is made the LVM driver checks the MWC to see if the LTG is marked dirty. If it is, then the I/O is scheduled; if not, then the LTG is marked as dirty and a high priority I/O is scheduled to write the MWC to disk, where it becomes a mirror consistancy record (MCR). Once this is completed, the I/O is unblocked and the data write is scheduled. Mirror consistancy options You do have some control over what type of mirror consistancy you maintain on a system. There are advantages and trade-offs to each which have to be evaluated based on your applications requirements. You would set the policy when you create the LV with the lvchange command or you can change it later using the lvchange command. MWC/MCR on gain : fast recovery on a system crash loss : additional runtime writes flag : -M y MWC/MCR off mirror consistancy recovery on gain : no extra overhead at runtime loss : slow recovery on system crash flag : -M n -c y MWC/MCR off mirror consistancy recovery off gain : application can do necessary recovery loss : no system managed recovery flag : -M n -c n For most applications, it is advisable to leave MWC/MCR on, as it provides for the fastest and most reliable recovery in the case of a system crash. Extent allocation policies As LVM mirrors data rather than whole drives, it becomes important to understand the concept of extends and the associated allocation policies before you set up your logical volumes. For example, it is entirely possible to have two mirror copies residing on the same PV, eliminating the redundacy that mirroring is supposed to provide. There are three extent policies that can be used when creating mirrors ; non-strict, strict and PVG-strict. At minimum, you should be using strict, which forces logical extends to mirror to seperate physical volumes. The PVG-strict option is an advanced setup that allows you to specify a group of PVs as being in a physical volume group (PVG). A typical scenario where this comes into play is where you wish to force mirroring across controllers. Under the strict policy you have no guarantee that extents allocated are beingmirrored to a specific disk or controller. In this scenario you would either edit /etc/lvmpvg manually or by using the -g flag on either vgcreate or vgextend. Example vgextend -g BUS1 /dev/vg00 /dev/dsk/c1t6d0 /dev/dsk/c1t6d0 would create a PVM named BUS1 in the volume group VG00, consisting of the the disks at id 6 and 5 on the second controller. LVM commands revisted lvcreate [-m copies] [-d schedule] [-s policy] [-M y/n] [-c y/n] VG lvextend [-m copies] LV [PV] lvreduce [-m copies] LV [PV] lvchange [-d schedule] [-s policy] [-M y/n] [-c y/n] LV When creating a new mirror you should consider carefully what allocation policy you which to use. Once an LV has been mirrored, it is not possible to to use lvchange to tighten up these policies. The suggested configuration for a single interface would be a strict allocation policy with no physical volume groups configured; for a system with multiple interfaces, it is typically recommended that you configure all the disks on a given interface into a PVM and enforce PVM-strict allocation. Summary : Mirrored Volume Options Number of copies This is fairly straight forward; the number is not including the master copy, so 0 is no mirroring, 1 is single mirrored and 2 is double mirrored. Strictness The default setting for extent allocation is to take each physical extent that makes up a logical extend from a seperate physical volume. This is known as strict allocation. Of the other options, only PVG-strict is typically useful, and is used in order to mirror across controllers. Mirror Write Cache This is LVM's standard way of maintaining consistancy across a mirrored logical volume; it involved marking areas of the logical volume as "dirty" when changes have been made to the master copy, allowing for quick recovery in the event of a system failure. The other options available for this would be either marking the entire mirror copy as dirty when we haven't written to it, which speeds up throughput but takes a hit in resyncing; or doing no consistancy checking at all. This option is typically only used when an application will be doing its own recovery or for swap areas where no data is held during a system reboot. Schedule When accessing a mirrored LV the LVM driver will split a request to write to an LE into writes to all the PEs that it contains. These requests can happen in parallel or in sequence, where one request must complete hefore the system attempts to write the next copy of the data. Mirroring boot volumes By using mirror copies of the root and primary swap LVs you will be able to keep your system operational in the event of a boot disk failure. Since MirrorDisk/UX is a data mirroring product rather than a disk mirroring product certain steps must be taken to provide all critical boot information to the system on the backup disk. A typical mirror would require these steps: 1. Create a physical volume with a boot area reserved. pvcreate -B /dev/rdsk/c1t6d0 2. Add the physical volume to the root volume group vgextend /dev/vg00 /dev/dsk/c1t6d0 3. Use mkboot to place the boot utilities in the boot area and add the AUTO file mkboot /dev/rdsk/c1t6d0 mkboot -a "hpux -lq" /dev/rdsk/c1t6d0 Note the inclusion of the -lq flag in the boot string. This indicates that the system is to boot without quarum checking enabled ; on a normal boot-up you are required to half the disks plus one operation in order for the boot sequence to complete. 4. Use mkboot to update the AUTO file on the primary boot disk mkboot -a "hpux -lq" /dev/rdsk/c0t6d0 5. Mirror the stand, root and swap logical volumes lvextend -m 1 /dev/vg00/lvol1 /dev/dsk/c1t6d0 lvextend -m 1 /dev/vg00/lvol2 /dev/dsk/c1t6d0 lvextend -m 1 /dev/vg00/lvol3 /dev/dsk/c1t6d0 6. Update the boot data reserved area (BDRA) and the LABEL file lvlnboot -R /dev/vg00 7. You may want to update your server's alternate boot path to point to the mirror copy of the boot disk. This is machine specific. 8. Typically you want to change the mirror consistency recovery policy of your swap partition. The default policy (MWC) requires additional overhead to maintain consistancy. Since there is no need to recover swap space in the event of a system crash, it is safe to change this to none. Since you cannot change the policy on a volume group while it is in use, you must perform this command in LVM maintainance mode; i.e. when bringing the system up, interact with the ISL (or IPL) and boot with the -lm flag. Activate the volume group and change the recovery policy vgchange -a y vg00 lvchange -M n -c n /dev/vg00/lvol2 Update the BDRA again lvlnboot -R /dev/vg00 Do *NOT* go directly into multi-user mode. Reboot the system instead. reboot