Patch-ID# 104347-08 Keywords: cvm drl pdb sparccluster vm catman Synopsis: Ultra Enterprise PDB 1.2: Multiple fixes for Cluster Volume Manager Date: Jun/22/99 Solaris Release: 2.5.1 SunOS Release: 5.5.1 Unbundled Product: SPARCcluster SUNWvxvm SUNWvmman Unbundled Release: vxvm2.2/cvm2.2 Relevant Architectures: sparc BugId's fixed with this patch: 4009536 4007143 4018956 4025400 4044803 4061879 4065499 4061474 4066260 4076902 4078677 4101872 4103257 4151066 4155764 4183493 Changes incorporated in this version: 4183493 Patches accumulated and obsoleted by this patch: Patches which conflict with this patch: Patches required with this patch: Obsoleted by: Files included with this patch: [SUNWvmdev] /opt/SUNWvxvm/include/volintf.h [SUNWvmman] /opt/SUNWvxvm/man/man1m/vxdctl.1m /opt/SUNWvxvm/man/man1m/vxdg.1m /opt/SUNWvxvm/man/man1m/vxnotify.1m /opt/SUNWvxvm/man/man1m/vxrecover.1m /opt/SUNWvxvm/man/man1m/vxstat.1m /opt/SUNWvxvm/man/man1m/vxvol.1m [SUNWvxvm] /kernel/drv/vxio /kernel/drv/vxio.conf /kernel/drv/vxspec /kernel/drv/vxspec.conf /sbin/vxconfigd /usr/sbin/vxclust /usr/sbin/vxinstall [Unbundled] scripts/upgrade_start scripts/upgrade_finish Problem Description: 4183493: This problem has been isolated, root-caused, and fixed. The problem was that vol_mem_allocsio_start did not return a value. The result was that the function itself would put the memallocsio on the idle_start queue and if the function returned the wrong garbage from the stack the caller of the function would also put the memallocsio on the idle_start queue. Putting one sio on multiple queues at the same time can cause things in the queue to instantly disappear. The symptom can be seen from the core by examing the volsioq_idle_start pointer which contains NULL head/tail pointers, but a non-null counter. i.e. there should be work on the queue, but the work isn't there. How to identify this problem: Go into adb and check CVM's global external structure volsioq_idle_start: volsioq_idle_start/3X volsioq_idle_start: volsioq_idle_start: 0 0 2 The first two NULL values are head-tail pointers. Since the head-tail pointers are NULL, the last value, a counter, should also be 0. The counter is not 0, it is 2. So we presume there are two I/O's which are only trackable via the volsioq_idle_start queue, which are now missing, hence the hang. In any case, the structure is inconsistent and this is a bug. 4155764: When plexes are detached as a result of I/O failure on one or more disks, volume manager stores this information in Config copies and/or Logs. If the diskgroup is deported before the failure is corrected, when detached disk come back online, it might be auto imported, in the process old/stale config/log overwrite new/clean config/log. This could lead to loss of configuration changes and data corruption (as the fact that disk was detached is lost and no recovery takes place). 4151066: Volumes with DRL logs created in a specific manner fails to start with error "Cannot get record from ". Under such circumstance it is not possible to any volume reconfiguration. 4103257: During a master takeover surviving node aborts due to a deadlock triggered by lack of resources (inadequate number of vxiod daemons). vxclust utility used to set vxiod count to 10, which will not suffice for a system with more than 10 shared diskgroups. Current approach is not to set it from vxclust and require Administrator to increase vxiod count in /etc/rcS.d/S85vxvm-startup2, if required. 4101872: Under certain circumstances, inconsistency of information between master and slave prevents disks from being added to a shared disk group. 4078677: Slave fails to take over as master (and aborts) if master does a snapshot and leaves the cluster. 4076902: vxtrace when invoked with "-o disk" option panics the node. 4066260: CVM gets hung in vol_klog_lock. klogging used to use the sio_flags field for non-sio related syncrhonization. KLOG_WANTED flag was overlapping VOLSIO_FLAG_WAIT_SIODONE. Now separate, klog-specific flags are used klogging. 4065499: When a fibre cable is pulled, causing many detaches and much activity, sometimes vold does not complete removing the failing disk from its disk group. As a result, when the fibre is plugged back in and vxreattach is run, the slave does not agree to replace the disk in the disk group. 4061879: On 2.5.1 system 'catman' command treats nroff font directives as part of command, breaking generation of windex. Font directives has been dropped from NAME section of above manual pages. 4061474: vxconfigd on slave dies if fiber cable is pulled on one of the nodes. 4044803: Vxconfigd has been modified to in addition to looking for ssa's attached to the system will look for the UltraCluster's failfast driver to be loaded. If it is then the cvm will be enabled. Also vxinstall needed to be modified because it also checked for the the presence of an ssa. 4025400: System panicing in module "vxio" vol_ack_get_mlock. The vol_ack_get_mlock function is no longer called or used in the cvm kernel. The locking mechanism was changed and going through this code path is no longer used. 4018956: A global lock was being released twice. 4009536: The system is not properly handling lock requests when servicing many cvm utilities. 4007143: The upgrade_finish script would look for a vxio file that was named with the sunos version. This was done to provide the capability to support the same vm when upgrading Solaris. This is the vxvm model but is not required for the cvm as we upgrade the entire release along with the OS. This was taken into account in the upgrade_start script and no vxio file with the os version was created. The upgrade_finish script was overlooked and was still looking for this file names which never existed. Patch Installation Instructions: -------------------------------- Refer to the Install.info file for instructions on using the generic 'installpatch' and 'backoutpatch' scripts provided with each patch. Any other special or non-generic installation instructions should be described below as special instructions. Special Install Instructions: ----------------------------- The system should be rebooted after the patch has been installed. NOTE: If you will be using more than 10 shared diskgroups increase argument to vxiod set (line 58, /etc/rcS.d/S85vxvm-startup2). Required minimum is number of diskgroup plus five. The fix for bug 4007143 consists of a modified version of the upgrade_finish script. This script is in the scripts directory of the patch. When the patch has been untarred, cd to the patch directory ( cd /tmp/patchid) and you will see along with the installpatch/backoutpatch files the scripts subdirectory. The proper procedure for using this updated upgrade_finish script is outlined as follows. This is a condensed version of the procedure defined in section 1.5.2.3 Upgrading to CVM Release 2.2, found in the Ultra Enterprise PDB Cluster Volume Manager Administration Guide-November 1996. NOTE - this procedure is only required if you have encapsulated your systems boot disk. - Mount the Ultra Enterprise PDB 1.2 cdrom. - Run the upgrade_start script from the directory /CVM/scripts. - Remove the PDB 1.0 or 1.1 packages by using the pdbinstall command. - Upgrade the operating system to Solaris 2.5.1. - Reboot the system so it is running the new OS. - Add the Ultra Enterprise PDB 1.2 packages. - Install patch 104347-07. - Complete the upgrade by running the new upgrade_finish script from the directory 104347-07/scripts. Two additional files are included with, but not installed by, this patch. They are: scripts/fixstartup scripts/upgrade_finish