Skip to content

WoooooW, I am chosen to be one of few to be an EMC Elect.

In November 2012 EMC announced a new recognition program called EMC Elect. This program provides community driven recognition to individuals who have demonstrated the highest levels of commitment to thought leadership in the fields of information management, Cloud computing, and big data. The scope of this award is unprecedented within EMC. The EMC Elect program has designed to span the gap between brand loyalty and brand advocacy. The idea of the program is similar to Microsoft MVP and VMware vExpert programs.

The top three reasons people were chosen for EMC Elect are:

  1. Regular engagement on the EMC Community Network
  2. Great conversations on Twitter
  3. Strong technical presence via blogs and at industry events

 

On 15th of January 2013, EMC announced the first 75 community members that were selected to the EMC Elect 2013 program by their contributions in the past year, and …… WOOOOWWW … I have been selected as one of those honored 75 members out of 270 nominated for this honor . I’m very pleased to be one of those 75 top contributors that were selected. Thanks to all my followers and other community members! Here is the official list of all members:

EMC Elect of 2013 – Official List

Besides myself I congratulate everyone who has been selected as an EMC Elect too, I am truly honored and humbled to be selected to the EMC Elect community and incredibly excited to further promote the EMC brand and to help others in the community along the way. Thanks EMC.

Troubleshooting media database corruption

 This post is about how to repair simple media database corruption, and media database compression

 

Symptoms & Error Messages

  • Media database corruption
  • Error: ‘media database must be scavenged’
  • Error: ‘nsrmmdbd: WISS error, database was not shutdown cleanly’
  • Index corruption
  • Error: ‘error on open of recfile for index [index-name]’
  • Error: ‘Cannot mount index [index-name]’
  • Error: ‘nothing in index at this time’
  • Unable to perform backup due to index corruption
  • Unable to perform recovery due to index corruption
  • Slow writing to tape
  • Error: ‘nsrmmdbd: WISS error: Unable to mount /nsr/mm/mmvolume6: bad database header’
  • Error: ‘nsrmmdbd: WARNING: clients file missing from /nsr/mm/mmvolume6’
  • Error: ‘nsrmmdbd: WISS error, database was not shutdown cleanly’
  • Error: ‘nsrd: nsrmmdbd has exited on signal 11’
  • Error: ‘nsrmmdbd: WISS error: invalid slot number’
  • Error: ‘nsrmon #(number): query resdb failed, err: Cancelled.’
  • Error: ‘nsrmmdbd: media db must be scavenged’
  • Error: ‘media database must be scavenged’
  • Error: ‘bad database header’
  • Error: ‘WISS Error’
  • Error: ‘nsrmmdbd: Error on close of volume index file #x (invalid file number)’
  • mmlocate loses locations of volumes when NetWorker daemons are restarted
  • Error: ‘save: error, media index problem’

Note: Not all of the Symptoms listed above will be found in all Cases. However, each is indicative of index corruption; therefore the fixes listed below are relevant to each situation.

The above noted error messages are typically the result of problems in the media database. Corruption in the media database could be caused by any number of events, such as hardware failure, or improperly shutting down NetWorker daemons. The general approach to resolving any fatal media database problem is to use the nsrim, and nsrck tools. If those tools did not solve the issue, so you have to proceed with scavenging the Media Database.

Follow the steps listed below to perform media database compression and consistency checks to correct possible media database corruption. You may also follow this procedure to simply compress the backup server’s indexes (Client File Index, and Media Database) as part of periodic system maintenance.

Media Database Consistency Check / Compression:

  1. Ensure that there are no running jobs (backup, recovery, cloning, etc…)
  2. Stop NetWorker daemons/services (NetWorker Backup and Recovery Server and NetWorker Remote Exec Service)
  3. Empty the files in the /nsr/tmp directory
  4. Delete /nsr/mm/cmprssd file  ( which will force NetWorker to compress the media db)
  5. Start NetWorker Services again (first start the NetWorker Remote Exec Service and then NetWorker Backup and Recovery Server)
  6. Run the following command line: nsrim -X
  7. Run the following command line: nsrck -L6

If the problem has not been solved with the above steps, proceed with the following procedures for Media Database Scavenging:

 

Media Database Scavenging:

  1. Make sure you have a bootstrap backup before commencing.
  2. Ensure that there are no running jobs (backup, recovery, cloning, etc…)
  3. Stop NetWorker daemons/services
  4. Cause a “controlled corruption” by removing:  /nsr/mm/mmvolume6/{*_i*,ss,vol}
  5. BE VERY CAREFUL WITH THIS STEP, Please read it all before proceeding. In /nsr/mm/mmvolume6 delete all files EXCEPT the 4 files (ss.0, vol.0, clients.0, ss.1, vol.0 , clients.0,ss.1, clients.1, vol.2, ss.2, clients.2, VolHdr ) You MUST keep those files.

NOTE:  Be careful when deleting the files and make sure not to delete the (vol.0, ss.0, clients.0, vol.1, ss.1, clients.1, vol.2, ss.2, clients.2, vol.3, ss.3, clients.3, etc.) These files are NEEDED to rebuild the media database. If you do (# rm ss*), this will delete ss.0 also and basically the media db is gone.

Note: this step is simply a way to force NetWorker to rebuild the media database. The files that are deleted are all rebuild-able from the media db file. The files mentioned above and cautioned to not delete them are not rebuild-able, NetWorker will not be able to rebuild your media db without these files (ss.0 and vol.0, volhdr, clients.0 , etc…) so take care well in this step.

6. Empty the files in the /nsr/tmp directory

7. Restart NetWorker daemons/service

In the situation where the corruption is severe and the above procedure did not resolve the media database corruption, the media database can be recovered from the bootstrap using the mmrecov command to get a good copy of the media database. NetWorker software attempts to generate a bootstrap every day, so no more than one day of data should be lost.

NetWorker 8 Client Access Feature

EMC NetWorker version 8 includes many features that make the solid backup solution even better. One of those is the Client Direct feature, also known as direct file access. This technology allows NetWorker client agents to backup directly to an advanced file type device (AFTD), or a Data Domain appliance using the DD Boost protocol.

Previously, no client backups could be written directly to AFTD devices and only the NetWorker Backup Server, Storage Nodes and specific database backup Modules could write directly to AFTD & DD Boost devices. The ability for client backup data to bypass the Storage Node and write directly over the IP network to Data Domain eliminates the Storage Node as a bottleneck during backups.

With DD Boost, in addition to client direct backup, Distributed Segment Processing (DSP) allows parts of the de-duplication process to be offloaded to the NetWorker Clients resulting in only new unique data segments being sent from clients to the Data Domain, dramatically reducing IP network traffic during backup. This will usually result in decreased I/O load on those clients during backup due to the lower processing requirements of data segmentation and fingerprinting versus having to read and transfer large volumes of data. This process would normally take place on the Storage Node, but with Client Direct enabled, the Storage Node simply manages the client without handling the backup workflow.  Unless of course Client Direct is not available, in which case the data is sent through the storage node to be deduplicated and then transferred to the Data Domain.

For Client Access feature Demo, Please check the following YouTube Video:

http://www.youtube.com/watch?v=EmoYfnyNQ2g

Hotfixes included in NMM 2.3 build 109

The cumulative hotfix for NetWorker Module for Microsoft Applications version 2.3 build 109 has been released. This cumulative hotfix includes the following new fixes:

NW131394 – Cannot do incremental backup of Public Folder Database on EXCH2010 DAG
https://solutions.emc.com/emcsolutionview.asp?id=esg123776

NW130259 – Restore of Exchange Database hangs.

NW131966 – NMM: Exchange 2010 – PopUp shows “recovery database is failed to created”

NW134775 – nsrsnap_vss_save.exe causes Dr. Watson on Windows 2003 Server for Exchange 2003 backups

NW135097 – Exchange log files are not getting encrypted.
https://solutions.emc.com/emcsolutionview.asp?id=esg127991

NW134883 – NMM 2.3 backup of DPM Replica Volume D: of protected client fails at 26 GB consistently

NW134814 – Exchange 2010 DAG -normal & RDB recovery fails with No cover-sets found. (Problem with text case of server name)

NW135893 – not able to create RDB . Getting error Not able to mount database

NW135291 – Exchange 2010 backup fails if there are any old JET errors in the event viewer

NW135648 – NMM24:SQL dbs at mount point locations are not getting encrypted
https://solutions.emc.com/emcsolutionview.asp?id=esg127991

NW135789 – Recovery of an Exchange 2010 database from DataDomain device hangs before completing

NW132245 – irccd.exe intermittently core dumps during file system backup with NMM 2.3

NW132515 – irccd.exe crashing and Exchange 2010 backups failing

NW133418 – irccd.exe crashing intermittently during Exchange 2010 backups.

NW137010 – Intermittent crashing while browsing after successful GLR recovery

This new package is available at: ftp://ftp.legato.com/pub/NetWorker/NMM/Cumulative_Hotfixes/2.3/

Cumulative Hotfixes included in 7.6.3.3 build 870

The cumulative hotfix for NetWorker version 7.6.3.3 build 870 has been released. This cumulative hotfix includes the following new fixes:

NW136987: NMC SQL Anywhere temp files has World Writable Permissions

https://solutions.emc.com/emcsolutionview.asp?id=esg127831

NW136447: DPA reports negative saveset size due to large number provided by networker

NW136785: License Mgr unable to allocate a Cluster client license as a traditional client connection license

NW131790: messages truncated in windows nsrwatch with Japanese Language pack installed

NW133576: Client connection licenses are no longer being used for virtual clients

https://solutions.emc.com/emcsolutionview.asp?id=esg127859

NW132605: Illegal date format pop up message when updating Expiration time in the Groups Resource via NMC https://solutions.emc.com/emcsolutionview.asp?id=esg125840

NW137085: Creating new devices by copy resets some attributes to default values

https://solutions.emc.com/emcsolutionview.asp?id=esg128102

NW127325: savepnpc:preclntsave executed twice

https://solutions.emc.com/emcsolutionview.asp?id=esg127920

NW136125: gstd cores after update to 7.6.2 on aix with powerpath installed

https://solutions.emc.com/emcsolutionview.asp?id=esg127828

NW136935: Concurrent Staging and Cloning : nsrclone loops forever even with:F option

NW136650: nsrvadp_save crashes when backing a VM with *FULL* that has millions of files with FLR enabled

https://solutions.emc.com/emcsolutionview.asp?id=esg128126

NW132964: lgtolm res corruption occures on license update in 7.6.2.4 prevents bootstrap backup

This build is available now at:

ftp://ftp.legato.com/pub/NetWorker/Cumulative_Hotfixes/7.6

Technical Alert: Avamar VMware image backups created with changed block tracking (CBT) enabled might not be able to be restored.

If you are using Avamar to backup VMware with Change Block Tracking (CBT) enabled,You have to check the following EMC Technical Alert , where under specific , limited circumstances, Avamar VMware image backups created with changed block tracking (CBT) enabled might not complete or may not be able to be restored either through the file level recovery (FLR) functionality or as a full image restore.

For details, please check the following article:

http://solutions.emc.com/emcsolutionview.asp?id=esg127567

RPC error: RPC send operation failed; errno = An existing connection was forcibly closed by the remote host. 74209:save: Quit signal received.

RPC error: RPC send operation failed; errno = An existing connection was forcibly closed by the remote host.   74209:save: Quit signal received.

This error message mainly either Firewall issue or TCP timeout problem, you will need to test the connectivity from the backup server to the client and vice versa using the following command:

Note: in each command use the FQDN and the Short Name of the machine. And you will use those commands in two ways communications ( Server > client and client > Server)

1-      Ping

2-      nslookup

3-      rpcinfo -p  MACHINENAME

4-      nsradmin -v1 -s CLIENTNAME –p390113   ( This is from backup server to client machine)

If communication is good, check the ability to telnet your client from the backup server using 7937 and 7938 ports respectively.

If all of these checks work correctly, and you still receiving that RPC error message, so it is TCP timeout issue. To avoid any TCP timeout issue, I would suggest the following:

1-      Change the group properties, and set “Client Inactivity Timeout” to “0”

2-      Setting Keepalives to Prevent Timeouts:

               https://solutions.emc.com/emcsolutionview.asp?id=esg91994

If you are using Windows operating systems for the Backup server and client, use the following recommendations as well:

1-      Disable TCP Chimney on Server and Client

               https://solutions.emc.com/emcsolutionview.asp?id=esg118437

2-      Set TCP/IP keep alive settings at OS level.

              https://solutions.emc.com/emcsolutionview.asp?id=esg60759

Also there are some recommendations by Microsoft TCP/IP tuning to avoid any TCP timeout issue. You need to apply those recommendations on the backup server and the client as well.

I would suggest to apply these changed on the registry for Windows boxes to avoid any TCP/IP timeout:

Note: You will have to reboot the machines after applying those changes in order to take effects.

Create the following DWORD values in registry.

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\TcpWindowSize=256000

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\GlobalMaxTcpWindowSize=16777216

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\KeepAliveInterval=1000

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\KeepAliveTime=600000

Follow

Get every new post delivered to your Inbox.

Join 30 other followers