Dell EMC NetWorker Troubleshooting

January 20, 2013

WoooooW, I am chosen to be one of few to be an EMC Elect.

In November 2012 EMC announced a new recognition program called EMC Elect. This program provides community driven recognition to individuals who have demonstrated the highest levels of commitment to thought leadership in the fields of information management, Cloud computing, and big data. The scope of this award is unprecedented within EMC. The EMC Elect program has designed to span the gap between brand loyalty and brand advocacy. The idea of the program is similar to Microsoft MVP and VMware vExpert programs.

The top three reasons people were chosen for EMC Elect are:

Regular engagement on the EMC Community Network
Great conversations on Twitter
Strong technical presence via blogs and at industry events

On 15th of January 2013, EMC announced the first 75 community members that were selected to the EMC Elect 2013 program by their contributions in the past year, and …… WOOOOWWW … I have been selected as one of those honored 75 members out of 270 nominated for this honor . I’m very pleased to be one of those 75 top contributors that were selected. Thanks to all my followers and other community members! Here is the official list of all members:

EMC Elect of 2013 – Official List

Besides myself I congratulate everyone who has been selected as an EMC Elect too, I am truly honored and humbled to be selected to the EMC Elect community and incredibly excited to further promote the EMC brand and to help others in the community along the way. Thanks EMC.

November 8, 2012

Troubleshooting media database corruption

This post is about how to repair simple media database corruption, and media database compression

Symptoms & Error Messages

Media database corruption
Error: ‘media database must be scavenged’
Error: ‘nsrmmdbd: WISS error, database was not shutdown cleanly’
Index corruption
Error: ‘error on open of recfile for index [index-name]’
Error: ‘Cannot mount index [index-name]’
Error: ‘nothing in index at this time’
Unable to perform backup due to index corruption
Unable to perform recovery due to index corruption
Slow writing to tape
Error: ‘nsrmmdbd: WISS error: Unable to mount /nsr/mm/mmvolume6: bad database header’
Error: ‘nsrmmdbd: WARNING: clients file missing from /nsr/mm/mmvolume6’
Error: ‘nsrmmdbd: WISS error, database was not shutdown cleanly’
Error: ‘nsrd: nsrmmdbd has exited on signal 11’
Error: ‘nsrmmdbd: WISS error: invalid slot number’
Error: ‘nsrmon #(number): query resdb failed, err: Cancelled.’
Error: ‘nsrmmdbd: media db must be scavenged’
Error: ‘media database must be scavenged’
Error: ‘bad database header’
Error: ‘WISS Error’
Error: ‘nsrmmdbd: Error on close of volume index file #x (invalid file number)’
mmlocate loses locations of volumes when NetWorker daemons are restarted
Error: ‘save: error, media index problem’

Note: Not all of the Symptoms listed above will be found in all Cases. However, each is indicative of index corruption; therefore the fixes listed below are relevant to each situation.

The above noted error messages are typically the result of problems in the media database. Corruption in the media database could be caused by any number of events, such as hardware failure, or improperly shutting down NetWorker daemons. The general approach to resolving any fatal media database problem is to use the nsrim, and nsrck tools. If those tools did not solve the issue, so you have to proceed with scavenging the Media Database.

Follow the steps listed below to perform media database compression and consistency checks to correct possible media database corruption. You may also follow this procedure to simply compress the backup server’s indexes (Client File Index, and Media Database) as part of periodic system maintenance.

Media Database Consistency Check / Compression:

Ensure that there are no running jobs (backup, recovery, cloning, etc…)
Stop NetWorker daemons/services (NetWorker Backup and Recovery Server and NetWorker Remote Exec Service)
Empty the files in the /nsr/tmp directory
Delete /nsr/mm/cmprssd file ( which will force NetWorker to compress the media db)
Start NetWorker Services again (first start the NetWorker Remote Exec Service and then NetWorker Backup and Recovery Server)
Run the following command line: nsrim -X
Run the following command line: nsrck -L6

If the problem has not been solved with the above steps, proceed with the following procedures for Media Database Scavenging:

Media Database Scavenging:

Make sure you have a bootstrap backup before commencing.
Ensure that there are no running jobs (backup, recovery, cloning, etc…)
Stop NetWorker daemons/services
Cause a “controlled corruption” by removing: /nsr/mm/mmvolume6/{*_i*,ss,vol}
BE VERY CAREFUL WITH THIS STEP, Please read it all before proceeding. In /nsr/mm/mmvolume6 delete all files EXCEPT the 4 files (ss.0, vol.0, clients.0, ss.1, vol.0 , clients.0,ss.1, clients.1, vol.2, ss.2, clients.2, VolHdr ) You MUST keep those files.

NOTE: Be careful when deleting the files and make sure not to delete the (vol.0, ss.0, clients.0, vol.1, ss.1, clients.1, vol.2, ss.2, clients.2, vol.3, ss.3, clients.3, etc.) These files are NEEDED to rebuild the media database. If you do (# rm ss*), this will delete ss.0 also and basically the media db is gone.

Note: this step is simply a way to force NetWorker to rebuild the media database. The files that are deleted are all rebuild-able from the media db file. The files mentioned above and cautioned to not delete them are not rebuild-able, NetWorker will not be able to rebuild your media db without these files (ss.0 and vol.0, volhdr, clients.0 , etc…) so take care well in this step.

6. Empty the files in the /nsr/tmp directory

7. Restart NetWorker daemons/service

In the situation where the corruption is severe and the above procedure did not resolve the media database corruption, the media database can be recovered from the bootstrap using the mmrecov command to get a good copy of the media database. NetWorker software attempts to generate a bootstrap every day, so no more than one day of data should be lost.

November 5, 2012

NetWorker 8 Client Access Feature

EMC NetWorker version 8 includes many features that make the solid backup solution even better. One of those is the Client Direct feature, also known as direct file access. This technology allows NetWorker client agents to backup directly to an advanced file type device (AFTD), or a Data Domain appliance using the DD Boost protocol.

Previously, no client backups could be written directly to AFTD devices and only the NetWorker Backup Server, Storage Nodes and specific database backup Modules could write directly to AFTD & DD Boost devices. The ability for client backup data to bypass the Storage Node and write directly over the IP network to Data Domain eliminates the Storage Node as a bottleneck during backups.

With DD Boost, in addition to client direct backup, Distributed Segment Processing (DSP) allows parts of the de-duplication process to be offloaded to the NetWorker Clients resulting in only new unique data segments being sent from clients to the Data Domain, dramatically reducing IP network traffic during backup. This will usually result in decreased I/O load on those clients during backup due to the lower processing requirements of data segmentation and fingerprinting versus having to read and transfer large volumes of data. This process would normally take place on the Storage Node, but with Client Direct enabled, the Storage Node simply manages the client without handling the backup workflow. Unless of course Client Direct is not available, in which case the data is sent through the storage node to be deduplicated and then transferred to the Data Domain.

For Client Access feature Demo, Please check the following YouTube Video:

http://www.youtube.com/watch?v=EmoYfnyNQ2g

April 30, 2012

Hotfixes included in NMM 2.3 build 109

The cumulative hotfix for NetWorker Module for Microsoft Applications version 2.3 build 109 has been released. This cumulative hotfix includes the following new fixes:

NW131394 – Cannot do incremental backup of Public Folder Database on EXCH2010 DAG
https://solutions.emc.com/emcsolutionview.asp?id=esg123776

NW130259 – Restore of Exchange Database hangs.

NW131966 – NMM: Exchange 2010 – PopUp shows “recovery database is failed to created”

NW134775 – nsrsnap_vss_save.exe causes Dr. Watson on Windows 2003 Server for Exchange 2003 backups

NW135097 – Exchange log files are not getting encrypted.
https://solutions.emc.com/emcsolutionview.asp?id=esg127991

NW134883 – NMM 2.3 backup of DPM Replica Volume D: of protected client fails at 26 GB consistently

NW134814 – Exchange 2010 DAG -normal & RDB recovery fails with No cover-sets found. (Problem with text case of server name)

NW135893 – not able to create RDB . Getting error Not able to mount database

NW135291 – Exchange 2010 backup fails if there are any old JET errors in the event viewer

NW135648 – NMM24:SQL dbs at mount point locations are not getting encrypted
https://solutions.emc.com/emcsolutionview.asp?id=esg127991

NW135789 – Recovery of an Exchange 2010 database from DataDomain device hangs before completing

NW132245 – irccd.exe intermittently core dumps during file system backup with NMM 2.3

NW132515 – irccd.exe crashing and Exchange 2010 backups failing

NW133418 – irccd.exe crashing intermittently during Exchange 2010 backups.

NW137010 – Intermittent crashing while browsing after successful GLR recovery

This new package is available at: ftp://ftp.legato.com/pub/NetWorker/NMM/Cumulative_Hotfixes/2.3/

April 2, 2012

Cumulative Hotfixes included in 7.6.3.3 build 870

The cumulative hotfix for NetWorker version 7.6.3.3 build 870 has been released. This cumulative hotfix includes the following new fixes:

NW136987: NMC SQL Anywhere temp files has World Writable Permissions

https://solutions.emc.com/emcsolutionview.asp?id=esg127831

NW136447: DPA reports negative saveset size due to large number provided by networker

NW136785: License Mgr unable to allocate a Cluster client license as a traditional client connection license

NW131790: messages truncated in windows nsrwatch with Japanese Language pack installed

NW133576: Client connection licenses are no longer being used for virtual clients

https://solutions.emc.com/emcsolutionview.asp?id=esg127859

NW132605: Illegal date format pop up message when updating Expiration time in the Groups Resource via NMC https://solutions.emc.com/emcsolutionview.asp?id=esg125840

NW137085: Creating new devices by copy resets some attributes to default values

https://solutions.emc.com/emcsolutionview.asp?id=esg128102

NW127325: savepnpc:preclntsave executed twice

https://solutions.emc.com/emcsolutionview.asp?id=esg127920

NW136125: gstd cores after update to 7.6.2 on aix with powerpath installed

https://solutions.emc.com/emcsolutionview.asp?id=esg127828

NW136935: Concurrent Staging and Cloning : nsrclone loops forever even with:F option

NW136650: nsrvadp_save crashes when backing a VM with *FULL* that has millions of files with FLR enabled

https://solutions.emc.com/emcsolutionview.asp?id=esg128126

NW132964: lgtolm res corruption occures on license update in 7.6.2.4 prevents bootstrap backup

This build is available now at:

ftp://ftp.legato.com/pub/NetWorker/Cumulative_Hotfixes/7.6

March 20, 2012

Technical Alert: Avamar VMware image backups created with changed block tracking (CBT) enabled might not be able to be restored.

If you are using Avamar to backup VMware with Change Block Tracking (CBT) enabled,You have to check the following EMC Technical Alert , where under specific , limited circumstances, Avamar VMware image backups created with changed block tracking (CBT) enabled might not complete or may not be able to be restored either through the file level recovery (FLR) functionality or as a full image restore.

For details, please check the following article:

http://solutions.emc.com/emcsolutionview.asp?id=esg127567

March 19, 2012

RPC error: RPC send operation failed; errno = An existing connection was forcibly closed by the remote host. 74209:save: Quit signal received.

RPC error: RPC send operation failed; errno = An existing connection was forcibly closed by the remote host. 74209:save: Quit signal received.

This error message mainly either Firewall issue or TCP timeout problem, you will need to test the connectivity from the backup server to the client and vice versa using the following command:

Note: in each command use the FQDN and the Short Name of the machine. And you will use those commands in two ways communications ( Server > client and client > Server)

1- Ping

2- nslookup

3- rpcinfo -p MACHINENAME

4- nsradmin -v1 -s CLIENTNAME –p390113 ( This is from backup server to client machine)

If communication is good, check the ability to telnet your client from the backup server using 7937 and 7938 ports respectively.

If all of these checks work correctly, and you still receiving that RPC error message, so it is TCP timeout issue. To avoid any TCP timeout issue, I would suggest the following:

1- Change the group properties, and set “Client Inactivity Timeout” to “0”

2- Setting Keepalives to Prevent Timeouts:

https://solutions.emc.com/emcsolutionview.asp?id=esg91994

If you are using Windows operating systems for the Backup server and client, use the following recommendations as well:

1- Disable TCP Chimney on Server and Client

https://solutions.emc.com/emcsolutionview.asp?id=esg118437

2- Set TCP/IP keep alive settings at OS level.

https://solutions.emc.com/emcsolutionview.asp?id=esg60759

Also there are some recommendations by Microsoft TCP/IP tuning to avoid any TCP timeout issue. You need to apply those recommendations on the backup server and the client as well.

I would suggest to apply these changed on the registry for Windows boxes to avoid any TCP/IP timeout:

Note: You will have to reboot the machines after applying those changes in order to take effects.

Create the following DWORD values in registry.

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\TcpWindowSize=256000

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\GlobalMaxTcpWindowSize=16777216

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\KeepAliveInterval=1000

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\KeepAliveTime=600000

March 9, 2012

42619:nsrndmp_recover: NDMP Service Error: Unable to perform a single file restore on a deduplicated file. Perform a full VBB restore on a separate file system and copy the deduplicated files that you want.”

Celerra NDMP VBB (NVB) file-by-file recovery of deduplicated file system fails.

NDMP File system being backed up with VBB (aka NVB) is deduplicated using Celerra native Deduplication, while recovering those NDMP backup data using file-by-file recovery method, file names restored but files contain 0 bytes. Recovery fails with the error message:

“42619:nsrndmp_recover: NDMP Service Error: Unable to perform a single file restore on a deduplicated file. Perform a full VBB restore on a separate file system and copy the deduplicated files that you want.”

NDMP Volume backup (NVB) “which also referred to as Volume Based Backup (VBB)” can backup up Celerra Data Deduplication-enabled file systems and restore them in full by using the full destructive restore method, Because NVB operates at the block level (while preserving the history of the files it backs up), it does not cause any data reduplication when backing up a deduplicated file system. The data in the file system is backed up in its reduced form. This means that the benefits of the storage efficiency realized in the production file system flow through to backups.

However, Celerra does not support a single-file or file-by-file restore of deduplicated files from NVB backups. Hence, EMC recommends that NVB backups of deduplicated file systems should be used as part of a strategy where a single-file or file-by-file restore is done from locally or remotely replicated SnapSure checkpoints and not from tape. Because the majority of file restores happen within the first few days after their deletion, a SnapSure checkpoint is an efficient and faster way to restore most files.

The only possible way to recover Celerra VBB (NVB) backups of a deduplicated file system is to perform a Celerra Full Destructive Restore (FDR) which requires a complete save set recovery into a raw volume. The original file system of which the backup was done must be converted into a raw volume “which will destroy the production file system”, OR, another raw volume of equal or larger size must exist or be created on the Celerra to direct the save set recovery into.

March 7, 2012

Considerations & Limitations for Shared Advanced File Type Device (AFTD)

In a Network Attached Storage (NAS) environment, NetWorker operations can be performed concurrently on two storage nodes that share an AFTD. When sharing an AFTD, one storage node can save to a writable volume, while the other storage node either recovers or clones from a read-only volume.

In a Storage Area Network (SAN), NetWorker operations are performed sequentially when two storage nodes share an AFTD. Only one storage node at a time can use the shared AFTD.

So you have to check the following considerations and limitations before implementing this Shared AFTD device:

Shared AFTD considerations:

Review these considerations before sharing AFTDs:

Ensure that operating system permissions (directory and file) and sharing are set up properly between storage node hosts for the root user or Windows administrator, to enable proper sharing of AFTDs on the file system.
Use operating system commands to create, copy, or erase directories or files on the sharing disk (such as NAS, JBOD in SAN), to ensure that sharing is possible at file system level between machines. If the operating system does not permit such sharing, then the NetWorker software cannot change that.

To share an AFTD between storage nodes, ensure to:
- Set the proper Storage Node and Clone Storage Node attributes in the Client resource.
- Specify for the Staging resource, the proper Device attribute (the one with the read-only volume mounted).

To share an AFTD in NAS on Windows storage nodes, ensure to:
- Start the nsrexecd (NetWorker Remote Exec) service is started by a Windows Administrator account on the storage node.
- Create the CIFS-mapped AFTD with UNC pathnames that have the appropriate remote user and password specified. An example of UNC path syntax is: rd=sn_a:\\nas1\path\shared_aftd

Shared AFTD limitations

Shared AFTDs include these limitations:

Read-only volumes might be auto-mounted onto the writing storage node during saves. The workaround is to un-mount all sharing instances of read-only and read-write AFTD volumes from all storage nodes, and then correctly remount them.

Limitations on sharing AFTD between storage nodes in NAS/SAN:
- Supports only homogeneous storage node platforms when sharing AFTD (such as Windows with Windows storage nodes, or UNIX with UNIX storage nodes).

The sharing read-write or read-only AFTD volume must be manually un-mounted from one storage node before being mounted on another, in order to prevent a potential out-of-sync state for the volume.

All instances of the sharing read-write or read-only AFTD volume must be manually un-mounted from all of the sharing storage nodes before relabeling the sharing AFTD, in order to prevent potential data loss.

For SAN only:

One storage node at a time can perform NetWorker operations.

Use operating system or SAN commands to mount or enable the sharing disk on the second storage node after un-mounting or disabling from the first storage node. This ensures that file and directory sharing of the same disk is supported (set up) in the SAN from the sharing storage nodes, allowing the sequential sharing of the disk as an AFTD in the NetWorker software.

March 7, 2012

The Supported and Not Supported in NMM & Data Domain Integration

To Integrate NetWorker Module for Microsoft Application with Data Domain, Please make sure that following integration prerequisites are met:

Integration requirements:

The NMM Data Domain integration requires the following software:

– NetWorker server 7.6 SP1 or later

– NetWorker storage node 7.6 SP1 or later

– NetWorker client 7.6 SP1 or later

– Data Domain Appliance with Data Domain OS version supported by NetWorker client installed on NMM 2.3

– Data Domain OS 4.8 or later for DD Boost functionality

Supported operating systems:

The supported operating systems are as follows:

– Windows 2003 (x86, x64)

– Windows 2003 R2 (x86, x64)

– Windows 2008 SP2 (x86, x64)

– Windows 2008 R2 (x64)

The Data Domain Boost on NMM client is supported for the following Microsoft applications:

– Exchange Server 2010 (x64)

– SharePoint Server 2010

– SQL Server 2008 R2 (x64)

Data Domain Boost on NMM client is NOT supported for the following Microsoft applications:

– Exchange Server 2003

– Exchange Server 2007

– SQL Server 2005

– SQL Server 2008

– SharePoint Server 2007

– Active Directory

– DPM Server

– Hyper-V