dslreports logo
site
 
    All Forums Hot Topics Gallery
spc

spacer




how-to block ads


Search Topic:
uniqs
2552
share rss forum feed


USR56K

join:2000-05-20
Lynnwood, WA

Tivoli TSM - delay in backup

Environment:
TSM client: v6.4r0 Windows 2008 R2 SP1.
TSM Server: v6.2r4, AIX

Problem:
During the nightly backups of severs, there is often a 2-8 hour gap during the backup process. For example:

<snip>
02/11/2013 00:33:08 ANS1898I ***** Processed    30,000 files *****
02/11/2013 00:33:20 Normal File-->        20,975,616 \\SERVERNAME\c$\Windows\System32\winevt\Logs\Security.evtx [Sent]      
02/11/2013 00:33:20 Successful incremental backup of '\\SERVERNAME\c$'
 
02/11/2013 07:50:12 Directory-->                   0 \\?\volume{6fc354c4-f72f-11df-83a1-806e6f6e6963}\Boot [Sent]      
02/11/2013 07:50:12 Normal File-->            28,672 \\?\volume{6fc354c4-f72f-11df-83a1-806e6f6e6963}\Boot\BCD [Sent]   
<snip>
 

That blank line between 00:33 and 07:50 is exactly how it appears in the log file too. This random delay, occurs on multiple of our servers. Upgrading the TSM client from v6.3 to v6.4 did not resolve this issue. The TSM server was rebooted last week, which also did not help. Nothing stands out in the TSM log files or windows event logs for the culprit.

Thoughts on the problem?

--
If it's not on Google, then it doesn't exist.

**DC++ FAQ**


dennismurphy
Put me on hold? I'll put YOU on hold
Premium
join:2002-11-19
Parsippany, NJ
kudos:3
Reviews:
·Verizon FiOS
said by USR56K:

Environment:
TSM client: v6.4r0 Windows 2008 R2 SP1.
TSM Server: v6.2r4, AIX

Problem:
During the nightly backups of severs, there is often a 2-8 hour gap during the backup process. For example:

<snip>
02/11/2013 00:33:08 ANS1898I ***** Processed    30,000 files *****
02/11/2013 00:33:20 Normal File-->        20,975,616 \\SERVERNAME\c$\Windows\System32\winevt\Logs\Security.evtx [Sent]      
02/11/2013 00:33:20 Successful incremental backup of '\\SERVERNAME\c$'
 
02/11/2013 07:50:12 Directory-->                   0 \\?\volume{6fc354c4-f72f-11df-83a1-806e6f6e6963}\Boot [Sent]      
02/11/2013 07:50:12 Normal File-->            28,672 \\?\volume{6fc354c4-f72f-11df-83a1-806e6f6e6963}\Boot\BCD [Sent]   
<snip>
 

That blank line between 00:33 and 07:50 is exactly how it appears in the log file too. This random delay, occurs on multiple of our servers. Upgrading the TSM client from v6.3 to v6.4 did not resolve this issue. The TSM server was rebooted last week, which also did not help. Nothing stands out in the TSM log files or windows event logs for the culprit.

Thoughts on the problem?

Anything on the server log? Are you waiting on a mountpoint? Operator intervention?

What's your storage pool look like?

TSM is waaaaay too complex to troubleshoot anything from 'just' the client log. :)


USR56K

join:2000-05-20
Lynnwood, WA
said by dennismurphy:

Anything on the server log? Are you waiting on a mountpoint? Operator intervention?

And this is where my problem really lies. The team who is in charge of managing the TSM infrastructure either doesn't care or have the expertise into troubleshooting into this problem for the last 6 months... which is why I've turned to the Internet for tips on troubleshooting as I'm fed up on waiting.

Since this problem is occurring on many of our TSM clients, would it be safe to assume the delay must reside with the TSM server?
--
If it's not on Google, then it doesn't exist.

**DC++ FAQ**


dennismurphy
Put me on hold? I'll put YOU on hold
Premium
join:2002-11-19
Parsippany, NJ
kudos:3
Reviews:
·Verizon FiOS
I would say, almost certainly.

Sounds like your clients are contending for mount points or available tape; do you know if the backups are staged to a disk pool or direct to tape?

Given the delays are several hours in length, could be waiting for an operator to mount a tape or somesuch.

Edit: I've been a TSM admin in a previous life. Great tool, but complex.


USR56K

join:2000-05-20
Lynnwood, WA
Thanks for the pointers. I'll poke the TSM admin with the questions and see if it helps to lead them somewhere to solve the problem.
--
If it's not on Google, then it doesn't exist.

**DC++ FAQ**

HELLFIRE
Premium
join:2009-11-25
kudos:18
reply to dennismurphy
Got any other hints and tricks from your TSM days you'd be willing to share dennismurphy See Profile?

I work in network operations, and one of my pet peeve tickets is one that starts with the client log that USR56K See Profile
posted and the comment "please check for network disconnects or slowdowns for this timeperiod."

[insert long string of four-letter expletives here]

Regards


USR56K

join:2000-05-20
Lynnwood, WA
reply to USR56K
Got a copy of the server logs.

Date/Time Message 
2/11/13 11:56:26 PM GMT-08:00 ANR2561I Schedule prompter contacting SERVERNAME (session 78376) to start a scheduled operation. (SESSION: 155) 
2/11/13 11:57:09 PM GMT-08:00 ANR0403I Session 78376 ended for node SERVERNAME (). (SESSION: 155) 
2/11/13 11:57:18 PM GMT-08:00 ANR0406I Session 78403 started for node SERVERNAME (WinNT) (Tcp/Ip 128.X.X.X(62089)). (SESSION: 78403) 
2/11/13 11:57:18 PM GMT-08:00 ANR0403I Session 78403 ended for node SERVERNAME (WinNT). (SESSION: 78403) 
2/11/13 11:57:25 PM GMT-08:00 ANR0406I Session 78405 started for node SERVERNAME (WinNT) (Tcp/Ip 128.X.X.X(62101)). (SESSION: 78405) 
2/11/13 11:57:39 PM GMT-08:00 ANR0406I Session 78408 started for node SERVERNAME (WinNT) (Tcp/Ip 128.X.X.X(62115)). (SESSION: 78408) 
2/11/13 11:58:26 PM GMT-08:00 ANE4940I (Session: 78405, Node: SERVERNAME) Backing up object &apos;ASR Writer&apos; component &apos;ASR&apos; using shadow copy.(SESSION: 78405) 
2/11/13 11:58:26 PM GMT-08:00 ANE4940I (Session: 78405, Node: SERVERNAME) Backing up object &apos;ASR Writer&apos; component &apos;This is the path to the boot BCD store and the boot managers.All the files in this directory need to be backed up.&apos; using shadow copy.(SESSION: 78405) 
2/11/13 11:58:26 PM GMT-08:00 ANE4940I (Session: 78405, Node: SERVERNAME) Backing up object &apos;ASR Writer&apos; component &apos;Select this volume (\??\Volume{6fc354c4-f72f-11df-83a1-806e6f6e6963}) if it is a critical volume.Critical volumes are those which ASR must restore.&apos; using shadow copy.(SESSION: 78405) 
2/11/13 11:58:26 PM GMT-08:00 ANE4940I (Session: 78405, Node: SERVERNAME) Backing up object &apos;ASR Writer&apos; component &apos;Select this volume (C:) if it is a critical volume.Critical volumes are those which ASR must restore.&apos; using shadow copy.(SESSION: 78405) 
2/11/13 11:58:27 PM GMT-08:00 ANE4940I (Session: 78405, Node: SERVERNAME) Backing up object &apos;ASR Writer&apos; component &apos;Select to include disk #0 (of signature 0x75a30edb) for ASR restore.&apos; using shadow copy.(SESSION: 78405) 
2/11/13 11:58:27 PM GMT-08:00 ANE4940I (Session: 78405, Node: SERVERNAME) Backing up object &apos;ASR Writer&apos; component &apos;Select to include disk #1 (of signature 0x99fd3133) for ASR restore.&apos; using shadow copy.(SESSION: 78405) 
2/11/13 11:58:27 PM GMT-08:00 ANE4940I (Session: 78405, Node: SERVERNAME) Backing up object &apos;COM+ REGDB Writer&apos; component &apos;COM+ REGDB&apos; using shadow copy.(SESSION: 78405) 
2/11/13 11:58:27 PM GMT-08:00 ANE4940I (Session: 78405, Node: SERVERNAME) Backing up object &apos;Performance Counters Writer&apos; component &apos;Performance Counters&apos; using shadow copy.(SESSION: 78405) 
2/11/13 11:58:27 PM GMT-08:00 ANE4940I (Session: 78405, Node: SERVERNAME) Backing up object &apos;Registry Writer&apos; component &apos;Registry&apos; using shadow copy.(SESSION: 78405) 
2/11/13 11:58:27 PM GMT-08:00 ANE4940I (Session: 78405, Node: SERVERNAME) Backing up object &apos;System Writer&apos; component &apos;System Files&apos; using shadow copy.(SESSION: 78405) 
2/11/13 11:58:58 PM GMT-08:00 ANE4940I (Session: 78405, Node: SERVERNAME) Backing up object &apos;Task Scheduler Writer&apos; component &apos;Tasks Store&apos; using shadow copy.(SESSION: 78405) 
2/11/13 11:58:58 PM GMT-08:00 ANE4940I (Session: 78405, Node: SERVERNAME) Backing up object &apos;VSS Metadata Store Writer&apos; component &apos;VSS Express Writer Metadata Store&apos; using shadow copy.(SESSION: 78405) 
2/11/13 11:58:58 PM GMT-08:00 ANE4940I (Session: 78405, Node: SERVERNAME) Backing up object &apos;IIS Config Writer&apos; component &apos;IISCONFIG&apos; using shadow copy.(SESSION: 78405) 
2/11/13 11:58:58 PM GMT-08:00 ANE4940I (Session: 78405, Node: SERVERNAME) Backing up object &apos;IIS Metabase Writer&apos; component &apos;IISMETABASE&apos; using shadow copy.(SESSION: 78405) 
2/11/13 11:58:58 PM GMT-08:00 ANE4940I (Session: 78405, Node: SERVERNAME) Backing up object &apos;WMI Writer&apos; component &apos;Windows Managment Instrumentation&apos; using shadow copy.(SESSION: 78405) 
2/12/13 12:00:21 AM GMT-08:00 ANR0406I Session 78445 started for node SERVERNAME (WinNT) (Tcp/Ip 128.X.X.X(62169)). (SESSION: 78445) 
2/12/13 12:00:46 AM GMT-08:00 ANR0406I Session 78448 started for node SERVERNAME (WinNT) (Tcp/Ip 128.X.X.X(62171)). (SESSION: 78448) 
2/12/13 12:02:05 AM GMT-08:00 ANR0403I Session 78445 ended for node SERVERNAME (WinNT). (SESSION: 78445) 
2/12/13 12:02:32 AM GMT-08:00 ANR0403I Session 78448 ended for node SERVERNAME (WinNT). (SESSION: 78448) 
2/12/13 12:32:41 AM GMT-08:00 ANR0482W Session 78405 for node SERVERNAME (WinNT) terminated - idle for more than 30 minutes. (SESSION: 78405) 
2/12/13 6:48:40 AM GMT-08:00 ANR0406I Session 81625 started for node SERVERNAME (WinNT) (Tcp/Ip 128.X.X.X(58701)). (SESSION: 81625) 
2/12/13 6:48:49 AM GMT-08:00 ANR0406I Session 81626 started for node SERVERNAME (WinNT) (Tcp/Ip 128.X.X.X(58703)). (SESSION: 81626) 
2/12/13 6:49:20 AM GMT-08:00 ANR0403I Session 81625 ended for node SERVERNAME (WinNT). (SESSION: 81625) 
2/12/13 7:18:31 AM GMT-08:00 ANR0406I Session 81783 started for node SERVERNAME (WinNT) (Tcp/Ip 128.X.X.X(59537)). (SESSION: 81783) 
2/12/13 7:18:31 AM GMT-08:00 ANR0403I Session 78408 ended for node SERVERNAME (WinNT). (SESSION: 78408) 
2/12/13 7:18:36 AM GMT-08:00 ANR0403I Session 81783 ended for node SERVERNAME (WinNT). (SESSION: 81783) 
2/12/13 7:19:07 AM GMT-08:00 ANE4941I (Session: 81626, Node: SERVERNAME) Backup of object &apos;SystemState&apos; component &apos;System State&apos; finished successfully. (SESSION: 81626) 
2/12/13 7:19:08 AM GMT-08:00 ANR0403I Session 81626 ended for node SERVERNAME (WinNT). (SESSION: 81626) 
2/12/13 7:19:15 AM GMT-08:00 ANR0406I Session 81786 started for node SERVERNAME (WinNT) (Tcp/Ip 128.X.X.X(59548)). (SESSION: 81786) 
2/12/13 7:19:15 AM GMT-08:00 ANE4952I (Session: 81786, Node: SERVERNAME) Total number of objects inspected: 99,188 (SESSION: 81786) 
2/12/13 7:19:15 AM GMT-08:00 ANE4951I (Session: 81786, Node: SERVERNAME) Total number of objects assigned: 68,714 (SESSION: 81786) 
2/12/13 7:19:15 AM GMT-08:00 ANE4954I (Session: 81786, Node: SERVERNAME) Total number of objects backed up: 817 (SESSION: 81786) 
2/12/13 7:19:15 AM GMT-08:00 ANE4958I (Session: 81786, Node: SERVERNAME) Total number of objects updated: 0 (SESSION: 81786) 
2/12/13 7:19:15 AM GMT-08:00 ANE4960I (Session: 81786, Node: SERVERNAME) Total number of objects rebound: 0 (SESSION: 81786) 
2/12/13 7:19:15 AM GMT-08:00 ANE4957I (Session: 81786, Node: SERVERNAME) Total number of objects deleted: 0 (SESSION: 81786) 
2/12/13 7:19:15 AM GMT-08:00 ANE4970I (Session: 81786, Node: SERVERNAME) Total number of objects expired: 9 (SESSION: 81786) 
2/12/13 7:19:15 AM GMT-08:00 ANE4959I (Session: 81786, Node: SERVERNAME) Total number of objects failed: 0 (SESSION: 81786) 
2/12/13 7:19:15 AM GMT-08:00 ANE4965I (Session: 81786, Node: SERVERNAME) Total number of subfile objects: 0 (SESSION: 81786) 
2/12/13 7:19:15 AM GMT-08:00 ANE4977I (Session: 81786, Node: SERVERNAME) Total number of bytes inspected: 20.00 GB (SESSION: 81786) 
2/12/13 7:19:15 AM GMT-08:00 ANE4961I (Session: 81786, Node: SERVERNAME) Total number of bytes transferred: 435.97 MB (SESSION: 81786) 
2/12/13 7:19:15 AM GMT-08:00 ANE4963I (Session: 81786, Node: SERVERNAME) Data transfer time: 30.56 sec (SESSION: 81786) 
2/12/13 7:19:15 AM GMT-08:00 ANE4966I (Session: 81786, Node: SERVERNAME) Network data transfer rate: 14,607.59 KB/sec (SESSION: 81786) 
2/12/13 7:19:15 AM GMT-08:00 ANE4967I (Session: 81786, Node: SERVERNAME) Aggregate data transfer rate: 16.84 KB/sec (SESSION: 81786) 
2/12/13 7:19:15 AM GMT-08:00 ANE4968I (Session: 81786, Node: SERVERNAME) Objects compressed by: 0% (SESSION: 81786) 
2/12/13 7:19:15 AM GMT-08:00 ANE4976I (Session: 81786, Node: SERVERNAME) Total data reduction ratio: 97.88%% (SESSION: 81786) 
2/12/13 7:19:15 AM GMT-08:00 ANE4969I (Session: 81786, Node: SERVERNAME) Subfile objects reduced by: 0% (SESSION: 81786) 
2/12/13 7:19:15 AM GMT-08:00 ANE4964I (Session: 81786, Node: SERVERNAME) Elapsed processing time: 07:21:48 (SESSION: 81786) 
2/12/13 7:19:15 AM GMT-08:00 ANR0403I Session 81786 ended for node SERVERNAME (WinNT). (SESSION: 81786) 
2/12/13 7:19:18 AM GMT-08:00 ANR0406I Session 81787 started for node SERVERNAME (WinNT) (Tcp/Ip 128.X.X.X(59549)). (SESSION: 81787) 
2/12/13 7:19:18 AM GMT-08:00 ANR2507I Schedule NIGHTLY_INCR_MID for domain DATABASE started at 02/11/13 23:55:00 for node SERVERNAME completed successfully at 02/12/13 07:19:18. (SESSION: 81787) 
2/12/13 7:19:18 AM GMT-08:00 ANR0403I Session 81787 ended for node SERVERNAME (WinNT). (SESSION: 81787) 
2/12/13 7:19:18 AM GMT-08:00 ANR0406I Session 81788 started for node SERVERNAME (WinNT) (Tcp/Ip 128.X.X.X(59550)). (SESSION: 81788) 
2/12/13 7:19:18 AM GMT-08:00 ANR0403I Session 81788 ended for node SERVERNAME (WinNT). (SESSION: 81788) 
 

1. Appears the AIX server time is off, as it doesnt match the time of the client. That shouldn't matter, right?
2. Odd at 12:00 it starts and then quickly ends 2 sessions? Yet nothing else logged in that gap until 7am...

As for the infrastructure per the admin, "There is no waiting for a mountpoint because all data is going to the diskpool. Data is only sent directly to tape if the size of the individual file exceeds 350G (large snapshots/databases). Data is written to a diskpool that is locally attached storage. It is 7.25 Tb of storage and rarely fills up at night."

--
If it's not on Google, then it doesn't exist.

**DC++ FAQ**


dennismurphy
Put me on hold? I'll put YOU on hold
Premium
join:2002-11-19
Parsippany, NJ
kudos:3
Reviews:
·Verizon FiOS
reply to USR56K
I wonder if it has something to do with the WMI Writer backup? Seems that backup session runs OK and then once it starts WMI Writer backups, it terminates due to lack of data 1/2 hour later.

Quick google search asks me to ask you for the output from 'vssadmin list writers' and see what the state is for it ...

»www.windows-server-answers.com/m···ion.aspx

I'm no Windows expert, but I wonder if you have shadow copies of your volumes that are hanging the backup?

That's what it sounds like, but what do I know?

Otherwise, the server log looks clean ... No retries, no failures ...


dennismurphy
Put me on hold? I'll put YOU on hold
Premium
join:2002-11-19
Parsippany, NJ
kudos:3
Reviews:
·Verizon FiOS
reply to HELLFIRE
said by HELLFIRE:

Got any other hints and tricks from your TSM days you'd be willing to share dennismurphy See Profile?

I work in network operations, and one of my pet peeve tickets is one that starts with the client log that USR56K See Profile
posted and the comment "please check for network disconnects or slowdowns for this timeperiod."

[insert long string of four-letter expletives here]

Regards

Sure ... Learn, study, and figure out what TSM is telling you with the 'q actlog' command .... All the data's in there, just have to figure out what it all means

HELLFIRE
Premium
join:2009-11-25
kudos:18
reply to USR56K

2/12/13 12:32:41 AM GMT-08:00 ANR0482W Session 78405 for node SERVERNAME (WinNT) terminated - idle for more than 30 minutes. (SESSION: 78405)
2/12/13 6:48:40 AM GMT-08:00 ANR0406I Session 81625 started for node SERVERNAME (WinNT) (Tcp/Ip 128.X.X.X(58701)). (SESSION: 81625)

Scratching my head as well about the log timestamps -- rule #1 of ANY log review, GET A COMMON TIMEZONE SETTING FOR
THE LOGS
-- but I'd say your 12:32:41 has your answer right there. Now why it took 6hours to start back up is the million
dollar question.

@ dennismurphy See Profile
...will keep that in mind for the next ticket like that that comes around.

Regards


USR56K

join:2000-05-20
Lynnwood, WA
reply to USR56K
I've been trying different settings in dsm.opt over the past few weeks across a handful of servers. So far, the only thing which has 'fixed' the problem is disabling SYSTEMSTATE backups. Now an daily incremental backup only takes 2-5 minutes.

As all the servers are VMware with daily snapshots, will just have to rely on those in case of a catastrophic OS failure.

Time still not fixed on the AIX server either.
--
If it's not on Google, then it doesn't exist.

**DC++ FAQ**