dslreports logo
 
    All Forums Hot Topics Gallery
spc

spacer

Search Topic:
uniqs
2740
share rss forum feed


USR56K

join:2000-05-20
Lynnwood, WA

Tivoli TSM - delay in backup

Environment:
TSM client: v6.4r0 Windows 2008 R2 SP1.
TSM Server: v6.2r4, AIX

Problem:
During the nightly backups of severs, there is often a 2-8 hour gap during the backup process. For example:


That blank line between 00:33 and 07:50 is exactly how it appears in the log file too. This random delay, occurs on multiple of our servers. Upgrading the TSM client from v6.3 to v6.4 did not resolve this issue. The TSM server was rebooted last week, which also did not help. Nothing stands out in the TSM log files or windows event logs for the culprit.

Thoughts on the problem?

--
If it's not on Google, then it doesn't exist.

**DC++ FAQ**


dennismurphy
Put me on hold? I'll put YOU on hold
Premium
join:2002-11-19
Parsippany, NJ
kudos:3
Reviews:
·Verizon FiOS
said by USR56K:

Environment:
TSM client: v6.4r0 Windows 2008 R2 SP1.
TSM Server: v6.2r4, AIX

Problem:
During the nightly backups of severs, there is often a 2-8 hour gap during the backup process. For example:


That blank line between 00:33 and 07:50 is exactly how it appears in the log file too. This random delay, occurs on multiple of our servers. Upgrading the TSM client from v6.3 to v6.4 did not resolve this issue. The TSM server was rebooted last week, which also did not help. Nothing stands out in the TSM log files or windows event logs for the culprit.

Thoughts on the problem?

Anything on the server log? Are you waiting on a mountpoint? Operator intervention?

What's your storage pool look like?

TSM is waaaaay too complex to troubleshoot anything from 'just' the client log. :)


USR56K

join:2000-05-20
Lynnwood, WA
said by dennismurphy:

Anything on the server log? Are you waiting on a mountpoint? Operator intervention?

And this is where my problem really lies. The team who is in charge of managing the TSM infrastructure either doesn't care or have the expertise into troubleshooting into this problem for the last 6 months... which is why I've turned to the Internet for tips on troubleshooting as I'm fed up on waiting.

Since this problem is occurring on many of our TSM clients, would it be safe to assume the delay must reside with the TSM server?
--
If it's not on Google, then it doesn't exist.

**DC++ FAQ**


dennismurphy
Put me on hold? I'll put YOU on hold
Premium
join:2002-11-19
Parsippany, NJ
kudos:3
Reviews:
·Verizon FiOS
I would say, almost certainly.

Sounds like your clients are contending for mount points or available tape; do you know if the backups are staged to a disk pool or direct to tape?

Given the delays are several hours in length, could be waiting for an operator to mount a tape or somesuch.

Edit: I've been a TSM admin in a previous life. Great tool, but complex.


USR56K

join:2000-05-20
Lynnwood, WA
Thanks for the pointers. I'll poke the TSM admin with the questions and see if it helps to lead them somewhere to solve the problem.
--
If it's not on Google, then it doesn't exist.

**DC++ FAQ**

HELLFIRE
Premium
join:2009-11-25
kudos:20
reply to dennismurphy
Got any other hints and tricks from your TSM days you'd be willing to share dennismurphy See Profile?

I work in network operations, and one of my pet peeve tickets is one that starts with the client log that USR56K See Profile
posted and the comment "please check for network disconnects or slowdowns for this timeperiod."

[insert long string of four-letter expletives here]

Regards


USR56K

join:2000-05-20
Lynnwood, WA
reply to USR56K
Got a copy of the server logs.


1. Appears the AIX server time is off, as it doesnt match the time of the client. That shouldn't matter, right?
2. Odd at 12:00 it starts and then quickly ends 2 sessions? Yet nothing else logged in that gap until 7am...

As for the infrastructure per the admin, "There is no waiting for a mountpoint because all data is going to the diskpool. Data is only sent directly to tape if the size of the individual file exceeds 350G (large snapshots/databases). Data is written to a diskpool that is locally attached storage. It is 7.25 Tb of storage and rarely fills up at night."

--
If it's not on Google, then it doesn't exist.

**DC++ FAQ**


dennismurphy
Put me on hold? I'll put YOU on hold
Premium
join:2002-11-19
Parsippany, NJ
kudos:3
Reviews:
·Verizon FiOS
reply to USR56K
I wonder if it has something to do with the WMI Writer backup? Seems that backup session runs OK and then once it starts WMI Writer backups, it terminates due to lack of data 1/2 hour later.

Quick google search asks me to ask you for the output from 'vssadmin list writers' and see what the state is for it ...

»www.windows-server-answers.com/m ··· ion.aspx

I'm no Windows expert, but I wonder if you have shadow copies of your volumes that are hanging the backup?

That's what it sounds like, but what do I know?

Otherwise, the server log looks clean ... No retries, no failures ...


dennismurphy
Put me on hold? I'll put YOU on hold
Premium
join:2002-11-19
Parsippany, NJ
kudos:3
Reviews:
·Verizon FiOS
reply to HELLFIRE
said by HELLFIRE:

Got any other hints and tricks from your TSM days you'd be willing to share dennismurphy See Profile?

I work in network operations, and one of my pet peeve tickets is one that starts with the client log that USR56K See Profile
posted and the comment "please check for network disconnects or slowdowns for this timeperiod."

[insert long string of four-letter expletives here]

Regards

Sure ... Learn, study, and figure out what TSM is telling you with the 'q actlog' command .... All the data's in there, just have to figure out what it all means

HELLFIRE
Premium
join:2009-11-25
kudos:20
reply to USR56K

2/12/13 12:32:41 AM GMT-08:00 ANR0482W Session 78405 for node SERVERNAME (WinNT) terminated - idle for more than 30 minutes. (SESSION: 78405)
2/12/13 6:48:40 AM GMT-08:00 ANR0406I Session 81625 started for node SERVERNAME (WinNT) (Tcp/Ip 128.X.X.X(58701)). (SESSION: 81625)

Scratching my head as well about the log timestamps -- rule #1 of ANY log review, GET A COMMON TIMEZONE SETTING FOR
THE LOGS
-- but I'd say your 12:32:41 has your answer right there. Now why it took 6hours to start back up is the million
dollar question.

@ dennismurphy See Profile
...will keep that in mind for the next ticket like that that comes around.

Regards


USR56K

join:2000-05-20
Lynnwood, WA
reply to USR56K
I've been trying different settings in dsm.opt over the past few weeks across a handful of servers. So far, the only thing which has 'fixed' the problem is disabling SYSTEMSTATE backups. Now an daily incremental backup only takes 2-5 minutes.

As all the servers are VMware with daily snapshots, will just have to rely on those in case of a catastrophic OS failure.

Time still not fixed on the AIX server either.
--
If it's not on Google, then it doesn't exist.

**DC++ FAQ**